DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
1. The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
2. The Action is responsive to the Applicant’s Remarks filed 01/28/2022.
3. Claims 1-20 are pending in which 1-6, 8-14, 16-22 and 24 are rejected, claims 7, 15 and 23 are objected to and claims 1, 9 and 17 are independent. 
Response to Arguments
4. Applicant's arguments in the Remarks filed 01/28/2022 have been fully considered. For the Examiner’s responses, please refer to below discussions.
4.1. As per claim 1, the Applicant argued that “”Mysore describes "to perform source separation at operation 1145, the mask may be applied to the mixture to isolate contributions from its corresponding source." Id., at para. [0104]. However, isolating contributions from its corresponding source does not teach or suggest one or more processors to execute instructions to identify query audio as a cover rendition of reference audio based on a comparison between a query data structure and a reference data structure associated with the reference audio””, the Examiner respectfully submits that,
Concerning the limitation “identifying, using the one or more processors, the query audio as a cover rendition of the reference audio based on a comparison 
As cited, Guralnick teaches searching song database for desired song title. The searching of the query audio title reads on an identifying of the query audio as a cover, the title. The searching is to render the reference, the desired song title. The Examiner also interpreted the song title as a data structure.  
At col. 47, lines 56-60, Guralnick teaches “The retrieval may be performed by parsing the characters of the song title 112 selected by the user and comparing this sequence of characters with those stored in memory in the identifier portion of each master song edit metadata of the song edit data 101 for each song title 112”, that is, the retrieving of song titles for the searching (identifying) leads to comparison of the song titles teaches explicitly rendition of the reference audio based on a comparison between the query data structure and a reference data structure associated with the reference audio.
At col. 26, lines 14-18, Guralnick teaches “The musical genre input may, for instance, lead to sorting of the song titles listed in, for example, a scroll menu of song titles of the song database, leaving only the song titles sharing the designated musical genre available for further selection by the user”, in which a sorting of song titles implicitly teaching a comparisons of the song titles. Guralnick also seems teaching comparison of the song titles teaches implicitly rendition of the reference audio based 
Therefore, Guralnick seems reasonably teaches the limitation “identifying, using the one or more processors, the query audio as a cover rendition of the reference audio based on a comparison between the query data structure and a reference data structure associated with the reference audio”.
4.2. As per the “Allowable Subject Matter” previously set forth, the Applicant expressly reserved the right to do so in the future”, to which the Examiner respectfully respected. To expedite the prosecution of the instant application, the Applicant is respectfully encouraged to exercise the right in an immediate fashion. 
4.3. As per Double Patenting Rejections, the Applicant requested the Rejection “be held in abeyance until agreement on the scope of allowable claims enables evaluation”. The Examiner respectfully submits, under the Office’s policy and current practice, it seems the rejections is required along with the application prosecutions. 
Due to extreme intervening of the limitations in conflict between the instant application and the US Patent 10885021, a table listing of all the claims is attached to a summarized description of main limitations in conflict in the Double Patenting Rejection below.
Double Patenting Rejection
5. The nonstatutory double patenting rejection is based on a judicially created 
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717 .02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159. See MPEP §§ 706.02(1)(1) - 706.02(1)(3) for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is autoprocessed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www .uspto.gov/patents/process/file/efs/guidance/eTD-info-1.jsp. 
Claims 1-24 are provisionally rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1-20 of U.S. Patent 10803119 (issued to the parent application 15698557). Although the claims at issue are not identical, they are not patentably distinct from each other because the current application and the Patent both are dedicated to identifying query audio from a content source based on a search query using rights metadata associated with the query audio, by
performing a constant Q transform on multiple reference time slices of the reference audio;
binarizing the constant Q transformed reference time slices;
executing (performing) a two-dimensional Fourier transform on query time windows within the binarized and constant Q transformed query time slices to generate two-dimensional Fourier transforms of the query time windows. Therefore, they are common features and should not be patentable from one to another.
This is a provisional nonstatutory double patenting rejection because the patentably indistinct claims have not in fact been patented.
Instant Application claims 1-24
Patent 10885021 claims 1-20
17. A method comprising:

identifying, using one or more processors, query audio from a content source based on a search query using rights metadata associated with the query audio;

executing, using the one or more processors, a constant Q transform on query time slices of the query audio;

binarizing, using the one or more processors, the constant Q transformed query time slices;

query time slices to generate two-dimensional Fourier transforms of the query time windows;
generating, using the one or more processors, a query data structure based on a sequential order of the two-dimensional Fourier transforms;
selecting, using the one or more processors, a subset of a reference database
based on the rights metadata, the subset including reference audio; and
identifying, using the one or more processors, the query audio as a cover rendition of the reference audio based on a comparison between the query data 
18. The method of claim 17, wherein the rights metadata includes at least one of
an artist, a publisher, license information, right holder information, royalty
information, or a title of the query audio, and further including:
obtaining the reference audio via a user interface in communication with a network; and
registering the reference audio based on storing the reference audio in the
reference database and storing the rights metadata in a rights database.
19. The method of claim 17, wherein the content source is at least one of (i) a
stream of a live broadcast, (ii) a music 
a social networking feed, a post, update, or a tweet of a social network.
20. The method of claim 17, wherein the one or more processors are to select the
content source based on at least one of information provided by a right holder
associated with the query audio, the rights metadata, a popularity of the query
content source, or a likelihood of the query content source having potentially
unlicensed cover songs.
21. The method of claim 17, further including:
executing a constant Q transform on reference time slices of the reference
audio;
binarizing the constant Q transformed reference time slices;

within the binarized and constant Q transformed reference time slices to generate
two-dimensional Fourier transforms of the reference time windows; and
generating the reference data structure by sequentially ordering the twodimensional
Fourier transforms of the reference time windows.
22. The method of claim 17, further including:
generating a similarity matrix that indicates degrees to which reference
portions of the reference data structure are similar to query portions of the query

computing a distance measure between the query data structure and the
reference data structure based on the similarity matrix; and
storing an association in a database between the reference audio and the
query audio based on a computed distance measure, the association identifying the
query audio as the cover rendition.
23. The method of claim 22, further including:
convolving the similarity matrix with a checkerboard kernel to generate a
first convolved similarity matrix, the first convolved similarity matrix including
positive elements and negative elements; and

similarity matrix; and wherein:
the computing of the distance measure between the query data structure and
the reference data structure is based on the second convolved similarity matrix.
24. The method of claim 17, further including:
grouping the binarized and constant Q transformed query time slices of the
query audio into the query time windows prior to the executing of the two-dimensional
Fourier transform on the query time windows, the query time windows
including overlapping query time windows of uniform duration; and

query time windows prior to the sequential ordering of the two-dimensional Fourier
transforms in the query data structure.











































































































































1. An apparatus comprising:
memory; and
one or more processors to execute instructions to:
identify query audio from a content source based on a search

execute a constant Q transform on query time slices of the
query audio;
binarize the constant Q transformed query time slices;
execute a two-dimensional Fourier transform on query time
windows within the binarized and constant Q transformed query time
slices to generate two-dimensional Fourier transforms of the query
time windows;
generate a query data structure based on a sequential order of
the two-dimensional Fourier transforms;
select a subset of a reference 
metadata, the subset including reference audio; and
identify the query audio as a cover rendition of the reference
audio based on a comparison between the query data structure and a
reference data structure associated with the reference audio.
2. The apparatus of claim 1, wherein the rights metadata includes at least one
of an artist, a publisher, license information, right holder information, royalty
information, or a title of the query audio, and the one or more processors are to:
obtain the reference audio via a user interface in communication with a

register the reference audio based on storing the reference audio in the
reference database and storing the rights metadata in a rights database.
3. The apparatus of claim 1, wherein the content source is at least one of (i) a
stream of a live broadcast, (ii) a music sharing shite, (iii) a video sharing site, or (iv)
a social networking feed, a post, update, or a tweet of a social network.
4. The apparatus of claim 1, wherein the one or more processors are to select
the content source based on at least one of information provided by a right holder
associated with the query audio, the rights metadata, a popularity of the query
content source, or a likelihood of the 
unlicensed cover songs.
5. The apparatus of claim 1, wherein the one or more processors are to:
execute a constant Q transform on reference time slices of the reference
audio;
binarize the constant Q transformed reference time slices;
execute a two-dimensional Fourier transform on reference time windows
within the binarized and constant Q transformed reference time slices to generate
two-dimensional Fourier transforms of the reference time windows; and
generate the reference data structure by sequentially ordering the twodimensional
Fourier transforms of the reference 
6. The apparatus of claim 1, wherein the one or more processors are to:
generate a similarity matrix that indicates degrees to which reference
portions of the reference data structure are similar to query portions of the query
data structure;
compute a distance measure between the query data structure and the
reference data structure based on the similarity matrix; and
store an association in a database between the reference audio and the query
audio based on a computed distance measure, the association identifying the query
audio as the cover rendition.

convolve the similarity matrix with a checkerboard kernel to generate a first
convolved similarity matrix, the first convolved similarity matrix including positive
elements and negative elements; and
replace the negative elements with zeros to generate a second convolved
similarity matrix; and wherein:
the computing of the distance measure between the query data structure and
the reference data structure is based on the second convolved similarity matrix.
8. The apparatus of claim 1, wherein the one or more processors are to:
group the binarized and constant Q 
query audio into the query time windows prior to the executing of the two -dimensional
Fourier transform on the query time windows, the query time windows
including overlapping query time windows of uniform duration; and
apply a blur algorithm to the two-dimensional Fourier transforms of the
query time windows prior to the sequential ordering of the two-dimensional Fourier
transforms in the query data structure.







































































































































































9. A non-transitory machine-readable medium comprising instructions that,
when executed, cause one or more processors to at least:
identify query audio from a content source based on a search query using
rights metadata associated with the query audio;

binarize the constant Q transformed query time slices;
execute a two-dimensional Fourier transform on query time windows within
the binarized and constant Q transformed query time slices to generate two-dimensional
Fourier transforms of the query time windows;
generate a query data structure based on a sequential order of the two-dimensional
Fourier transforms;
select a subset of a reference database based on the rights metadata, the
subset including reference audio; and
identify the query audio as a cover 
a comparison between the query data structure and a reference data structure
associated with the reference audio.
10. The non-transitory machine-readable medium of claim 9, wherein the rights
metadata includes at least one of an artist, a publisher, license information, right
holder information, royalty information, or a title of the query audio, and the
instructions, when executed, cause the one or more processors to:
obtain the reference audio via a user interface in communication with a
network; and
register the reference audio based on storing the reference audio in the

11. The non-transitory machine-readable medium of claim 9, wherein the
content source is at least one of (i) a stream of a live broadcast, (ii) a music sharing
shite, (iii) a video sharing site, or (iv) a social networking feed, a post, update, or a
tweet of a social network.
12. The non-transitory machine-readable medium of claim 9, wherein the
instructions, when executed, cause the one or more processors to select the content
source based on at least one of information provided by a right holder associated
with the query audio, the rights 
or a likelihood of the query content source having potentially unlicensed cover
songs.
13. The non-transitory machine-readable medium of claim 9, wherein the
instructions, when executed, cause the one or more processors to:
execute a constant Q transform on reference time slices of the reference
audio;
binarize the constant Q transformed reference time slices;
execute a two-dimensional Fourier transform on reference time windows
within the binarized and constant Q transformed reference time slices to generate

generate the reference data structure by sequentially ordering the twodimensional
Fourier transforms of the reference time windows.
14. The non-transitory machine-readable medium of claim 9, wherein the
instructions, when executed, cause the one or more processors to:
generate a similarity matrix that indicates degrees to which reference
portions of the reference data structure are similar to query portions of the query
data structure;
compute a distance measure between the query data structure and the
reference data structure based on the 
store an association in a database between the reference audio and the query
audio based on a computed distance measure, the association identifying the query
audio as the cover rendition.
15. The non-transitory machine-readable medium of claim 14, wherein the
instructions, when executed, cause the one or more processors to:
convolve the similarity matrix with a checkerboard kernel to generate a first
convolved similarity matrix, the first convolved similarity matrix including positive
elements and negative elements; and
replace the negative elements with zeros to generate a second convolved

the computing of the distance measure between the query data structure and
the reference data structure is based on the second convolved similarity matrix.
16. The non-transitory machine-readable medium of claim 9, wherein the
instructions, when executed, cause the one or more processors to:
group the binarized and constant Q transformed query time slices of the
query audio into the query time windows prior to the executing of the twodimensional
Fourier transform on the query time windows, the query time windows
including overlapping query time windows of uniform duration; and

query time windows prior to the sequential ordering of the two-dimensional Fourier
transforms in the query data structure.



accessing, using one or more hardware processors, reference audio to be represented by a reference data structure to be generated and stored in a reference database; 
generating, using the one or more hardware processors, the reference data structure from the reference audio by at least: 
performing a constant Q transform on multiple reference time slices of the reference audio; 

performing a two-dimensional Fourier transform on multiple reference time windows within the binarized and constant Q transformed reference time slices of the reference audio to obtain two-dimensional Fourier transforms of the reference time windows; and 
sequentially ordering the two-
creating, within the reference database, a data association between the reference audio and the generated reference data structure that includes the sequentially ordered two-dimensional Fourier transforms of the reference time windows, the created data association indicating that the reference data structure is an identifier of the reference audio; 
accessing, using the one or more hardware processors, metadata associated with the reference audio; 
accessing, using the one or more hardware processors, a content source using the metadata to obtain query 
comparing, using the one or more hardware processors, the query audio to the reference audio based on the query data structure and the reference data structure; 
generating, using the one or more hardware processors, a ranking of the query audio based on the comparison; and 
in response to the ranking satisfying a threshold, generating, using the one or more hardware processors, a notification based on identifying the query audio as a cover rendition of the reference audio.

2. The computerized method of claim 1, further including: 
grouping the binarized and constant Q transformed reference time slices of the reference audio into the multiple reference time windows prior to the performing of the two-dimensional Fourier transform on the multiple reference time windows, the multiple reference time windows including overlapping reference time windows of uniform duration.

3. The computerized method of claim 1, wherein: 
the generating of the reference data structure includes applying a blur algorithm to each of the two-

4. The computerized method of claim 1, further including: 
receiving a request to identify the query audio, the request being received from a device; and 
controlling the device by causing the device to present a notification that the query audio is a cover rendition of the reference audio.

5. A computerized method comprising: 
generating, using one or more 
performing a constant Q transform on multiple query time slices of the query audio; 
binarizing the constant Q transformed query time slices of the query audio by, for each constant Q transformed query time slice, calculating a median value of a range of constant Q transformed query time slices that encompasses the constant Q transformed query time slice and binarizing the constant Q transformed query time slices based on the calculated median value of the range; 
performing a two-dimensional Fourier transform on multiple query time windows within the binarized and 
sequentially ordering the two-dimensional Fourier transforms of the query time windows in the query data structure; 
creating, within a reference database, a data association between reference audio and the query audio based on a match between the query data structure and a reference data structure, the created data association indicating that the query audio is a cover rendition of the reference audio; 
accessing, using the one or more hardware processors, metadata associated with the reference audio; 

comparing, using the one or more hardware processors, the query audio to the reference audio based on the query data structure and the reference data structure; 
generating, using the one or more hardware processors, a ranking of the query audio based on the comparison; and 
in response to the ranking satisfying a threshold, generating, using the one or more hardware processors, a 

6. The computerized method of claim 5, further including: grouping the binarized and constant Q transformed query time slices of the query audio into the multiple query time windows prior to the performing of the two-dimensional Fourier transform on the multiple query time windows, the query time windows including overlapping query time windows of uniform duration.

7. A computerized method comprising: 
accessing, using one or more 
accessing, using the one or more hardware processors, a content source using the metadata to obtain query audio to be represented by a query data structure for comparison to a reference data structure that represents the reference audio; 
comparing, using the one or more hardware processors, the query audio to the reference audio based on the query data structure and the reference data structure; 
generating, using the one or more hardware processors, a ranking of the query audio based on the comparison; in response to the ranking satisfying a threshold, generating, using the one or 
generating a similarity matrix that indicates degrees to which reference portions of the reference data structure are similar to query portions of the query data structure; 
computing a distance measure between the query data structure and the reference data structure based on the generated similarity matrix; 
creating, within a reference database, a data association between the reference audio and the query audio based on the computed distance measure between the query data structure and the reference data 
convolving the generated similarity matrix with a checkerboard kernel, the convolved similarity matrix including positive elements and negative elements; and 
replacing the negative elements of the convolved similarity matrix with zeros; and 
wherein: the computing of the distance measure between the query data structure and the reference data structure is based on the convolved similarity matrix with the negative elements replaced with zeros.

8. The computerized method of claim 7, wherein: the computing of the distance measure between the query 
identifying diagonals in the convolved similarity matrix with the negative elements replaced with zeros; 
computing lengths and sums of the diagonals in the convolved similarity matrix; 
computing multiplicative products of the lengths and the sums of the diagonals in the convolved similarity matrix; 
ranking the diagonals based on the multiplicative products of the length and the sums of the diagonals in the convolved similarity matrix; 
identifying a dominant subset of the diagonals based on the ranking of the diagonals in the convolved similarity 
wherein: the computing of the distance measure between the query data structured structure and the reference data structure is based on the summed multiplicative products of lengths and sums of the dominant subset of the ranked diagonals in the convolved similarity matrix.

9. A system comprising: 
one or more hardware processors; and memory including instructions that, when executed, cause the one or more hardware processors to: 
access reference audio to be 
generate the reference data structure from the reference audio by at least: 
performing a constant Q transform on multiple reference time slices of the reference audio; 
binarizing the constant Q transformed reference time slices of the reference audio by, for each constant Q transformed reference time slice, calculating a median value of a range of constant Q transformed reference time slices that encompasses the constant Q transformed reference time slice and binarizing the constant Q transformed reference time slices based on the 
performing a two-dimensional Fourier transform on multiple reference time windows within the binarized and constant Q transformed reference time slices of the reference audio to obtain two-dimensional Fourier transforms of the reference time windows; and 
sequentially ordering the two-dimensional Fourier transforms of the reference time windows in the reference data structure; 
create, within the reference database, a data association between the reference audio and the generated reference data structure that includes the sequentially ordered two-dimensional Fourier transforms of the reference time windows, the created 
access metadata associated with the reference audio; access a content source using the metadata to obtain query audio to be represented by a query data structure for comparison to the reference data structure that represents the reference audio; 
compare the query audio to the reference audio based on the query data structure and the reference data structure; generate a ranking of the query audio based on the comparison; and 
in response to the ranking satisfying a threshold, generate a notification based on identifying the 

10. The system of claim 9, wherein the one or more hardware processors are to: 
group the binarized and constant Q transformed reference time slices of the reference audio into the multiple reference time windows prior to the performing of the two-dimensional Fourier transform on the multiple reference time windows, the multiple reference time windows including overlapping reference time windows of uniform duration.

11. The system of claim 9, wherein: the generating of the reference 

12. A system comprising: 
one or more hardware processors; and memory including instructions that, when executed, cause the one or more hardware processors to: 
generate a query data structure from query audio by at least: 
performing a constant Q transform on multiple query time slices of the query audio; 
binarizing the constant Q 
performing a two-dimensional Fourier transform on multiple query time windows within the binarized and constant Q transformed query time slices of the query audio to obtain two-dimensional Fourier transforms of the query time windows; and 
sequentially ordering the two-dimensional Fourier transforms of the 
create, within a reference database, a data association between reference audio and the query audio based on a match between the query data structure and a reference data structure, the created data association indicating that the query audio is a cover rendition of the reference audio; 
access metadata associated with the reference audio; 
access a content source using the metadata to obtain query audio to be represented by a query data structure for comparison to the reference data structure that represents the reference audio; 
compare the query audio to the 
generate a ranking of the query audio based on the comparison; and in response to the ranking satisfying a threshold, 
generate a notification based on identifying the query audio as a cover rendition of the reference audio.

13. The system of claim 12, wherein the one or more hardware processors are to: group the binarized and constant Q transformed query time slices of the query audio into the multiple query time windows prior to the performing of the two-dimensional Fourier transform on the multiple query 

14. A system comprising: 
one or more hardware processors; and memory including instructions that, when executed, cause the one or more hardware processors to: 
access metadata associated with reference audio; 
access a content source using the metadata to obtain query audio to be represented by a query data structure for comparison to a reference data structure that represents the reference audio; 
compare the query audio to the reference audio based on the query data 
generate a ranking of the query audio based on the comparison; 
in response to the ranking satisfying a threshold, generate a notification based on identifying the query audio as a cover rendition of the reference audio; 
generate a similarity matrix that indicates degrees to which reference portions of the reference data structure are similar to query portions of the query data structure; 
compute a distance measure between the query data structure and the reference data structure based on the generated similarity matrix; 
create, within a reference 
convolve the generated similarity matrix with a checkerboard kernel, the convolved similarity matrix including positive elements and negative elements; and 
replace the negative elements of the convolved similarity matrix with zeros; and wherein: the computing of the distance measure between the query data structure and the reference data structure is based on the convolved similarity matrix with the negative elements replaced with zeros.

15. The system of claim 14, wherein the one or more hardware processors are to: 
access the reference audio to be represented by the reference data structure to be generated and stored in a reference database; 
generate the reference data structure from the reference audio by at least: 
performing a constant Q transform on multiple reference time slices of the reference audio; 
binarizing the constant Q transformed reference time slices of the reference audio by, for each constant Q transformed reference time slice, calculating a median value of a range of 
performing a two-dimensional Fourier transform on multiple reference time windows within the binarized and constant Q transformed reference time slices of the reference audio to obtain two-dimensional Fourier transforms of the reference time windows; and 
sequentially ordering the two-dimensional Fourier transforms of the reference time windows in the reference data structure; and 
create, within the reference database, a data association between 

16. The system of claim 14, wherein the one or more hardware processors are to: 
generate the query data structure from the query audio by at least: performing a constant Q transform on multiple query time slices of the query audio; 
binarizing the constant Q transformed query time slices of the 
performing a two-dimensional Fourier transform on multiple query time windows within the binarized and constant Q transformed query time slices of the query audio to obtain two-dimensional Fourier transforms of the query time windows; and 
sequentially ordering the two-dimensional Fourier transforms of the query time windows in the query data 
create, within the reference database, a further data association between the reference audio and the query audio based on a match between the query data structure and the reference data structure, the created further data association indicating that the query audio is a cover rendition of the reference audio.

17. A non-transitory machine-readable storage medium comprising instructions that, when executed, cause one or more processors of a machine to at least: 
access reference audio to be represented by a reference data structure to be generated and stored in 
generate the reference data structure from the reference audio by at least: 
performing a constant Q transform on multiple reference time slices of the reference audio; 
binarizing the constant Q transformed reference time slices of the reference audio by, for each constant Q transformed reference time slice, calculating a median value of a range of constant Q transformed reference time slices that encompasses the constant Q transformed reference time slice and binarizing the constant Q transformed reference time slices based on the calculated median value of the range; 
performing a two-dimensional 
sequentially ordering the two-dimensional Fourier transforms of the reference time windows in the reference data structure; 
create, within the reference database, a data association between the reference audio and the generated reference data structure that includes the sequentially ordered two-dimensional Fourier transforms of the reference time windows, the created data association indicating that the reference data structure is an identifier 
access metadata associated with reference audio; 
access a content source using the metadata to obtain query audio to be represented by a query data structure for comparison to a reference data structure that represents the reference audio; 
compare the query audio to the reference audio based on the query data structure and the reference data structure; 
generate a ranking of the query audio based on the comparison; and in response to the ranking satisfying a threshold, generate a notification based on identifying the query audio as a cover rendition of the reference audio.

18. The non-transitory machine-readable storage medium of claim 17, wherein the instructions, when executed, cause the one or more processors to: 
generate a similarity matrix that indicates degrees to which reference portions of the reference data structure are similar to query portions of the query data structure; 
compute a distance measure between the query data structure and the reference data structure based on the generated similarity matrix; and 
create, within the reference database, the data association between the reference audio and the query audio based on the computed distance 

19. The non-transitory machine-readable storage medium of claim 18, wherein the instructions, when executed, cause the one or more processors to: 
convolve the generated similarity matrix with a checkerboard kernel, the convolved similarity matrix including positive elements and negative elements; and 
replace the negative elements of the convolved similarity matrix with zeros; and wherein: the computing of the distance measure between the query data structure and the reference 

20. The non-transitory machine-readable storage medium of claim 17, wherein the instructions, when executed, cause the one or more processors to: 
group the binarized and constant Q transformed reference time slices of the reference audio into the multiple reference time windows prior to the performing of the two-dimensional Fourier transform on the multiple reference time windows, the multiple reference time windows including overlapping reference time windows of uniform duration.


Claim Rejections - 35 USC § 103
6. The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The text of those sections of Title 35, U.S. Code not included in this action can be found in a prior Office action.
The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.


This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37CPR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

6.1. Claims 17-21, 1-5 and 9-13 are rejected under 35 U.S.C. § 103 as being unpatentable over 
Mysore et al.: "SEMI-SUPERVISED SOURCE SEPARATION USING NON-NEGATIVE TECHNIQUES", (U.S. Patent Application Publication 20130132077 A1, filed May 27, 2011 and published May 23, 2013, hereafter "Mysore"), and in view of 
Guralnick: "WORKOUT MUSIC PLAYBACK MACHINE", (U.S. Patent US 9,880,805 B1, filed September 23, 2014 and issued April 2, 2015).

As per claim 17, Mysore teaches a method comprising:
identifying, using one or more processors, query audio from a content source based on a search query using rights metadata associated with the query audio (See [0025]-[0026], techniques for modeling signals originated from single sources, followed by techniques for modeling signals originated from multiple sources may be used in music audio search and retrieval, and recording and processing. Here audio searching and retrieving teaches identifying the query audio while techniques for modeling signals 
executing, using the one or more processors, a constant Q transform on query time slices of the query audio (See FIGS. 7A-E and [0086], a spectrogram of a synthesized saxophone playing a C major arpeggio four times. Therefore, four repetitions of the sequence C-E-G may be identified. The spectrogram was computed using an STFT with a window size of 100 ms and a hop size of 25 ms (a constant -Q transform was used for displaying the fundamental frequencies of the different notes and the relation between the fundamental frequencies purposes).);
binarizing, using the one or more processors, the constant Q transformed query time slices (See [0032], algorithms or symbolic representations of operations on binary digital signals stored within a memory and manipulating or transforming signals, typically represented as physical electronic or magnetic quantities within memories);
executing, using the one or more processors, a two-dimensional Fourier transform on query time windows within the binarized and constant Q transformed query time slices to generate two-dimensional Fourier transforms of the query time windows (See [0044]-[0045], the spectrogram may be a spectrogram generated as the magnitude of the short time Fourier transform (STFT) of a signal and construct a dictionary for each segment of the spectrogram. The various segments may be, for example, time frames of the spectrogram);
generating, using the one or more processors, a query data structure based on a sequential order of the two-dimensional Fourier transforms (See [0044]-[0045], the spectrogram may be a spectrogram generated in sequentially ordering as the magnitude of the short time Fourier transform (STFT) of a signal and construct a dictionary for each segment of the spectrogram. The various segments may be, for example, time frames of the spectrogram. Here the dictionary for each segment of the spectrogram is interpreted the data structure).
Mysore does not explicitly teach selecting, using the one or more processors, a subset of a reference database based on the rights metadata, the subset including reference audio.
However, Guralnick teaches selecting, using the one or more processors, a subset of a reference database based on the rights metadata, the subset including reference audio (See col. 33, lines 54-57 and col. 37, lines 9-11 and 22-24, retrieving from memory the song recording data, by for instance accessing the song recording metadata, for each of original musical recordings 110 corresponding to the song titles; and the user may then set the song title 112 searching through the song database 108 for the desired song title 112, then the song title 112 corresponding to the original musical recording 110 may be added to a slot row 305, forming part of the selection 114b of song titles 112. Here the song title is a data structure for searching and selecting audio via the song titles).

Mysore in view of Guralnick teaches the following:
identifying, using the one or more processors, the query audio as a cover rendition of the reference audio based on a comparison between the query data structure and a reference data structure associated with the reference audio (See Guralnick: col. 37, lines 9-11 and 22-24 and col. 26, lines 14-18, the user may then set the song title 112 searching through the song database 108 for the desired song title 112, then the song title 112 corresponding to the original musical recording 110 may be added to a slot row 305, forming part of the selection 114b of song titles 112. Here the song title is a data structure for searching audio via the song titles; and the musical genre input may, for instance, lead to sorting of the song titles listed in, for example, a scroll menu of song titles of the song database, leaving only the song titles sharing the designated musical genre available for further selection by the user.).

at least one of an artist, a publisher, license information, right holder information, royalty information, or a title of the query audio, and further including:
obtaining the reference audio via a user interface in communication with a network (See Guralnick: col. 33, lines 54-57 and col. 37, lines 9-11 and 22-24, retrieving from memory the song recording data, by for instance accessing the song recording metadata, for each of original musical recordings 110 corresponding to the song titles; and the user may then set the song title 112 searching through the song database 108 for the desired song title 112, then the song title 112 corresponding to the original musical recording 110 may be added to a slot row 305, forming part of the selection 114b of song titles 112. Here the retrieving and selecting teaches obtaining); and
registering the reference audio based on storing the reference audio in the reference database and storing the rights metadata in a rights database (See Guralnick: col. 19, lines 32-34 and col. 37, lines 9-11 and 22-24, the song title information stored in the song recording metadata may be accessed to identify the song recording data of the original musical recording; the user may then set the song title 112 searching through the song database 108 for the desired song title 112, then the song title 112 corresponding to the original musical recording 110 may be added to a slot row 305, forming part of the selection 114b of song titles 112. Here the song title is a data 

As per claim 19, Mysore in view of Guralnick teaches the method of claim 17, wherein the content source is at least one of (i) a stream of a live broadcast, (ii) a music sharing shite, (iii) a video sharing site, or (iv) a social networking feed, a post, update, or a tweet of a social network (See Guralnick: col 4, lines 50-51, retrieving video data corresponding to said song recording data).

As per claim 20, Mysore in view of Guralnick teaches the method of claim 17, wherein the one or more processors are to select the content source based on at least one of information provided by a right holder associated with the query audio, the rights metadata, a popularity of the query content source, or a likelihood of the query content source having potentially unlicensed cover songs (See Guralnick: col. 36, lines 13-16: The video source 140 may also receive instructions to retrieve certain video files from memory and stream the video files as video data (in, e.g. a digital compressed or uncompressed format) to a screen display 102.).

As per claim 21, Mysore in view of Guralnick teaches the method of claim 17, further including:

binarizing the constant Q transformed reference time slices (See Mysore: [0032], algorithms or symbolic representations of operations on binary digital signals stored within a memory and manipulating or transforming signals, typically represented as physical electronic or magnetic quantities within memories);
executing a two-dimensional Fourier transform on reference time windows within the binarized and constant Q transformed reference time slices to generate two-dimensional Fourier transforms of the reference time windows (See Mysore: [0044]-[0045], the spectrogram may be a spectrogram generated as the magnitude of the short time Fourier transform (STFT) of a signal and construct a dictionary for each segment of the spectrogram. The various segments may be, for example, time frames of the spectrogram); and
generating the reference data structure by sequentially ordering the two-dimensional Fourier transforms of the reference time windows (See Mysore: [0044]-

As per claims 1-5, the claims recite an apparatus comprising memory and one or more processors to execute instructions (Mysore: [0034], program instructions 140 that may be executable by the processor(s) 110 to implement aspects of the techniques described herein may be partly or fully resident within the memory 120) to perform the steps of the methods as recited in claims 17-21 above, respectively, and rejected under 35 U.S.C. § 103 as unpatentable over Mysore in view of Guralnick.
Therefore, claims 1-5 are rejected along the same rationale that rejected claims 17-21, respectively.

As per claims 9-13, the claims recite a non-transitory machine-readable medium comprising instructions that, when executed, cause one or more processors (Mysore: [0034], program instructions 140 that may be executable by the processor(s) 110 to implement aspects of the techniques described herein may be partly or fully resident within the memory 120) to perform the steps of the methods as recited in claims 17-21 
Therefore, claims 9-13 are rejected along the same rationale that rejected claims 17-21, respectively.

6.2. Claims 22, 6 and 14 are rejected under 35 U.S.C. § 103 as being unpatentable over Mysore, in view of Guralnick, as applied to claims 17, 1 and 9 above, and further in view of


Harutyunyan et al.: "METHODS AND SYSTEMS TO IDENTIFY ANOMALOUS BEHAVING COMPONENTS OF A DISTRIBUTED COMPUTING SYSTEM", (U.S. Patent Application Publication 20180165142 A1, filed December 12, 2016 and published June 14, 2018, hereafter "Harutyunyan").

As per claim 22, Mysore in view of Guralnick does not explicitly teach the method of claim 17, further including: generating a similarity matrix that indicates degrees to which reference portions of the reference data structure are similar to query portions of the query data structure.
However, Harutyunyan teaches the method of claim 17, further including:
similarity matrix that indicates degrees to which reference portions of the reference data structure are similar to query portions of the query data structure (See FIG. 27C and [0107], similarity matrix of similarities calculated for each pair of the seven event sources, for example, the similarity between (ES.sub.B, ES.sub.F) and ES.sub.A is 0.5 and the similarity between (ES.sub.E, ES.sub.E) and ES.sub.E is 0.333 as revealed by the corresponding matrix elements).
It would have been obvious to a person of ordinary skill in the computer art before the effective filing date of the claimed invention to combine Harutyunyan's teaching with Mysore in view of Guralnick reference because Mysore is dedicated to differentiating between constituent sound sources, Guralnick focuses on sophisticated looping sequences and Harutyunyan is dedicated to identifying anomalous behaving components of a distributed computing system, the combined teaching would have enabled Mysore in view of Guralnick reference to use similarity matrix techniques to differentiate between constituent sound sources.
Mysore in view of Guralnick and further in view of Harutyunyan teaches the following:
computing a distance measure between the query data structure and the reference data structure based on the similarity matrix (See Harutyunyan: Fig. 32 and [0128], a k -distance neighborhood is determined for each event source based on the k-th nearest neighbor distance of the event source); and
association in a database between the reference audio and the query audio based on a computed distance measure, the association identifying the query audio as the cover rendition (See Harutyunyan: [0072] and [0111], the distance is calculated between each pair of event sources in the cluster C; and the administration computer 1412 collects and may store the received event messages in a data-storage device or appliance 1418 as event logs 1420-1424.).


As per claim 6, the claim recites an apparatus comprising memory and one or more processors to execute instructions (Mysore: [0034], program instructions 140 that may be executable by the processor(s) 110 to implement aspects of the techniques described herein may be partly or fully resident within the memory 120) to perform the steps of the method as recited in claim 22 above, and rejected under 35 U.S.C. § 103 as unpatentable over Mysore in view of Guralnick and further in view of Harutyunyan.
Therefore, claim 6 is rejected along the same rationale that rejected claim 22.

As per claim 14, the claim recites a non-transitory machine-readable medium comprising instructions that, when executed, cause one or more processors (Mysore: [0034], program instructions 140 that may be executable by the processor(s) 110 to implement aspects of the techniques described herein may be partly or fully resident 
Therefore, claim 14 is rejected along the same rationale that rejected claim 22.

6.3. Claims 24, 8 and 16 are rejected under 35 U.S.C. § 103 as being unpatentable over Mysore, in view of Guralnick, as applied to claims 17, 1 and 9 above, and further in view of


Wong et al.: "METHOD OF AND APPARATUS FOR GENERATING A DEPTH MAP UTILIZED IN AUTOFOCUSING", (U.S. Patent Application Publication 20070297784 A1, filed June 22, 2006 and published December 27, 2007, hereafter "Wong").

As per claim 24, Mysore in view of Guralnick teaches the method of claim 17, further including:
grouping the binarized and constant Q transformed query time slices of the query audio into the query time windows prior to the executing of the two-dimensional Fourier transform on the query time windows, the query time windows including overlapping query time windows of uniform duration (See Mysore: [0044]-[0045], the 
Mysore in view of Guralnick does not explicitly teach applying a blur algorithm to the two-dimensional Fourier transforms of the query time windows prior to the sequential ordering of the two-dimensional Fourier transforms in the query data structure.
However, Wong teaches applying a blur algorithm to the two-dimensional Fourier transforms of the query time windows prior to the sequential ordering of the two-dimensional Fourier transforms in the query data structure (See [0005] and [0046], Computing the two-dimensional Fourier transforms of recorded images and implementing a pillbox blur based imaging system using an algorithm based on a Gaussian blur approximation).
It would have been obvious to a person of ordinary skill in the computer art before the effective filing date of the claimed invention to combine Wong's teaching 

As per claim 8, the claim recites an apparatus comprising memory and one or more processors to execute instructions (Mysore: [0034], program instructions 140 that may be executable by the processor(s) 110 to implement aspects of the techniques described herein may be partly or fully resident within the memory 120) to perform the steps of the method as recited in claim 24 above, and rejected under 35 U.S.C. § 103 as unpatentable over Mysore in view of Guralnick and further in view of Wong.
Therefore, claim 8 is rejected along the same rationale that rejected claim 24.

As per claim 16, the claim recites a non-transitory machine-readable medium comprising instructions that, when executed, cause one or more processors (Mysore: [0034], program instructions 140 that may be executable by the processor(s) 110 to implement aspects of the techniques described herein may be partly or fully resident 
Therefore, claim 16 is rejected along the same rationale that rejected claim 24.
Allowable Subject Matter
7. Claims 23, 7 and 15 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
References
8.1. The prior art made of record:
A. U.S. Patent US-9880805-B1.
F. U.S. Patent Application Publication US-20130132077-A1.
G. U.S. Patent Application Publication US-20180165142-A1.
H. U.S. Patent Application Publication US-20070297784-A1.
8.2. The prior art made of record and not relied upon is considered pertinent to applicant's disclosure: 
B. U.S. Patent Application Publication US-20140185815-A1.
C. U.S. Patent Application Publication US-20160012857-A1.
D. U.S. Patent Application Publication US-20150094835-A1.
E. U.S. Patent Application Publication US-20110058685-A1.
Conclusion
9.1.THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
9.2. Examiner has cited particular columns and line numbers in the references applied to the claims above for the convenience of the applicant. Although the specified citations are representative of the teachings of the art and are applied to specific limitations within the individual claim, other passages and figures may apply as well. It is respectfully requested from the applicant in preparing responses, to fully consider the references in entirety as potentially teaching all or part of the claimed invention, as well as the context of the passage as taught by the prior art or disclosed by the Examiner. SEE MPEP 2141.02 [R-5] VI. PRIOR ART MUST BE CONSIDERED IN ITS ENTIRETY, INCLUDING DISCLOSURES THAT TEACH AWAY FROM THE CLAIMS: A prior art reference 
9.3. In the case of amending the Claimed invention, Applicant is respectfully requested to indicate the portion(s) of the specification which dictate(s) the structure relied on for proper interpretation and also to verify and ascertain the metes and bounds of the claimed invention. 
Contact Information
10. Any inquiry concerning this communication or earlier communications from the Examiner should be directed to KUEN S LU whose telephone number is (571)272-4114. The examiner can normally be reached on M-F, 8-19, Mid-Flex 2 hours.
If attempts to reach the examiner by telephone pre unsuccessful, the examiner's Supervisor, Mrs. Tamara T Kyle can be reached on 571-272-4241. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for Page 13 Published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http: “//pair-direct.uspto.gov. Should you 
KUEN S LU  /Kuen S Lu/
Art Unit 2156
Primary Patent Examiner
March 8, 2022