DETAILED ACTION 
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . 
This action is in reply to the communications filed on July 1, 2022.  The Applicant’s Amendment and Request for Reconsideration has been received and entered. 
	Claims 1-4, 7-9 and 11-18 are currently pending and have been examined. Claims 1, 9, and 15 have been amended.  Claims 5, 6, and 10 have been cancelled.

Response to Arguments
Applicant’s amendments necessitated the new grounds of rejection.
The previous rejection of claims 1-4, 7-9 and 11-16 under 35 USC 101 have been withdrawn in view of Applicant’s amendments.  
Applicant’s remaining arguments have been fully considered but they are not persuasive.  Particularly, Applicant’s arguments are directed to the instantly amended claims, and are thus moot in view of the new grounds of rejection. 



Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.
The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.

Claims 1-4, 7-9 and 11-18 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor, or for pre-AIA , the applicant, regards as the invention.
Claim 1, and similarly claims 9, and 15, recite “store in a map representing at least one of the first and second versions, an indication of the temporal location of the matching region and/or a region of difference, depending, respectively, on whether a matching region is found or not found.”  As recited this limitation is unclear as how can an indication of the temporal location of the matching region AND a region of difference be stored in a map, depending, respectively, on whether a matching region is found or not found.  Generally, an indication of the temporal location of the matching region or a region of difference be stored in a map, depending, respectively, on whether a matching region is found or not found. For examination purposes, the Examiner has interpreted this limitation as merely store in a map representing at least one of the first and second versions, an indication of the temporal location of the matching region or a region of difference.
Claims 2-4, 7, 8, and 16-18 depend from claim 1 and thus inherit the deficiencies of claim 1.
Claims 11-14 depend from claim 1 and thus inherit the deficiencies of claim 9.


Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claims 1-4, 7-9, and 11-15 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Wang (US PGP 2014/0280127).
As per claim 1, Wang teaches [a]n apparatus to map the differences between versions of an audiovisual product comprising: 
a processing engine configured to perform a process to: (Wang: Para [0006])
generate: 
a plurality of audio fingerprints each associated with a respective temporal location within a first version of the audiovisual product, and  (Wang: Fig. 5; Para [0054]-[0065]; Fig. 3 (302); Para [0037]; Para [0018]) 
a plurality of audio fingerprints each associated with a respective temporal location within a second version of the audiovisual product, (Wang: Fig. 3; Para [0031]-[0033] (At block 204, the method 200 includes performing a content recognition of the sample of media content using a data file including a concatenation of representations for each of a plurality of media content recordings. The concatenation may include a plurality of respective representations (e.g., fingerprints or set of fingerprints) per media content recording and arranged in sequential time order per media content recording in the data file. A representation for a given media content recording may include a set of fingerprints determined or extracted at respective landmark positions within the given media content recording, and each fingerprint corresponds to a global position within the data file. The data file also may have associated identifiers per groupings of representations (e.g., per sets of fingerprints) for each of the plurality of media content recordings. In an example where the media content recordings include songs, the identifiers may include any of a title of a song, an artist, genre, etc.); Para [0037]; Para (Wang: Fig. 5; Para [0054]-[0065]; Para [0018]);
wherein each audio fingerprint comprises a fingerprint hash of a section of audio at the respective associated temporal location, the fingerprint hash being based on a frequency characteristic of the section; (Wang: Fig. 5; Para [0054]-[0065] (At block 502, the method 500 includes determining fingerprints in the data file that substantially match to one or more fingerprints of the sample of media content. Fingerprints of the received sample of media content are created by processing a query media sample into a set of sample landmark and fingerprint pairs. The sample fingerprints are then used to retrieve matching KV pairs in the KV data file of concatenated media content, where the key K is a fingerprint and the value V is the payload, which in this case is a concatenated global position value. At block 504, the method 500 includes pairing corresponding global positions of the substantially matching fingerprints with corresponding respective landmark positions of the one or more fingerprints in the sample of media content to provide global position-landmark position pairs. Thus, a retrieved global position value is paired with the sample landmark value. At block 508, the method 500 includes determining clusters of the global position-landmark position pairs that are substantially linearly related (or have some associated temporal correspondence). As one example, to verify if there is a match, a histogram scan can be performed to search for a significant peak in the sorted time offset difference data (e.g., number of data points occurring within a predetermined window width or number of points in a histogram bin). A presence of a peak in the number of points above a threshold within a window or bin can be interpreted as evidence for a match. Each occurrence of a significant peak in the long concatenated timeline of time offset differences indicates a candidate match, and candidate matches may be further processed individually to ascertain whether the candidates matches are exact, possibly using a different algorithm to verify a match. As one example, the time offset differences may be filtered using a predetermined window width of a few milliseconds.); Fig. 3 (302); Para [0037] (FIG. 3 illustrates a diagram of an example method to form a concatenation of representations of media content recordings. Generally, media content can be identified by computing characteristics or fingerprints of a media sample and comparing the fingerprints to previously identified fingerprints of reference media files. Particular locations within the sample at which fingerprints are computed may depend on reproducible points in the sample. Such reproducibly computable locations are referred to as “landmarks.” One landmarking technique, known as Power Norm, is to calculate an instantaneous power at many time points in the recording and to select local maxima. One way of doing this is to calculate an envelope by rectifying and filtering a waveform directly. FIG. 3 illustrates a media content recording being input to a fingerprint extractor 302 (or fingerprint generator) that is configured to determine fingerprints of the media content recording. An example plot of dB (magnitude) of a sample vs. time is shown, and the plot illustrates a number of identified landmark positions (L1 to L8). Once the landmarks have been determined, the fingerprint extractor 302 is configured to compute a fingerprint at or near each landmark time point in the recording. The fingerprint is generally a value or set of values that summarizes a set of features in the recording at or near the landmark time point. In one example, each fingerprint is a single numerical value that is a hashed function of multiple features. Other examples of fingerprints include spectral slice fingerprints, multi-slice fingerprints, LPC coefficients, cepstral coefficients, and frequency components of spectrogram peaks.); Para [0018]);
identify an audio fingerprint associated with a first temporal location within the first version of an audiovisual product; (Wang: Fig. 5; Para [0054]-[0065] (At block 502, the method 500 includes determining fingerprints in the data file that substantially match to one or more fingerprints of the sample of media content. Fingerprints of the received sample of media content are created by processing a query media sample into a set of sample landmark and fingerprint pairs. The sample fingerprints are then used to retrieve matching KV pairs in the KV data file of concatenated media content, where the key K is a fingerprint and the value V is the payload, which in this case is a concatenated global position value. At block 504, the method 500 includes pairing corresponding global positions of the substantially matching fingerprints with corresponding respective landmark positions of the one or more fingerprints in the sample of media content to provide global position-landmark position pairs. Thus, a retrieved global position value is paired with the sample landmark value. At block 508, the method 500 includes determining clusters of the global position-landmark position pairs that are substantially linearly related (or have some associated temporal correspondence). As one example, to verify if there is a match, a histogram scan can be performed to search for a significant peak in the sorted time offset difference data (e.g., number of data points occurring within a predetermined window width or number of points in a histogram bin). A presence of a peak in the number of points above a threshold within a window or bin can be interpreted as evidence for a match. Each occurrence of a significant peak in the long concatenated timeline of time offset differences indicates a candidate match, and candidate matches may be further processed individually to ascertain whether the candidates matches are exact, possibly using a different algorithm to verify a match. As one example, the time offset differences may be filtered using a predetermined window width of a few milliseconds.); Fig. 3 (302); Para [0037] (FIG. 3 illustrates a diagram of an example method to form a concatenation of representations of media content recordings. Generally, media content can be identified by computing characteristics or fingerprints of a media sample and comparing the fingerprints to previously identified fingerprints of reference media files. Particular locations within the sample at which fingerprints are computed may depend on reproducible points in the sample. Such reproducibly computable locations are referred to as “landmarks.” One landmarking technique, known as Power Norm, is to calculate an instantaneous power at many time points in the recording and to select local maxima. One way of doing this is to calculate an envelope by rectifying and filtering a waveform directly. FIG. 3 illustrates a media content recording being input to a fingerprint extractor 302 (or fingerprint generator) that is configured to determine fingerprints of the media content recording. An example plot of dB (magnitude) of a sample vs. time is shown, and the plot illustrates a number of identified landmark positions (L1 to L8). Once the landmarks have been determined, the fingerprint extractor 302 is configured to compute a fingerprint at or near each landmark time point in the recording. The fingerprint is generally a value or set of values that summarizes a set of features in the recording at or near the landmark time point. In one example, each fingerprint is a single numerical value that is a hashed function of multiple features. Other examples of fingerprints include spectral slice fingerprints, multi-slice fingerprints, LPC coefficients, cepstral coefficients, and frequency components of spectrogram peaks.); Para [0018]); 
search the plurality of audio fingerprints each associated with a respective temporal location within the second version of an audiovisual product for an audio fingerprint associated with a temporal location within a second version of the audiovisual product which matches the audio fingerprint associated with the first temporal location within the first version, to identify a first matching pair of audio fingerprints in the first version and the second version of the audiovisual product (Wang: Fig. 5; Para [0054]-[0060] (At block 502, the method 500 includes determining fingerprints in the data file that substantially match to one or more fingerprints of the sample of media content. Fingerprints of the received sample of media content are created by processing a query media sample into a set of sample landmark and fingerprint pairs. The sample fingerprints are then used to retrieve matching KV pairs in the KV data file of concatenated media content, where the key K is a fingerprint and the value V is the payload, which in this case is a concatenated global position value. At block 504, the method 500 includes pairing corresponding global positions of the substantially matching fingerprints with corresponding respective landmark positions of the one or more fingerprints in the sample of media content to provide global position-landmark position pairs. Thus, a retrieved global position value is paired with the sample landmark value.))
when a first matching pair of audio fingerprints is found, (Wang: Fig. 5; Para [0054]-[0060] (At block 502, the method 500 includes determining fingerprints in the data file that substantially match to one or more fingerprints of the sample of media content. Fingerprints of the received sample of media content are created by processing a query media sample into a set of sample landmark and fingerprint pairs. The sample fingerprints are then used to retrieve matching KV pairs in the KV data file of concatenated media content, where the key K is a fingerprint and the value V is the payload, which in this case is a concatenated global position value. At block 504, the method 500 includes pairing corresponding global positions of the substantially matching fingerprints with corresponding respective landmark positions of the one or more fingerprints in the sample of media content to provide global position-landmark position pairs. Thus, a retrieved global position value is paired with the sample landmark value.)
determine whether a region comprising the first temporal location associated with the audio fingerprint within the first version matches a corresponding region comprising the temporal location associated with the audio fingerprint within the second version by comparing the versions progressively away from the respective temporal locations to identify at least one further matching pair of audio fingerprints and thereby identify a matching region comprising the first matching pair of audio fingerprints and the at least one further matching pair of audio fingerprints in the first version and the second version of the audiovisual product; and (Wang: Fig. 5; Para [0054]-[0060] (At block 502, the method 500 includes determining fingerprints in the data file that substantially match to one or more fingerprints of the sample of media content. Fingerprints of the received sample of media content are created by processing a query media sample into a set of sample landmark and fingerprint pairs. The sample fingerprints are then used to retrieve matching KV pairs in the KV data file of concatenated media content, where the key K is a fingerprint and the value V is the payload, which in this case is a concatenated global position value. At block 504, the method 500 includes pairing corresponding global positions of the substantially matching fingerprints with corresponding respective landmark positions of the one or more fingerprints in the sample of media content to provide global position-landmark position pairs. Thus, a retrieved global position value is paired with the sample landmark value. A time offset between the two positions may then be determined, for each global position-landmark position pair, by subtracting the global position value from the sample landmark value for matching fingerprints. Instead of storing the time offset pair differences (generated by subtracting corresponding time offsets from matching sample versus reference fingerprints) into many buckets where each bucket corresponds to a sound_ID index, all time offset differences can be stored in a single bucket. At block 506, the method 500 includes sorting the global position-landmark position pairs. In other examples, the method 500 may include sorting the time offset differences generated from the global position-landmark position pairs. At block 508, the method 500 includes determining clusters of the global position-landmark position pairs that are substantially linearly related (or have some associated temporal correspondence). As one example, to verify if there is a match, a histogram scan can be performed to search for a significant peak in the sorted time offset difference data (e.g., number of data points occurring within a predetermined window width or number of points in a histogram bin). A presence of a peak in the number of points above a threshold within a window or bin can be interpreted as evidence for a match. Each occurrence of a significant peak in the long concatenated timeline of time offset differences indicates a candidate match, and candidate matches may be further processed individually to ascertain whether the candidates matches are exact, possibly using a different algorithm to verify a match.); Fig. 6; Para [0061] (Initially, fingerprint and landmark pairs (F1/L1, F2/L2, . . . , Fn/Ln) can be determined and the fingerprints can be used to find matching fingerprints within the concatenated data file of known media content recordings. Global positions within the data file can be paired with landmarks in the sample for matching fingerprints. A scatter plot of landmarks of the sample and global positions of the known reference files can be determined. After generating a scatter plot, clusters of landmark pairs having linear correspondences can be identified, and the clusters can be scored according to the number of pairs that are linearly related.))
store in a map representing at least one of the first and second versions, an indication of the temporal location of the matching region and/or a region of difference, depending, respectively, on whether a matching region is found or not found; and (Wang: Para [0057]-[0058] (At block 508, the method 500 includes determining clusters of the global position-landmark position pairs that are substantially linearly related (or have some associated temporal correspondence). As one example, to verify if there is a match, a histogram scan can be performed to search for a significant peak in the sorted time offset difference data (e.g., number of data points occurring within a predetermined window width or number of points in a histogram bin). A presence of a peak in the number of points above a threshold within a window or bin can be interpreted as evidence for a match. Each occurrence of a significant peak in the long concatenated timeline of time offset differences indicates a candidate match, and candidate matches may be further processed individually to ascertain whether the candidates matches are exact, possibly using a different algorithm to verify a match. As one example, the time offset differences may be filtered using a predetermined window width of a few milliseconds. At block 510, the method 500 includes identifying a matching media content recording to the sample of media content as a media content recording having a cluster with a largest number of global position-landmark position pairs that are substantially linearly related. Thus, the candidate match that has the most time offset differences within a predetermined window width can be deemed the winning matching file, for example.); Fig. 6; Para [0061] (Initially, fingerprint and landmark pairs (F1/L1, F2/L2, . . . , Fn/Ln) can be determined and the fingerprints can be used to find matching fingerprints within the concatenated data file of known media content recordings. Global positions within the data file can be paired with landmarks in the sample for matching fingerprints. A scatter plot of landmarks of the sample and global positions of the known reference files can be determined. After generating a scatter plot, clusters of landmark pairs having linear correspondences can be identified, and the clusters can be scored according to the number of pairs that are linearly related.); Para [0032] (In one example, the content recognition can be performed by determining a representation in the data file that matches to a portion of the sample of media content, and then to identify a mapping between the matching portion in the data file and an identifier for a respective media content recording. The mapping may be between a global position of the representation in the data file and the identifier.); Fig. 5; Para [0054]-[0060] (In some examples, the method 500 may further include determining a sound identifier of the matching media content recording based on the corresponding global position of the substantially matching fingerprints in the data file. For example, global positions of representations of the given media content recording in the data file can be associated or mapped to respective sound identifiers, and the mapping may be referenced when a winning global position is identified. At block 504, the method 500 includes pairing corresponding global positions of the substantially matching fingerprints with corresponding respective landmark positions of the one or more fingerprints in the sample of media content to provide global position-landmark position pairs. Thus, a retrieved global position value is paired with the sample landmark value. A time offset between the two positions may then be determined, for each global position-landmark position pair, by subtracting the global position value from the sample landmark value for matching fingerprints. Instead of storing the time offset pair differences (generated by subtracting corresponding time offsets from matching sample versus reference fingerprints) into many buckets where each bucket corresponds to a sound_ID index, all time offset differences can be stored in a single bucket.); Fig. 6; Para [0061] (Initially, fingerprint and landmark pairs (F1/L1, F2/L2, . . . , Fn/Ln) can be determined and the fingerprints can be used to find matching fingerprints within the concatenated data file of known media content recordings. Global positions within the data file can be paired with landmarks in the sample for matching fingerprints. A scatter plot of landmarks of the sample and global positions of the known reference files can be determined. After generating a scatter plot, clusters of landmark pairs having linear correspondences can be identified, and the clusters can be scored according to the number of pairs that are linearly related.); Claim 1 (storing, by the computing device, a mapping between an identifier for a respective media content recording and a global position in the data file that corresponds to the representation of the respective media content recording.)
repeat the process by identifying further first matching pairs of audio fingerprints associated with temporal locations outside of any matching region(s), identifying further matching regions and region(s) of difference between the versions of the audiovisual product based on said further first matching pairs of audio fingerprints, and updating the map with indications of the temporal locations of the further matching regions and region(s) of difference between the versions of the audiovisual product. (Wang: Para [0032]; Fig. 5; Para [0054]-[0060] (At block 504, the method 500 includes pairing corresponding global positions of the substantially matching fingerprints with corresponding respective landmark positions of the one or more fingerprints in the sample of media content to provide global position-landmark position pairs. Thus, a retrieved global position value is paired with the sample landmark value. A time offset between the two positions may then be determined, for each global position-landmark position pair, by subtracting the global position value from the sample landmark value for matching fingerprints. Instead of storing the time offset pair differences (generated by subtracting corresponding time offsets from matching sample versus reference fingerprints) into many buckets where each bucket corresponds to a sound_ID index, all time offset differences can be stored in a single bucket.  At block 508, the method 500 includes determining clusters of the global position-landmark position pairs that are substantially linearly related (or have some associated temporal correspondence). As one example, to verify if there is a match, a histogram scan can be performed to search for a significant peak in the sorted time offset difference data (e.g., number of data points occurring within a predetermined window width or number of points in a histogram bin). A presence of a peak in the number of points above a threshold within a window or bin can be interpreted as evidence for a match. Each occurrence of a significant peak in the long concatenated timeline of time offset differences indicates a candidate match, and candidate matches may be further processed individually to ascertain whether the candidates matches are exact, possibly using a different algorithm to verify a match. As one example, the time offset differences may be filtered using a predetermined window width of a few milliseconds); Fig. 6; Para [0061]-[0065] (A scatter plot of landmarks of the sample and global positions of the known reference files can be determined After generating a scatter plot, clusters of landmark pairs having linear correspondences can be identified, and the clusters can be scored according to the number of pairs that are linearly related. The offset values may be differences between landmark time positions and the global positions where a fingerprint matches. FIG. 6 illustrates an example histogram of offset values. The reference file may be given a score that is related to the number of points in a peak of the histogram (e.g., score=28 in FIG. 6). The entire concatenated data file may be processed in this manner using a single bulk operation to determine histogram peaks and a score for each peak, and the media content recording corresponding to the global position resulting in the highest score may be determined to be a match to the sample.); Claim 1)

As per claim 2, Wang teaches wherein if no further matching pair of audio fingerprints is identified within a certain temporal distance of the temporal location of the second version then there is no matching region.  (Wang: Fig. 5; Para [0054]-[0060]; Fig. 6; Para [0061]-[0065])

As per claim 3, Wang teaches wherein if at least one further matching pair of audio fingerprints is identified within a certain temporal distance of the temporal location of the second version then there is a matching region.  (Wang: Fig. 5; Para [0054]-[0060]; Fig. 6; Para [0061]-[0065])

As per claim 4, Wang teaches, wherein the matching region grows until no further matching pair of audio fingerprints are found within a threshold temporal distance.  (Wang: Fig. 5; Para [0054]-[0060]; Fig. 6; Para [0061]-[0065])

As per claim 7, Wang teaches wherein a matching pair of audio fingerprints is found when a threshold number of audio fingerprint hashes match between the respective audio fingerprints.  (Wang: Para [0037], Fig. 5; Para [0054]-[0060]; Fig. 6; Para [0061]-[0065])

As per claim 8, Wang teaches wherein to identify an audio fingerprint associated with a temporal location within the first version of the audiovisual product, the processing engine is configured to randomly select an audio fingerprint associated with a temporal location within the first version of the audiovisual product.  (Wang: Fig. 3; Para [0037]-[0043])

As per claims 9 and 15, these claims recite limitations substantially similar to claim 1 and are therefore rejected in the same manner as this claim, as set forth above. 

As per claim 11, Wang teaches further comprising: 
adding additional content to the second version of the audiovisual product by: 
using the map to identify at least one of: 
new regions of the second version of the audiovisual product; and matching regions of the second version of the audiovisual product; and - 26 -adding additional content to at least one of the new regions and the matching regions. (Wang: Fig. 5; Para [0054]-[0060]; Fig. 6; Para [0061]-[0065])

As per claim 12, Wang teaches wherein the audiovisual content comprises at least one of textual content and audio content.  (Wang: Para [0018]; Para [0023]; Fig. 5; Para [0054]-[0060])

As per claim 13, Wang teaches wherein the audio content comprises dubbing content. (Wang: Para [0018]; Para [0023])
Examiner note: While prior art has been applied, the Examiner notes that the audio content comprising dubbing content is merely nonfunctional descriptive material and is not functionally involved in the steps recited.  The searching and matching audiovisual content steps would be performed the same regardless of the specific type of audio content.  This descriptive material will not distinguish the claimed invention from the prior art in terms of patentability, see In re Gulack, 70 F.2d 1381, 1385, 217 USPQ 401 (Fed. Cir. 1983); In re Lowry, 32 F.3d 1579, 32 USPQ2d 1031 (Fed. Cir. 1994).

As per claim 14, Wang teaches wherein the textual content comprises at least one of captioning content and subtitling content. (Wang: Para [0018]; Para [0023])
Examiner note: While prior art has been applied, the Examiner notes that textual content comprising at least one of captioning content and subtitling content is merely nonfunctional descriptive material and is not functionally involved in the steps recited.  The searching and matching audiovisual content steps would be performed the same regardless of the specific type of textual content.  This descriptive material will not distinguish the claimed invention from the prior art in terms of patentability, see In re Gulack, 70 F.2d 1381, 1385, 217 USPQ 401 (Fed. Cir. 1983); In re Lowry, 32 F.3d 1579, 32 USPQ2d 1031 (Fed. Cir. 1994).


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.


Claim 16 is rejected under 35 U.S.C. 103 as being unpatentable over Wang in view of Holm (US PGP 2005/0065976).
As per claim 16, Wang teaches wherein generating each fingerprint hash comprises generating the peak frequencies and their associated times (Wang: Fig. 5; Para [0054]-[0065]); Fig. 3 (302); Para [0037] (FIG. 3 illustrates a diagram of an example method to form a concatenation of representations of media content recordings. Generally, media content can be identified by computing characteristics or fingerprints of a media sample and comparing the fingerprints to previously identified fingerprints of reference media files. Particular locations within the sample at which fingerprints are computed may depend on reproducible points in the sample. Such reproducibly computable locations are referred to as “landmarks.” One landmarking technique, known as Power Norm, is to calculate an instantaneous power at many time points in the recording and to select local maxima. One way of doing this is to calculate an envelope by rectifying and filtering a waveform directly. FIG. 3 illustrates a media content recording being input to a fingerprint extractor 302 (or fingerprint generator) that is configured to determine fingerprints of the media content recording. An example plot of dB (magnitude) of a sample vs. time is shown, and the plot illustrates a number of identified landmark positions (L1 to L8). Once the landmarks have been determined, the fingerprint extractor 302 is configured to compute a fingerprint at or near each landmark time point in the recording. The fingerprint is generally a value or set of values that summarizes a set of features in the recording at or near the landmark time point. In one example, each fingerprint is a single numerical value that is a hashed function of multiple features. Other examples of fingerprints include spectral slice fingerprints, multi-slice fingerprints, LPC coefficients, cepstral coefficients, and frequency components of spectrogram peaks.); Para [0018])
Wang does not explicitly disclose the following known technique which is taught by Holm:
by applying a Fast Fourier Transform to the respective section of audiovisual content. (Holm: Para [0033]-[0039] (The process starts, and in step 100, the fingerprint extraction engine 18 or a separate fourier transform engine (not shown) calculates a Fast Fourier Transform (FFT) or the like, of the audio signal of the preprocessed audio piece for transforming the signal waveform in the time domain into a signal in the frequency domain. Based on the FFT calculation, the fingerprint extraction engine 18 generates, in step 102, a T×F matrix A, where T≧F.))
This known technique is applicable to the method of Wang as they both share characteristics and capabilities, namely, they are directed to generating audio fingerprints. 
One of ordinary skill in the art at the time of filing would have recognized that applying the known technique of Holm would have yielded predictable results and resulted in an improved method.  It would have been recognized that applying the technique of Holm to the teachings of Wang would have yielded predictable results because the level of ordinary skill in the art demonstrated by the references applied shows the ability to incorporate such Fast Fourier Transform features into similar methods.  Further, applying the Fast Fourier Transform to the respective section of audiovisual content to the generation of each fingerprint hash comprising generating the peak frequencies and their associated times of Wang would have been recognized by those of ordinary skill in the art as resulting in an improved method that would allow a fingerprinting system that provides a reliable, fast, and robust identification of audio pieces (Holm: Para [0005]).


Claims 17 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Wang in view of Ye (US PGP 2016/0275588).
As per claim 17, Wang teaches wherein generating each audio fingerprint comprises generating the fingerprint hash by creating a single value . . . (Wang: Fig. 5; Para [0054]-[0065]); Fig. 3 (302); Para [0037] (FIG. 3 illustrates a media content recording being input to a fingerprint extractor 302 (or fingerprint generator) that is configured to determine fingerprints of the media content recording. An example plot of dB (magnitude) of a sample vs. time is shown, and the plot illustrates a number of identified landmark positions (L1 to L8). Once the landmarks have been determined, the fingerprint extractor 302 is configured to compute a fingerprint at or near each landmark time point in the recording. The fingerprint is generally a value or set of values that summarizes a set of features in the recording at or near the landmark time point. In one example, each fingerprint is a single numerical value that is a hashed function of multiple features. Other examples of fingerprints include spectral slice fingerprints, multi-slice fingerprints, LPC coefficients, cepstral coefficients, and frequency components of spectrogram peaks.); Para [0018])
Wang does not explicitly disclose the following known technique which is taught by Ye:
wherein generating each audio fingerprint comprises generating the fingerprint hash by creating a single value representing a pair of peak frequencies of the section of audio and the time difference between the pair of peak frequencies (Ye: Para [0131]-[0137] (By performing steps (1) and (2), for any peak feature point Sn(tk,fk), a paired peak feature point Sn(tb,fb) can be obtained, where n represents a sequence number of a phase channel or a sequence number of a time-frequency sub-diagram, and 0<n≦M; b represents a sequence number of the paired peak feature point in a peak feature point sequence n, and b is a positive integer; tb represents a time when the paired peak feature point Sn(tb,fb) appears in the nth time-frequency sub-diagram; and fb represents a frequency value of the paired peak feature point. In this embodiment, a four-tuple (tk,fk,Δfk,Δtk)n is defined to represent any peak feature point pair in a peak feature point pair sequence of any phase channel, where n represents a sequence number of a phase channel or a sequence number of a time-frequency sub-diagram; Δtk represents a time difference between two peak feature points in a peak feature point pair, and Δtk=tb−tk; and Δfk represents a frequency difference between two peak feature points in a peak feature point pair, and Δfk=fb−fk. Step 12): Perform a Hash operation according to a peak feature point pair corresponding to each time-frequency sub-diagram of the audio data to obtain the audio fingerprint.  As described in the foregoing, the four-tuple (tk,fk,Δfk,Δtk)n is used to represent any peak feature point pair in a peak feature point pair sequence of any phase channel. Parameters in the four-tuple may be understood as follows: (fk, Δfk,Δtk) represents a feature part of a peak feature point pair, and tk represents a time when (fk, Δfk, Δtk) appears and represents a collection timestamp. In this step, the Hash operation may be performed on (fk, Δfk, Δtk), (fk, Δfk,Δtk) is represented using a Hash code with a fixed bit quantity, as follows: hashcodek=H (fk, Δfk, Δtk). Through the calculation in this step, any peak feature point pair in a peak feature point pair sequence of any phase channel may be represented by (tk,hashcodek)n, where n represents a sequence number of a phase channel or a sequence number of a time-frequency sub-diagram, tk represents a time when hashcodek appears and (tk, hashcodek)n is an audio fingerprint and may represent a peak feature point pair. An audio fingerprint is represented by a collection timestamp and a Hash value.))
This known technique is applicable to the method of Wang as they both share characteristics and capabilities, namely, they are directed to generating audio fingerprints. 
One of ordinary skill in the art at the time of filing would have recognized that applying the known technique of Ye would have yielded predictable results and resulted in an improved method.  It would have been recognized that applying the technique of Ye to the teachings of Wang would have yielded predictable results because the level of ordinary skill in the art demonstrated by the references applied shows the ability to incorporate such single value representing a pair of peak frequencies of the section of audio and the time difference between the pair of peak frequencies features into similar methods.  Further, applying the generating the fingerprint hash by creating a single value representing a pair of peak frequencies of the section of audio and the time difference between the pair of peak frequencies to the fingerprint hash of Wang would have been recognized by those of ordinary skill in the art as resulting in an improved method that would allow audio recognition to be performed accordingly to determine the matched channel identity. (Ye: Para [0131]-[0137]).


As per claim 18, Wang teaches wherein each audio fingerprint comprises a plurality of fingerprint hashes of the section of audio at the respective associated temporal location, and generating each audio fingerprint comprising generating the plurality of fingerprint hashes by, 7CASE REF. NO. E3004.US+PATENTfor each fingerprint hash, creating a single value . . . (Wang: Fig. 5; Para [0054]-[0065]); Fig. 3 (302); Para [0037] (FIG. 3 illustrates a media content recording being input to a fingerprint extractor 302 (or fingerprint generator) that is configured to determine fingerprints of the media content recording. An example plot of dB (magnitude) of a sample vs. time is shown, and the plot illustrates a number of identified landmark positions (L1 to L8). Once the landmarks have been determined, the fingerprint extractor 302 is configured to compute a fingerprint at or near each landmark time point in the recording. The fingerprint is generally a value or set of values that summarizes a set of features in the recording at or near the landmark time point. In one example, each fingerprint is a single numerical value that is a hashed function of multiple features. Other examples of fingerprints include spectral slice fingerprints, multi-slice fingerprints, LPC coefficients, cepstral coefficients, and frequency components of spectrogram peaks.); Para [0018])
Wang does not explicitly disclose the following known technique which is taught by Ye:
. . .  for each fingerprint hash, creating a single value representing a pair of frequencies of the section of audio and the time difference between the pair of frequencies. (Ye: Para [0131]-[0137] (By performing steps (1) and (2), for any peak feature point Sn(tk,fk), a paired peak feature point Sn(tb,fb) can be obtained, where n represents a sequence number of a phase channel or a sequence number of a time-frequency sub-diagram, and 0<n≦M; b represents a sequence number of the paired peak feature point in a peak feature point sequence n, and b is a positive integer; tb represents a time when the paired peak feature point Sn(tb,fb) appears in the nth time-frequency sub-diagram; and fb represents a frequency value of the paired peak feature point. In this embodiment, a four-tuple (tk,fk,Δfk,Δtk)n is defined to represent any peak feature point pair in a peak feature point pair sequence of any phase channel, where n represents a sequence number of a phase channel or a sequence number of a time-frequency sub-diagram; Δtk represents a time difference between two peak feature points in a peak feature point pair, and Δtk=tb−tk; and Δfk represents a frequency difference between two peak feature points in a peak feature point pair, and Δfk=fb−fk. Step 12): Perform a Hash operation according to a peak feature point pair corresponding to each time-frequency sub-diagram of the audio data to obtain the audio fingerprint.  As described in the foregoing, the four-tuple (tk,fk,Δfk,Δtk)n is used to represent any peak feature point pair in a peak feature point pair sequence of any phase channel. Parameters in the four-tuple may be understood as follows: (fk, Δfk,Δtk) represents a feature part of a peak feature point pair, and tk represents a time when (fk, Δfk, Δtk) appears and represents a collection timestamp. In this step, the Hash operation may be performed on (fk, Δfk, Δtk), (fk, Δfk,Δtk) is represented using a Hash code with a fixed bit quantity, as follows: hashcodek=H (fk, Δfk, Δtk). Through the calculation in this step, any peak feature point pair in a peak feature point pair sequence of any phase channel may be represented by (tk,hashcodek)n, where n represents a sequence number of a phase channel or a sequence number of a time-frequency sub-diagram, tk represents a time when hashcodek appears and (tk, hashcodek)n is an audio fingerprint and may represent a peak feature point pair. An audio fingerprint is represented by a collection timestamp and a Hash value.))
This known technique is applicable to the method of Wang as they both share characteristics and capabilities, namely, they are directed to generating audio fingerprints. 
One of ordinary skill in the art at the time of filing would have recognized that applying the known technique of Ye would have yielded predictable results and resulted in an improved method.  It would have been recognized that applying the technique of Ye to the teachings of Wang would have yielded predictable results because the level of ordinary skill in the art demonstrated by the references applied shows the ability to incorporate such single value representing a pair of frequencies of the section of audio and the time difference between the pair of frequencies. features into similar methods.  Further, applying the for each fingerprint hash, creating a single value representing a pair of frequencies of the section of audio and the time difference between the pair of frequencies to the fingerprint hashes of Wang would have been recognized by those of ordinary skill in the art as resulting in an improved method that would allow audio recognition to be performed accordingly to determine the matched channel identity. (Ye: Para [0131]-[0137]).

	

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
Steele “Apple Rolls out More Accurate Itunes Matching for Apple Music.” Engadget, 13 May 2021 (https://www.engadget.com/2016/07/18/apple-music-itunes-match-update/) – matching audio fingerprints.

Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JENNIFER V LEE whose telephone number is (571)272-4778. The examiner can normally be reached Monday - Friday 9AM - 5PM EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, JEFFREY A. SMITH can be reached on (571)272-6763. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/JENNIFER V LEE/Examiner, Art Unit 3625                                                                                                                                                                                                        
/Jeffrey A. Smith/Supervisory Patent Examiner, Art Unit 3625