DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 18 - 20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter.  The claim(s) does/do not fall within at least one of the four categories of patent eligible subject matter because this claim recites “computer-readable medium.”  The phrase “computer-readable medium” is not defined in the specification.   The specification discloses embodiments of computer readable storage mediums such as, “Non-limiting examples of a computer-readable medium include a magnetic disk, a memory chip, a ROM, a RAM, an ASIC, optical storage, magnetic tape or other magnetic storage, or any other medium from which a processing device can read instructions.” (Spec., ¶ [0045]).  However, the examples of computer readable media provided by the specification do not limit the scope of the phrase “computer-readable medium.”  The broadest reasonable interpretation of “computer-readable medium” covers forms of non-transitory tangible media and transitory propagating signals per se in view of the ordinary and customary meaning of computer readable media, particularly when the specification is silent.  See MPEP 2111.01. When the broadest reasonable interpretation of a claim covers a signal per se, the claim must be rejected under 35 U.S.C. § 101 as covering non-statutory subject matter.  See Subject Matter Eligibility of Computer Readable Media, 1351 OG 212 (26 Jan 2010).
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claims 1 - 5, 8, 13 - 16, 18 - 20, are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Jansen et al (U.S. PG Pub. No. 2020/0349921).
With regards to claim 1, Jansen discloses estimating, by a neural network, “feature vectors” comprising numerical representations of the audio portions, respectively, wherein the neural network (e.g., “artificial neural network”) is trained to estimate (using a “mapping”) the “feature vectors” based on a trained model such that distances between the feature vectors and a positive feature vector indicate a level of similarity between respective audio portions (e.g., “positive segment” and “negative segment”) and selected audio portions (e.g., “anchor segment”) of the audio recording at ¶¶ [0020]-[0021] and ¶¶ [0024]-[0025]: “[A] mapping (e.g., an artificial neural network, …) between segments of audio in an n-dimensional feature space, where proximity within the feature space represents semantic similarity between the contents of the audio segment… [A] mapping or other algorithm to map segments of the audio recording e.g., segments of the audio recording that respectively contain the sounds 110a, 110b, 110c) to respective feature vectors in a semantically-encoded n-dimensional feature space.” See, also, ¶¶ [0027]-[0029]. Jansen further discloses the neural network uses the selected audio portions (e.g., “anchor segment”) to estimate the positive feature vector at ¶ [0022]; e.g.: “[A] positive segment is provided by randomly selecting an additional audio segment from the same audio recording as an anchor segment… [A] positive segment is sampled by generating a weighted combination of randomly-selected anchor and negative segments… [A] positive segment is 
Jansen discloses comparing the feature vectors associated with the audio portions, respectively, to the positive feature vector and to a negative feature vector representing negative samples to generate an audio score (distance) associated with the audio portions, respectively at ¶¶ [0029]-[0031]: “Thus, the level of semantic similarity or dissimilarity between the contents of segments of audio can be determined by determining the distance, within the semantically encoded n-dimensional feature space defined by such a mapping, between feature vectors determined by applying the mapping to spectrograms determined from the segments of audio.” See, also, ¶ [0036]-[0040]: “It can be difficult to define the quantitative degree to which pairs of audio segments are more or less semantically alike. It may be more straightforward to define whether a particular audio segment (an “anchor” segment) is more like a first audio segment (a “positive” segment) than a second audio segment (a “negative” segment)… A mapping can then be trained using such triplets such that the mapping defines an n-dimensional feature space such that feature vectors determined for the anchor segments of the training triplets are closer, within the n-dimensional feature space, to feature vectors determined for corresponding positive segments than to feature vectors determined for corresponding negative segments.”
Jansen discloses classifying, with the audio scores (distances), a first subset of the audio portions into a first class (i.e., closer to positive) representing a match with the selected audio portions and a second subset of the audio portions into a second class representing no match (i.e., closer to negative) with the selected audio portions and outputting the classification of the audio portions at ¶ [0036]-[0040]: “It can be difficult to define the quantitative degree to which pairs of audio segments are more or less semantically alike. It may be more straightforward to define whether a particular audio segment (an “anchor” segment) is more like a first audio segment (a “positive” segment) than a second audio segment (a “negative” 
With regards to claim 2, Jansen discloses the neural network uses at least some of the audio portions as the negative samples for estimating the feature vectors at ¶¶ [0045]-[0047]: “The negative spectrogram for the triplet could then be generated by selecting another audio segment, that differs from the selected anchor segment, from the training set of audio recordings and determining a spectrogram therefrom.” 
With regards to claim 3, Jansen discloses the feature vectors associated with the selected audio portions are combined to generate the positive feature vector at ¶ [0046]: “The positive audio segment for the triplet could then be generated by generating a weighted combination of the anchor and negative segments, with the weighted combination weighted more heavily toward the anchor spectrogram... [T]he positive spectrogram could be generated according to xp=xa+α[E(xa)/E(xn)]xn where xp, xn, and Xa are the anchor, negative, positive spectrograms…, respectively, E(                                
                                    ∙
                                
                            ) is the total energy of an audio segment/spectrogram, and a is a weighting parameter set to a positive value that is less than one.”
With regards to claim 4, Jansen discloses time shifting the selection of the selected audio portions to generate additional selected audio portions, wherein the neural network uses the additional selected audio portions as positive samples for estimating the feature vectors at ¶ [0045]: “In another example, applying a small shift in time and/or frequency to the spectrogram of an audio segment should not, in general, alter the classification and/or semantic content of the time- and/or frequency-shifted audio segment. Accordingly, a training triplet of spectrograms could be generated by selecting an anchor audio segment from a training set of audio recordings and generating a spectrogram therefrom. The positive audio segment for the triplet could then be generated by applying a shift in time and/or frequency to the anchor spectrogram.”
With regards to claim 5, as a matter of claim construction, ordinarily, the phrase “on the classification” in the limitation “receiving user feedback on the classification of the audio portions” might imply a temporal order to the receipt of user feedback; i.e., that the user feedback was received after the step of “classifying… into a first class”. However, this interpretation is inconsistent the specification-as-filed at p. 2, par. [0004], which states: “The neural network uses the user feedback (i.e. in the form of labeled positive and/or negative examples) for estimating the feature vectors and to identify and present changes to the positive samples and/or to the negative samples to the user.” Accordingly, consistent with the specification, “receiving user feedback on the classification of the audio portions” has been interpreted as user feedback related to what the classification should be, such as labeled positive or negative examples.
Jansen discloses receiving user feedback on the classification of the audio portions at ¶¶ [0019](“This could include obtaining manually-generated labels for the audio recordings. The manually-generated labels could then be used to train the machine learning algorithm via a supervised learning process.”), [0035](“This training can include obtaining a plurality of training 
Jansen discloses recalculating, by the neural network, the positive feature vector, the negative feature vector, or the positive feature vector and the negative feature vector using the user feedback to identify changes in the feature vectors used for calculating the positive feature vector, the negative feature vector, or the positive feature vector and the negative feature vector, respectively, at ¶¶ [0037]-[0040]: “The distance (e.g., the Euclidean distance), within the feature space, between the anchor 310a and positive 310p feature vectors is indicated by distance “Dp1” and the distance between the anchor 310a and negative 310n feature vectors is indicated by distance “Dn1” ... The distances or some other information about the relative locations of the feature vectors could be used to update or otherwise train the mapping (e.g., to decrease Dn1 and/or to decrease Dn1)… A loss function could be provided that receives the relative locations of the feature vectors of segments of such training triplets (e.g., that receives the Euclidean distances between the anchor feature vector and each of the positive and negative feature vectors) and outputs a loss value that could be used to update and/or train the mapping in order to improve the ability of the mapping to project the training segments into an n-dimensional feature space such that the anchor segments are projected to respective feature vectors that are closer to the feature vectors of their respective positive segments than to feature vectors of their respective negative segments.”
With regards to claim 8, Jansen discloses the selected audio portions (e.g., “anchor segment”) are selected by a user at ¶ [0046](“Accordingly, a training triplet of spectrograms could be generated by selecting an anchor audio segment from a training set of audio recordings.”)
With regards to claim 13, 
With regards to claim 14, the steps performed by the apparatus of this claim are anticipated by Jansen for the same reasons as were presented with respect to claim 2, which is a method claim reciting these same steps.
With regards to claim 15, the steps performed by the apparatus of this claim are anticipated by Jansen for the same reasons as were presented with respect to claim 3, which is a method claim reciting these same steps.
With regards to claim 16, the steps performed by the apparatus of this claim are anticipated by Jansen for the same reasons as were presented with respect to claim 4, which is a method claim reciting these same steps.
With regards to claim 18, the steps of the instructions stored in the computer readable medium of this claim are anticipated by Jansen for the same reasons as were presented with respect to claim 1, which is a method claim reciting these same steps.
With regards to claim 19, the steps of the instructions stored in the computer readable medium of this claim are anticipated by Jansen for the same reasons as were presented with respect to claim 2, which is a method claim reciting these same steps.
With regards to claim 20, the steps of the instructions stored in the computer readable medium of this claim are anticipated by Jansen for the same reasons as were presented with respect to claim 3, which is a method claim reciting these same steps.






(continued on next page)
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claims 6 - 7, 9 - 10, 12, 17, are rejected under 35 U.S.C. 103 as being unpatentable over Jansen et al (U.S. PG Pub. No. 2020/0349921) in view of Abe et al (U.S. PG Pub. No. 2013/0255473).
With regards to claim 6, at ¶ [0018], Jansen discloses various audio portions such as, “noises related to the operation of machinery, weather, the movements of people or animals, sirens or other alert sounds, barks or other sounds generated by animals, or other sounds.” But, Jansen does not specify any of these audio portions overlap. However, this limitation was known in the art:
Abe discloses a first audio portion (“transient noise component”) overlaps a second audio portion (“tonal component”)  of the audio recording at ¶ [0002]. See, also, ¶ [0005], [0041]-[0042]. At the time of the filing of the present application, it would have been obvious to a person of ordinary skill in the art to consider overlapping audio portions, as taught by Abe, when classifying audio portions according to the method taught by Jansen.  The motivation for doing so comes from Abe, which discloses, “Thus, useful information for many application techniques such as voice analysis, coding, noise reduction, and high-quality sound reproduction can be obtained.”  (¶ [0066]).  Therefore, it would have been obvious to combine Abe with Jansen to obtain the invention specified in this claim.
With regards to claim 7
With regards to claim 9, Jansen discloses converting the audio recording into a spectrogram at ¶¶ [0027]-[0031], but does not specify generating the audio portions by selecting frames of the spectrogram. However, this limitation was known in the art:
Abe discloses converting the audio recording into a spectrogram and generating the audio portions by selecting frames (“time-frequency region”) of the spectrogram at ¶¶ [0036]-[0039]. The motivation for this combination is the same as was previously presented.
With regards to claim 10, Abe discloses deleting, from the audio recording, the first subset (“transient noise component”) of the audio portions at ¶¶ [0003], [0066](“noise reduction”).
With regards to claim 12, Abe discloses performing key frame (“time-frequency region”)  generation with the first subset of the audio portions at ¶¶ [0036]-[0039]. The motivation for this combination is the same as was previously presented.
With regards to claim 17, the steps performed by the apparatus of this claim are obvious over the combination of Jansen and Abe for the same reasons as were presented with respect to claim 7, which is a method claim reciting these same steps.
Claim 11 is rejected under 35 U.S.C. 103 as being unpatentable over Jansen et al (U.S. PG Pub. No. 2020/0349921) in view of Donaldson et al (U.S. PG Pub. No. 2022/0067557).
With regards to claim 11, Jansen discloses converting the audio recording into a spectrogram at ¶¶ [0027]-[0031], but does not specify highlighting the first subset of the audio portions in the spectrogram for display on a user device. However, this limitation was known in the art:
Donaldson discloses converting the audio recording into a spectrogram and highlighting the first subset of the audio portions in the spectrogram for display on a user device at ¶ [0072](“[A]udio clips may be displayed by a time-frequency spectrogram, where masks identify with frequency components over specific time periods, and optimal differential masks explain 
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DAVID F DUNPHY whose telephone number is (571)270-1230. The examiner can normally be reached 9 am - 5 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Vu Le can be reached on 5712727332. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/DAVID F DUNPHY/Primary Examiner, Art Unit 2668