DETAILED ACTION
Introduction
This office action is in response to Applicant’s submission filed on 03/10/2021. Claims 1-11 are pending in the application and have been examined.
	
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 2, 4, 5- 9 and 11  are rejected under 35 U.S.C. 103 as being unpatentable over LeBoeuf et. al. US Patent Application Publication 2011/0075851 in view of Do, H.M., et. al. Human-assisted sound event recognition for home service robots. Robot. Biomim. 3, 7 (2016).
Regarding claim 1, LeBoeuf teaches a model construction method for audio recognition, comprising: obtaining an audio data (see LeBoeuf, Fig.1 and [0024] describes an audio signal source 110 passes input data as a digital signal, which may be a live stream from a microphone or received over network, or a file retrieved from a database or other storage mechanism); determining a predicted result of the audio data by using a classification model, wherein the classification model is trained based on a machine learning algorithm(see LeBoeuf, [0057-0058] describes a symbolic layer based on pre-trained machine-learning methodologies to provide labels to the audio segments; symbolic layer interpreted as classification model), and the predicted result comprises a label defined by the classification model (see LeBoeuf, [0056] describes a symbolic layer based on machine-learning methodologies to provide labels to the audio segments). However LeBoeuf does not teach providing a prompt message according to a loss level of the predicted result, wherein the loss level is related to a difference between the predicted result and a corresponding actual result, and the prompt message is provided to query a correlation between the audio data and the label; and modifying the classification model according to a confirmation response of the prompt message, wherein the confirmation response is related to a confirmation of the correlation between the audio data and the label.
However, Do teaches providing a prompt message according to a loss level of the predicted result, wherein the loss level is related to a difference between the predicted result and a corresponding actual result, and the prompt message is provided to query a correlation between the audio data and the label (see Do, pg. 5 col. 2, teaches as shown in 
    PNG
    media_image1.png
    380
    663
    media_image1.png
    Greyscale
Fig. 2, we propose to let the robot and the human caregiver collaborate to recognize it. Basically, the robot sends the segment of non-voice sound to the caregiver, who then recognizes it and labels it through a user interface. Such an interface can be on a computer, or a mobile device such as a tablet or smartphone); and modifying the classification model according to a confirmation response of the prompt message, wherein the confirmation response is related to a confirmation of the correlation between the audio data and the label (see Do, pg. 5, col. 2 The sound library (SoundLib) consists of labelled sound events, which can be used in the training of the sound event recognition algorithm, therefore enabling incremental learning. When sufficient labelled data are available, the robot will be able to use the recognition algorithm to accurately recognize the event sounds; sound library is interpreted as classification model).
LeBoeuf and Do  are considered to be analogous to the claimed invention because they relate to audio recognition algorithm for audio event labeling and processing. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of LeBoeuf on analyzing and classification of the audio signal with the human labeling teachings of  Do to improve the recognition the diversity of the sounds associated with the same event ( see Do, pg. 2).
Regarding claim 2, LeBoeuf in view of Do teaches the model construction method for audio recognition according to claim 1. Do further teaches wherein the prompt message comprises the audio data and an inquiry content, the inquiry content is to query whether the audio data belongs to the label, and the steps of providing the prompt message comprises:  playing the audio data and providing the inquiry content (see Do, pg. 4, col. 2, The framework allows the robot not only to capture and separate acoustic events but also to estimate the context of sound events and send the audio data along with their contextual information to human caregivers for labelling).
Regarding claim 4, LeBoeuf in view of Do teaches the model construction method for audio recognition according to claim 1. Do further teaches adopting a label and the audio data corresponding to the confirmation response as training data of the classification model, and the classification model is retrained accordingly (see Do, pg. 5, col. 2 The sound library (SoundLib) consists of labelled sound events, which can be used in the training of the sound event recognition algorithm, therefore enabling incremental learning. When sufficient labelled data are available, the robot will be able to use the recognition algorithm to accurately recognize the event sounds;  interpreted as classification model retrained with incremental learning).
Regarding claim 5, LeBoeuf in view of Do teaches the model construction method for audio recognition according to claim 1.  LeBoeuf further teaches analyzing properties of an original audio data to determine a noise component of the original audio data (see LeBoeuf, [0027] the statistical features extracted from the audio signal include: zero crossing rate, which is a count of how many times the signal changes from positive amplitude to negative amplitude during a given period and which correlates to the "noisiness" of the signal); and eliminating the noise component from the original audio data to generate the audio data (see LeBoeuf, [0094] A "noise gate" or "gate" is a widely-used algorithm which only allows a signal to pass if its amplitude exceeds a certain threshold. Otherwise, no sound is output).
Regarding claim 6, LeBoeuf in view of Do teaches the model construction method for audio recognition according to claim 5.  LeBoeuf further teaches decomposing the original audio data to generate a plurality of mode components of the original audio data, wherein each of the mode components corresponds to an intrinsic mode function (see LeBoeuf [0027] teaches zero crossing rate, which is a count of how many times the signal changes from positive amplitude to negative amplitude during a given period and which correlates to the "noisiness" of the signal);determining an autocorrelation of each of the mode components (see LeBoeuf [0031] spectral kurtosis, which is a measure of the peaked-ness of the signal, and is calculated as the fourth standard moment of the spectrum; [0032] spectral flatness measure, which quantifies how tone-like a sound is, and is based on the resonant structure and the spiky nature of a tone compared to the flat spectrum of a noise-like sound. Spectral flatness is calculated as the ratio of geometric mean of spectrogram to arithmetic mean of spectrum; interpreted to determine the autocorrelation of each of the mode components); and selecting one of the mode components as the noise component according to the  autocorrelation of the mode components (see LeBoeuf [0036] spectral roll-off, which is the frequency in which 85% of the spectrum energy is contained and used to distinguish between harmonic and noisy sounds).
Regarding claim 7, LeBoeuf in view of Do teaches the model construction method for audio recognition according to claim 1. LeBoeuf further teaches extracting a sound feature from the audio data (see LeBoeuf, [0026] Some of the statistical features extracted from the audio signal include pitch contour, various onsets, stereo/surround spatial features, mid-side diffusion, and inter-channel spectral differences); determining a target segment and a non-target segment in the audio data according to the sound feature (see LeBoeuf, [0060] For example, a SVM classifier might be trained to identify snare drums. Traditionally, the output of a SVM is a binary output regarding the membership in a class of data for the input feature vector (e.g., class 1 would be "snare drum" and class 2 would be "not snare drum"). A probabilistic extension to SVMs may be used, which outputs a probability measure of the signal being a snare drum given the input feature vector (e.g., 85% certainty that the input feature vector is class 1--"snare drum"); snare drum interpreted as target segment); and retaining the target segment, and removing the non-target segment (see LeBoeuf, [0095] Using an embodiment of the presently disclosed invention, one could vastly improve a gate algorithm to use instrument recognition to control the gate--rather than the relatively naive amplitude parameter. For example, a user could allow the gate on their snare drum track to allow "snare drums only" to pass through it--any other detected sounds would not pass; improving a gate algorithm is interpreted as retraining the target segment).
Regarding claim 8, LeBoeuf in view of Do teaches the model construction method for audio recognition according to claim 5. Claim 8 is rejected on similar grounds as claim 7.
Regarding claim 9, LeBoeuf in view of Do teaches the model construction method for audio recognition according to claim 7. LeBoeuf further teaches wherein the target segment is a voice content, the non-target segment is not the voice content, the voice features comprises a short time energy and a zero crossing rate, and the step of extracting the sound feature from the audio data comprises: determining two end points of the target segment in the audio data according to the short time energy and the zero crossing rate of the audio data, wherein the two end points are related to a boundary of the target segment in a time domain (see LeBoeuf, [0052-0056] teaches determining the boundary of the target segment in the time domain based on creating time-stamped event labels).
Regarding claim 11, LeBoeuf in view of Do teaches the model construction method for audio recognition according to claim 1. Do further teaches providing the classification model that is transmitted through a network (see Do, pg. 6, col 1 and Fig 7, depicts the SoundLib stored in audio server); loading the classification model obtained through the network to recognize a voice input (see Do, Fig. 3 and pg. 4 col 1 based on the VAD proposed in [28]); and providing an event notification based on a recognition result of the voice input (see Do, Fig. 5 and pg. 5 col. 2 teaches the labeling and sound event decision).
Claims 3 and 10  are rejected under 35 U.S.C. 103 as being unpatentable over LeBoeuf et. al. US Patent Application Publication 2011/0075851 in view of Do, H.M., et. al. Human-assisted sound event recognition for home service robots. Robot. Biomim. 3, 7 (2016) further in view of Wang. et. al. US Patent Application Publication 2020/0118042.
Regarding claim 3, LeBoeuf in view of Do teaches the model construction method for audio recognition according to claim 2. However, LeBoeuf in view of Do fails to teach receiving an input operation, wherein the input operation corresponds to an option of the inquiry content, and the option is that the audio data belongs to the label or the audio data does not belong to the label; and determining the confirmation response based on the input operation.
However, Wang teaches receiving an input operation, wherein the input operation corresponds to an option of the inquiry content, and the option is that the audio data belongs to the label or the audio data does not belong to the label (see Wang, [0075] Labeling module 320 receives user input labeling the presented data. Such labeling input can be in any suitable format, such as a selection from a displayed list or one or more selections from a set of checkboxes, free-form text input, a voice annotation (such as “yes” or “no”), or provided using another method); and determining the confirmation response based on the input operation (see Wang, [0080] Application 300 uses labeling analysis module 360 and the results of the recommendations of recommendation model 350, as shown by a specific user's subsequent behavior and labeling results, to adapt both accuracy prediction model 340 and recommendation model 350 to this specific user).
LeBoeuf, Do and Wang are considered to be analogous to the claimed invention because they relate to audio recognition algorithm for audio event labeling and processing. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of LeBoeuf and Do on analyzing and classification of the audio signal with the user adapted label teachings of Wang to improve the increase the labeled datasets for machine training ( see Wang, [0002]).
Regarding claim 10, LeBoeuf in view of Do teaches the model construction method for audio recognition according to claim 7.  Do further teaches training the classification model according to a second confirmation response of the second prompt message, wherein the second confirmation response comprises the label corresponding to the target segment (see Do, pg. 5 col 2 teaches sending segments of the sound to the human caregiver who labels using the user interface which is in turn used to train the sound event recognition algorithm interpreted as second confirmation response for the label corresponding to the target segment). However, LeBoeuf in view of Do fails to teach providing a second prompt message according to the target segment, wherein the second prompt message is provided to request the label be assigned to the target segment.
However, Wang, teaches  providing a second prompt message according to the target segment, wherein the second prompt message is provided to request the label be assigned to the target segment (see Wang, [0075] the Labeling module provides a voice annotation feature to respond with yes or no which is interpreted as second prompt message request to label the assigned target segment).


Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Kwon et. al. US Patent Application Publication 2019/0206389 teaches updating of the speech recognition model may include personalizing the speech recognition model for the user by re-training the speech recognition model based on the obtained feedback data (see Kwon, [0007]).
Braho et. al. US Patent Application Publication 2007/0192101 teaches speech recognition model adaptation based on the confidence factors of the expected response (see Braho, Fig. 2)
Any inquiry concerning this communication or earlier communications from the examiner should be directed to NANDINI SUBRAMANI whose telephone number is (571)272-3916. The examiner can normally be reached Monday - Friday 12:00pm - 5:00 pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh M Mehta can be reached on (571)272-7453. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/NANDINI SUBRAMANI/Examiner, Art Unit 2656                                                                                                                                                                                                        

/BHAVESH M MEHTA/Supervisory Patent Examiner, Art Unit 2656