DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . 

This Office Action is sent in response to Applicant’s communication received 6/28/2022 for application number 16/672,633.

Claims 1, 4, 6-8, 11, 13-15, 18 and 20 are pending.  Claims 1, 8 and 15 are independent claims.  Claims 2-3, 5, 9-10, 12, 16-17 and 19 have been cancelled.


Response to Arguments
Applicants’ arguments to independent claims 1, 8 and 15 have been fully considered, but are moot because the claims were newly added by the applicant to include new features that were never previously presented. Therefore, the scope of claims 1, 8 and 15, and their dependent claims has changed. However, Examiner asserts that the newly added feature was taught by newly found prior art. See the rejection presented below.

Claim Rejections - 35 USC § 103
  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.


The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claim(s) 1, 4-8, 11-15 and 18-20 are rejected under 35 U.S.C. 103 as being unpatentable over Torok et al. (US Patent 9,324,322; hereinafter Torok) in view of Gutkin et al. (US Patent Application 2016/0093295; hereinafter Gutkin), further in view of Gruenstein et al. (US Patent Application 2018/0130469; hereinafter Gruenstein).

As to independent claim 1, Torok discloses a computer-implemented method comprising:
receiving, by one or more processors, a first external audio external to a media player that a user is listening to [Col 7, line 65 – Col 8, line 3 - FIG. 4 is a flow chart illustrating an example algorithm for implementing the steps shown in FIG. 1 on the ASR device 200 in FIG. 2. At the start 410, the system generates an output 420 corresponding to operation of a device by a user and receives audio input 422 corresponding to the audio of the user's present environment.  Although shown sequentially these activities may be performed in any order or at the same time, and at least the receiving of audio input 422 may be continuous.];
detecting, by one or more processors, a first interruption action from the user of the media player [Col 12, lines 39-46 - the speaker recognition engine 692 may be programmed to trigger an event in response to specific known voices];
automatically buffering, by one or more processors, the first external audio [Col 10, lines 1-13 - if the AFE 250 processes both the speech and noise components together, providing a single processed audio feed as output, that processed audio is stored in the FIFO buffer 646. The duration of processed audio held in the buffer 646 is preferably at least a few seconds, as this data will later be used to create new acoustic fingerprints/models];
generating, by one or more processors, a first acoustic fingerprint, based on the first external audio, which identifies the first external audio [Col 12, lines 52-60 - If a voice or other user command is received to store a new audio interruption (742 "Yes"), then a new fingerprint/model is created and added (748) to fingerprint storage (244). FIG. 9 expands on how a new fingerprint/model is created. The audio interruption data currently stored (730) in the FIFO buffer 646 is automatically analyzed to isolate the acoustic signature of the audio interruption (950) and determine whether an acoustically distinct audio interruption occurred (952) immediately prior to the voice command]; and
saving, by the one or more processors, the first acoustic fingerprint to a database [Col 12, lines 52-56 - If a voice or other user command is received to store a new audio interruption (742 "Yes"), then a new fingerprint/model is created and added (748) to fingerprint storage];
receiving, by one or more processors, a second external audio [Fig. 4, Col 8, lines 7-11 – The audio input may originate with the audio capture device 212, or as noted earlier, may be received from somewhere else such as a peripheral or a network connection via input/output device interfaces 202 - Examiner notes that Fig. 4 and 7 show a continuous loop where the process determines whether the audio interruption is new by matching with stored acoustic fingerprints];
determining, by one or more processors, that the second external audio matches the first acoustic fingerprint [Col 4, lines 55-67 - The acoustic fingerprinting engine 242 of the classifier system 252 compares a non-speech audio interruption included in the audio data with stored acoustic fingerprints or models. When a match is found, the acoustic fingerprinting engine 242 may trigger a predefined interrupt of controller/processor 204; Col 10, lines 18-24 - If the acoustic fingerprint engine 242 determines (434 "Yes") that an audio interruption matches an acoustic fingerprint/model stored in the fingerprint storage 244, any of several modification actions may be undertaken depending upon the action or actions associated with the acoustic fingerprint/model, a context-based rule set, and/or user preferences (460)];
responsive to the second external audio matching the first acoustic fingerprint, interrupting, by one or more processors, the media player [Col 10, lines 46-60 - the audio output volume may be attenuated or increased, or audio output may be muted or paused. Which modification is made may be based upon a user setting, may be uniform for all events, or may be event dependent]:
saving, by one or more processors, the second external audio to the database [Col 13, lines 28-35 - If a distinctive audio interruption is isolated (952 "Yes"), the corresponding noise data (e.g., noise feature vectors) may be used to generate (954) an acoustic fingerprint/model, which is then stored (956) in fingerprint storage 244];
updating, by one or more processors, the first acoustic fingerprint based on the second external audio [Col 13, lines 38-56 - Along with storing the new signature, an event may be triggered modifying the output (460), including, among other things, suspending notifications, pausing, muting, loudening or attenuating the output. Whether the output is loudened, attenuated, muted or paused might depend upon the voice command used to trigger (742) the storage (956) of the new noise.  For example, if a user says "pause" upon hearing a new noise, the device 600 will store (956) the acoustic fingerprint of the distinctive noise (if any) that the user heard before saying "pause," pause output, and note in fingerprint storage 244 that the "pause" action should be taken if the fingerprint/model is detected again in the future]; 
receiving, by one or more processors, a third external audio [Fig. 4, Col 8, lines 7-11 – The audio input may originate with the audio capture device 212, or as noted earlier, may be received from somewhere else such as a peripheral or a network connection via input/output device interfaces 202 - Examiner notes that Fig. 4 and 7 show a loop where the process determines whether the audio interruption is new by matching with stored acoustic fingerprints];
determining, by one or more processors, that the third external audio does not match the first acoustic fingerprint [Col 11, lines 54-65 - if speaker recognition does not trigger an event (772 “No”), speech recognition 740 is performed on the processed audio signal. An added speech command in this expanded algorithm may store new noises as event-triggering audio interruptions];
detecting, by one or more processors, a second interruption action from the user who is listening to a media player and reacts to the third external audio [Col 13, lines 38-56 - an event may be triggered modifying the output (460), including, among other things, suspending notifications, pausing, muting, loudening or attenuating the output. Whether the output is loudened, attenuated, muted or paused might depend upon the voice command used to trigger (742) the storage (956) of the new noise.  For example, if a user says "pause" upon hearing a new noise, the device 600 will store (956) the acoustic fingerprint of the distinctive noise (if any) that the user heard before saying "pause," pause output, and note in fingerprint storage 244 that the "pause" action should be taken if the fingerprint/model is detected again in the future – Examiner notes that the system detects user’s different outputs such as loudening, muting and attenuating upon listening to a new noise];
generating, by one or more processors, a second acoustic fingerprint, based on the third external audio, which identifies the third external audio [Col 11, lines 54-65 - If a voice or other user command is received to store a new audio interruption (742 “Yes”), then a new fingerprint/model is created and added (748) to fingerprint storage (244). FIG. 9 expands on how a new fingerprint/model is created. The audio interruption data currently stored (730) in the FIFO buffer 646 is automatically analyzed to isolate the acoustic signature of the audio interruption (950) and determine whether an acoustically distinct audio interruption occurred (952) immediately prior to the voice command]; and
saving, by the one or more processors, the second acoustic fingerprint to the database [Col 13, lines 28-35 - If a distinctive audio interruption is isolated (952 "Yes"), the corresponding noise data (e.g., noise feature vectors) may be used to generate (954) an acoustic fingerprint/model, which is then stored (956) in fingerprint storage 244].
Although Torok teaches a continuous loop where the process determines whether the audio interruption is new by matching with stored acoustic fingerprints as illustrated in Fig. 4 and 7, Torok does not appear to teach explicitly:
updating, by one or more processors, a list of acoustic fingerprints including the first acoustic fingerprint in the database.
However, Gutkin teaches in the same field of endeavor [Abs - Gutkin teaches a learning system that updates fingerprints in the database as it continuously trains the model]:
updating, by one or more processors, a list of acoustic fingerprints including the first acoustic fingerprint in the database [Para 0035 - The system may further include a database of updated recorded training utterances 160 that stores updated recorded utterances of speech segmented into acoustic units, a fingerprint similarity engine 170 that compares acoustic fingerprints of differing acoustic units and identifies similar acoustic fingerprints, a prior probability assigner 180 that assigns an existing probability estimate of occurrence of an acoustic unit in a corpus of text to the updated acoustic units, and a database 190 that stores the updated acoustic unit, its identified similar acoustic fingerprint and assigned probability of occurrence in data triples].
It would have been obvious to one of ordinary skill in art, having the teachings of Torok and Gutkin at the time of filing, to modify a method for automatic volume attenuation for speech enabled devices disclosed by Torok to include the concept of statistical unit selection language models based on acoustic fingerprinting disclosed by Gutkin to obtain a database of recorded training utterances and output a probabilistic language model that can be efficiently encoded in a finite state transducer framework to enable speech synthesis [Gutkin, Para 0003].
One of the ordinary skill in the art wanted to be motivated to include the concept of statistical unit selection language models based on acoustic fingerprinting disclosed by Gutkin to obtain a database of recorded training utterances and output a probabilistic language model that can be efficiently encoded in a finite state transducer framework to enable speech synthesis [Gutkin, Para 0003].
Torok and Gutkin does not appear to teach: 
Wherein saving the second external audio includes:
Automatically buffering the second external audio when the receiving the second external audio,
Saving the buffered second external audio,
Recording the second external audio with additional time until the user resumes the media player, and
Saving both the buffered and recorded second external audio to the database;
However Gruenstein teaches in the same field of endeavor:
Wherein saving the second external audio includes:
Automatically buffering the second external audio when the receiving the second external audio [Para 0022 - The computing device 106 stores the audio data 104 in an audio buffer 124],
Saving the buffered second external audio  [Para 0022 - The computing device 106 stores the audio data 104 in an audio buffer 124],,
Recording the second external audio with additional time until the user resumes the media player [Para 0032 - Upon detecting the hotword 105, the computing device 106 may send the audio data 104 with the hotword 105 and any additional audio data following or preceding or both following and preceding the hotword 105 to a server 116. Once the server 116 receives the required amount of audio data 104, the server 116 may perform the audio fingerprinting processes to determine if the audio data 104 is a part of pre-recorded media], and
Saving both the buffered and recorded second external audio to the database [Para 0035 - The computing device 106 might store a predetermined amount of audio data 104 in the audio buffer 124. Once the hotword 105 is detected, the audio fingerprinter 118 might process the audio data 104 including the hotword 105 and the data in the audio buffer 124. For example, the computing device 106 might store 10 seconds of audio in the audio buffer 124 at all times. If a television commercial is playing on a television, when the hotword triggers, the computing device 124 might fingerprint the 10 seconds of audio data 104 from the audio buffer 124];
It would have been obvious to one of ordinary skill in art, having the teachings of Torok, Gutkin and Gruenstein at the time of filing, to modify a method for automatic volume attenuation for speech enabled devices disclosed by Torok and statistical unit selection language models based on acoustic fingerprinting disclosed by Gutkin to include the concept of recorded media hotword trigger suppression disclosed by Gruenstein to enable users to orally query the system from essentially anywhere in the environment without the need to have a computer or other device in front of him/her or even nearby [Gruenstein, Para 0003].
One of the ordinary skill in the art wanted to be motivated to include the concept of recorded media hotword trigger suppression disclosed by Gruenstein to enable users to orally query the system from essentially anywhere in the environment without the need to have a computer or other device in front of him/her or even nearby [Gruenstein, Para 0003].

As to dependent claim 4, Torok, Gutkin and Gruenstein disclose the computer-implemented method of claim 2.
Torok further teaches: further comprising:
responsive to the second external audio matching the first acoustic fingerprint, providing, by one or more processors, a notification of the second external audio to the user [Col 14, lines 42 – 45 - the device 600 may provide the user feedback when instructed to store a new noise, such as an affirmative beep or negative beep to indicate that a distinctive sound was or was not successfully stored].

As to dependent claim 6, Torok, Gutkin and Gruenstein disclose the computer-implemented method of claim 2.
Torok further teaches:  wherein the first interruption action is selected from the group consisting of: muting the media player, pausing the media player, and removal of a listening device [Col 10, lines 46-60 - the audio output volume may be attenuated or increased, or audio output may be muted or paused. Which modification is made may be based upon a user setting, may be uniform for all events, or may be event dependent].

As to dependent claim 7, Torok, Gutkin and Gruenstein disclose the computer-implemented method of claim 2.
Torok further teaches:  wherein generating the first acoustic fingerprint comprises saving, by one or more processors, the first external audio to the database [Col 12, lines 52-56 - If a voice or other user command is received to store a new audio interruption (742 "Yes"), then a new fingerprint/model is created and added (748) to fingerprint storage].

As to independent claims 8 and 15, the claims are substantially similar to claim 1 and are rejected on the same ground.  

As to dependent claim 9, the claim is substantially similar to claim 2 and is rejected on the same ground.  

As to dependent claims 10, 16 and 17, the claims are substantially similar to claim 3 and are rejected on the same ground.  

As to dependent claims 11 and 18, the claims are substantially similar to claim 4 and are rejected on the same ground.  

As to dependent claims 13 and 20, the claims are substantially similar to claim 6 and are rejected on the same ground.  

As to dependent claim 14, the claim is substantially similar to claim 7 and is rejected on the same ground.  

Conclusion

The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.

Yu et al.  (US PGPUB 2016/0381436) teaches a method for automatically recognizing media contents comprise steps of capturing media content from the Internet and/or devices, extracting fingerprints from captured contents and transferring to the backend servers for identification, and backend servers processing the fingerprints and replying with identified result.

Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to SANG H KIM whose telephone number is (571)270-5285.  The examiner can normally be reached on M-F 9am-6pm.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kavita Stanley can be reached on (571) 272-8352.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/SANG H KIM/Primary Examiner, Art Unit 2176