Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .


Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claim(s) 1-4 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Colangelo et al. (Enhancing audio surveillance with hierarchical recurrent neural networks, hereinafter “Colangelo”).
In regard to claim 1, Colangelo discloses a system for event detection and classification, the system comprising:
a processor circuit (CPU, section 4.3); and 
a processor-readable media comprising instructions (RAM, section 4.3) that, when performed by the processor circuit, configure the processor circuit to:
receive audio information about an audio event (see Fig. 1, an audio signal s is received, section 3); 
identify a multi-dimensional spectrogram of the audio information as-received (the audio signal s is partitioned into frames, converted to the frequency domain, and grouped into sets of Tf frames, resulting in an M by Tf signal x, section 3); and 
apply information about the spectrogram at an input to a multiple- stage machine learning algorithm and provide, from the machine learning algorithm, identification of the audio event as a particular event (two classifiers Cl1 and Cl2 are fed by x to identify an event class label yn, section 3).

In regard to claim 2, Colangelo discloses the instructions to configure the processor circuit to apply information about the spectrogram at the input to the multiple-stage machine learning algorithm include instructions to configure the processor circuit to: 
(1) at a first stage of the algorithm, analyze the information about the spectrogram and coarsely classify the audio event as corresponding to a particular event type (classifier Cl1 detects the presence or absence of an event, section 3), and 
(2) at a different second stage of the algorithm, identify the audio event as a particular event within the classification of the particular event type (classifier Cl2 then labels the event class if a relevant event is determined to be present, section 3).

In regard to claim 3, Colangelo discloses the instructions to identify the audio event as a particular event within the classification of the particular event type include instructions to use the different second stage of the algorithm to identify the audio event as indicating one or more of a glass break event, a gun shot event, a dog bark event, a security alarm event, a fire alarm event, a smoke alarm event, a water alarm event, or a human voice, shout, or cry event (exemplary events include glass breaking, gunshots and screams, section 4.1).

In regard to claim 4, Colangelo discloses the multiple-stage machine learning algorithm comprises a neural network-based deep learning algorithm (LSTM neural cells, section 3 and section 4.2).


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 5-6 is/are rejected under 35 U.S.C. 103 as being unpatentable over Colangelo, in view of Gunderson et al. (U.S. Patent No. 9,830,932, hereinafter “Gunderson”).
In regard to claim 5, Colangelo discloses the processor-readable media further comprises reference data for the neural network-based deep learning algorithm, wherein the reference data includes positive target samples (the classifiers are trained with examples of the events, section 4.2).
Colangelo does not expressly disclose training data comprising hard negatives, wherein the hard negatives comprise training data that is based on false alarms.
Gunderson discloses a system for audio event detection and classification, comprising processor-readable media comprising training data comprising hard negatives, wherein the hard negatives comprise training data that is based on false alarms (a neural network for classifying sound events is trained with false alarms as negative examples, column 7, lines 37-63).
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the claimed invention to include training data comprising hard negatives, because it would allow the neural network to differentiate between the true target sound events and similar false alarm sound events, as suggested by Gunderson (column 2, line 59 to column 3, line 4).

In regard to claim 6, Colangelo discloses the processor-readable media further comprises reference data for the neural network-based deep learning algorithm, wherein the reference data includes positive target samples (the classifiers are trained with examples of the events, section 4.2).
Colangelo does not expressly disclose training data comprising hard negatives, wherein the hard negatives comprise training data that is based on false alarms.
Gunderson discloses a system for audio event detection and classification, comprising processor-readable media comprising training data comprising hard negatives, wherein the hard negatives comprise training data that is based on false alarms (a neural network for classifying sound events is trained with false alarms as negative examples, column 7, lines 37-63).
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the claimed invention to include training data comprising hard negatives, because it would allow the neural network to differentiate between the true target sound events and similar false alarm sound events, as suggested by Gunderson (column 2, line 59 to column 3, line 4).



Claim(s) 7-13, 15-18, and 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Colangelo, in view of Nongpiur et al. (U.S. Patent Application Pub. No. 2016/0379456, hereinafter “Nongpiur”).
In regard to claim 7, Colangelo does not expressly disclose a microphone configured to receive acoustic information from an environment and provide the audio information to the processor circuit.
Nongpiur discloses a practical implementation of a system for event detection and classification, comprising a microphone configured to receive acoustic information from an environment and provide the audio information to the processor circuit (in a smart home event detector, a microphone receives audio information from the environment, paragraph [0047]).
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the claimed invention to include a microphone to receive acoustic information, because it would allow a user to monitor their home environment and be notified when non-normal sound events occurred, as suggested by Nongpiur (paragraph [0002]).

In regard to claim 8, Colangelo does not disclose the processor circuit, the microphone, and the processor-readable media comprise a portion of a smart speaker or camera device.
Nongpiur discloses a practical implementation of a system for event detection and classification, wherein the processor circuit, the microphone, and the processor-readable media comprise a portion of a smart speaker or camera device (smart entry detectors comprising a microphone and camera, paragraph [0073]).
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the claimed invention to include the processor circuit, the microphone, and the processor-readable media as a portion of a smart speaker or camera device, because it would provide a home security system capable of detecting forced entry into a user’s home, as suggested by Nongpiur (paragraph [0073]).

In regard to claim 9, Colangelo discloses the processor circuit, the microphone, and the processor-readable media comprise a portion of a security system (surveillance security system, section 1).

In regard to claim 10, Colangelo does not disclose one or more environment sensors coupled to the security system, and wherein the processor circuit is configured to provide the identification of the audio event as the particular event based in part on information from the one or more environment sensors.
Nongpiur discloses one or more environment sensors coupled to the security system, and wherein the processor circuit is configured to provide the identification of the audio event as the particular event based in part on information from the one or more environment sensors (sensors in a smart-home environment, paragraph [0049]).
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the claimed invention to include one or more environment sensors coupled to the security system, because correlation between the audio and additional sensors helps to train the system to differentiate between security events and typical household activities, as suggested by Nongpiur (paragraph [0047]).

In regard to claim 11, Colangelo does not disclose the one or more environment sensors comprise at least one of a window sensor, door sensor, lock sensor, motion sensor, image sensor, acceleration sensor, position sensor, temperature sensor, humidity sensor, pressure sensor, proximity sensor, or gas sensor.
Nongpiur discloses the one or more environment sensors comprise at least one of a window sensor, door sensor, lock sensor, motion sensor, image sensor, acceleration sensor, position sensor, temperature sensor, humidity sensor, pressure sensor, proximity sensor, or gas sensor (motion, camera, acceleration, location, temperature, proximity, etc., paragraph [0049]).
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the claimed invention to include at least one of a window sensor, door sensor, lock sensor, motion sensor, image sensor, acceleration sensor, position sensor, temperature sensor, humidity sensor, pressure sensor, proximity sensor, or gas sensor, because correlation between the audio and additional sensors helps to train the system to differentiate between security events and typical household activities, as suggested by Nongpiur (paragraph [0047]).

In regard to claim 12, Colangelo discloses a security system for event detection and classification, the security system comprising: 
a processor circuit (CPU, section 4.3) configured to: 
receive an audio signal, and the audio signal comprising acoustic information about an audio event (see Fig. 1, an audio signal s is received, section 3); 
identify a multi-dimensional spectrogram from a portion of the audio signal corresponding to the audio event (the audio signal s is partitioned into frames, converted to the frequency domain, and grouped into sets of Tf frames, resulting in an M by Tf signal x, section 3); and 
apply information about the spectrogram at an input to a multiple- stage machine learning algorithm (two classifiers Cl1 and Cl2 are fed by x to identify an event class label yn, section 3) and (1) at a first stage of the algorithm, analyze the information about the spectrogram and coarsely classify the audio event as corresponding to a particular event type (classifier Cl1 detects the presence or absence of an event, section 3), and (2) at a different second stage of the algorithm, identify the audio event as a particular event within the classification of the particular event type (classifier Cl2 then labels the event class if a relevant event is determined to be present, section 3).
Colangelo does not expressly disclose a microphone, the microphone configured to monitor acoustic information in an environment protected by the security system.
Nongpiur discloses a practical implementation of a system for event detection and classification, comprising a microphone, the microphone configured to monitor acoustic information in an environment protected by the security system (in a smart home event detector, a microphone receives audio information from the environment, paragraph [0047]).
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the claimed invention to include a microphone to receive acoustic information, because it would allow a user to monitor their home environment and be notified when non-normal sound events occurred, as suggested by Nongpiur (paragraph [0002]).

In regard to claim 13, Colangelo does not disclose one or more environment sensors coupled to the processor circuit, and wherein the processor circuit is configured to identify the audio event as the particular event based in part on information from the one or more environment sensors.
Nongpiur discloses one or more environment sensors coupled to the processor circuit, and wherein the processor circuit is configured to identify the audio event as the particular event based in part on information from the one or more environment sensors (sensors in a smart-home environment, paragraph [0049]).
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the claimed invention to include one or more environment sensors coupled to the processor circuit, because correlation between the audio and additional sensors helps to train the system to differentiate between security events and typical household activities, as suggested by Nongpiur (paragraph [0047]).

In regard to claim 15, Colangelo discloses a method for using artificial intelligence-based processing to classify audio information, the method comprising:
receiving audio information about an audio event (see Fig. 1, an audio signal s is received, section 3); 
identifying a multi-dimensional spectrogram from a portion of the audio signal as-received (the audio signal s is partitioned into frames, converted to the frequency domain, and grouped into sets of Tf frames, resulting in an M by Tf signal x, section 3); and 
receiving the spectrogram at an input to a multiple- stage machine learning algorithm (two classifiers Cl1 and Cl2 are fed by x to identify an event class label yn, section 3) and (1) at a first stage of the algorithm, analyzing the information about the spectrogram and coarsely classifying the audio event as corresponding to a particular event type (classifier Cl1 detects the presence or absence of an event, section 3), and (2) at a different second stage of the algorithm, identifying the audio event as a particular event within the classification of the particular event type (classifier Cl2 then labels the event class if a relevant event is determined to be present, section 3).
Colangelo does not expressly disclose a microphone.
Nongpiur discloses a practical implementation of a system for event detection and classification, comprising a microphone (in a smart home event detector, a microphone receives audio information from the environment, paragraph [0047]).
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the claimed invention to include a microphone to receive acoustic information, because it would allow a user to monitor their home environment and be notified when non-normal sound events occurred, as suggested by Nongpiur (paragraph [0002]).

In regard to claim 16, Colangelo discloses at the first stage of the algorithm, analyzing the multi-dimensional spectrogram includes classifying the audio event as corresponding to a normal event type or an abnormal event type (classifier Cl1 detects the presence or absence of an event, section 3), and at the different second stage of the algorithm, identifying the audio event includes identifying the audio event as a particular event within the classification of the event type (classifier Cl2 then labels the event class if a relevant event is determined to be present, section 3).

In regard to claim 17, Colangelo does not disclose receiving environment security information from a sensor other than the microphone and applying the environment security information as an input to the first or different second stage of the applied machine learning algorithm to thereby influence a result of the particular event identification.
Nongpiur discloses receiving environment security information from a sensor other than the microphone and applying the environment security information as an input to the first or different second stage of the applied machine learning algorithm to thereby influence a result of the particular event identification (sensors in a smart-home environment, paragraph [0049]).
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the claimed invention to receive environment security information from a sensor other than the microphone and applying the environment security information as an input to the first or different second stage of the applied machine learning algorithm to thereby influence a result of the particular event identification, because correlation between the audio and additional sensors helps to train the system to differentiate between security events and typical household activities, as suggested by Nongpiur (paragraph [0047]).

In regard to claim 18, Colangelo discloses generating an alert for communication to a security system administrator when the event type or the particular event corresponds to a potential security breach (alerts provided to security operators, section 1).

In regard to claim 20, Colangelo discloses using at least a portion of the same spectrogram information as respective inputs to the first and different second stages of the applied machine learning algorithm (both classifiers are fed input x, section 3).


Claim(s) 14 and 19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Colangelo, in view of Nongpiur, and further in view of Gunderson.
In regard to claim 14, Colangelo discloses a memory circuit that includes a reference data set for the first or different second stages of the algorithm, wherein the reference data set includes positive target samples (the classifiers are trained with examples of the events, section 4.2).
Colangelo and Nongpiur do not disclose the reference data set includes hard negatives, wherein the hard negatives comprise training data that is based on false alarms.
Gunderson discloses a reference data set that includes hard negatives, wherein the hard negatives comprise training data that is based on false alarms (a neural network for classifying sound events is trained with false alarms as negative examples, column 7, lines 37-63).
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the claimed invention to include hard negatives comprising training data that is based on false alarms, because it would allow the neural network to differentiate between the true target sound events and similar false alarm sound events, as suggested by Gunderson (column 2, line 59 to column 3, line 4).

In regard to claim 19, Colangelo and Nongpiur do not disclose training the applied machine learning algorithm using hard negatives by selecting, as training data, results for which the algorithm provides a false alarm.	Gunderson discloses training the applied machine learning algorithm using hard negatives by selecting, as training data, results for which the algorithm provides a false alarm (a neural network for classifying sound events is trained with false alarms as negative examples, column 7, lines 37-63).
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the claimed invention to train the applied machine learning algorithm use hard negatives by selecting, as training data, results for which the algorithm provided a false alarm, because it would allow the neural network to differentiate between the true target sound events and similar false alarm sound events, as suggested by Gunderson (column 2, line 59 to column 3, line 4).



Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Horling discloses an additional sound event monitoring system that first determines whether a sound is likely to be a sound event, and if so, subsequently identifies the sound event.  Dennis et al. disclose the use of spectrogram image features for sound event classification.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to BRIAN LOUIS ALBERTALLI whose telephone number is (571)272-7616. The examiner can normally be reached Mon-Thurs 9AM-3PM (Part time).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh Mehta can be reached on 571-272-7453. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





BLA 12/5/22
/BRIAN L ALBERTALLI/               Primary Examiner, Art Unit 2656