DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
Applicant's arguments filed 6/27/22 have been fully considered but they are not persuasive.
Applicant submits that the trained phrases/sounds of Mozer are abnormal events. The examiner respectfully disagrees. Mozer (¶0023) discloses the security system maintaining “info regarding a group of “known and approved” users within the home, and any voice that does not correspond to one of the known and approved users can be classified as an unknown voice/user.” The known and approved user voices are normal usage in the environment (i.e. at home) while the unknown voice/users are the abnormal events. 
Regarding claims 3 and 13, Applicant requested documentary evidence that superimposing abnormal events on normal audio clips is well-known and conventional. See at least Yamada (US 2002/0055840 A1) which discloses, “In a conventional speech recognition in a noisy environment, noise data are superimposed on speech samples and, by using the noise superimposed speech samples, untrained acoustic models are trained to produce acoustic models for speech recognition, corresponding to the noisy environment” (Yamada, ¶0004).
Applicant’s arguments regarding amended claims 18-19 are moot because the new ground of rejection does not rely on the combination of references applied in the prior rejection of record for any teaching or matter specifically challenged in the argument. The scope of the claims has changed with the amendments.



Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1-4, 8 and 10-13 are rejected under 35 U.S.C. 103 as being unpatentable over Mozer (US 2017/0064262 A1).
As to claim 1, Mozer discloses a method for identifying abnormal sounds in a particular environment, the method comprising: 
capturing a real time normal audio stream during normal usage of the particular environment from a microphone located in the particular environment when no abnormal audio events are present (¶0022, Fig. 2. Audio in home captured in ongoing basis using audio capture devices.); 
using at least part of the real time normal audio stream as a baseline for subsequently processing an incoming audio stream with a processor to determine whether the incoming audio stream from the microphone in the particular environment includes an abnormal audio event for the particular environment (¶0022-0023, Fig. 2. “The home security system 102 can maintain information regarding a group of “known and approved” users within the home, and any voice that does not correspond to one of the known and approved users can be classified as an unknown voice/user.” Abnormal sound detected in normal audio stream.); 
when it is determined that the incoming audio stream includes an abnormal audio event for the particular environment, determine a location of the abnormal audio event in the particular environment (¶0026, Fig. 2. “EAPVT component 110 can trigger (e.g., send a signal to) the video capture devices 106 located in the area where the audio signal was captured to begin video recording (block 208).” Implicit that location of abnormal event is determined if system identifies camera in the area.); 
identifying a video camera with a field of view that includes the location of the abnormal audio event in the particular environment (¶0026, Fig. 2. “EAPVT component 110 can trigger (e.g., send a signal to) the video capture devices 106 located in the area where the audio signal was captured to begin video recording (block 208).”); and 
retrieving and displaying on a display a video stream from the identified video camera (¶0026, Fig. 2. Alert sent to homeowner or third party indicating video surveillance triggered. “This can be useful if the homeowner or third party wishes to view the captured video footage in real-time (via, e.g., a smartphone app, live video monitor, etc.) so that they can observe the situation in the home and take appropriate steps as needed.”).
Mozer does not expressly disclose accessing a database to determine the location.
However, as cited above, Mozer (¶0026) does disclose triggering a video capture device located in the area where the audio signal was captured. It is therefore obvious that the relationship between the locations of the audio capture devices and the locations of the video capture devices is known beforehand. 
Before the effective filing date of the claimed invention, it would have been obvious to a person of ordinary skill in the art that the relationship between locations of audio capture device and video capture devices could be stored in a database. The motivation would have been that a database is well-known and conventional way of storing information.
As to claim 2, Mozer discloses wherein the real time normal audio stream is captured during a training mode (¶0023. “The known and approved users can be enrolled into the system via, e.g., a predetermined training process whereby each user speaks a number of training phrases so that system 102 can build and save a voice model for the user.” System trained on normal audio conditions.).
As to claim 3, Mozer discloses wherein during the training mode, the method further comprises: training an audio classification model to identify abnormal audio events in the particular environment using one or more of the plurality of normal audio clips and the plurality of abnormal audio clips (¶0011 and ¶0024. “If the computer system recognizes a voice, speech phrase, and/or environmental sound that corresponds to a condition previously defined by a user (e.g., the homeowner), the computer system can automatically turn on one or more video cameras, thereby initiating video surveillance of the location. For example, the computer system may recognize the phrase “Fire!” or “Help!”, which may each correspond to a condition defined by the homeowner for triggering video surveillance.”  “Yet another example of such a predefined condition is the recognition of an environmental sound such as glass breaking, an explosion, an alarm, a dog barking, a person screaming, a baby crying, and so on. These various predefined conditions can be configured/enabled by the homeowner via a setup user interface provided by computer system 102.” System trained with abnormal audio clips.).
Mozer does not expressly disclose dividing the real time normal audio stream into a plurality of normal audio clips; and preparing a plurality of abnormal audio clips by superimposing known abnormal audio events onto one or more of the plurality of normal audio clips.
However, training the system with abnormal sounds such as “Fire,” “Help,” glass breaking, baby crying, etc. is disclosed by Mozer (¶0011 and ¶0024) as cited above. Superimposing these sounds over clips of a normal recording is an obvious variant. 
Before the effective filing date of the claimed invention, it would have been obvious to a person of ordinary skill in the art to superimpose the abnormal sounds. The motivation would have been that it is an obvious variant of configuring the system for predefined sounds. 
Per Applicant’s request for documentary evidence that superimposing sounds over clips of normal audio is well-known and conventional, further see Yamada (US 2002/0055840 A1) which discloses, “In a conventional speech recognition in a noisy environment, noise data are superimposed on speech samples and, by using the noise superimposed speech samples, untrained acoustic models are trained to produce acoustic models for speech recognition, corresponding to the noisy environment” (Yamada, ¶0004).
As to claim 4, Mozer discloses wherein the audio classification model is used in determining whether the incoming audio stream from the particular environment during an operation mode includes an abnormal audio event for the particular environment (¶0024, Fig. 2. Determined if predefined condition (abnormal sounds such as “Fire,” glass breaking, crying, etc.) is met.).
As to claim 8, Mozer discloses wherein determining the location of the abnormal audio event in the particular environment is based at least in part on a physical location of the microphone in the particular environment stored in the electronic database (¶0026, Fig. 2. “EAPVT component 110 can trigger (e.g., send a signal to) the video capture devices 106 located in the area where the audio signal was captured to begin video recording (block 208).”).
As to claim 10, Mozer does not expressly disclose wherein the video camera comprises a housing that houses the microphone, and provides the incoming audio stream and the video stream.
However, video cameras with microphones are well-known in the art and would have been obvious to one of ordinary skill in the art before the effective filing date of the invention. The motivation would have been that it would have been obvious to try. Either the microphone and video camera are separate devices or the microphone and video came are the same device.
As to claim 11, Mozer discloses wherein the video camera is housed separately from the microphone, and the microphone is addressed separately from the video camera (¶0016, Fig. 1. “Video capture devices 106 and audio capture devices 108 may be placed at various locations within or outside home 104.”).
As to claim 12, Mozer disclose sending an alert to an operator when it is determined that the incoming audio stream includes an abnormal audio event for the particular environment (¶0026 and Fig. 2. “EAPVT component 110 can also cause home security system 102 to send an alert to the homeowner, or a third party such as a security service provider, indicating that video surveillance has been triggered.”).
As to claim 13, Mozer discloses a method for identifying abnormal sounds in a particular environment, the method comprising: 
entering a training mode (¶0023. “Predetermined training process.”) and while in the training mode: 
capturing real time training audio from a plurality of microphones in the particular environment when no abnormal audio events are present in the particular environment (¶0023. “The known and approved users can be enrolled into the system via, e.g., a predetermined training process whereby each user speaks a number of training phrases so that system 102 can build and save a voice model for the user.“); 
splitting the real time training audio into a plurality of audio clips (¶0023. Implicit that the training phrases are stored and storing audio in clips is well-known and conventional in the art. ¶0039 further discloses a file storage subsystem.); 
training an audio classification model using the normal audio clips and the abnormal audio clips (¶0011 and ¶0024. “If the computer system recognizes a voice, speech phrase, and/or environmental sound that corresponds to a condition previously defined by a user (e.g., the homeowner), the computer system can automatically turn on one or more video cameras, thereby initiating video surveillance of the location. For example, the computer system may recognize the phrase “Fire!” or “Help!”, which may each correspond to a condition defined by the homeowner for triggering video surveillance.”  “Yet another example of such a predefined condition is the recognition of an environmental sound such as glass breaking, an explosion, an alarm, a dog barking, a person screaming, a baby crying, and so on. These various predefined conditions can be configured/enabled by the homeowner via a setup user interface provided by computer system 102.” System trained with abnormal audio clips.); 
entering an operational mode (Fig. 2), and while in the operational mode: 
capturing real time operational audio from each of the plurality of microphones (¶0022, Fig. 2. “EAPVT component 110 can use audio capture devices 108 to listen for audio in various areas of home 104 on an ongoing basis.”); 
splitting the real time operational audio into a plurality of operational audio clips (¶0032. Captured audio stored locally. Storing audio clips is well-known and conventional in the art. ¶0039 further discloses a file storage subsystem.); 
processing the operational audio clips using the audio classification model via a processor to identify one or more abnormal audio signatures in the particular environment (¶0022-0023, Fig. 2. “The home security system 102 can maintain information regarding a group of “known and approved” users within the home, and any voice that does not correspond to one of the known and approved users can be classified as an unknown voice/user.” Abnormal sound detected in normal audio stream.); 
determining a location of one of the abnormal audio signatures in the particular environment (¶0026, Fig. 2. “EAPVT component 110 can trigger (e.g., send a signal to) the video capture devices 106 located in the area where the audio signal was captured to begin video recording (block 208).” Implicit that location of abnormal event is determined if system identifies camera in the area.); and 
retrieving and displaying on a display a video stream from a video camera that has a field of view that includes the location (¶0026, Fig. 2. Alert sent to homeowner or third party indicating video surveillance triggered. “This can be useful if the homeowner or third party wishes to view the captured video footage in real-time (via, e.g., a smartphone app, live video monitor, etc.) so that they can observe the situation in the home and take appropriate steps as needed.”).
Mozer does not expressly disclose saving at least some of the plurality of audio clips as normal audio clips containing normal audio signatures for the particular environment; 
superimposing abnormal audio signatures onto at least some of the plurality of normal audio clips and saving the resulting files as abnormal audio clips containing abnormal audio signatures.
However, training the system with abnormal sounds such as “Fire,” “Help,” glass breaking, baby crying, etc. is disclosed by Mozer (¶0011 and ¶0024) as cited above. Superimposing these sounds over clips of a normal recording is an obvious variant.
Before the effective filing date of the claimed invention, it would have been obvious to a person of ordinary skill in the art to superimpose the abnormal sounds. The motivation would have been that it is an obvious variant of configuring the system for predefined sounds.
Per Applicant’s request for documentary evidence that superimposing sounds over clips of normal audio is well-known and conventional, further see Yamada (US 2002/0055840 A1) which discloses, “In a conventional speech recognition in a noisy environment, noise data are superimposed on speech samples and, by using the noise superimposed speech samples, untrained acoustic models are trained to produce acoustic models for speech recognition, corresponding to the noisy environment” (Yamada, ¶0004).

Claims 5-6, 9 and 16-17 are rejected under 35 U.S.C. 103 as being unpatentable over Mozer, as applied to claims 3, 8 and 13 above, in view of Brav et al. (US 2016/0163168 A1) hereinafter “Brav.”
As to claim 5, Mozer does not expressly disclose wherein the audio classification model is a self- learning model.
Mozer in view of Brav discloses wherein the audio classification model is a self- learning model (Brav, ¶0050. “The audio surveillance nodes automatically update predetermined alert conditions by machine learning.”).
Mozer and Brav  are analogous art because they are from the same field of endeavor with respect to surveillance systems.
Before the effective filing date of the claimed invention, it would have been obvious to a person of ordinary skill in the art to use machine learning, as taught by Brav. The motivation would have been so the system can learn noises are acceptable/unacceptable in different situations, such as different times of day (Brav, ¶0050).
As to claim 6, Mozer in view of Brav discloses wherein the self-learning model uses reinforcement learning and/or transfer learning (¶0050. “The audio surveillance nodes automatically update predetermined alert conditions by machine learning.”).
The motivation is the same as claim 5 above.
	As to claim 9, Mozer discloses wherein determining the location of the abnormal audio event in the particular environment is based at least in part on the physical location of the microphone in the particular environment (¶0026, Fig. 2. “EAPVT component 110 can trigger (e.g., send a signal to) the video capture devices 106 located in the area where the audio signal was captured to begin video recording (block 208).” Implicit that location of microphone known.).
Mozer does not expressly disclose wherein the microphone is a directional microphone with a directional orientation, and wherein determining the location of the abnormal audio event in the particular environment is based at least in part on the directional orientation of the microphone.
	Mozer in view of Brav discloses wherein the microphone is a directional microphone with a directional orientation, and wherein determining the location of the abnormal audio event in the particular environment is based at least in part on the directional orientation of the microphone (Brav, ¶0028. “Microphone 210 may include omnidirectional, bidirectional, and unidirectional characteristics, where the directionality characteristics indicate the direction(s) in which microphone 210 may detect sound.” Implicit that locating the sound is based on the directivity of the microphone as the abnormal sound would be picked up by the microphone based on the microphone directivity.).
Before the effective filing date of the claimed invention, it would have been obvious to a person of ordinary skill in the art to use directional microphones, as taught by Brav. The motivation would have been to better control the areas the user wishes to monitor, such as focusing on a doorway (Brav, ¶0028).
As to claims 16-17, they are rejected under claim 13 using the same motivation as claims 5-6 above.

Claims 7 and 14-15 are rejected under 35 U.S.C. 103 as being unpatentable over Mozer in view of Brav, as applied to claims 3 and 13 above, and further in view of Hwang et al. (US 2017/0154638 A1) hereinafter “Hwang.”
As to claim 7, Mozer in view of Brav discloses presenting one or more determined abnormal audio event to an operator (Mozer, ¶0026 and ¶0033, Fig. 2. “EAPVT component 110 can also cause home security system 102 to send an alert to the homeowner, or a third party such as a security service provider.” “Home security system 102 can be configured with an option to let the homeowner screen the audio in question before beginning video surveillance.”).
Mozer in view of Brav does not expressly disclose receiving a classification from the operator that the determined abnormal audio event is indeed an abnormal audio event or should be considered a normal audio event for the particular environment; and 
updating the audio classification model based on the classification received from the operator.
Mozer in view of Brav as modified by Hwang discloses receiving a classification from the operator that the determined abnormal audio event is indeed an abnormal audio event or should be considered a normal audio event for the particular environment (Hwang, ¶0034 and Fig. 2. “The user 210 may play or reproduce the input sound transmitted from the electronic device 110 associated with the audio event by pressing an icon 240. If the user does not wish to receive a notification for the audio event occurring at the indicated location (i.e., screaming sound at the front door), the user may press an icon 250. In response, the electronic device 110 may not generate a notification for the audio event “screaming sound” occurring at the front door.” Pressing 240 confirms abnormal sound while pressing 250 indicates that the sound isn’t abnormal.); and
updating the audio classification model based on the classification received from the operator (Hwang, ¶0034 and Fig. 2. “In response, the electronic device 110 may not generate a notification for the audio event “screaming sound” occurring at the front door.” Brav, ¶0050, further discloses updating the model based on lack of response from the user.).
Mozer, Brav and Hwang are analogous art because they are from the same field of endeavor with respect to audio event detection.
Before the effective filing date of the claimed invention, it would have been obvious to a person of ordinary skill in the art to receive user input regarding the abnormal sound, as taught by Hwang. The motivation would have been to allow for more accurate user notifications in the future (Hwang, ¶0034).
As to claim 14, Mozer in view of Brav does not expressly disclose receiving a user input confirming or rejecting the identification of the abnormal audio signatures.
Mozer in view of Brav as modified by Hwang discloses receiving a user input confirming or rejecting the identification of the abnormal audio signatures (Hwang, ¶0034 and Fig. 2. “The user 210 may play or reproduce the input sound transmitted from the electronic device 110 associated with the audio event by pressing an icon 240. If the user does not wish to receive a notification for the audio event occurring at the indicated location (i.e., screaming sound at the front door), the user may press an icon 250. In response, the electronic device 110 may not generate a notification for the audio event “screaming sound” occurring at the front door.” Pressing 240 confirms abnormal sound while pressing 250 indicates that the sound isn’t abnormal.).
The motivation is the same as claim 7 above.
As to claim 15, Mozer in view of Brav does not expressly disclose wherein the audio classification model is updated based on the user input.
Mozer in view of Brav as modified by Hwang discloses wherein the audio classification model is updated based on the user input (Hwang, ¶0034 and Fig. 2. “In response, the electronic device 110 may not generate a notification for the audio event “screaming sound” occurring at the front door.” Brav, ¶0050, further discloses updating the model based on lack of response from the user.).
The motivation is the same as claim 7 above.

Claims 18-19 are rejected under 35 U.S.C. 103 as being unpatentable over Brav in view of DiPoala (US 2010/0128446 A1). 
As to claim 18, Brav discloses an audio-video system for capturing audio and video of a particular environment (Figs. 1a and 2b), the audio-video camera comprising: 
a camera for providing a video stream (¶0034, Fig. 2b. Camera 216 for capturing video.); 
two or more directional microphones each configured to receive sound from a primary audio direction (¶0028, Fig. 2b. “Microphone 210 may comprise an array of microphone elements, such as a beamforming array or a directional microphone array.”); and 
a controller, the controller operatively coupled to the camera and each of the one or more directional microphones (Fig. 2b. Control unit 201 coupled to camera 216 and microphone 210.), 
the controller configured to provide an audio and video output to a remote device (¶0039. “Monitoring device 113 may be configured to receive alerts from audio surveillance nodes. For example, upon the plurality of audio surveillance nodes detecting a sound of a certain classification, monitoring device 113 may receive an alert message indicating that human intervention is necessary. The alert message may include a recording of the detected sound, an image associated with the detected sound, a predetermined alert image, a predetermined alert sound, etc.”).
	Brav does not expressly disclose an audio-video camera with a housing,
wherein the primary audio direction for each of the two or more directional microphones is orientated in a different direction such that an approximate direction of a sound event emanating from the particular environment can be determined, and 
wherein the primary directions of the two or more directional microphones are orientated to have a uniform angular spacing.
	Brav in view of DiPoala discloses an audio-video camera with a housing (DiPoala, ¶0009 and ¶0031, Fig. 1. Audio-video security system with modular housing),
wherein the primary audio direction for each of the two or more directional microphones is orientated in a different direction such that an approximate direction of a sound event emanating from the particular environment can be determined (DiPoala, ¶0031, Fig. 1. Microphones 38 oriented in different directions for location identification.), and 
wherein the primary directions of the two or more directional microphones are orientated to have a uniform angular spacing (DiPoala, ¶0031, Fig. 1. Microphones 38 uniformly spaced around the microphone ring 36a.).
Brav and DiPoala are analogous art because they are from the same field of endeavor with respect to surveillance systems.
Before the effective filing date of the claimed invention, it would have been obvious to a person of ordinary skill in the art to uniformly space the microphones in different directions, as taught by DiPoala. The motivation would have been for improved location identification (DiPoala, ¶0031).
	As to claim 19, Brav in view of DiPoala discloses wherein the camera is a Pan Tilt Zoom (PTZ) camera having a field of view (Brav, ¶0034-0035, Fig. 2b. “Camera 216 may be configured to move in various directions, for example, to pan left and right, tilt up and down, or zoom in and out on a particular target.”), and wherein the controller is configured to: 
determine the approximate direction of the sound event emanating from the particular environment using the one or more directional microphones (Brav, ¶0027-0028, Fig. 2b. Control unit 201 receives input from microphone 210 and determines location of the detected sound.); and 
control the field of view of the PTZ camera to face the determined approximate direction of the sound event in order to capture a video stream of a source of the sound event (Brav, ¶0034-0035, Fig. 2b. “Upon determining the location of detected sound, control unit 201 may position camera 216 to capture an image of the source location of the detected sound. In one embodiment, control unit 201 may use camera 216 to zoom in on the source location of the detected sound when appropriate.”).

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to JAMES K MOONEY whose telephone number is (571)272-2412. The examiner can normally be reached Monday-Thursday, 8:30-6:30 EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Vivian Chin can be reached on (571) 272-7848. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/JAMES K MOONEY/Primary Examiner, Art Unit 2654