Nd#24332sNotice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
The submitted information disclosure statement (IDS) complies with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.
Claim Rejections - 35 USC § 102
3.	In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claims 1-10, 12-19 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Thomsen (EP 3407348).
As per claim 1, Thomsen teaches a method implemented by one or more processors (Fig. 2), the method comprising: 
detecting, via one or more microphones of an assistant device located in an ecosystem that includes a plurality of assistant devices, audio data that captures an acoustic event ((figure 3 with paragraph [0033], speech recognition engine 204 in a local VRD 104 locally detects a speech event") together with figure 1 and paragraphs [0010] ("VRD network 102 includes multiple VRDs 104(1 )-(N)") and [0011] ("A VRD 104 includes at least one microphone for capturing audio commands")); 
processing, using an event detection model that is stored locally at the assistant device, the audio data that captures the acoustic event to generate a measure associated with the acoustic event ([0034], [0018], processing the audio data and generating input quality metrics associated with the recognized speech based on a measure of energy of the received audio signal and the background noise in the audio signal, output quality metrics associated with the recognized speech based on a confidence that the recognized speech is accurate and/or correct relative to the spoken words, and timing information associated with the recognized speech); 
detecting, via one or more additional microphones of an additional assistant device located in the ecosystem, additional audio data that also captures the acoustic event, the additional assistant device being in addition to the assistant device, and the additional assistant device being co-located in the ecosystem with the assistant device ([0035]- [0038], detecting additional audio data by additional assistance devices, i.e. external VRD 104 in view of any of paragraphs [0014] ("At a given time, the audio signals 108 associated with a speech event are incident on the microphones included in several VRDs 104.") or [0020] ("other VRDs 104 in the VRD network 102 (referred to herein as "external VRDs 104") that detected the same speech event"))); 
processing, using an additional event detection model that is stored locally at the additional assistant device, the additional audio data that captures the acoustic event to generate an additional measure associated with the acoustic event ( [0036], generating a second set of characteristics associated with the same speech event that was detected by an external VRD 104, wherein it is apparent from figures 1 and 2 together with paragraphs [0015] ("a given VRD 104 [ ... ] includes[ ... ] a speech recognition engine 204") and [0034] ("speech recognition engine 204 generates a set of characteristics associated with the locally detected speech event") that any VRD of the VRD network 102 uses a local speech recognition engine 204 to generate the characteristics)); 
processing both the measure and the additional measure to determine whether the acoustic event detected by at least both the assistant device and the additional assistant device is an actual acoustic event ([0022], to determine whether the locally detected speech event is the same as an externally detected speech event, the VRD selection engine 208 evaluates metadata corresponding to each of the detected speech events." - i.e. processing, by evaluating, the metadata measures from two different VRDs to determine whether the speech event detected by both VRDs is the same, with this "same event" being interpretable as "an actual acoustic event") or paragraph [0039] ("VRD selection engine 208 [ ... ] compares the first set of characteristics with the second set of characteristics to determine whether the local VRD 104 is better suited to process audio commands corresponding to the speech event relative to external VRDs 104 that also detected the same speech event.", wherein, in case of determining that the local VRD is better suited for further processing, the speech event represents (i) an acoustic event actually suited for being processed by the local VRD and, thus (ii) an actual acoustic event) ); and 
in response to determining that the acoustic event is the actual acoustic event, causing an action associated with the actual acoustic event to be performed ([0024], to select the VRD 104 that will further process the recognized speech, the VRD selection engine 208 evaluates the metadata and/or the content of the recognized speech corresponding to each of the detected speech events.")).
As per claim 2, Thomsen teaches wherein the acoustic event comprises a hotword detection event, wherein the event detection model that is stored locally at the assistant device comprises a hotword detection model that is trained to detect whether a particular word or phrase is captured in the audio data, and wherein the additional event detection model that is stored locally at the additional assistant device comprises an additional hotword detection model that is trained to detect whether the particular word or phrase is captured in the additional audio data ([0013]-[0014], detecting, by each of the VRDs 104 whether a particular word or phrase is captured in an audio event, and determining, based on the detected word or phrase, which of the VRDs 104 is best suited to process the audio command(s) corresponding to the speech event.).
As per claim 3, Thomsen teaches wherein the measure associated with the acoustic event comprises a confidence level corresponding to whether the audio data captures the particular word or phrase, and wherein the additional measure associated with the acoustic event comprises an additional confidence level corresponding to whether the additional audio data captures the particular word or phrase ([0018], [0028], generating, by the voice recognition device and the additional device, quality metrics based on confidence that the recognized speech is accurate).
As per claim 4, Thomsen teaches wherein determining that the acoustic event is the actual acoustic event comprises determining the particular word or phrase is captured in both the audio data and the additional audio data based on the confidence level and the additional confidence level ([0021]- [0022], determining whether the locally detected speech event is the same as an externally detected speech event based on characteristics/metadata extracted by the voice recognition device and external voice recognition device 104. The voice recognition device and the additional device generate quality metrics based on confidence that the recognized words or phrases is accurate, [0018], [0028]).
As per claim 5, Thomsen teaches wherein causing the action associated with the actual acoustic event to be performed comprises activating one or more components of an automated assistant in response to determining the acoustic event data indicates the audio data or the additional audio data captures the particular word or phrase ([0024] (activating further processing of recognized speech).
As per claim 6, Thomsen teaches wherein the hotword detection model that is stored locally at the assistant device is a distinct hotword model that is distinct from the additional hotword detection model that is stored locally at the additional assistant device (figures 1 and 2 with paragraphs [0013] and [0015], from which it is apparent that any VRD of the VRD network 102 uses a distinct local speech recognition engine 204 representing inter alia a keyword/hotword detection model).
As per claim 7, Thomsen teaches wherein the acoustic event comprises a sound detection event, wherein the event detection model that is stored locally at the assistant device comprises a sound detection model that is trained to detect whether a particular sound is captured in the audio data, and wherein the additional event detection model that is stored locally at the additional assistant device comprises an additional sound detection model that is trained to detect whether the particular sound captured is in the additional audio data (Fig. 3 with paragraph [0033], speech recognition engine 204 in a local VRD 104 locally detects a speech event, together with Fig. 1 and paragraphs [0010]- [0015], wherein VRD network 102 includes multiple VRDs 104(1 )-(N)") and [0011] ("A VRD 104 includes at least one microphone for capturing audio commands"));.
As per claim 8, Thomsen teaches wherein the measure associated with the acoustic event comprises a confidence level corresponding to whether the audio data captures the particular sound, and wherein the additional measure associated with the acoustic event comprises an additional confidence level corresponding to whether the additional audio data captures the particular sound ([0018], [0028], wherein the set of characteristics includes a confidence score associated with spoken content recognized from the speech event detected locally, and the second set of characteristics includes a second confidence score associated with spoken content recognized from the speech event detected by the external device).
As per claim 9, Thomsen teaches wherein determining that the acoustic event is the actual acoustic event comprises determining the particular sound is captured in both the audio data and the additional audio data based on the confidence level and the additional confidence level ([0028], where in the event that two or more detected speech events have comparable quality and/or confidence values, the VRD selection engine 208 performs a tie-breaking operation).
As per claim 10, Thomsen teaches wherein causing the action associated with the actual acoustic event to be performed comprises: generating a notification that indicates an occurrence of the actual acoustic event; and causing the notification to be presented to a user that is associated with the ecosystem via a computing device of the user ([0013], [0031], notifying the user of the result of processing the audio commands and any associated actions. The notification can be visual and/or audio-based).
	As per claim 12, Thomsen teaches wherein processing both the measure and the additional measure to determine whether the acoustic event detected by both the assistant device and the additional assistant device is the actual acoustic event is by a given assistant device, wherein the given assistant device comprises one or more of: the assistant device, the additional assistant device, or a further additional assistant device that is co-located in the ecosystem with the assistant device and the additional assistant device ([0022]- [0023], one or more external VRDs 104 also detect the speech event).
As per claim 13, Thomsen teaches transmitting, by the assistant device, and to a remote system, the audio data; and transmitting, by the additional assistant device, and to the remote system, the additional audio data, wherein processing both the measure and the additional measure to determine whether the acoustic event detected by both the assistant device and the additional assistant device is the actual acoustic event is by the remote system ([0012], each VRD 104 is connected via a network connection to the processing system 106 that is remote from the VRD network 102. In one embodiment, the VRD 104 operates in conjunction with the processing system 106 to process audio commands captured via the microphones).
As per claim 14, Thomsen teaches wherein the audio data temporally corresponds to the additional audio data ([0014], [0025]).
As per claim 15, Thomsen teaches wherein processing both the measure and the additional measure to determine whether the acoustic event detected by both the assistant device and the additional assistant device is the actual acoustic event is in response to determining that a timestamp associated with the audio data temporally corresponds to an additional timestamp associated with the additional audio data ([0025], using timing information to determine whether the acoustic event detected by both the assistant device and the additional assistant device…).
As per claim 16, Thomsen teaches in response to detecting the audio data via the one or more microphones of the assistant device: anticipating detection of the additional audio data via the one or more additional microphones of the additional assistant device based on a plurality of historical acoustic events being detected at both the assistant device and the additional assistant device ([0011], [0013], [0014], anticipating detection of additional audio data via one or more additional microphones of the additional assistant device…).
As per claim 17, Thomsen teaches detecting, via the one or more microphones of the assistant device, subsequent audio data that captures subsequent acoustic event; processing, using the event detection model, the subsequent audio data that captures the subsequent acoustic event to generate a subsequent measure associated with the subsequent acoustic event; detecting, via one or more further additional microphones of a further additional assistant device located in the ecosystem, additional subsequent audio data that also captures the subsequent acoustic event, the further additional assistant device being in addition to the assistant device, and the further additional assistant device being co-located in the ecosystem with the assistant device; processing, using a further additional event detection model that is stored locally at the further additional assistant device, the additional subsequent audio data that captures the acoustic event to generate an additional subsequent measure associated with the acoustic event; processing both the measure and the additional measure to determine whether the subsequent acoustic event detected by both the assistant device and the further additional assistant device is an actual subsequent acoustic event; and in response to determining that the subsequent acoustic event is the actual subsequent acoustic event, causing a subsequent action associated with the actual subsequent acoustic event to be performed ([0003], wherein said, voice recognition devices operate independently such that a given device will process every voice-based command that the device receives.  The rest of the limitations are rejected for the same reason as set with regard to claim 1).

As per claim 18, Thomsen teaches in response to detecting the subsequent audio data via the one or more microphones of the assistant device: anticipating detection of the additional subsequent audio data via the one or more further additional microphones of the further additional assistant device, and without anticipating detection of any audio data via the one or more additional microphones of the additional assistant device, based on an additional plurality of historical acoustic events being detected at both the assistant device and the further additional assistant device ([0011], [0013], [0014], anticipating detection of additional audio data via one or more additional microphones of the additional assistant device…).
As per claim 19, Thomsen teaches detecting, via one or more microphones of an assistant device located in an ecosystem that includes a plurality of assistant devices, audio data that captures an acoustic event (Fig. 3and [0033], speech recognition engine 204 in a local VRD 104 locally detects a speech event") together with figure 1 and paragraphs [0010] ("VRD network 102 includes multiple VRDs 104(1 )-(N)") and [0011] ("A VRD 104 includes at least one microphone for capturing audio commands")); 
identifying, based on a location of the assistant device within the ecosystem, at least one additional assistant device that should have detected, via one or more respective microphones of the at least one additional assistant device in the ecosystem, additional audio data that temporally corresponds to the audio data ([0022], wherein a local voice recognition device and an external voice recognition device are identified within the ecosystem, and data from speech events received by each of the  local and external voice recognition devices is analyzed for further processing.  See also, [0014], audio signals 108 associated with a speech event are incident on the microphones included in several VRDs 104; and [0025], timestamps corresponding to the detected speech events… are the same or close in time") that the audio data captured by the VRDs have temporal correspondence);
in response to determining that the at least one additional assistant device detected the additional audio data that temporally corresponds to the audio data, processing, using an event detection model that is stored locally at the assistant device, the audio data that captures the acoustic event to generate a measure associated with the acoustic event ([0034], [0018], processing the audio data and generating input quality metrics associated with the recognized speech based on a measure of energy of the received audio signal and the background noise in the audio signal, output quality metrics associated with the recognized speech based on a confidence that the recognized speech is accurate and/or correct relative to the spoken words, and timing information associated with the recognized speech), and processing, using a respective event detection model stored locally at the at least one additional assistant device, the additional audio data that captures the acoustic event to generate an additional measure associated with the acoustic event ( [0035]- [0038], generating a second set of characteristics associated with the same speech event that was detected by an external VRD 104, wherein it is apparent from figures 1 and 2 together with paragraphs [0015] ("a given VRD 104 [ ... ] includes[ ... ] a speech recognition engine 204") and [0034] ("speech recognition engine 204 generates a set of characteristics associated with the locally detected speech event") that any VRD of the VRD network 102 uses a local speech recognition engine 204 to generate the characteristics)); 
determining, based on both the measure and the additional measure, whether the acoustic event detected by both the assistant device and the additional assistant device is an actual acoustic event ([0022], to determine whether the locally detected speech event is the same as an externally detected speech event, the VRD selection engine 208 evaluates metadata corresponding to each of the detected speech events." - i.e. processing, by evaluating, the metadata measures from two different VRDs to determine whether the speech event detected by both VRDs is the same, with this "same event" being interpretable as "an actual acoustic event") or paragraph [0039] ("VRD selection engine 208 [ ... ] compares the first set of characteristics with the second set of characteristics to determine whether the local VRD 104 is better suited to process audio commands corresponding to the speech event relative to external VRDs 104 that also detected the same speech event.", wherein, in case of determining that the local VRD is better suited for further processing, the speech event represents (i) an acoustic event actually suited for being processed by the local VRD and, thus (ii) an actual acoustic event) ); and
in response to determining that the acoustic event is the actual acoustic event, causing an action associated with the actual acoustic event to be performed ([0024], to select the VRD 104 that will further process the recognized speech, the VRD selection engine 208 evaluates the metadata and/or the content of the recognized speech corresponding to each of the detected speech events.")).


Claim Rejections - 35 USC § 103
4.	In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim 11 is rejected under 35 U.S.C. 103 as being unpatentable over Thomsen (EP 3407348) in view of Horling (US 2018/0330589).
As per claim 11, Thomsen does not explicitly disclose wherein the particular sound comprises one or more of: glass breaking, a dog barking, a cat meowing, a doorbell ringing, a smoke alarm sounding, a carbon monoxide detector sounding, a baby crying, or a door knocking.
Horling in the same field of endeavor teaches, at paragraph [0167], the example of Fig. 7B, wherein the audio event has been classified as glass breaking in the kitchen.  Therefore, it would have been obvious at the time the application was filed to used the sound detection feature of Horling with the system of Thomsen, in order to enhance audible activity detection and provide better assistance to the user.

Claim 20 is rejected under 35 U.S.C. 103 as being unpatentable over Thomsen (EP 3407348). 
As per claim 20, Thomsen does not explicitly disclose in response to determining that the at least one additional assistant device did not detect any audio data that temporally corresponds to the audio data, discarding the audio data.  However, Thomsen teaches using the timing information, the metrics, and the content of the recognized speech to filter certain VRDs 104 before a final selection ([0024]). If, for a given detected speech event, a less than a threshold amount of the energy of the audio signal is within that range, then the detected speech event is less likely to be human speech or may include significant noise combined with human speech (to be discarded or filtered) ([0026]).  Therefore, it would have been obvious at the time the application was filed for the system of Thomsen to discard the audio data as claimed.  This would enhance audible activity detection and provide better assistance to the user.
Conclusion
5.	The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.  See PTO-892.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ABDELALI SERROU whose telephone number is (571)272-7638. The examiner can normally be reached M-F 9 Am - 5 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Pierre-Louis Desir can be reached on 571-272-7799. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/ABDELALI SERROU/Primary Examiner, Art Unit 2659