Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

DETAILED ACTION
This office action is in response to correspondence 06/10/22 regarding application 16/878,802, in which claims 21-40 were amended and new claims 41-42 were added. Claims 21-42 are pending in the application and have been considered.

Response to Arguments
Applicant argues that Padilla teaches a voice assistant device receiving a command to broadcast audio, and broadcasting it to all networked voice assistant devices in the example household, but does not teach receiving an utterance to output content using multiple devices, determining two different devices detecting presence of users, and based at least in part thereon, sending output data to the foregoing two devices. 
In response, Applicant is correct that the previously cited sections of Padilla such as [0014-0016] describe broadcasting audio to all the voice assistant device in the example household. However, in response to Applicant’s amendments, the examiner has conducted a thorough review of Padilla and discovered additional evidence that applies to the amended claims:

[0017] FIG. 2 is a schematic illustration of the example household 100 of FIG. 1 having the same voice assistant devices located in the different rooms. In the illustrated example of FIG. 2, a household member that is located in the kitchen 102 recites a wake phrase and a corresponding task, such as the phrase “Alexa, open a channel to the Garage.” In response to receiving the aforementioned phrase, the example kitchen intercom engine 126 causes a room channel to be established between the originating voice assistant device (i.e., the kitchen voice assistant device 114) and a destination voice assistant device (i.e., the garage voice assistant device 118). As described in further detail below, each voice assistant device includes an associated room name to enable voice communications to particular rooms within the example household 100. Unlike the previous example, in which a broadcast channel is established for all voice assistant devices, the illustrated example of FIG. 2 establishes the room channel that is restricted to include only those voice assistant devices identified by the recited phrase, as shown by an example channel indication arrow 142 between the example kitchen 102 and the example garage 106. In other words, the room channel is established for the example kitchen voice assistant device 114 and the example garage voice assistant device 118, but the example living room voice assistant device 116, the example first bedroom voice assistant device 120, the example second bedroom voice assistant device 122, and the example third bedroom voice assistant device 124 do not participate in the established communication channel/session. In some examples, three or more rooms may be identified for multi-room communication, such as by reciting the phrase “Alexa, open a channel to the Garage and the Living Room.”

[0032] In some examples, the participant does not know where a desired household member is located and, additionally, may not want to conduct an intercom session with all networked voice assistant devices in the example household 100. In such circumstances, and assuming the participant is located near the example kitchen voice assistant device 114, the participant may recite “OK Google, open a channel with Jane.” The example target determiner 516 determines that an intercom session is to be enabled with a particular person, but the location of that person is unknown to the requesting participant. As such, the example target determiner 516 invokes the example device map engine 518 to query the location database 520 for a voice assistant device associated with Jane. In particular, the example device map engine 518 queries the example location table 400 stored in the location database 520 for the household member “Jane,” and identifies that Jane is associated with the example garage voice assistant device 118 (see row 414 of the illustrated example of FIG. 4). In view of a match between the desired household member and a corresponding voice assistant device, the example device map engine 518 invokes the example broadcast engine 514 to enable an intercom session between the example kitchen voice assistant device 114 and the example garage voice assistant device 118.

[0045] In the illustrated example of FIG. 7, the example environment detector 510 determines if sound has been detected by one or more of the example voice assistant devices of the example household 100 (block 702). If not, then the example environment detector 510 waits for the next scheduled, periodic, aperiodic and/or manual iteration (block 702). However, in response to detecting a sound near one or more of the example voice assistant devices (block 702), the example member discovery engine 522 determines if the detected sound matches a household member's voice signature (block 704). If so, then the example device map engine 518 updates the example location table 400 (e.g., stored in the example location database 520) with a pairing/association between the detected voice signature and a corresponding voice assistant device that detected the signature (block 706). Such associations may occur when household members are speaking, but may not necessarily be interacting with the corresponding voice assistant device in the proximity of their speaking activity (e.g., talking on the phone, talking to other occupants, etc.).

[0047] In other examples, the detected sound is not associated with human speech (block 704), but may be indicative of activity in a particular room. For example, sounds relating to typing, office chair squeaking and/or eating may be detected by the example member discovery engine 522 to indicate presence of an occupant near one or more of the example voice assistant devices. Accordingly, the example device map engine 518 may flag those particular locations as candidate locations within which a household member may be located (block 708). Additionally, the example device map engine 518 updates the example location table 400 with an indication of human presence near the corresponding voice assistant device, as shown in the example candidate occupant column 408 of FIG. 4 (see row 416). In such circumstances, a voice command may target only those rooms having (a) a confirmed occupant or (b) a candidate occupant when enabling a communication session (e.g., an intercom session). This may be particularly useful when one or more of the rooms in which a voice assistant device is present includes a sleeping occupant, such as the example first bedroom that functions as a nursery (see row 418) of FIG. 4. In such circumstances, voice commands to locate members of the household may be targeted to all rooms of the example household 100 except the first bedroom 108 so that the sleeping occupant is not disturbed. As described above, if a particular room has an associated do-not-disturb indication active, such as the “YES” designation in the example do-not-disturb column 422 of FIG. 4 (see row 418), then such designated rooms will not be inundated with audio prompts.

As the above evidence demonstrates, in addition to the ability to broadcast to all rooms, Padilla also teaches allowing the user to specify which rooms or users to broadcast the audio to, and broadcasting only to rooms in which a user or candidate user has been detected using attributes such as matching a voice signature. The examiner contends that based on the above evidence, Padilla is reasonably considered to teach receiving an utterance to output content using multiple devices, determining two different devices detecting presence of users, and based at least in part thereon, sending output data to the foregoing two devices, and that the independent claims even as amended are still considered anticipated by Padilla.

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.



Claims 21, 22, 24, 25, 28, 29, 33-35, 38, and 39 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Padilla et al. (US 20180288104 A1).

Consider claim 21, Padilla discloses a computer-implemented method comprising: 
receiving, from a first device, first input audio data corresponding to a first utterance (monitoring a room with a microphone, [0011], and recording “Alexa, open a channel to the Garage and the Living Room” spoken by a household member, [0017]); 
performing speech processing using the first input audio data to determine the first utterance requests first content be broadcast using multiple devices (using voice recognition engine, [0027], to determine the user said “Alexa, open a channel to the Garage and the Living Room”, [0017], the “first content” to be broadcast being whatever speech the user utters after the open channel command, such as “Dinner is ready”, [0016]); 
generating first output data corresponding to the first content (output during the intercom broadcast session via rendering audio to the speakers, [0030]); 
based at least in part on the first utterance requesting the first content be broadcast using multiple devices:
	determining a second device detecting presence of a first user (when a household member is speaking but not necessarily interacting with the voice assistant device in the proximity, the member discovery engine determines the sound matches a household member’s voice and updates location table 400, [0045], for example, device 3 detects Jane in the Garage, Fig. 4);
determining a third device detecting presence of a second user (when a household member is speaking but not necessarily interacting with the voice assistant device in the proximity, the member discovery engine determines the sound matches a household member’s voice and updates location table 400, [0045], for example, device 5 detects Robert in Bedroom 2, Fig. 4);
based at least in part on the second device detecting presence of the first user, sending the first output data to a second device (the voice command targets only those rooms having a confirmed occupant when enabling a communication, [0047], therefore when Jane is confirmed in the garage, the audio output is considered “based… on” the second device detecting presence of the second user, [0045], [0047]); 
based at least in part on the third device detecting presence of the second user, sending the first output data to a third device (the voice command targets only those rooms having a confirmed occupant when enabling a communication, [0047], therefore when Robert is confirmed in the Bedroom 2, the audio output is considered “based… on” the third device detecting presence of the second user, [0045], [0047]).


Consider claim 28, Padilla discloses a system comprising: 
at least one processor (processor, [0027]); and 
at least one memory comprising instructions (memory storing instructions, [0036]) that, when executed by the at least one processor, cause the system to: 
receive, from a first device, first input audio data corresponding to a first utterance (monitoring a room with a microphone, [0011], and recording “Alexa, open a channel to the Garage and the Living Room” spoken by a household member, [0017]); 
perform speech processing using the first input audio data to determine the first utterance requests first content be broadcast using multiple devices (using voice recognition engine, [0027], to determine the user said “Alexa, open a channel to the Garage and the Living Room”, [0017], the “first content” to be broadcast being whatever speech the user utters after the open channel command, such as “Dinner is ready”, [0016]); 
generate first output data corresponding to the first content (output during the intercom broadcast session via rendering audio to the speakers, [0030]); 
based at least in part on the first utterance requesting the first content be broadcast using multiple devices:
	determine a second device detecting presence of a first user (when a household member is speaking but not necessarily interacting with the voice assistant device in the proximity, the member discovery engine determines the sound matches a household member’s voice and updates location table 400, [0045], for example, device 3 detects Jane in the Garage, Fig. 4);
determine a third device detecting presence of a second user (when a household member is speaking but not necessarily interacting with the voice assistant device in the proximity, the member discovery engine determines the sound matches a household member’s voice and updates location table 400, [0045], for example, device 5 detects Robert in Bedroom 2, Fig. 4);
based at least in part on the second device detecting presence of the first user, send the first output data to a second device (the voice command targets only those rooms having a confirmed occupant when enabling a communication, [0047], therefore when Jane is confirmed in the garage, the audio output is considered “based… on” the second device detecting presence of the second user, [0045], [0047]); 
based at least in part on the third device detecting presence of the second user, send the first output data to a third device (the voice command targets only those rooms having a confirmed occupant when enabling a communication, [0047], therefore when Robert is confirmed in the Bedroom 2, the audio output is considered “based… on” the third device detecting presence of the second user, [0045], [0047]).

Consider claim 35, Padilla discloses computer-implemented method comprising: 
receiving, from a first device, first input audio data corresponding to a first utterance (monitoring a room with a microphone, [0011], and recording “Alexa, open a channel to the Garage and the Living Room” spoken by a household member, [0017]); 
performing speech processing using the first input audio data to determine the first utterance requests first content be broadcast using multiple devices (using voice recognition engine, [0027], to determine the user said “Alexa, open a channel to the Garage and the Living Room”, [0017], the “first content” to be broadcast being whatever speech the user utters after the open channel command, such as “Dinner is ready”, [0016]); 
generating first output data corresponding to the first content (output during the intercom broadcast session via rendering audio to the speakers, [0030]); 
based at least in part on the first utterance requesting the first content be broadcast using multiple devices:
	determining a second device detecting presence of a first user (when a household member is speaking but not necessarily interacting with the voice assistant device in the proximity, the member discovery engine determines the sound matches a household member’s voice and updates location table 400, [0045], for example, device 3 detects Jane in the Garage, Fig. 4);
determining a third device detecting presence of a second user (when a household member is speaking but not necessarily interacting with the voice assistant device in the proximity, the member discovery engine determines the sound matches a household member’s voice and updates location table 400, [0045], for example, device 5 detects Robert in Bedroom 2, Fig. 4);
based at least in part on the second device detecting presence of the first user, sending the first output data to a second device (the voice command targets only those rooms having a confirmed occupant when enabling a communication, [0047], therefore when Jane is confirmed in the garage, the audio output is considered “based… on” the second device detecting presence of the second user, [0045], [0047]); 
based at least in part on the third device detecting presence of the second user, sending the first output data to a third device (the voice command targets only those rooms having a confirmed occupant when enabling a communication, [0047], therefore when Robert is confirmed in the Bedroom 2, the audio output is considered “based… on” the third device detecting presence of the second user, [0045], [0047]);
 receiving, from the second device, second input audio data corresponding to a second utterance, the second utterance corresponding to a first response to the first output audio data (“Okay, be there in five minutes”, [0016]); 
receiving, from the third device, third input audio data corresponding to a third utterance, the third utterance corresponding to a second response to the first output audio data (whatever response comes from a user, Guest #1, Fig 4, in the living room, [0016], [0014], noting that “voice discussions in any of the household rooms” are broadcast to each of the voice assistant devices); 
generating second output data representing the first response (the response from the garage generates a broadcast to all the other voice assistant devices, [0016]); 
generating third output data representing the second response (the response from the living room generates a broadcast to all the other voice assistant devices, [0016]); 
sending the second output data to the first device (the other voice assistant devices will cause the person from the garage’s audio to be emanated from the speaker, which includes the kitchen voice assistant device 114, [0016], [0014]); and 
sending the third output data to the first device (the other voice assistant devices will cause the person from the living room’s audio to be emanated from the speaker, which includes the kitchen voice assistant device 114, [0016], [0014]).

Consider claim 22, Padilla discloses the first device, the second device, and the third device are associated with a first profile (the device names, [0014], Fig 4). 

Consider claim 24, Padilla discloses: generating the first output data to include at least a portion of the first input audio data (“Everyone, dinner is ready”, [0016]). 

Consider claim 25, Padilla discloses: receiving, from the second device, second input data corresponding to a response to the first content (whatever response comes from a user, Guest #1, Fig 4, in the living room, [0016], [0014], noting that “voice discussions in any of the household rooms” are broadcast to each of the voice assistant devices); generating second output data corresponding to the response (the response from the living room generates a broadcast to all the other voice assistant devices, [0016]); and sending the second output data to the first device (the other voice assistant devices will cause the person from the living room’s audio to be emanated from the speaker, which includes the kitchen voice assistant device 114, [0016], [0014]). 

Consider claim 29, Padilla discloses the at least one memory further comprises instructions that, when executed by the at least one processor, further cause the system to: receive, from the second device, second input data corresponding to a response to the first content (whatever response comes from a user, Guest #1, Fig 4, in the living room, [0016], [0014], noting that “voice discussions in any of the household rooms” are broadcast to each of the voice assistant devices); generate second output data corresponding to the second response (the response from the living room generates a broadcast to all the other voice assistant devices, [0016]); and send the second output data to the first device (the other voice assistant devices will cause the person from the living room’s audio to be emanated from the speaker, which includes the kitchen voice assistant device 114, [0016], [0014]). 

Consider claim 33, Padilla discloses the first device, the second device and the third device are associated with a first profile (the device names, [0014], Fig 4). 

Consider claim 34, Padilla discloses the at least one memory further comprises instructions that, when executed by the at least one processor, further cause the system to: generate the first output data to include at least a portion of the first input audio data (“Everyone, dinner is ready”, [0016]). 

Consider claim 38, Padilla discloses: including in the first output audio data at least a portion of the first input audio data (“Everyone, dinner is ready”, [0016]). 

Consider claim 39, Padilla discloses the first device, the second device, and the third device are associated with a first profile (the device names, [0014], Fig 4). 

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.


Claims 26, 27, 31, 32, 36, and 37 are rejected under 35 U.S.C. 103 as being unpatentable over Padilla et al. (US 20180288104 A1) in view of Bostick et al. (US 20170330585 A1).

Consider claim 26, Padilla discloses the second input data comprises second input audio data (“Okay, be there in five minutes”, [0016]). 
Padilla does not specifically mention: performing automatic speech recognition on the second input audio data to generate text data corresponding to the second utterance, and generating the second output data to comprise a representation of the text data. 
Bostick discloses performing automatic speech recognition on announcement audio data to generate text data, and generating the output data to comprise a representation of the text data (presenting a set of announcements, which have been transcribed from audio to text, to a user in an AR environment, [0033]). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Padilla by performing automatic speech recognition on the second input audio data to generate text data, and generating the second output data to comprise a representation of the text data in order to offer the advantage of permitting a user to review prior announcements that the user may have missed, as suggested by Bostick ([0033]).

Consider claim 27, Padilla discloses: causing the first device to output audio corresponding to the second input audio data (“Okay, be there in five minutes” is output in the kitchen via kitchen voice assistant device, [0016]).

Consider claim 31, Padilla discloses Padilla discloses the second input data comprises second input audio data (“Okay, be there in five minutes”, [0016]). 
Padilla does not specifically mention: performing automatic speech recognition on the second input audio data to generate text data corresponding to the second utterance, and generating the second output data to comprise a representation of the text data. 
Bostick discloses performing automatic speech recognition on announcement audio data to generate text data, and generating the output data to comprise a representation of the text data (presenting a set of announcements, which have been transcribed from audio to text, to a user in an AR environment, [0033]). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Padilla by performing automatic speech recognition on the second input audio data to generate text data, and generating the second output data to comprise a representation of the text data for reasons similar to those for claim 26.

Consider claim 32, Padilla discloses: causing the first device to output audio corresponding to the second input audio data (“Okay, be there in five minutes” is output in the kitchen via kitchen voice assistant device, [0016]).

Consider claim 36, Padilla discloses Padilla discloses the second input data comprises second input audio data (“Okay, be there in five minutes”, [0016]). 
Padilla does not specifically mention: performing automatic speech recognition on the second input audio data to generate text data corresponding to the second utterance, and generating the second output data to comprise a representation of the text data. 
Bostick discloses performing automatic speech recognition on announcement audio data to generate text data, and generating the output data to comprise a representation of the text data (presenting a set of announcements, which have been transcribed from audio to text, to a user in an AR environment, [0033]). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Padilla by performing automatic speech recognition on the second input audio data to generate text data, and generating the second output data to comprise a representation of the text data for reasons similar to those for claim 26.

Consider claim 37, Padilla discloses: causing the first device to output audio corresponding to the second input audio data (“Okay, be there in five minutes” is output in the kitchen via kitchen voice assistant device, [0016]).


Claim 41 is rejected under 35 U.S.C. 103 as being unpatentable over Padilla et al. (US 20180288104 A1) in view of Todasco, Michael (US 20180007210 A1).

Consider claim 41, Padilla does not, but Todasco discloses a second device detecting presence of a first user is based at least in part on image data captured by a camera of the second device (utilizing the camera for facial recognition, [0032], in which there are at least two devices and two users, Fig 1). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Padilla such that a second device detecting presence of a first user is based at least in part on image data captured by a camera of the second device in order to improve automation, as suggested by Todasco ([0003]).


Claims 23, 30, and 40 are rejected under 35 U.S.C. 103 as being unpatentable over Padilla et al. (US 20180288104 A1) in view of Sapp, Kevin (US 20080233932 A1).


Consider claim 23, Padilla does not, but Sapp discloses: causing a second device to display a first virtual button corresponding to a first response to a first content, and a second virtual button corresponding to a second response to the first content (client computing devices 102, 108, etc., includes a “second” device [0022], selecting a pre-programmed response in response to receiving the communication, [0038], Fig 6A).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Padilla by causing a second device to display a first virtual button corresponding to a first response to a first content, and a second virtual button corresponding to a second response to the first content in order to allow a more personal or contextually accurate response, as suggested by Sapp ([0003]). 

Consider claim 30, Padilla does not, but Sapp discloses: causing a second device to display a first virtual button corresponding to a first response to a first content, and a second virtual button corresponding to a second response to the first content (client computing devices 102, 108, etc., includes a “second” device [0022], selecting a pre-programmed response in response to receiving the communication, [0038], Fig 6A).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Padilla by causing a second device to display a first virtual button corresponding to a first response to a first content, and a second virtual button corresponding to a second response to the first content for reasons similar to those for claim 23. 

Consider claim 40, Padilla does not, but Sapp discloses: causing a second device to display a first virtual button corresponding to a first response, and a second virtual button corresponding to a second response (client computing devices 102, 108, etc., includes a “second” device [0022], selecting a pre-programmed response in response to receiving the communication, [0038], Fig 6A).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Padilla by causing a second device to display a first virtual button corresponding to a first response, and a second virtual button corresponding to a second response for reasons similar to those for claim 23. 



Allowable Subject Matter
Claim 42 is objected to as being dependent on a rejected base claim, but would be allowable if rewritten in independent form including all limitations of the base and any intervening claims.

The following is the examiner’s statement of reasons for indicating subject matter allowable over the prior art:

With respect to claim 42, the prior art does not fairly teach or suggest ”… performing the speech
processing using the first input audio data further determines the first utterance includes an
indication of a time, and wherein the computer-implemented method further comprises:
 determining an electronic calendar entry associated with the time indicated in the first
utterance; determining a participant of the electronic calendar entry is associated with the second
device; and after determining the participant of the electronic calendar entry is associated with the
second device, determining the second device is detecting presence of the first user.”

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Jesse Pullias whose telephone number is 571/270-5135. The examiner can normally be reached on M-F 8:00 AM - 4:30 PM. The examiner’s fax number is 571/270-6135.

Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.

If attempts to reach the examiner by telephone are unsuccessful, the examiner's supervisor, Andrew Flanders can be reached on 571/272-7516. 

Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free).


/Jesse S Pullias/
Primary Examiner, Art Unit 2655                                   07/28/22