DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

The text of those sections of Title 35, U.S. Code not included in this action can be found in a prior Office action.

Response to Amendment
This communication is responsive to the applicant’s amendment dated 08/26/2022.  The applicant(s) amended claims 1-12 and 15 and canceled claims 13-14.

Response to Arguments
Applicant's arguments filed 08/26/2022 have been fully considered but they are not persuasive.  

Regarding claim 1, the Applicant argues, “Specifically, Huang does not involve associating the voice input of the user with the place and the time of the voice input, and checking a stored user setting value for determining selection of an endpoint device, and further does not involve wherein any such stored user setting value is associated with the time and place when and where the user's voice was received. Merely selecting a device based on a location of the user is not equivalent to the above features.” (Remarks: pg. 9) The Examiner respectfully disagrees.
Considering the prior art as a whole, Huang teaches associating the voice input of the user with the place and the time of the voice input and checking a stored user setting value for determining selection of an endpoint device (par. 0085; ‘The I/O processing module selects the endpoint devices to output the informational answers, audio of the media item, and/or the destination device based on the user's voice input, and/or based on the location of the user when the user's requested task is performed.’; timestamp of the user input, par. 0086; ‘‘In some embodiments, the multi-channel input collection interface 360 provides a timestamp and associates a respective input device ID with the voice input received from each respective input channel of the multi-channel input collection interface.’; See also par. 0090 regarding timestamp). Regarding a stored use setting, this can be a voice print, which is used to authorize a user (voiceprint, par. 0065; ‘The authentication is optionally performed based on the voice input (e.g., a comparison between a voiceprint extracted from the user's voice input and a voiceprint of an authorized user that is stored at the digital assistant server).’). Therefore, the Applicant’s arguments are not persuasive.

 Claim Rejections - 35 USC § 102
Claims 1-4, 6-10, 12, and 15 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Huang et al. (US 20190295542 A1).

Regarding claim 1, Huang teaches:
“A device for controlling a plurality of voice recognition devices” (par. 0069; ‘As shown in FIG. 3A, in some embodiments, the I/O processing module 328 interacts with the user, a user device (e.g., a user device 104 in FIG. 1B), and other devices (e.g., endpoint devices 124 in FIG. 1B) through the network communications interface to obtain user input (e.g., a speech input) and to provide responses to the user input.’), the device comprising:
“a user identification unit configured to identify a user based on a voice spoken by the user” (Speaker recognition, par. 0069; ‘In some embodiments, when a user request is received by the I/O processing module 328 and the user request contains a speech input, the I/O processing module 328 forwards the speech input to speaker recognition module 340 for speaker recognition and subsequently to the speech-to-text (STT) processing module 330 for speech-to-text conversions.’);
“a user setting storage unit configured to store a setting value of the user” (par. 0069; ‘In some embodiments, person-specific speech-to-text models are selected to perform the speech-to-text conversion based on the speaker recognition result.’; voiceprint, par. 0065; ‘The authentication is optionally performed based on the voice input (e.g., a comparison between a voiceprint extracted from the user's voice input and a voiceprint of an authorized user that is stored at the digital assistant server).’;);
“a model storage unit configured to analyze an intention of the user based on the voice spoken by the user and select a first voice recognition device corresponding to the analyzed intention” (par. 0085; ‘It is to be noted that, the endpoint devices that capture the user's voice inputs, the endpoint devices that output the follow-up questions, and the endpoint devices that output the informational answer, audio of the media item, and the endpoint devices to which the encoded commands are sent, can all be different devices, and are dynamically selected during the interactions between the user and the digital assistant for a single executable intent/requested task.’; );
“a processor configured to cause the first voice recognition device to execute a function corresponding to the voice or the intention, wherein the model storage unit selects the first voice recognition device based on the stored setting value of the user indicating selection of the first voice recognition device based on a time when the voice is spoken by the user and a place where the voice was received” (par. 0076; ‘The natural language processor 332 can use the user-specific information to supplement the information contained in the user input to further define the user intent.’ This user-specific information may be the voiceprint.; Selecting endpoint device to output a response based on the location of user, par. 0085; ‘The I/O processing module selects the endpoint devices to output the informational answers, audio of the media item, and/or the destination device based on the user's voice input, and/or based on the location of the user when the user's requested task is performed.’; par. 0086; ‘In some embodiments, the multi-channel input collection interface 360 provides a timestamp and associates a respective input device ID with the voice input received from each respective input channel of the multi-channel input collection interface.’; par. 0087; ‘For example, based on the device identifier or network address of an input endpoint device that has transmitted a voice input to the digital assistant through an incoming channel of the multi-channel input collection interface 360, the input channel identification module 362 identifies the device name, device type, and optionally, device location, and/or device capabilities of the input endpoint device from which a current voice input has been transmitted to the I/O processing module 328.). 

 Regarding claims 2 (dep. on claim 1), 8 (dep. on claim 6), Huang further teaches:
“an intention analysis model configured to analyze the intention” (par. 0071; ‘The natural language processing module 332 (“natural language processor”) of the digital assistant 106′ takes the sequence of words or tokens (“token sequence”) generated by the speech-to-text processing module 330, and attempts to associate the token sequence with one or more “actionable intents” recognized by the digital assistant. As used herein, an “actionable intent” represents a task that can be performed by the digital assistant 106′ and/or devices subject to control by the digital assistant system, and has an associated task flow implemented in the task flow models 354.’); and
“a device selection model configured to select the first voice recognition device, wherein the first voice recognition device is a voice recognition device that the user wants to use among a plurality of voice recognition devices” (par. 0032; ‘The voice-based digital assistant processes the selected audio stream containing the voice input, and the voice-based digital assistant selects, from among multiple endpoint devices, a destination device that is to output an audio output (e.g., a confirmation output, an informational answer, etc.) and/or perform a requested task.’; par. 0071; Task flow model).

Regarding claims 3 (dep. on claim 2), 9 (dep. on claim 8), Huang further teaches:
“wherein the intention analysis model includes: a specific user intention analysis model configured to analyze intention of a specific user, wherein the specific user is defined as a user whose setting value is stored in the user setting storage unit” (par. 0065; ‘The authentication is optionally performed based on the voice input (e.g., a comparison between a voiceprint extracted from the user's voice input and a voiceprint of an authorized user that is stored at the digital assistant server).’; par. 0069; ‘In some embodiments, person-specific speech-to-text models are selected to perform the speech-to-text conversion based on the speaker recognition result.’); and
“a common user intention analysis model configured to analyze individual intention of a common user, wherein the common user is defined as a user different from the specific user and whose setting value is not stored in the user setting storage unit” (Huang teaches that some embodiments use person-specific models. This suggests that there are embodiments in which non-person-specific models may be selected.; par. 0069; ‘In some embodiments, person-specific speech-to-text models are selected to perform the speech-to-text conversion based on the speaker recognition result.’).

Regarding claims 4 (dep. on claim 3), 10 (dep. on claim 9), Huang further teaches:
“wherein the device selection model includes: a specific user device selection model configured to select the first voice recognition device among the plurality of voice recognition devices in response to the intention of the specific user” (par. 0069; ‘In some embodiments, person-specific speech-to-text models are selected to perform the speech-to-text conversion based on the speaker recognition result.’; par. 0076; ‘The natural language processor 332 can use the user-specific information to supplement the information contained in the user input to further define the user intent. In some embodiments, the user data also includes the user's specific voiceprint for user authentication or speech samples for speaker recognition training.’); and
“a common user device selection model configured to select a voice recognition device corresponding to the individual intention of the common user among the plurality of voice recognition devices, wherein the voice recognition device corresponding to the individual intention of the common user includes at least one of the first voice recognition device and a second voice recognition device that is a device different from the first voice recognition device” (par. 0099; ‘In some embodiments, the output channel selection module 370 uses a set of prioritization rules to determine which of the set of available output endpoint devices identified by the destination device identification module 368 should be selected as the output device for the current audio output. The prioritization rules may be based on the location of the output devices relative to the location of the user (e.g., the device closest to the user is chosen), the audio quality of the output devices (e.g., the device with the highest sound quality is chosen), the type of output that is to be delivered to the user (e.g., different output devices are suitable for outputting alarm sound vs. music vs. speech), the power usage considerations, etc.’).

Regarding claim 6, Huang teaches:
“a plurality of voice recognition devices; a server networked with each of the plurality of voice recognition devices; and a user terminal configured to perform data communication with the server and the voice recognition devices” (par. 0043; ‘The digital assistant system includes a client-side portion 102 (hereafter “digital assistant (DA) client 102”) executed on a user device 104 (e.g., a smartphone, a tablet, or a central communication hub), and a server-side portion 106 (hereafter “digital assistant (DA) server 106”) executed on a server system 108. T’), wherein the server includes:
“a user identification unit configured to identify a user based on a voice spoken by the user; a user setting storage unit that stores a setting value of the user; a model storage unit configured to analyze an intention of the user based ono the voice spoken by the user and select a first voice recognition device corresponding to the analyzed intention; a processor configured to cause the first voice recognition device to execute a function corresponding to the voice or the intention, wherein the model storage unit selects the first voice recognition device based on the stored setting value of the user indicating selection of the first voice recognition device based on a time when the voice is spoken by the user and a place where the voice was received” (see claim 1).

Regarding claim 7 (dep. on claim 6), Huang further teaches:
“wherein the plurality of voice recognition devices includes at least a TV, an air conditioner, an air cleaner, a refrigerator, a kimchi refrigerator, a water purifier, a dishwasher, a microwave, a washing machine, a dryer, a styler, a cleaning robot, a massage chair, a PC or a projector” (par. 0060; ‘n some implementations, the smart home environment 122 includes various devices 124, such as a plurality of appliances 212, such as refrigerators, stoves, ovens, televisions, washers, dryers, lights, stereos, intercom systems, garage-door openers, floor fans, ceiling fans, wall air conditioners, pool heaters, irrigation systems, security systems, space heaters, window AC units, motorized duct vents, and so forth. In some embodiments, some of the devices 124 may be intelligent, multi-sensing, and network enabled.’ See also par. 0051 and 0061). 

Regarding claim 12, Huang teaches:
“recognizing a voice of a user by the voice recognition devices” (speech input, par. 0069; ‘As shown in FIG. 3A, in some embodiments, the I/O processing module 328 interacts with the user, a user device (e.g., a user device 104 in FIG. 1B), and other devices (e.g., endpoint devices 124 in FIG. 1B) through the network communications interface to obtain user input (e.g., a speech input) and to provide responses to the user input.’)
“identifying the user based on the voice” (Speaker recognition, par. 0069; ‘In some embodiments, when a user request is received by the I/O processing module 328 and the user request contains a speech input, the I/O processing module 328 forwards the speech input to speaker recognition module 340 for speaker recognition and subsequently to the speech-to-text (STT) processing module 330 for speech-to-text conversions.’);
“determining whether a setting value of the user is stored by a user setting storage unit” (par. 0069; ‘In some embodiments, person-specific speech-to-text models are selected to perform the speech-to-text conversion based on the speaker recognition result.’ See also claim 1);
“selecting a first voice recognition device among the plurality of voice recognition devices based on the stored setting value of the user indicating selection of the first voice recognition device based on a time when the voice is spoken by the user and a place where the voice was received” (par. 0032; ‘The voice-based digital assistant processes the selected audio stream containing the voice input, and the voice-based digital assistant selects, from among multiple endpoint devices, a destination device that is to output an audio output (e.g., a confirmation output, an informational answer, etc.) and/or perform a requested task.’ See also claim 1); and
 “causing a function to be executed corresponding to the voice of the user by the first voice recognition device based on the selection” (par. 0071; ‘As used herein, an “actionable intent” represents a task that can be performed by the digital assistant 106′ and/or devices subject to control by the digital assistant system, and has an associated task flow implemented in the task flow models 354.’). 

 Regarding claim 15 (dep. on claim 12), Huang further teaches:
“receiving feedback from the user in response to execution of the function by the first voice recognition device” (par. 0069; ‘In some embodiments, the I/O processing module 328 also sends follow-up questions to, and receives answers from, the user regarding the user request.’);
“determining whether selection of the first voice recognition device correctly corresponded to the intention of the specific user based on feedback” (par. 0074; ‘In some embodiments, additional factors are considered in selecting the node as well, such as whether the home assistant system 300 has previously correctly interpreted a similar request from a user.’; par. 0099; ‘In some embodiments, the output channel selection module 370 uses a set of prioritization rules to determine which of the set of available output endpoint devices identified by the destination device identification module 368 should be selected as the output device for the current audio output.’); and
“updating an intention analysis model and a device selection model depending on the determination” (par. 0101; ‘In some embodiments, if the output channel selection module 370 determines that one of the available output devices is a mobile device that moves with the user, the output channel selection module chooses to use that mobile device as the output channel for a continuous exchange between the user and the digital assistant.’).

Claim Rejections - 35 USC § 103
Claims 5 and 11 are rejected under 35 U.S.C. 103 as being unpatentable over Huang in view of Finkelstein et al. (US 20180260680 A1).

Regarding claims 5 (dep. on claim 1) and 11 (dep. on claim 6), Huang does not expressly teach:
“a user feedback analysis unit configured to collect and analyze a reaction of the user, wherein the user feedback analysis unit is configured to compare the intention with the reaction of the user, and determine whether the device that the user wanted to use matches the first voice recognition device.”
Finkelstein teaches:
“a user feedback analysis unit configured to collect and analyze a reaction of the user, wherein the user feedback analysis unit is configured to compare the intention with the reaction of the user, and determine whether the device that the user wanted to use matches the first voice recognition device” (par. 0110; ‘Using this additional user input 98, the user intent confidence classifier 230 may determine that the previously derived user intent 84 was at least partially incorrect. In this example, while the portion of the derived user intent 84 related to providing driving directions was correct, the additional user input 98 correcting the destination indicates that the destination portion was incorrect. Accordingly, the user intent confidence classifier 230 may determine a user intent confidence value 234 of the previously derived user intent 84 that reflects the incomplete accuracy of the user intent.’).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date to modify Huang’s natural language processing module by incorporating Finkelstein’s user intent confidence classifier in order to determine whether the intent determined was correct.

Conclusion
Other pertinent prior art are listed in the PTO-892 for consideration.
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to MARK VILLENA whose telephone number is (571)270-3191. The examiner can normally be reached 10 am - 6pm EST Monday through Friday.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Richemond Dorvil can be reached on (571) 272-7602. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

MARK . VILLENA
Examiner
Art Unit 2658



/MARK VILLENA/Examiner, Art Unit 2658