DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Priority
Receipt is acknowledged or paper submitted under 35 U.S.C. 119(a)-(d), which papers have been places of record in the file.

Drawings
The drawings were submitted on 06/30/2022.  These drawings are reviewed and accepted by the examiner.

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claims 1-4, 6-10, and 12-15 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Huang et al. (US 20190295542 A1).

Regarding claim 1, Huang teaches:
“A device for controlling a plurality of voice recognition devices” (par. 0069; ‘As shown in FIG. 3A, in some embodiments, the I/O processing module 328 interacts with the user, a user device (e.g., a user device 104 in FIG. 1B), and other devices (e.g., endpoint devices 124 in FIG. 1B) through the network communications interface to obtain user input (e.g., a speech input) and to provide responses to the user input.’), comprising:
“a user identification unit that identifies who a user is by using a voice spoken by the user” (Speaker recognition, par. 0069; ‘In some embodiments, when a user request is received by the I/O processing module 328 and the user request contains a speech input, the I/O processing module 328 forwards the speech input to speaker recognition module 340 for speaker recognition and subsequently to the speech-to-text (STT) processing module 330 for speech-to-text conversions.’);
“a user setting storage unit that stores a setting value of the user” (par. 0069; ‘In some embodiments, person-specific speech-to-text models are selected to perform the speech-to-text conversion based on the speaker recognition result.’);
“a model storage unit that receives the voice from the user identification unit, analyzes intention of the user, and selects a first voice recognition device corresponding to the analyzed intention” (par. 0071; ‘The natural language processing module 332 (“natural language processor”) of the digital assistant 106′ takes the sequence of words or tokens (“token sequence”) generated by the speech-to-text processing module 330, and attempts to associate the token sequence with one or more “actionable intents” recognized by the digital assistant. As used herein, an “actionable intent” represents a task that can be performed by the digital assistant 106′ and/or devices subject to control by the digital assistant system, and has an associated task flow implemented in the task flow models 354.’);
“a data learning unit that can change models stored in the model storage unit by artificial intelligence learning” (par. 0070; ‘In some embodiments, the speech-to-text processing module 330 uses various acoustic and language models to recognize the speech input as a sequence of phonemes, and ultimately, a sequence of words or tokens written in one or more languages. The speech-to-text processing module 330 is implemented using any suitable speech recognition techniques, acoustic models, and language models, such as Hidden Markov Models, Dynamic Time Warping (DTW)-based speech recognition, and other statistical and/or analytical techniques.’); and
“a processor that controls the first voice recognition device to execute a function corresponding to the voice or the intention, wherein the model storage unit selects the first voice recognition device based on a point in time when the voice is spoken from the user or a place where the user spoke the voice” (Selecting endpoint device to output a response based on the location of user, par. 0085; ‘The I/O processing module selects the endpoint devices to output the informational answers, audio of the media item, and/or the destination device based on the user's voice input, and/or based on the location of the user when the user's requested task is performed.’).

Regarding claims 2 (dep. on claim 1), 8 (dep. on claim 6), Huang further teaches:
“an intention analysis model for analyzing the intention” (par. 0071; ‘The natural language processing module 332 (“natural language processor”) of the digital assistant 106′ takes the sequence of words or tokens (“token sequence”) generated by the speech-to-text processing module 330, and attempts to associate the token sequence with one or more “actionable intents” recognized by the digital assistant. As used herein, an “actionable intent” represents a task that can be performed by the digital assistant 106′ and/or devices subject to control by the digital assistant system, and has an associated task flow implemented in the task flow models 354.’); and
“a device selection model for selecting the first voice recognition device, wherein the first voice recognition device is a voice recognition device that the user wants to use among a plurality of voice recognition devices” (par. 0032; ‘The voice-based digital assistant processes the selected audio stream containing the voice input, and the voice-based digital assistant selects, from among multiple endpoint devices, a destination device that is to output an audio output (e.g., a confirmation output, an informational answer, etc.) and/or perform a requested task.’; par. 0071; Task flow model).

Regarding claims 3 (dep. on claim 2), 9 (dep. on claim 8), Huang further teaches:
“wherein the intention analysis model includes: a specific user intention analysis model for analyzing intention of a specific user” (par. 0069; ‘In some embodiments, person-specific speech-to-text models are selected to perform the speech-to-text conversion based on the speaker recognition result.’); and
“a common user intention analysis model for analyzing individual intention of a common user that is another user in addition to the specific user, wherein the specific user is a user whose setting value is stored in the user setting storage unit, and the common user is a user whose setting value is not stored in the user setting storage unit” (Huang teaches that some embodiments use person-specific models. This suggests that there are embodiments in which non-person-specific models may be selected.; par. 0069; ‘In some embodiments, person-specific speech-to-text models are selected to perform the speech-to-text conversion based on the speaker recognition result.’).

Regarding claims 4 (dep. on claim 3), 10 (dep. on claim 9), Huang further teaches:
“wherein the device selection model includes: a specific device selection model for selecting the first voice recognition device among the plurality of voice recognition devices in response to the intention of the specific user” (par. 0069; ‘In some embodiments, person-specific speech-to-text models are selected to perform the speech-to-text conversion based on the speaker recognition result.’; par. 0076; ‘The natural language processor 332 can use the user-specific information to supplement the information contained in the user input to further define the user intent. In some embodiments, the user data also includes the user's specific voiceprint for user authentication or speech samples for speaker recognition training.’); and
“a common device selection model for selecting a voice recognition device corresponding to the individual intention of the common user among the plurality of voice recognition devices, wherein the voice recognition device corresponding to the individual intention of the common user includes at least one of the first voice recognition device and a second voice recognition device that is a device different from the first voice recognition device” (par. 0099; ‘In some embodiments, the output channel selection module 370 uses a set of prioritization rules to determine which of the set of available output endpoint devices identified by the destination device identification module 368 should be selected as the output device for the current audio output. The prioritization rules may be based on the location of the output devices relative to the location of the user (e.g., the device closest to the user is chosen), the audio quality of the output devices (e.g., the device with the highest sound quality is chosen), the type of output that is to be delivered to the user (e.g., different output devices are suitable for outputting alarm sound vs. music vs. speech), the power usage considerations, etc.’).

Regarding claim 6, Huang teaches:
“a plurality of voice recognition devices; a server networked with each of the plurality of voice recognition devices; and a user terminal that can perform data communication with the server and the voice recognition devices” (par. 0043; ‘The digital assistant system includes a client-side portion 102 (hereafter “digital assistant (DA) client 102”) executed on a user device 104 (e.g., a smartphone, a tablet, or a central communication hub), and a server-side portion 106 (hereafter “digital assistant (DA) server 106”) executed on a server system 108. T’), wherein the server includes:
“a user identification unit that identifies who a user is by using a voice spoken by the user; a user setting storage unit that stores a setting value of the user; a model storage unit that receives the voice from the user identification unit, analyzes intention of the user, and selects a first voice recognition device corresponding to the analyzed intention; a data learning unit that can change models stored in the model storage unit by artificial intelligence learning; and a processor that controls the first voice recognition device to execute a function corresponding to the voice or the intention, wherein the model storage unit selects the first voice recognition device based on a point in time when the voice is spoken from the user or a place where the user spoke the voice” (see claim 1).

Regarding claim 7 (dep. on claim 6), Huang further teaches:
“wherein the plurality of voice recognition devices include a TV, an air conditioner, an air cleaner, a refrigerator, a kimchi refrigerator, a water purifier, a dishwasher, a microwave, a washing machine, a dryer, a styler, a cleaning robot, a massage chair, a PC and a projector” (par. 0060; ‘n some implementations, the smart home environment 122 includes various devices 124, such as a plurality of appliances 212, such as refrigerators, stoves, ovens, televisions, washers, dryers, lights, stereos, intercom systems, garage-door openers, floor fans, ceiling fans, wall air conditioners, pool heaters, irrigation systems, security systems, space heaters, window AC units, motorized duct vents, and so forth. In some embodiments, some of the devices 124 may be intelligent, multi-sensing, and network enabled.’ See also par. 0051 and 0061). 

Regarding claim 12, Huang teaches:
“recognizing a voice of a user by the voice recognition devices” (speech input, par. 0069; ‘As shown in FIG. 3A, in some embodiments, the I/O processing module 328 interacts with the user, a user device (e.g., a user device 104 in FIG. 1B), and other devices (e.g., endpoint devices 124 in FIG. 1B) through the network communications interface to obtain user input (e.g., a speech input) and to provide responses to the user input.’)
“identifying who the user is through the voice” (Speaker recognition, par. 0069; ‘In some embodiments, when a user request is received by the I/O processing module 328 and the user request contains a speech input, the I/O processing module 328 forwards the speech input to speaker recognition module 340 for speaker recognition and subsequently to the speech-to-text (STT) processing module 330 for speech-to-text conversions.’);
“checking whether a setting value of the user is stored and determining whether to apply the setting value to selection for a first voice recognition device” (par. 0069; ‘In some embodiments, person-specific speech-to-text models are selected to perform the speech-to-text conversion based on the speaker recognition result.’);
“selecting the first voice recognition device” (par. 0032; ‘The voice-based digital assistant processes the selected audio stream containing the voice input, and the voice-based digital assistant selects, from among multiple endpoint devices, a destination device that is to output an audio output (e.g., a confirmation output, an informational answer, etc.) and/or perform a requested task.’);
“performing a response to the user by the first voice recognition device” (par. 0098; ‘The output channel selection module 370 selects one of the output endpoint devices that have been identified by the destination device identification module 368 as the endpoint device to output the audio output.’);
“executing a function corresponding to the voice of the user by the first voice recognition device” (par. 0071; ‘As used herein, an “actionable intent” represents a task that can be performed by the digital assistant 106′ and/or devices subject to control by the digital assistant system, and has an associated task flow implemented in the task flow models 354.’); and
“checking feedback of the user, wherein the first voice recognition device is a voice recognition device that the user wants to use” (par. 0085; ‘The I/O processing module 328 outputs follow-up questions to the user to clarify ambiguities in the user's earlier voice inputs, and obtain necessary parameters for completing a task that is requested by the user using the earlier voice inputs.’).

Regarding claim 13 (dep. on claim 12), Huang further teaches:
“selecting the first voice recognition device among the plurality of voice recognition devices in response to the intention of the specific user” (par. 0105; ‘When the digital assistant determines the user's intent and generates the audio output (e.g., today's weather report), the digital assistant selects a single output endpoint device from multiple available output end devices that are identified to be in the vicinity of the user.’); and
“selecting a voice recognition device corresponding to the individual intention of another user in addition to the specific user among the plurality of voice recognition devices” (Devices may be user-specific, which reads on “another user”; par. 0075; ‘In some embodiments, the named entity database 305 also includes the aliases of the home devices that are provided by individual users during the device registration stage for the different home devices.’).

Regarding claim 14 (dep. on claim 13), Huang further teaches:
“performing a response corresponding to the intention of the specific user by the first voice recognition device” and “performing a response corresponding to the individual intention by the voice recognition device corresponding to the individual intention of the common user” (par. 0115; ‘For example, the audio response to the first voice command is sent to the thermostat, while the audio response to the second voice command is sent to the refrigerator.’).

Regarding claim 15 (dep. on claim 13), Huang further teaches:
“identifying from which user the feedback generated” (par. 0085; ‘The I/O processing module 328 outputs follow-up questions to the user to clarify ambiguities in the user's earlier voice inputs, and obtain necessary parameters for completing a task that is requested by the user using the earlier voice inputs.’ See also par. 0080);
“classifying feedback generated from the specific user” (par. 0080; ‘Once answers are received from the user, the dialogue processing module 334 populates the structured query with the missing information, or passes the information to the task flow processor 336 to complete the missing information from the structured query.’);
“generating life pattern data of the specific user based on the feedback” (par. 0072; ‘The context information includes, for example, user preferences, hardware and/or software states of the user device, sensor information collected before, during, or shortly after the user request, prior interactions (e.g., dialogue) between the digital assistant and the user, and the like.’);
“analyzing the feedback and learning the intention of the specific user included in the feedback” (par. 0072; ‘The context information includes, for example, user preferences, hardware and/or software states of the user device, sensor information collected before, during, or shortly after the user request, prior interactions (e.g., dialogue) between the digital assistant and the user, and the like.’);
“determining whether the first voice recognition device corresponding to the intention of the specific user is selected” (par. 0099; ‘In some embodiments, the output channel selection module 370 uses a set of prioritization rules to determine which of the set of available output endpoint devices identified by the destination device identification module 368 should be selected as the output device for the current audio output.’); and
“updating an intention analysis model and a device selection model depending on the determining whether the first voice recognition device corresponding to the intention of the specific user is selected” (par. 0101; ‘In some embodiments, if the output channel selection module 370 determines that one of the available output devices is a mobile device that moves with the user, the output channel selection module chooses to use that mobile device as the output channel for a continuous exchange between the user and the digital assistant.’).

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 5 and 11 are rejected under 35 U.S.C. 103 as being unpatentable over Huang in view of Finkelstein et al. (US 20180260680 A1).

Regarding claims 5 (dep. on claim 1) and 11 (dep. on claim 6), Huang does not expressly teach:
“a user feedback analysis unit that collects and analyzes a reaction of the user, wherein the user feedback analysis unit compares the intention with the reaction of the user, and analyzes whether the device that the user wants to use matches the first voice recognition device.”
Finkelstein teaches:
 “a user feedback analysis unit that collects and analyzes a reaction of the user, wherein the user feedback analysis unit compares the intention with the reaction of the user, and analyzes whether the device that the user wants to use matches the first voice recognition device” (par. 0110; ‘Using this additional user input 98, the user intent confidence classifier 230 may determine that the previously derived user intent 84 was at least partially incorrect. In this example, while the portion of the derived user intent 84 related to providing driving directions was correct, the additional user input 98 correcting the destination indicates that the destination portion was incorrect. Accordingly, the user intent confidence classifier 230 may determine a user intent confidence value 234 of the previously derived user intent 84 that reflects the incomplete accuracy of the user intent.’).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date to modify Huang’s natural language processing module by incorporating Finkelstein’s user intent confidence classifier in order to determine whether the intent determined was correct.

Conclusion
Other prior art are listed in the PTO-892 for consideration.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MARK VILLENA whose telephone number is (571)270-3191. The examiner can normally be reached 10 am - 6pm EST Monday through Friday.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Richemond Dorvil can be reached on (571) 272-7602. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

MARK . VILLENA
Examiner
Art Unit 2658



/MARK VILLENA/Examiner, Art Unit 2658