DETAILED ACTION
1.	This communication is in response to the Amendments and Arguments filed on 11/24/2021. Claims 1-3, 5-20 are pending and have been examined. Claim 4 is cancelled.
Response to Amendments and Arguments
2.	Applicant's arguments with respect to claim rejections under 35 U.S.C. 103 have been fully considered, but they are not persuasive.
In particular, the applicant argues that the references do not teach “calculating a probability score of each of the plurality of candidate devices such that a specific device is selected among the plurality of candidate devices based on the calculated probability scores.”
In response, the examiner respectfully disagrees. Note that the above recited and underlined limitations are the old limitations before the filed amended limitations. As such, for the corresponding newly amended limitations, the applicant is referred to the rejection rationale (see bold text) detailed in the next section.
Claim Rejections - 35 USC § 103
3.	Claims 1-3, 5-20 are rejected under 35 U.S.C. 103 as being unpatentable over Weinstein, et al. (US 20150039299; hereinafter WEINSTEIN) in view of Mahoney (US 6813341; hereinafter MAHONEY).
As per claim 1, WEINSTEIN (Title: Context-based speech recognition) discloses “An information processing device, comprising: a processing unit configured to: calculate a selection score for each of a plurality of selection target candidates based on [ an intention of a first user] and utterance information associated with an utterance of the first user, wherein the utterance corresponds to an interaction with the first user; and select a selection target from among the plurality of selection target candidates based on the calculated selection score for each of the plurality of selection target candidates (WEINSTEIN, [0004], speech recognition systems may use neural networks that incorporate multiple speech recognition models <read on selection target candidates which can be broadly interpreted>. These neural networks can indicate likelihoods that audio data corresponds to certain units of speech. Input to a neural network used for speech recognition can include information about the acoustic features of a portion of an utterance, as well as context information about the utterance <read on interaction>; [0038-0040], The classifications may be represented by probabilities or likelihoods <read on selection score> that the inputs correspond to a particular speech recognition model .. selects a speech recognition model based on the output of the statistical .. select the speech recognition model having the highest probability).” 
WEINSTEIN does not expressly disclose “an intention of a first user ..” However, the feature is taught by MAHONEY (Title: Voice activated/voice responsive item locator). 
In the same field of endeavor, MAHONEY teaches: [col. 8, lines 58-67] “the voice data is sent to interpretation 67 <read on user’s intention>. If it is understood 69, an action 75 if process and feedback function 77 is performed. Additionally, signal 79 prompts user 71. Likewise, if the interpretation is not understood 69, user 71 is prompted and via signal 73 ..” [col. 7, lines 43-51] “recognition/non-recognition response 7 results from processing the user inputs to central processor 1, and audio and/or video response unit(s) 9 provide feedback 11 to the user, either by answering the inquiry, conditionally defaulting, e.g., asking for a repeat or a restate the question ..”

As per claim 2 (dependent on claim 1), WEINSTEIN in view of MAHONEY further discloses “calculate the selection score for each of the plurality of selection target candidates based on state information that indicates a current state of the first user or a surrounding state of the first user (WEINSTEIN, [0004], Input to a neural network used for speech recognition can include information about the acoustic features of a portion of an utterance, as well as context information about the utterance. The context information may be information about the speaker of the utterance .. the circumstances in which the utterance was made .. a geographic location of the source <read on state information which can be broadly interpreted>; [0038-0040], The classifications may be represented by probabilities or likelihoods <read on selection score> that the inputs correspond to a particular speech recognition model).” 
As per claim 3 (dependent on claim 2), WEINSTEIN in view of MAHONEY further discloses “calculate the selection score for each of   Reply to Office Action of August 25, 2021the plurality of selection target candidates based on action history information that indicates an action history of the first user (WEINSTEIN, [0004], Input to a neural network used for speech recognition can include .. context information about the utterance .. the context information may include .. a search history <read on action history> associated with the speaker).” 
As per claim 5 (dependent on claim 3), WEINSTEIN in view of MAHONEY further discloses “dynamically update the selection score for each of a plurality of selection target candidates based on the utterance information repeatedly obtained from the interaction with the first user, the state information, and the action history information; and select the selection target from among the plurality of selection target candidates based on the updated selection score (WEINSTEIN, [0004], Input to a neural network used for speech recognition can include information about the acoustic features of a portion of an utterance; [0038-0040], The classifications may be represented by probabilities or likelihoods that the inputs correspond to a particular speech recognition model <read on ‘dynamically update .. based on the utterance information repeatedly obtained’ for every new frame of the utterance processed> .. select the speech recognition model having the highest probability. Also see Claims 2, 3).” 
As per claim 6 (dependent on claim 5), WEINSTEIN in view of MAHONEY further discloses “receive new utterance information based on a response to the first user in a case where the selection target device is not able to be uniquely specified; and update the selection score for each of the plurality of selection target candidates based on a basis of new utterance information (Examiner’s Note: ‘the selection target device’ has antecedent issue subject to 35 USC 112b rejection; MAHONEY, [col. 8, lines 58-67], the voice data is sent to interpretation 67 .. if the interpretation is not understood 69, user 71 is prompted and via signal 73 ..; [col. 7, lines 43-51], recognition/non-recognition response 7 results from processing the user inputs to central processor 1, and audio and/or video response unit(s) 9 provide feedback 11 to the user, either by answering the inquiry, conditionally defaulting, e.g., asking for a repeat or a restate the question ..; WEINSTEIN, [0004], Input to a neural network used for speech recognition can include information about the acoustic features of a portion of an utterance <where for every new frame of the utterance obtained and processed, the selection score is updated>).” 
As per claim 7 (dependent on claim 6), WEINSTEIN in view of MAHONEY further discloses “acquire the new utterance information based on an inquiry to specify the selection target to the first user (MAHONEY, [col. 7, lines   46-51] , audio and/or video response unit(s) 9 provide feedback 11 to the user, either by answering the inquiry, conditionally defaulting, e.g., asking for a repeat or a restate the question, or fully defaulting, e.g. directing the user to a courtesy desk or check out counter for help. Also see Claim 6 rejection).”
As per claim 8 (dependent on claim 7), WEINSTEIN in view of MAHONEY further discloses “wherein the processing unit includes a state of each of the plurality of selection target candidates at a point of time of the inquiry as the inquiry to the first user (WEINSTEIN, [0004], The context information may be information about the speaker of the utterance .. the circumstances in which the utterance was made .. a geographic location of the source <read on state information which can be broadly interpreted>; MAHONEY, [col. 7, lines   46-51] , audio and/or video response unit(s) 9 provide feedback 11 to the user, either by answering the inquiry, conditionally defaulting, e.g., asking for a repeat or a restate the question .. <The applicant is requested to clarify ‘a state of each of the plurality of selection target candidates>).” 
As per claim 9 (dependent on claim 7), WEINSTEIN in view of MAHONEY further discloses “wherein the selection target is selectable at a point of time of the inquiry as the inquiry to the first user (WEINSTEIN, [0109], instructions .. to control the operation of, data processing apparatus; [0038-0040], select the speech recognition model having the highest probability <read on a ready mechanism to make selection>).”
As per claim 10 (dependent on claim 3), WEINSTEIN in view of MAHONEY further discloses “control an operation of the selection target (WEINSTEIN, [0109], instructions .. to control the operation of, data processing apparatus; [0038-0040], select the speech recognition model having the highest probability).”
claim 11 (dependent on claim 10), WEINSTEIN in view of MAHONEY further discloses “determine operation content of the selection target based on device information and the intention of the first user, wherein the device information is associated with the selection target, and control the operation of the selection target based on the determined operation content (WEINSTEIN, [0109], instructions .. to control the operation of, data processing apparatus; [0004], Input to a neural network used for speech recognition can include information about the acoustic features of a portion of an utterance <read on operation content which can be broadly interpreted>, as well as context information about the utterance; MAHONEY, [col. 8, lines 60-62], the voice data is sent to interpretation 67 <read on user’s intention>. If it is understood 69, an action 75 if process and feedback function 77 is performed; [col. 3, lines 20-25], Sensors continually monitor the driving condition of the motor vehicle in which the telephone is installed. When the operation of the steering wheel or brakes of the motor vehicle exceeds a predetermined threshold or the speed of the motor vehicle is excessive, the sensors generate safety signals that inhibit the operation of the telephone <The applicant is requested to clarify the ambiguous ‘device information’ which can be broadly interpreted>).”
As per claim 12 (dependent on claim 11), WEINSTEIN in view of MAHONEY further discloses “wherein, based on determination to not immediately execute the operation of the selection target, the processing unit is further configured to modify or redetermine the operation content based on the state information (WEINSTEIN, [0109], instructions .. to control the operation of, data processing apparatus <whether to execute a recognized instruction immediately or later is a system design choice>; [0004], Input to a neural network used for speech recognition can include information about the acoustic features of a portion of an utterance <read on operation content which can be broadly interpreted>, as well as context .”
As per claim 13 (dependent on claim 12), WEINSTEIN in view of MAHONEY further discloses “perform a feedback to the first user based on the state information, and modify or redetermine the operation content based on the utterance information obtained from a new utterance of the first user to the feedback (see Claims 2 and 6. MAHONEY, [col. 6, lines 11-14], user feedback features are included wherein both audio and visual feedback is given to a user in response to recognizable voice signals; [col. 7, lines 46-49], audio and/or video response unit(s) 9 provide feedback 11 to the user .. asking for a repeat or a restate the question <read on redetermination>; MAHONEY, [col. 3, lines 20-25], Sensors continually monitor the driving condition <read on ‘state’ - See Claim 2 ‘state information indicating a current state of the user or a surrounding state of the user’> of the motor vehicle in which the telephone is installed. When the operation of the steering wheel or brakes of the motor vehicle exceeds a predetermined threshold or the speed of the motor vehicle is excessive, the sensors generate safety signals that inhibit the operation of the telephone <read on feedback which can be broadly interpreted>).”
As per claim 14 (dependent on claim 11), WEINSTEIN in view of MAHONEY further discloses “wherein, based on determining to immediately execute the operation of the selection target .. control the operation of the selection target (WEINSTEIN, [0109], instructions .. to control the operation of, data processing apparatus; [0038-0040], the inputs correspond to a particular speech recognition model .. select the speech recognition model having the highest .” 

As per claim 15 (dependent on claim 11), WEINSTEIN in view of MAHONEY further discloses “wherein the state information includes position information that indicates a current position of the user, and .. control the operation of the selection target based on the operation content and the position information (WEINSTEIN, [0004], Input to a neural network used for speech recognition can include information about the acoustic features of a portion of an utterance <read on operation content>, as well as context information about the utterance .. The context information may be information about .. the circumstances in which the utterance was made .. a geographic location of the source).”  
As per claim 16 (dependent on claim 1), WEINSTEIN in view of MAHONEY further discloses “calculate the selection score for each of the plurality of the selection target candidates based on the registration information registered by the first user (WEINSTEIN, [0038-0040], The classifications may be represented by probabilities or likelihoods <read on score> that the inputs correspond to a particular speech recognition model; [0004], Input to a neural network used for speech recognition can include information about the acoustic features of a portion of an utterance, as well as context information about the utterance .. The context information may be information about the speaker of the utterance <where ‘speaker’ reads on user’s registration information).”
As per claim 17 (dependent on claim 2), WEINSTEIN in view of MAHONEY further discloses “obtain sensor information from a plurality of sensors; and estimate the state information based on the sensor information (WEINSTEIN, [0004], The context information may be information about .. the circumstances <read on ‘state’ which can be broadly interpreted> in which the utterance was made .. a geographic location of the source. MAHONEY, [col. 3, lines 20-25], Sensors continually monitor the driving condition <read on state> of the motor vehicle in which the telephone is installed. When the operation of the steering wheel or brakes of the motor vehicle exceeds a predetermined threshold or the speed of the motor vehicle is excessive, the sensors generate safety signals that inhibit the operation of the telephone).”
As per claim 18 (dependent on claim 3), WEINSTEIN in view of MAHONEY further discloses “wherein the action history information further includes an action history of a second user different from the first user (WEINSTEIN, [0004], Input to a neural network used for speech recognition can include information about the acoustic features of a portion of an utterance, as well as context information about the utterance .. the context information may include .. a search history associated with the speaker <read on ‘action history’ which can be broadly interpreted. Also, with the taught ready mechanism, the selection of which user is a system design choice>).”  
As per claim 19 (dependent on claim 4), WEINSTEIN in view of MAHONEY further discloses “wherein the information processing device is a home console, and the selection target includes a home appliance connected to the home console via a home network (WEINSTEIN, [Abstract], A processing system receives an audio signal encoding a portion of an utterance .. and generates a transcription for the utterance; [0109], instructions .. to control the operation of, data processing apparatus <read on any apparatus including a home appliance>; [0110], A computer program may be deployed to be executed on one computer or on multiple computers .” 
Claim 20 (similar in scope to claim 1) is rejected under the same rationale as applied above for claim 1.
Conclusion 
4.	THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).   
	A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 		
Any inquiry concerning this communication or earlier communications from the examiner should be directed to FENG-TZER TZENG whose telephone number is (571)272-4609. The examiner can normally be reached on M-F (8:30-5:00). The fax phone number where this application or proceeding is assigned is 571-273-4609.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Pierre-Louis Desir (SPE) can be reached on 571-272-7799. 
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications 

/FENG-TZER TZENG/		2/20/2022Primary Examiner, Art Unit 2659