DETAILED ACTION
This Office Action is in response to the correspondence filed by the applicant on 4/21/2022.
The Amendment filed on 4/21/2022 has been entered.  
Claims 1, 19, and 20 have been amended by Applicant.
Claim 21 is added by Applicant.
Claims 1-21 remain pending in the application of which Claims 1, 19, and 20 are independent.  
Applicant’s amendments and arguments are considered but are either unpersuasive or moot in view of the new grounds of rejection that were necessitated by the amendments to the Claims.   
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Priority
Receipt is acknowledged of certified copies of papers submitted under 35 U.S.C. 119(a)-(d), which papers have been placed of record in the file.

Response to Arguments
Regarding 103 rejections, Applicant’s arguments, pages 10-19 of Remarks (4/21/2022) with respect to rejections have been fully considered and are moot upon a further consideration and a new ground(s) of rejection made under AIA  35 U.S.C. 103 as being unpatentable over ROSARIO (US 2014/0244259 A1), and further in view of PARK (US 2016/0232894 A1) and KIM (US 2015/0081288 A1).  Please see the rejections below for more details.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.

Claims 1-2, 5-9, 11-16, 18-21 are rejected under 35 U.S.C. 103 as being unpatentable over ROSARIO (US 2014/0244259 A1), and further in view of PARK (US 2016/0232894 A1) and KIM (US 2015/0081288 A1).

REGARDING CLAIM 1, ROSARIO discloses an information processing device comprising: 
an acquisition unit (ROSARIO Fig. 1 – “Microphone(s) 142”) configured to acquire voice information acquired through voice input (ROSARIO Fig. 5; Par 59 – “Once speech input recognition has been activated, speech input may be recorded by one or more audio capture devices (e.g., microphones, etc.) at block 504. Speech input data collected by the audio capture devices may then be received by a suitable speech recognition module 135 or speech recognition engine for processing at block 506.”); and 
a control unit (ROSARIO Fig. 1 – “Processor(s) 105”) configured to control change in one or more parts of correspondence relations between the voice information and processes (ROSARIO Par 42 – “In certain embodiments, the contextual information may be utilized to adjust and/or modify the list or set of grammar elements. For example, contextual information may be continuously received, periodically received, and/or received based upon one or more identified or detected events (e.g., application outputs, gestures, received inputs, etc.). The received contextual information may then be utilized to adjust the orderings and/or priorities of the grammar elements.”; Par 55 – “Additionally, as desired, the set of grammar elements may be dynamically adjusted based upon the identification of a wide variety of additional information, such as additional contextual information and/or changes in the executing applications.”; Fig. 5 – “Determine corresponding grammar in grammar list 522”) based on the voice information (ROSARIO Fig. 5 – “Determine corresponding grammar in grammar list 522”; Par 61 – “At block 522, a grammar element (or plurality of grammar elements) included in the set of grammar elements that corresponds to the received speech input may be determined. A wide variety of suitable methods or techniques may be utilized to determine a grammar element. For example, at block 524, an accessed list of grammar elements may be traversed (e.g., sequentially evaluated starting from the beginning or top, etc.) until a best match or correspondence between a grammar element and the speech input is identified.”) in a selected set of the correspondence relations to be used in a speech recognition process (ROSARIO Fig. 5 – “Determine corresponding grammar in grammar list 522”; Par 10 – “Embodiments of the disclosure may provide systems, methods, and apparatus for dynamically maintaining a set or plurality of grammar elements utilized in association with speech recognition.”), 
wherein the change in the one or more parts of the correspondence relations in the selected set is based on object information of operation using the voice input (ROSARIO Figs. 4-5; Par 48 –“At block 405, one or more executing applications may be identified. A wide variety of applications may be identified as desired in various embodiments. For example, at block 410, one or more vehicle applications, such as a navigation application, a stereo control application, a climate control application, and/or a mobile device communications application, may be identified. As another example, at block 415, one or more run time or network applications may be identified. The run time applications may include applications executed by one or more processors and/or computing devices associated with a vehicle and/or applications executed by devices in communication with the vehicle (e.g., mobile devices, tablet computers, nearby vehicles, cloud servers, etc.). In certain embodiments, the run time applications may include any number of suitable browser-based and/or hypertext markup language (“HTML”) applications, such as Internet and/or cloud-based applications. During the identification of language models, as described in greater detail below with reference to block 430, one or more speech recognition language models associated with each of the applications may be identified or determined. In this regard, application-specific grammar elements may be identified for speech recognition purposes. As desired, various priorities and/or weightings may be determined for the various applications, for example, based upon user profile information and/or default profile information. In this regard, different priorities may be applied to the application language models and/or their associated grammar elements.”; Par 62 – “For example, the command “up” may be associated with a plurality of different applications, such as a stereo system application and/or an application that controls window functions. In the event that the last input element selected by a user is associated with a stereo system, a received command of “up” may be identified as a stereo system command, and the volume of the stereo may be increased.”) and subject information of the operation (ROSARIO Fig. 5; Par 60 – “For example, at block 512, at least one user, such as a speaker of the speech input, may be identified based upon one or more suitable identification techniques (e.g. an evaluation of image data, processing of speech data, etc.).”),
wherein the operation using the voice input is configured to be recognized based on the one or more changed parts of the correspondence relations in the selected set (ROSARIO Fig. 5 – “Determine corresponding grammar in grammar list 522 -> Identify a received command 528”; Par 61 – “At block 522, a grammar element (or plurality of grammar elements) included in the set of grammar elements that corresponds to the received speech input may be determined. A wide variety of suitable methods or techniques may be utilized to determine a grammar element. For example, at block 524, an accessed list of grammar elements may be traversed (e.g., sequentially evaluated starting from the beginning or top, etc.) until a best match or correspondence between a grammar element and the speech input is identified.”; Par 63 – “Once a grammar element (or plurality of grammar elements) corresponding to the speech input has been determined, a received command associated with the grammar element may be identified at block 528.”; Par 48 – “In this regard, application-specific grammar elements may be identified for speech recognition purposes.”; Par 49 –“ Once the one or more users have been identified, respective language models associated with each of the users may be identified and/or obtained (e.g., accessed from memory, obtained from a data source or user device, etc.). In this regard, user-specific grammar elements (e.g., user-defined commands, etc.) may be identified.”),
wherein the operation using the voice input includes an [activation] control operation (ROSARIO Par 62 – “For example, the command “up” may be associated with a plurality of different applications, such as a stereo system application and/or an application that controls window functions. In the event that the last input element selected by a user is associated with a stereo system, a received command of “up” may be identified as a stereo system command, and the volume of the stereo may be increased.”),
wherein the object information of the operation includes information specifying an attribute of an operation target (ROASRIO Par 13 – “Additionally, a wide variety of contextual information or environmental information may be determined or identified, such as identification information for one or more users, the identification information for one or more executing applications, actions taken by one or more executing applications, vehicle parameters (e.g., speed, current location, etc.), gestures made by a user, and/or a wide variety of user input (e.g., button presses, etc.).”; Par 39 – “Examples of configuration information include, but are not limited to, an identification of one or more users (e.g., a driver, a passenger, etc.), user profile information, user preferences and/or parameters associated with identifying speech input and/or obtaining language models, identifications of one or more executing applications (e.g., vehicle applications, run time applications), priorities associated with the applications, information associated with actions taken by the applications, one or more vehicle parameters (e.g., location, speed, etc.), and/or information associated with received user inputs (e.g., input element selections, gestures, etc.).”; Par 45 – “For example, by ordering grammar elements associated with the most recently activated applications and/or components higher in a list of grammar elements, the speech recognition module may be biased towards those grammar elements. Such an approach may apply the heuristic that speech input is most likely to be directed towards components and/or applications that have most recently come to a user's attention. For example, if a message has recently been output by an application or component, speech recognition may be biased towards commands associated with the application or component. As another example, if a user indication associated with a particular component or application has recently been identified, then speech recognition may be biased towards commands associated with the application or component.”) to be [activated] controlled by the [activation] control operation (ROSARIO Par 62 – “For example, the command “up” may be associated with a plurality of different applications, such as a stereo system application and/or an application that controls window functions. In the event that the last input element selected by a user is associated with a stereo system, a received command of “up” may be identified as a stereo system command, and the volume of the stereo may be increased.”),
wherein the subject information of the operation includes information specifying an attribute of a subject of the operation (ROSARIO Par 13 – “Additionally, a wide variety of contextual information or environmental information may be determined or identified, such as identification information for one or more users, the identification information for one or more executing applications, actions taken by one or more executing applications, vehicle parameters (e.g., speed, current location, etc.), gestures made by a user, and/or a wide variety of user input (e.g., button presses, etc.).”; Par 22 –“Additionally, the speech recognition module 135 may evaluate a wide variety of contextual information, such as user preferences, application identifications, application priorities, application outputs and/or actions, vehicle parameters (e.g., speed, current location, etc.), gestures made by a user, and/or a wide variety of user input (e.g., button presses, etc.), in order to order and/or sort the grammar elements.”; Par 39 – “Examples of configuration information include, but are not limited to, an identification of one or more users (e.g., a driver, a passenger, etc.), user profile information, user preferences and/or parameters associated with identifying speech input and/or obtaining language models, identifications of one or more executing applications (e.g., vehicle applications, run time applications), priorities associated with the applications, information associated with actions taken by the applications, one or more vehicle parameters (e.g., location, speed, etc.), and/or information associated with received user inputs (e.g., input element selections, gestures, etc.).”; Par 60 – “. For example, at block 512, at least one user, such as a speaker of the speech input, may be identified based upon one or more suitable identification techniques (e.g. an evaluation of image data, processing of speech data, etc.) … For example, a speaker of the speech input may be identified, and grammar elements may be accessed, sorted, and/or prioritized based upon the identity of the speaker.”), 
wherein the information specifying the attribute of the subject of the operation includes an <age> image of the subject of the operation (ROSARIO Par 27 – “For example, image data may be evaluated in order to identify users, detect user indications, and/or to detect user gestures.”; Par 34 – “The image sensors 210 may facilitate the collection of image data that may be evaluated for a wide variety of suitable purposes, such as user identification and/or the identification of user gestures.”; Par 49 – “At block 420, one or more users associated with the vehicle (or another speech recognition environment) may be identified. A wide variety of suitable methods and/or techniques may be utilized to identify a user. For example, a voice sample of a user may be collected and compared to a stored voice sample. As another example, image data for the user may be collected and evaluated utilizing suitable facial recognition techniques. As another example, other biometric inputs (e.g., fingerprints, etc.) may be evaluated to identify a user. As yet another example, a user may be identified based upon determining a pairing between the vehicle and a user device (e.g., a mobile device, etc.) and/or based upon the receipt and evaluation of user identification information (e.g., a personal identification number, etc.) entered by the user.”), and
wherein the acquisition unit and the control unit are each implemented via at least one processor (ROSARIO Fig. 1; Par 19 – “With reference to FIG. 1, the system may include one or more processors 105 and memory devices 110 (generally referred to as memory 110).”).
ROSARIO does not explicitly teach the [square-bracketed] and <angle-bracketed>limitations, but teaches the underlined features instead.

PARK discloses the [square-bracketed] limitations. PARK discloses a method/system for controlling multiple devices using voice commands and contextual information comprising, wherein the operation using the voice input includes an [activation] operation (PARK Fig. 5 – “Obtain state information of controllable device S501 -> Obtain grammar model information of controllable device S503-> Obtain pronunciation table S505 ….Generate final grammar model to perform speech recognition, by incorporating pieces of obtained grammar model information S511”; Fig. 9; Pars 123-126 – “Position Information 1 and 2, respectively, denote Room 1 and Room 2. Room 1 and Room 2 may be respectively pronounced to be “my room” and “living room” in the user's verbal command.  The speech recognition apparatus 910 may obtain grammar model information about controllable home appliances as shown in Table according to the verbal command. … TV1 @Room TV Power on …. Audio1 @Room Audio Power on …. TV 2 … @Room TV Power on ….”; Par 18 – “The information about the state of the at least one device may include at least one of information about an operation state of each device, information about whether each device is controllable, information about a position where each device is installed or connected, and an operation that is performable in each device.”; Par 84 – “For example, the speech recognition apparatus 110 may detect whether at least one device is installed in or detached from a slot. Alternatively, the speech recognition apparatus 110 may detect whether an application is installed at or removed from at least one device.”),
wherein the object information of the operation includes information specifying an attribute of an operation target (PARK Par 62 – “For example, when a text string corresponding to a position of the device is to be inserted in the pattern information, the speech recognition apparatus 110 may determine the text string to be inserted in the pattern information according to the information about the installation or connection of each device. In other words, the speech recognition apparatus 110 may determine a text string indicating a position where the device is installed or connected, as the text string to be inserted in the pattern information.”; Par 69 – “The grammar model information may be configured of at least one of command models, as shown in Table 2. The command model of each device may be formed of text strings divided by “I”. Also, “@Pat1” and “@Pat2” included in some command models are pattern information and a text string determined according to the state information may be inserted in the pattern information. “word1_1”, “word1_2”, etc. denote text strings signifying commands. For example, “word1_1”, “word1_2”, etc. may include commands such as “make screen brighter”, “turn off power”, etc.”; Par 81 – “In Operation S511, the speech recognition apparatus 110 may incorporate pieces of the grammar model information of a device generated in Operation S509, and may generate a final grammar model to perform speech recognition from the incorporated grammar model information. For example, a final grammar model to perform speech recognition may be generated from the final grammar model information as shown in Table 5.”) to be [activated] by the [activation] operation (PARK Fig. 9; Pars 123-126 – “Position Information 1 and 2, respectively, denote Room 1 and Room 2. Room 1 and Room 2 may be respectively pronounced to be “my room” and “living room” in the user's verbal command.  The speech recognition apparatus 910 may obtain grammar model information about controllable home appliances as shown in Table according to the verbal command. … TV1 @Room TV Power on …. Audio1 @Room Audio Power on …. TV 2 … @Room TV Power on ….”)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the method/system of ROSARIO to include utilizing a device state information for modifying a grammar to include activation command of the device, as taught by PARK.
One of ordinary skill would have been motivated to include utilizing a device state information for modifying a grammar to include activation command of the device, in order to reduce a possibility of misrecognition during speech recognition (Par 6).

KIM discloses the <angle-bracketed> limitations. KIM discloses a method/system for recognizing speech of a user using different models based on contextual information comprising: wherein the information specifying the attribute of the subject of the operation includes an <age> of the subject of the operation (KIM Par 41 – “The age data may estimate the age of a speaker using a facial recognition technology. The facial recognition technology has been widely developed in various fields such as image processing, pattern recognition, computer vision and neural network and has been variously applied to identify a person, track the same person or restore a face, etc. using a still picture or video.”; Par 44 – “Accordingly, the second estimation unit 122 b can select an acoustic and language model representing a proper gender and age range of the speaker from the pre-classified acoustic and language models by transmitting information as side information for speech recognition, which estimates gender and age of the speaker, along with speech as performance to estimate gender and age is improved using a camera in facial recognition.”; Par 56 – “Referring to FIG. 3, the speech recognition device, when a language model is selected in S120, estimates location and position of the speech recognition terminal 10 on the basis of the location data (S210), estimates age of the speaker on the basis of the image data (S220), and selects a language model on the basis of the location and position and the age (S230).”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the method/system of ROSARIO in view of PARK to include age information of a user, as taught by KIM.
One of ordinary skill would have been motivated to include age information of a user, in order to accurately recognize the speech of the user (Par 21).



REGARDING CLAIM 2, ROSARIO in view of PARK and KIM discloses the information processing device according to claim 1, 
wherein the correspondence relations related to change (ROSARIO Par 42 – “In certain embodiments, the contextual information may be utilized to adjust and/or modify the list or set of grammar elements. For example, contextual information may be continuously received, periodically received, and/or received based upon one or more identified or detected events (e.g., application outputs, gestures, received inputs, etc.). The received contextual information may then be utilized to adjust the orderings and/or priorities of the grammar elements.”; Par 55 – “Additionally, as desired, the set of grammar elements may be dynamically adjusted based upon the identification of a wide variety of additional information, such as additional contextual information and/or changes in the executing applications.”; Fig. 5 – “Determine corresponding grammar in grammar list 522”) include at least one correspondence relation decided based on usage information regarding the correspondence relation in the speech recognition process (ROSARIO Par 62 – “For example, the command “up” may be associated with a plurality of different applications, such as a stereo system application and/or an application that controls window functions. In the event that the last input element selected by a user is associated with a stereo system, a received command of “up” may be identified as a stereo system command, and the volume of the stereo may be increased.”; Par 35 – “For example, a last selected input element or an input element selected during the receipt of a speech input (or relatively close in time following the receipt of a speech input) may be evaluated in order to identify a grammar element or command associated with the speech input.”) regarding the operation estimated from the object information of the operation (ROSARIO Fig. 5; Par 60 – “As another example, at block 514, any number of application operations and/or parameters may be identified, such as a message or warning generated by an application or a request for input generated by an application. As another example, at block 516, a wide variety of vehicle parameters (e.g., a location, a speed, an amount of remaining fuel, etc.) may be identified.”) or the subject information of the operation (ROSARIO Fig. 5; Par 60 – “For example, at block 512, at least one user, such as a speaker of the speech input, may be identified based upon one or more suitable identification techniques (e.g. an evaluation of image data, processing of speech data, etc.).”).


REGARDING CLAIM 5, ROSARIO in view of PARK and KIM discloses the information processing device according to claim 1, wherein the control unit (ROSARIO Fig. 1 – “Processor(s) 105”) further controls change in the selected set of the correspondence relations (ROSARIO Par 42 – “In certain embodiments, the contextual information may be utilized to adjust and/or modify the list or set of grammar elements. For example, contextual information may be continuously received, periodically received, and/or received based upon one or more identified or detected events (e.g., application outputs, gestures, received inputs, etc.). The received contextual information may then be utilized to adjust the orderings and/or priorities of the grammar elements.”; Par 55 – “Additionally, as desired, the set of grammar elements may be dynamically adjusted based upon the identification of a wide variety of additional information, such as additional contextual information and/or changes in the executing applications.”; Fig. 5 – “Determine corresponding grammar in grammar list 522”) based on the object information of the operation (ROSARIO Fig. 5; Par 60 – “As another example, at block 514, any number of application operations and/or parameters may be identified, such as a message or warning generated by an application or a request for input generated by an application. As another example, at block 516, a wide variety of vehicle parameters (e.g., a location, a speed, an amount of remaining fuel, etc.) may be identified.”; Par 62 – “For example, the command “up” may be associated with a plurality of different applications, such as a stereo system application and/or an application that controls window functions. In the event that the last input element selected by a user is associated with a stereo system, a received command of “up” may be identified as a stereo system command, and the volume of the stereo may be increased.”) or the subject information of the operation (ROSARIO Fig. 5; Par 60 – “For example, at block 512, at least one user, such as a speaker of the speech input, may be identified based upon one or more suitable identification techniques (e.g. an evaluation of image data, processing of speech data, etc.).”).


REGARDING CLAIM 6, ROSARIO in view of PARK and KIM discloses the information processing device according to claim 5, wherein the change in the selected set of the correspondence relations includes change into a different-sized set of the correspondence relations (ROSARIO Par 42 – “As another example, if an application is closed or terminated, grammar elements associated with the application may be removed from the set of grammar elements.”; Par 51 – “For example, if the location information indicates that the vehicle is situated at or near San Francisco, one or more language models relevant to traveling in San Francisco may be identified, such as language models that include grammar elements associated with landmarks, points of interest, and/or features of interest in San Francisco. Example grammar elements for San Francisco may include, but are not limited to, “golden gate park,” “north beach,” “pacific height,” and/or any other suitable grammar elements associated with various points of interest. In certain embodiments, one or more user preferences may be taken into consideration during the identification of language models. For example, a user may specify that language models associated with tourist attractions should be obtained in the event that the vehicle travels outside of a designated home area. Additionally, once language models associated with a particular location are no longer relevant (i.e., the vehicle location has changed, etc.), the language models may be discarded.”).


REGARDING CLAIM 7, ROSARIO in view of PARK and KIM discloses the information processing device according to claim 1, wherein the correspondence relation is changed  (ROSARIO Par 42 – “In certain embodiments, the contextual information may be utilized to adjust and/or modify the list or set of grammar elements. For example, contextual information may be continuously received, periodically received, and/or received based upon one or more identified or detected events (e.g., application outputs, gestures, received inputs, etc.). The received contextual information may then be utilized to adjust the orderings and/or priorities of the grammar elements.”; Par 55 – “Additionally, as desired, the set of grammar elements may be dynamically adjusted based upon the identification of a wide variety of additional information, such as additional contextual information and/or changes in the executing applications.”; Fig. 5 – “Determine corresponding grammar in grammar list 522”) through communication (ROSARIO Fig. 5 – “Determine corresponding grammar in grammar list 522”; Par 61 – “At block 522, a grammar element (or plurality of grammar elements) included in the set of grammar elements that corresponds to the received speech input may be determined. A wide variety of suitable methods or techniques may be utilized to determine a grammar element. For example, at block 524, an accessed list of grammar elements may be traversed (e.g., sequentially evaluated starting from the beginning or top, etc.) until a best match or correspondence between a grammar element and the speech input is identified.”).


REGARDING CLAIM 8, ROSARIO in view of PARK and KIM discloses the information processing device according to claim 1, wherein the object information of the operation further includes information for specifying an identity of the operation target (ROSARIO Fig. 5; Par 45 – “For example, by ordering grammar elements associated with the most recently activated applications and/or components higher in a list of grammar elements, the speech recognition module may be biased towards those grammar elements. Such an approach may apply the heuristic that speech input is most likely to be directed towards components and/or applications that have most recently come to a user's attention. For example, if a message has recently been output by an application or component, speech recognition may be biased towards commands associated with the application or component. As another example, if a user indication associated with a particular component or application has recently been identified, then speech recognition may be biased towards commands associated with the application or component.”).


REGARDING CLAIM 9, ROSARIO in view of PARK and KIM discloses the information processing device according to claim 8, wherein the operation target includes an application or an apparatus (ROSARIO Fig. 5; Par 60 – “As another example, at block 514, any number of application operations and/or parameters may be identified, such as a message or warning generated by an application or a request for input generated by an application. As another example, at block 516, a wide variety of vehicle parameters (e.g., a location, a speed, an amount of remaining fuel, etc.) may be identified.”; Par 62 – “For example, the command “up” may be associated with a plurality of different applications, such as a stereo system application and/or an application that controls window functions. In the event that the last input element selected by a user is associated with a stereo system, a received command of “up” may be identified as a stereo system command, and the volume of the stereo may be increased.”).


REGARDING CLAIM 11, ROSARIO in view of PARK and KIM discloses the information processing device according to claim 1, wherein the subject information of the operation includes information for specifying a state of a subject of the operation (ROSARIO Par 56 – “As another example, a most recently identified user gesture or user input may be evaluated in order to provide grammar elements associated with the gesture or input with a higher priority. For example, if a user gestures (e.g., gazes, points at, etc.) towards a stereo system, grammar elements associated with a stereo application may be provided with higher priorities.”; Par 12 – “The grammar elements may be associated with a wide variety of different language models identified by the speech recognition system, such as language models associated with one or more users, language models associated with any number of executing applications, and/or language models associated with a current location (e.g. a location of a vehicle, etc.).”).


REGARDING CLAIM 12, ROSARIO in view of PARK and KIM discloses the information processing device according to claim 11, 
wherein the state of the subject of the operation includes an action, an attitude, or a position of the subject of the operation (ROSARIO Par 34 – “In certain embodiments, a user gesture may indicate when speech input recognition should begin and/or terminate. In other embodiments, a user gesture may provide contextual information associated with the processing of speech inputs. For example, a user may gesture towards a sound system (or a designated area associated with the sound system) to indicate that a speech input is associated with the sound system.”; Par 35 – “In certain embodiments, a gesture towards a display (e.g., pointing at a display, gazing towards the display, etc.) may be identified and evaluated as suitable contextual information.”; Par 12 – “The grammar elements may be associated with a wide variety of different language models identified by the speech recognition system, such as language models associated with one or more users, language models associated with any number of executing applications, and/or language models associated with a current location (e.g. a location of a vehicle, etc.).”).




REGARDING CLAIM 13, ROSARIO in view of PARK and KIM discloses the information processing device according to claim 1, wherein the subject information of the operation includes information for specifying an environment around a subject of the operation (ROSARIO Par 13 – “Additionally, a wide variety of contextual information or environmental information may be determined or identified, such as identification information for one or more users, the identification information for one or more executing applications, actions taken by one or more executing applications, vehicle parameters (e.g., speed, current location, etc.), gestures made by a user, and/or a wide variety of user input (e.g., button presses, etc.).”; Par 42 – “A wide variety of contextual information may be collected as desired in various embodiments of the invention, such as an identification of one or more users (e.g., an identification of a speaker), information associated with status changes of applications (e.g. newly executed applications, terminated applications, etc.), information associated with actions taken by the applications, one or more vehicle parameters, (e.g., location, speed, etc.), and/or information associated with received user inputs (e.g., input element selections, gestures, etc.).”).


REGARDING CLAIM 14, ROSARIO in view of PARK and KIM discloses the information processing device according to claim 1, 
wherein the subject information of the operation further includes information for specifying an identity of the subject of the operation (ROSARIO Par 60 – “For example, at block 512, at least one user, such as a speaker of the speech input, may be identified based upon one or more suitable identification techniques (e.g. an evaluation of image data, processing of speech data, etc.).”; Par 42 – “A wide variety of contextual information may be collected as desired in various embodiments of the invention, such as an identification of one or more users (e.g., an identification of a speaker), information associated with status changes of applications (e.g. newly executed applications, terminated applications, etc.), information associated with actions taken by the applications, one or more vehicle parameters, (e.g., location, speed, etc.), and/or information associated with received user inputs (e.g., input element selections, gestures, etc.).”).


REGARDING CLAIM 15, ROSARIO in view of PARK and KIM discloses the information processing device according to claim 1, wherein the object information of the operation (ROSARIO Fig. 5; Par 60 – “As another example, at block 514, any number of application operations and/or parameters may be identified, such as a message or warning generated by an application or a request for input generated by an application. As another example, at block 516, a wide variety of vehicle parameters (e.g., a location, a speed, an amount of remaining fuel, etc.) may be identified.”; Par 62 – “For example, the command “up” may be associated with a plurality of different applications, such as a stereo system application and/or an application that controls window functions. In the event that the last input element selected by a user is associated with a stereo system, a received command of “up” may be identified as a stereo system command, and the volume of the stereo may be increased.”) or the subject information of the operation (ROSARIO Fig. 5; Par 60 – “For example, at block 512, at least one user, such as a speaker of the speech input, may be identified based upon one or more suitable identification techniques (e.g. an evaluation of image data, processing of speech data, etc.).”) includes information estimated based on information acquired with regard to an object or a subject of the operation (ROSARIO Par 49 – “At block 420, one or more users associated with the vehicle (or another speech recognition environment) may be identified. A wide variety of suitable methods and/or techniques may be utilized to identify a user. For example, a voice sample of a user may be collected and compared to a stored voice sample. As another example, image data for the user may be collected and evaluated utilizing suitable facial recognition techniques. As another example, other biometric inputs (e.g., fingerprints, etc.) may be evaluated to identify a user.”).


REGARDING CLAIM 16, ROSARIO in view of PARK and KIM discloses the information processing device according to claim 1, wherein the object information of the operation or the subject information of the operation includes information acquired through the speech recognition process (ROSARIO Par 60 – “For example, at block 512, at least one user, such as a speaker of the speech input, may be identified based upon one or more suitable identification techniques (e.g. an evaluation of image data, processing of speech data, etc.).”; Par 62 – “Accordingly, when a command of “tune up” is received, it may be determined that the command is associated with an application that schedules maintenance at a dealership and/or that maps a route to a service provider as opposed to a command that alters the tuning of a stereo system.”).

REGARDING CLAIM 18, ROSARIO in view of PARK and KIM discloses the information processing device according to claim 1, wherein the voice information related to the correspondence relation includes voice information indicating a start of the operation or voice information indicating a content of the operation (ROSARIO Par 46 – “For example, an identified grammar element or command may be translated into an input that is provided to an executing application. In this regard, voice commands may be identified and dispatched to relevant applications. Additionally, in certain embodiments, a recognized speech input may be processed in order to generate output information (e.g., audio output information, display information, messages for communication, etc.) for presentation to a user. For example, an audio output associated with the recognition and/or processing of a voice command may be generated and output. As another example, a visual display may be updated based upon the processing of a voice command. The method 300 may end following block 330.”).

REGARDING CLAIM 19, ROSARIO in view of PARK and KIM discloses an information processing method comprising, by using a processor (ROSARIO Fig. 1 – “Processor(s) 105”): performing the functions of Claim 1; thus, it is rejected under the same rationale.

REGARDING CLAIM 20, ROSARIO in view of PARK and KIM discloses a non-transitory computer-readable storage medium having embodied thereon program, which when executed by a computer causes the computer to execute a method, the method comprising: the functions of Claim 1; thus, it is rejected under the same rationale.

REGARDING CLAIM 21, ROSARIO in view of PARK and KIM discloses the information processing device according to claim 1, wherein information specifying the attribute of the subject of the operation is determined by face recognition (ROSARIO Par 27 – “For example, image data may be evaluated in order to identify users, detect user indications, and/or to detect user gestures.”; Par 34 – “The image sensors 210 may facilitate the collection of image data that may be evaluated for a wide variety of suitable purposes, such as user identification and/or the identification of user gestures.”; Par 49 – “At block 420, one or more users associated with the vehicle (or another speech recognition environment) may be identified. A wide variety of suitable methods and/or techniques may be utilized to identify a user. For example, a voice sample of a user may be collected and compared to a stored voice sample. As another example, image data for the user may be collected and evaluated utilizing suitable facial recognition techniques. As another example, other biometric inputs (e.g., fingerprints, etc.) may be evaluated to identify a user. As yet another example, a user may be identified based upon determining a pairing between the vehicle and a user device (e.g., a mobile device, etc.) and/or based upon the receipt and evaluation of user identification information (e.g., a personal identification number, etc.) entered by the user.”).



Claim 3 is rejected under 35 U.S.C. 103 as being unpatentable over ROSARIO in view of PARK and KIM, and further in view of TOMSA (US 2017/0133015 A1).

REGARDING CLAIM 3, ROSARIO in view of PARK and KIM discloses the information processing device according to claim 2. 
ROSARIO does not explicitly teach specifying frequency of usage.
TOMSA discloses a method/system for context-augmented speech recognition, wherein the usage information includes information for specifying frequency of usage (TOMSA Par 40 – “Based on an initial context associated with the characteristics, among other things, a location specific vocabulary can be obtained. Other contextual clues, e.g., without limitation, time of day, day of week, date, month, year, weather, proximity of other users, social media profiles, and user personal information may also be used to augment the vocabulary set or change which vocabularies are selected. Over time, crowd-sourcing can be used to determine any words or context that may be frequent to the location, but not yet identified. This crowd-sourced data can also be used to refine vocabularies for both the specific location and for characteristics of the location.”; Par 64 – “For example, if a “dinosaur” vocabulary were affiliated with paleontology wings of museums, it might be observed that the word “brontosaurus” was infrequently or never used (since it has been determined that a brontosaurus is actually a combination of two dinosaurs). At a large museum, where a number of words are commonly used (such that a meaningful distinction can be drawn between infrequently used words and frequently used words), it may be that “brontosaurus” decays sufficiently based on usage to be removed from the dinosaur vocabulary.”; Par 66 – “If a new artist becomes popular, frequent enough usage of the artist name or the name of a work may cause the addition of that word to a vocabulary. But there may also be individual vocabularies associated with specific artists.”; Par 78 – “Selection of any of the contextual identifiers may result in inclusion of the affiliated word. Inclusion thresholds may also be included with the context identifiers when the context group is defined, and frequency values may be associated with each word-context pair, such that only words of a certain frequency used within a certain context are selected.”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the method/system of ROSARIO in view of KIM to include specifying frequency of usage, as taught by TOMSA.
One of ordinary skill would have been motivated to include specifying frequency of usage, in order to increase speech recognition accuracy and speed (Par 90).




Claim 4 is rejected under 35 U.S.C. 103 as being unpatentable over ROSARIO in view of PARK and KIM, and further in view of FAABORG (US 8,938,394 B1).

REGARDING CLAIM 4, ROSARIO in view of PARK and KIM discloses the information processing device according to claim 2.
ROSARIO in view of KIM does not explicitly teach specifying whether usage is permitted.
FAABORG discloses a method/system for speech recognition based on context, wherein the usage information includes information for specifying whether usage is permitted (FAABORG Figs. 1-4 – “Enabled?”; Col 7:47-58– “Contextual audio trigger 22N, however, remains disabled. Contextual audio trigger 22N may remain disabled as a result of audio trigger module 12 determining that it is unlikely that a user of computing device 2 would use audio trigger 22N in the current context.”; Col 21:50-63– “Contextual audio triggers 104A and 104B remain disabled. This may be because contextual audio triggers 104A and 104B were not associated with the correct contextual category values. For instance, it may be unlikely that, when travelling by automobile, the user would desire to cause an alarm clock application to postpone output of an alarm notification. As another example, it may be unlikely that the user would desire to pause a navigation route when computing device 2 is currently inactive (e.g., and is not currently executing a navigation application).”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the method/system of ROSARIO in view of KIM to include specifying whether usage is permitted, as taught by FAABORG.
One of ordinary skill would have been motivated to include specifying whether usage is permitted, in order to avoid performing unintended actions (Col 1:5-17).



Claim 10 is rejected under 35 U.S.C. 103 as being unpatentable over ROSARIO in view of PARK and KIM, and further in view of JUNEJA (US 2012/0215539 A1).

REGARDING CLAIM 10, ROSARIO discloses the information processing device according to claim 1.
ROSARIO in view of PARK and KIM does not explicitly teach a capability of communication.
JUNEJA discloses a method/system for switching between a local-side recognizer and a server-side recognizer, wherein the control unit further controls change in the correspondence relation (JUNEJA Par 26 – “Depending upon the nature of an application that invokes a speech recognizer or other speech recognition functionality consistent with the current subject matter, a dialogue script designer, developer, or development team can decide or otherwise define, for example during design time (e.g. prior to run time), at what point within a received utterance a switch should be made between processing of speech recognition-related tasks at the thin client (e.g. the mobile device) and the server or servers.”; Par 27 – “Also possible within one or more implementations of the current subject matter is the ability to switch between languages mid-sentence or elsewhere within a single speech utterance, and/or to use multiple speech recognizers in the same language in parallel to boost the accuracy rate and/or determine an appropriate acoustic profile and fundamental frequency for the speaker who has created the speech utterance.”) based on whether the information processing device is capable of communication (JUNEJA Par 15 – “A server-based speech recognizer can provide higher accuracy rates and a larger vocabulary, but may not always be available to respond to user demands due to data network availability or reliability issues.” Par 25 – “In some implementations, tasks that involve using multiple speech recognizers at once in different languages can be processed on either or both of the server side and the thin client computing device, depending on, for example, the processing power of the thin client computing device, the quality of the available network connection, network bandwidth limitations, and the like.”; Par 39 – “Similarly, the type of verbal inputs expected to be received and processed by the speech recognition functionality can be used in determining appropriate thresholds, as can the availability of necessary data for use in adapted language models, availability and cost of network access and bandwidth, and the like. …. can be identified as having low quality necessitating processing at the one or more servers 204, where greater processing power is available. A third speech utterance segment having a signal to noise ratio of greater than the next threshold (e.g. approximately 60 db) can be identified as having high quality permitting processing at the thin client computing device or terminal 202 as less processing power is expected to be necessary.”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the method/system of ROSARIO in view of KIM to include a capability of communication, as taught by JUNEJA.
One of ordinary skill would have been motivated to include a capability of communication, in order to take advantage of processing power both at a thin client computing terminal and in one or more servers (Par 15).



Claim 17 is rejected under 35 U.S.C. 103 as being unpatentable over ROSARIO in view of PARK and KIM, and further in view of WOLFF (US 2015/0046157 A1).

REGARDING CLAIM 17, ROSARIO in view of PARK and KIM discloses the information processing device according to claim 1.
ROSARIO in view of PARK and KIM does not explicitly teach a notification.
WOLFF disclose a method/system for switching between two modes of speech recognition associated with different sizes of recognition vocabulary further comprising a notification control unit configured to control notification to a subject of the operation regarding the change in the correspondence relation (WOLFF Par 17 – “The interface system can switch modes based on different switching cues: dialog-state, certain activation words, or visual gestures. The different listening modes may also use different recognition vocabularies, for example, a limited vocabulary in broad listening mode and a larger recognition vocabulary in selective listening mode. To limit the speech inputs to a specific speaker, the system may use acoustic speaker localization and/or video processing means to determine speaker position.”; Par 26 – “It may be useful for the interface to communicate to the specific speaker when the device is in selective listening mode and listening only to him. There are several different ways in which this can be done. For example, a visual display may show a schematic image of the room scene with user highlighting to identify the location of the selected specific speaker. Or more simply, a light bar display can be intensity coded to indicate that spatial direction of the selected specific speaker. Or an avatar may be used to deliver listening mode feedback as part of a dialog with the user(s).”), wherein the notification control unit is implemented via at least one processor (WOLFF Par 29 – “Such implementation may include a series of computer instructions fixed either on a tangible medium, such as a computer readable medium (e.g., a diskette, CD-ROM, ROM, or fixed disk) or transmittable to a computer system, via a modem or other interface device, such as a communications adapter connected to a network over a medium. The medium may be either a tangible medium (e.g., optical or analog communications lines) or a medium implemented with wireless techniques (e.g., microwave, infrared or other transmission techniques). … ”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the method/system of ROSARIO in view of KIM to include a notification, as taught by WOLFF.
One of ordinary skill would have been motivated to include a notification, in order to provide a user useful information as part of a dialog with the user (Par 26).


Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JONATHAN C KIM whose telephone number is (571)272-3327. The examiner can normally be reached Monday to Friday 8:00 AM thru 4:00 PM EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Andrew C Flanders can be reached on 571-272-7516. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/JONATHAN C KIM/Primary Examiner, Art Unit 2655