DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
All objections/rejections not mentioned in this Office Action have been withdrawn by the Examiner.

Information Disclosure Statement
The information disclosure statement(s) (IDS) submitted on August 3, 2022 is/are being considered by the examiner.

Response to Amendments 
Applicant’s amendment filed on August 3, 2022 has been entered. 
After entry of the amendment, claims 1-18 remain pending.
In view of the amendment to the claim(s), the amendment of claim(s) 1, 12, 17, and 18 have been acknowledged and entered.  
In view of the amendment to claim(s) 12, the rejection of claim(s) 12 under 35 U.S.C. §112 is withdrawn.
In view of the amendment to claim(s) 1, 17, and 18, the rejection of claims 1-18 under 35 U.S.C. §103 is withdrawn.
In light of the amended claims, new grounds for rejection under 35 U.S.C. §103 are provided in the response below. 

Response to Arguments
Prior to entry of this response, claims 1, 3, 5-13, and 16-18 stand rejected under 35 U.S.C. § 103 as being unpatentable over McDuff (U.S. Pat. App. Pub. No. 2020/0279553, hereinafter McDuff) in view of Zhang (U.S. Pat. App. Pub. No. 2018/0054523, hereinafter Zhang), claims 2 and 4 stand rejected under 35 U.S.C. § 103 as being unpatentable over McDuff in view of Zhang, and further in view of Motomura (U.S. Pat. App. Pub. No. 2016/0343372, hereinafter Motomura), claim 14 stands rejected under 35 U.S.C. § 103 as allegedly being unpatentable over McDuff in view of Zhang, and further in view of Herold (U.S. Pat. App. Pub. No. 2018/0233132, hereinafter Herold), and claim 15 stands rejected under 35 U.S.C. § 103 as allegedly being unpatentable over McDuff in view of Zhang, and further in view of Kennewick (U.S. Pat. App. Pub. No. 2016/0148610, hereinafter Kennewick).
Applicant’s arguments regarding the prior art rejections under 35 U.S.C. §103, see pages 10-12 of the Response to Non-Final Office Action dated June 3, 2022, which was received on August 3, 2022 (hereinafter Response and Office Action, respectively), have been fully considered.
With respect to the rejection(s) of claim(s) 1, and mutatis mutandis claims 17 and 18, under 35 U.S.C. §103 in light of McDuff in view of Zhang, applicant fails to find the teaching or suggestion in McDuff for “generate a familiarity measure for the at least one intent,” “generating the familiarity measure for the at least one intent comprises: processing, using a machine learning model, a plurality of parameters to generate the familiarity measure,” or “wherein the plurality of parameters processed using the machine learning model to generate the familiarity measure include one or more intent specific parameters that are based on historical interactions, of the user with the automated assistant, for the at least one intent,” as recited in claim 1. However, this argument is not persuasive. 
The rejection, as originally presented in the Office Action, has been further clarified in the response below. The mappings of the references to the above claim limitations are explained in greater detail below, as amended in light of the claim amendments submitted in this response. Applicant is invited to review said mappings and explanations, such that any further confusion regarding the mapping of the references to the instant application can be specifically discussed, as appropriate. Therefore, the rejection of said limitations in light of McDuff is maintained. 
Further with respect to the rejection(s) of claim(s) 1, and mutatis mutandis claims 17 and 18, under 35 U.S.C. §103 in light of McDuff in view of Zhang, applicant argues that McDuff and Zhang fail to teach or suggest “processing the user input to determine at least one intent associated with the user input, wherein the at least one intent corresponds to controlling one or more connected smart devices" as recited in claim 1. Applicant’s arguments in light of the amended claims are persuasive. As such, the rejections of claims 1, 17, and 18 under 35 U.S.C. §103 are withdrawn. 
Applicant further argues that the rejection of dependent claims 2-16 should be withdrawn for at least the same reasons as independent claim 1. Applicant’s arguments in light of the amended claims are persuasive. As such, the rejections of claims 2-16 under 35 U.S.C. §102/ 35 U.S.C. §103 are withdrawn.
However, upon further consideration, new ground(s) of rejection under 35 U.S.C. §103 are made in light of combinations of McDuff, Zhang, Motomura, Herold, Kennewick, and newly cited reference Li (U.S. Pat. App. Pub. No. 2019/0206411, hereinafter Li).
The Applicant has not provided any further statement and therefore, the Examiner directs the Applicant to the below rationale.	

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claim 1, 3, 5-13 and 16-18 is/are rejected under 35 U.S.C. 103 as being unpatentable over McDuff in view of Zhang and Li.

Regarding claim 1, McDuff discloses A method implemented by one or more processors, the method comprising (the systems and methods described with reference to the "conversational agent"; McDuff, ¶¶ [0019], [0055]) : receiving user input, wherein the user input is provided by a user at an automated assistant interface of a client device ("conversational agent system... includes an audio pipeline" where "The audio pipeline begins with audio input representing speech 104 of the user 102 that is produced by a microphone 110, 308 in response to sound waves contacting a sensing element on the microphone 110, 308."; McDuff, ¶¶ [0057]-[0058]), and wherein the automated assistant interface is an interface for interacting with an automated assistant executing on the client device and/or one or more remote computing devices ("In conversational agent system 300, the user 102 interacts with a local computing device 304"; McDuff, ¶¶ [0047]); processing the user input to determine at least one intent associated with the user input ("The text sentiment recognizer 404 recognizes sentiments in the content of an input by the user 102" and "An intent recognition module 718 recognizes intents in the conversational input such as speech identified by the speech recognition module 712," where both sentiment and intent are the at least one intent associated with the user input.; McDuff, ¶¶ [0065], [0128])… generating a familiarity measure for the at least one intent ("the conversational agent can evaluate a user’s visual and verbal behavior in view of a larger conversational context {generating a familiarity measure for the...}" where "the sentiment {...at least one intent} as identified by the text sentiment recognizer 404 may be a part of the conversational context"; McDuff, ¶¶ [0018], [0065]), wherein generating the familiarity measure for the at least one intent comprises: processing… a plurality of parameters to generate the familiarity measure, ("The conversational context can include the audio, text, and/or video inputs as well as other factors sensed or available to the conversational agent system."; McDuff, ¶¶ [0020], [0065]) wherein the plurality of parameters processed using the machine learning model to generate the familiarity measure include one or more intent specific parameters that are based on historical interactions, of the user with the automated assistant (where the "The conversational context can include the audio, text, and/or video inputs as well as other factors sensed or available to the conversational agent system... includ[ing] usage behavior of the user associated with the system" where usage behavior can include "total usage time, usage frequency, time of day of usage, identity of applications launched, powered on time, standby time," which are intent specific parameters that are derived from "communication history {thus, based on historical interaction}"; McDuff, ¶¶ [0020]), for the at least one intent specified by the user input (The system detects "sentiments in the content of an input by the user 102" and "intents in the conversational input such as speech identified by the speech recognition module 712," thus each of the sentiment and intent {the at least one intent associated with the user input} are also specified by the user input.; McDuff, ¶¶ [0018], [0065]); determining a response, of the automated assistant to the user input, based on the familiarity measure and based on the at least one intent ("A dialogue generation module 720 captures input from the linguistic style detection module 714 and the intent recognition module 718 to generate for dialogue that will be produced by the conversational agent" and "the speech synthesizer 722 may generate the response dialogue {determining a response of the automated assistant…} based the conversational context {...to the user input}," where the conversational context includes the sentiment; McDuff, ¶¶ [0130], [0065], [0135]); and causing the client device to render the determined response. ("the speech synthesizer… [is] used to cause a computing device {and causing the client device...} to generate the sounds of synthetic speech {...to render the determined response}."; McDuff, ¶¶ [0135]). However, McDuff fails to expressly recite wherein the at least one intent corresponds to controlling one or more connected smart devices, [and] processing, using a machine learning model, a plurality of parameters to generate the familiarity measure.
Zhang teaches “methods, systems, and programming for virtual agents” including context sensitivity. (Zhang, ¶ [0002], [0007]). Regarding claim 1, Zhang teaches processing, using a machine learning model, a plurality of parameters to generate the familiarity measure, (Discloses “semi-supervised [machine learning] approaches... for learning from past and present conversations” to derive “different types of dialog models” which are “adaptive to the dynamic conversation contexts,” thus disclosing determining “past and present” conversation context {a familiarity measure} by processing using a machine learning model, and where the past and present conversation includes the plurality of parameters; Zhang, ¶¶ [0045]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the linguistic style matching agent of McDuff to incorporate the teachings of Zhang to include processing, using a machine learning model, a plurality of parameters to generate the familiarity measure. The systems and methods described in Zhang overcome the requirement for “hand written rules and manually labelled training data for the systems to learn the communication rules,” thus “provid[ing] an improved solution for the development and application of a virtual agent,” by as recognized by Zhang. (Zhang, ¶ [0004]-[0005]). However, McDuff and Zhang fail to expressly recite wherein the at least one intent corresponds to controlling one or more connected smart devices.
Li teaches systems and methods for controlling multiple home devices based on the intent of the user. (Li, [0002], [0006]). Regarding claim 1, Li teaches wherein the at least one intent corresponds to controlling one or more connected smart devices (“the home assistant system 300… [identifies] a user's intent expressed in a natural language input received from the user… and executing the task flow to fulfill the deduced intent” where “the memory includes a home control module 360 that utilizes the APIs of the home control services to control different home appliances that are registered with the digital assistant system”; Li, [0055]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the linguistic style matching agent of McDuff, as modified by the context-sensitive virtual agents of Zhang, to incorporate the teachings of Li to include wherein the at least one intent corresponds to controlling one or more connected smart devices. Controlling smart devices in light of the user intent can make device interactions less “time-consuming and cumbersome,” which will “improve the way that multiple devices are controlled by the digital assistant using voice-based commands,” as recognized by Li. (Li, ¶ [0004]-[0006]). 

Regarding claim 3, the rejection of claim 1 is incorporated. McDuff further discloses wherein determining the response based on the familiarity measure and based on the determined at least one intent comprises: determining an initial response based on the determined intent, (The dialogue manager adjusts the content of a preexisting utterance, thus indicating an initial response.; McDuff, ¶¶ [0039]) wherein the initial response comprises a first quantity of bytes (An initial response necessarily comprises a first quantity of bytes, as measured based on the number of letters in a message or length of a message.; McDuff, ¶¶ [0039]); responsive to determining the familiarity measure satisfies a threshold: modifying the initial response to generate an abridged response, (“The conversational context can include the audio, text, and/or video inputs,” which is the user content, based on what is “available to the conversational agent system,” where level of availability can be viewed as a threshold, and “the dialogue manager 216 attempts to adjust the content of an utterance or select an utterance in order to more closely match the conversational style of the user 102.” which is responsive to the availability of said information {determining familiarity satisfies a threshold}; McDuff, ¶¶ [0020], [0039]) wherein the abridged response comprises a second quantity of bytes that is less than the first quantity of bytes (“the dialogue manager 216 may... Add... personal pronouns... and abbreviate... the utterance to better match the conversational style of the user 102.”; McDuff, ¶¶ [0039]).

Regarding claim 5, McDuff discloses wherein determining the response occurs at one or more servers of the automated assistant that are remote from the client device ("The conversational agent system 100 may also include one or more remote computing device(s) 120 implemented as a cloud-based computing system, a server, or other computing device that is physically remote from the local computing device 106."; McDuff, ¶¶ [0024]), and wherein causing the client device to render the response comprises transmitting, by one or more of the servers over one or more networks, the response to the client device ("The local computing device 106 may communicate with the remote computing device(s) 120 using the communication interface(s) 116 via a direct connection or via a network such as the Internet..." where "some or all of the instructions...may be performed by the remote computing device(s) 120. For example, more computationally intensive operations such as speech recognition may be offloaded to the remote computing device(s) 120."; McDuff, ¶¶ [0024]).

Regarding claim 6, McDuff discloses wherein the user input is a spoken user input (The input is described as "microphone input 202 that corresponds to voice activity [which] is passed to the speech recognizer 206," thus the input is spoken user input.; McDuff, ¶¶ [0027]), and further comprising: processing the spoken user input, or a spoken invocation that precedes the spoken user input (Discloses "detect[ing] speech in the audio signal; [and] recognize a content of the speech {processing spoken input}"; McDuff, ¶¶ [0159]), to determine that the spoken user input corresponds to a user profile that is accessible to the automated assistant (The system then can, based on the recognized speech, "determine a conversational context {...corresponds to a user profile…} associated with the speech {...that the spoken user input...}"; McDuff, ¶¶ [0159]); and responsive to determining that the spoken user input corresponds to the user profile: determining one or more parameters in the plurality of parameters based on data that is stored in association with the user profile ("and generate a response dialogue having response content based on the content of the speech and prosodic qualities {determining one or more parameters in the plurality of parameters...} based on the conversational context {...based on data that is stored in association with the user profile} associated with the speech {responsive to determining that the spoken user input corresponds to the user profile...}."; McDuff, ¶¶ [0159]).

Regarding claim 7, McDuff discloses further comprising: determining one or more intent agnostic parameters that are each determined without regard to the determined intent, ("The linguistic style extractor 212 identifies non-prosodic components {determining one or more intent agnostic parameters...} of the user’s conversational style that may be referred to as ‘content variables.’ The content variables may include, but are not limited to, pronoun use, repetition, and utterance length, " where determination of number of use or word length are agnostic to the intent of the user.; McDuff, ¶¶ [0032]) wherein the parameters that are processed using the machine learning model in generating the familiarity measure further comprise the one or more intent agnostic parameters (The system discloses that, in one embodiment, "the conversational context {parameters that are processed using the machine learning model in generating the familiarity measure…} comprises a linguistic style of the speech {further comprise one or more intent agnostic parameters}"; McDuff, ¶¶ [0159]).

Regarding claim 8, McDuff discloses wherein determining the one or more intent agnostic parameters comprises: determining at least one of the intent agnostic parameters based on additional historical interactions, of the user with the automated assistant ("In order to measure the second content variable {...of the user with the automated assistant...}, repetition, the linguistic style extractor 212 uses two variables that both relate to repetition of terms {determining at least one of the intent agnostic parameters...}" where "repetition can be seen as a measure of persistence in introducing a specific topic. The first of the variables measures the occurrence rate of repeated terms on an utterance level. The second measures the rate of utterances which contain one or more repeated terms {based on additional historical interactions...}."; McDuff, ¶¶ [0033]), including historical interactions that do not correspond to the determined at least one intent (The historical interactions do not correspond with any event, including the determined at least one intent, other than the repetition of non-stop-words.; McDuff, ¶¶ [0033]).

Regarding claim 9, McDuff discloses wherein determining the one or more intent agnostic parameters comprises determining at least one of the intent agnostic parameters ("The linguistic style extractor 212 identifies non-prosodic components of the user’s conversational style that may be referred to as ‘content variables’, {one or more intent agnostic parameters}"; McDuff, ¶¶ [0032]) based on an amount of the additional historical interactions between the user and the automated assistant (where "The first of the variables measures the occurrence rate of repeated terms on an utterance level," where each utterance is a historical interaction and an occurrence rate in each of said utterances is the frequency of events within the interaction and is based on the amount of the interactions between the user and the automated assistant.; McDuff, ¶¶ [0033]).

Regarding claim 10, McDuff discloses wherein the one or more intent agnostic parameters determined based on the amount of historical interactions between the user and the automated assistant include one or more intent agnostic parameters ("The linguistic style extractor 212 identifies non-prosodic components of the user’s conversational style that may be referred to as ‘content variables’, {one or more intent agnostic parameters}"; McDuff, ¶¶ [0032]) that are based on a total number of the additional historical interactions and/or a frequency of the additional historical interactions (where "The first of the variables measures the occurrence rate of repeated terms on an utterance level," where each utterance is a historical interaction and an occurrence rate in each of said utterances is the frequency of events within the interaction as well as being based on total number of the interactions.; McDuff, ¶¶ [0033]).

Regarding claim 11, McDuff discloses wherein determining the one or more intent specific parameters includes determining at least one of the intent specific parameters ("the conversational context… includ[ing] usage behavior of the user {intent specific parameters} associated with the system (e.g., the user of an active account on a smartphone or computer). Usage behavior may include total usage time, usage frequency, [and/or] time of day of usage"; McDuff, ¶¶ [0020]) based on an amount of the historical interactions between the user and the automated assistant for the at least one intent and/or a length of time since a most recent interaction between the user and the automated assistant for the at least one intent (based on "total usage time" {total number of additional historical interactions} and "usage frequency" {and/or a frequency of the additional historical interactions}; McDuff, ¶¶ [0020]).

Regarding claim 12, McDuff discloses wherein the one or more intent specific parameters are a plurality of intent specific parameters ("the conversational context… includ[ing] usage behavior of the user associated with the system (e.g., the user of an active account on a smartphone or computer). Usage behavior may include total usage time, usage frequency, [and/or] time of day of usage {a plurality of intent specific parameters} "; McDuff, ¶¶ [0020]), including at least one that is based on the amount of the historical interactions between the user and the automated assistant for the intent (where the conversational context {intent specific parameters} include "total usage time" {which are based on the amount of historical interactions} of the conversational agent by the user {between the user and the automated assistant} for the intent.; McDuff, ¶¶ [0020]), and optionally including at least an additional one that is based on the length of time since the most recent interaction between the user and the automated assistant for the intent (where the conversational context {intent specific parameters} further include "usage frequency" {based on the length of time since the most recent interaction} of the conversational agent by the user {between the user and the automated assistant} for the intent.; McDuff, ¶¶ [0020]).

Regarding claim 13, McDuff discloses wherein the at least one intent includes only one or more intents that are referenced by the user input ("The custom intent recognizer 214 recognizes intents in the speech identified by the speech recognizer 206," where "An intent may be the “goal” of the user 102 such as booking a flight or finding out when a package will be delivered," thus including only intents that are referenced by the user.; McDuff, ¶¶ [0035]).

Regarding claim 16, McDuff discloses wherein the plurality of parameters processed using the machine learning model to generate the familiarity measure includes a current modality of the client device (Conversational context further incorporates communication history where "Communication history may also include the modality of communications (e.g., email, text, phone, specific messaging app, etc.)."; McDuff, ¶¶ [0020]).

Regarding claim 17, McDuff discloses A system comprising (the systems and methods described with reference to the "conversational agent" as implemented in the “computing device 700”; McDuff, ¶¶ [0019], [0055], [0113]): one or more processors (The computing device 700 includes one or more processors(s) 702”; McDuff, ¶¶ [0019], [0055]) ; and memory storing instructions that, when executed, cause the one or more processors to (“The computing device 700 includes …one or more memory 704" which “may be implemented as computer-readable media” where the conversational agent may comprise “multiple modules that may be implemented as instructions stored in the memory 704 for execution by processor(s) 702”; McDuff, ¶¶ [0114], [0116], [0121]): receive user input, wherein the user input is provided by a user at an automated assistant interface of a client device ("conversational agent system... includes an audio pipeline" where "The audio pipeline begins with audio input representing speech 104 of the user 102 that is produced by a microphone 110, 308 in response to sound waves contacting a sensing element on the microphone 110, 308."; McDuff, ¶¶ [0057]-[0058]), and wherein the automated assistant interface is an interface for interacting with an automated assistant executing on the client device and/or one or more remote computing devices ("In conversational agent system 300, the user 102 interacts with a local computing device 304"; McDuff, ¶¶ [0047]); process the user input to determine at least one intent associated with the user input ("The text sentiment recognizer 404 recognizes sentiments in the content of an input by the user 102" and "An intent recognition module 718 recognizes intents in the conversational input such as speech identified by the speech recognition module 712," where both sentiment and intent are the at least one intent associated with the user input.; McDuff, ¶¶ [0065], [0128])… generate a familiarity measure for the at least one intent ("the conversational agent can evaluate a user’s visual and verbal behavior in view of a larger conversational context {generating a familiarity measure for the...}" where "the sentiment {...at least one intent} as identified by the text sentiment recognizer 404 may be a part of the conversational context"; McDuff, ¶¶ [0018], [0065]), wherein the instructions to generate the familiarity measure for the at least one intent cause the one or more processors to: process… a plurality of parameters to generate the familiarity measure, ("The conversational context can include the audio, text, and/or video inputs as well as other factors sensed or available to the conversational agent system."; McDuff, ¶¶ [0020], [0065]) wherein the plurality of parameters processed using the machine learning model to generate the familiarity measure include one or more intent specific parameters that are based on historical interactions, of the user with the automated assistant (where the "The conversational context can include the audio, text, and/or video inputs as well as other factors sensed or available to the conversational agent system... includ[ing] usage behavior of the user associated with the system" where usage behavior can include "total usage time, usage frequency, time of day of usage, identity of applications launched, powered on time, standby time," which are intent specific parameters that are derived from "communication history {thus, based on historical interaction}"; McDuff, ¶¶ [0020]), for the at least one intent specified by the user input (The system detects "sentiments in the content of an input by the user 102" and "intents in the conversational input such as speech identified by the speech recognition module 712," thus each of the sentiment and intent {the at least one intent associated with the user input} are also specified by the user input.; McDuff, ¶¶ [0018], [0065]); determine a response, of the automated assistant to the user input, based on the familiarity measure and based on the at least one intent ("A dialogue generation module 720 captures input from the linguistic style detection module 714 and the intent recognition module 718 to generate for dialogue that will be produced by the conversational agent" and "the speech synthesizer 722 may generate the response dialogue {determining a response of the automated assistant…} based the conversational context {...to the user input}," where the conversational context includes the sentiment; McDuff, ¶¶ [0130], [0065], [0135]); and cause the client device to render the determined response. ("the speech synthesizer… [is] used to cause a computing device {and causing the client device...} to generate the sounds of synthetic speech {...to render the determined response}."; McDuff, ¶¶ [0135]). However, McDuff fails to expressly recite wherein the at least one intent corresponds to controlling one or more connected smart devices, [and] processing, using a machine learning model, a plurality of parameters to generate the familiarity measure.
The relevance of Zhang is described above with relation to claim 1.  Regarding claim 17, Zhang teaches process, using a machine learning model, a plurality of parameters to generate the familiarity measure, (Discloses “semi-supervised [machine learning] approaches... for learning from past and present conversations” to derive “different types of dialog models” which are “adaptive to the dynamic conversation contexts,” thus disclosing determining “past and present” conversation context {a familiarity measure} by processing using a machine learning model, and where the past and present conversation includes the plurality of parameters; Zhang, ¶¶ [0045]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the linguistic style matching agent of McDuff to incorporate the teachings of Zhang to include processing, using a machine learning model, a plurality of parameters to generate the familiarity measure. The systems and methods described in Zhang overcome the requirement for “hand written rules and manually labelled training data for the systems to learn the communication rules,” thus “provid[ing] an improved solution for the development and application of a virtual agent,” by as recognized by Zhang. (Zhang, ¶ [0004]-[0005]). However, McDuff and Zhang fail to expressly recite wherein the at least one intent corresponds to controlling one or more connected smart devices.
The relevance of Li is described above with relation to claim 1.  Regarding claim 17, Li teaches wherein the at least one intent corresponds to controlling one or more connected smart devices (“the home assistant system 300… [identifies] a user's intent expressed in a natural language input received from the user… and executing the task flow to fulfill the deduced intent” where “the memory includes a home control module 360 that utilizes the APIs of the home control services to control different home appliances that are registered with the digital assistant system”; Li, [0055]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the linguistic style matching agent of McDuff, as modified by the context-sensitive virtual agents of Zhang, to incorporate the teachings of Li to include wherein the at least one intent corresponds to controlling one or more connected smart devices. Controlling smart devices in light of the user intent can make device interactions less “time-consuming and cumbersome,” which will “improve the way that multiple devices are controlled by the digital assistant using voice-based commands,” as recognized by Li. (Li, ¶ [0004]-[0006]).  

Regarding claim 18, McDuff discloses A computer-readable storage medium comprising instructions that, when executed by one or more processors, cause the one or more processors to(the systems and methods described with reference to the "conversational agent" which includes “multiple modules that may be implemented as instructions stored in the memory 704 for execution by processor(s) 702”; McDuff, ¶¶ [0019], [0055], [0121]): receive user input, wherein the user input is provided by a user at an automated assistant interface of a client device ("conversational agent system... includes an audio pipeline" where "The audio pipeline begins with audio input representing speech 104 of the user 102 that is produced by a microphone 110, 308 in response to sound waves contacting a sensing element on the microphone 110, 308."; McDuff, ¶¶ [0057]-[0058]), and wherein the automated assistant interface is an interface for interacting with an automated assistant executing on the client device and/or one or more remote computing devices ("In conversational agent system 300, the user 102 interacts with a local computing device 304"; McDuff, ¶¶ [0047]); process the user input to determine at least one intent associated with the user input ("The text sentiment recognizer 404 recognizes sentiments in the content of an input by the user 102" and "An intent recognition module 718 recognizes intents in the conversational input such as speech identified by the speech recognition module 712," where both sentiment and intent are the at least one intent associated with the user input.; McDuff, ¶¶ [0065], [0128])… generate a familiarity measure for the at least one intent ("the conversational agent can evaluate a user’s visual and verbal behavior in view of a larger conversational context {generating a familiarity measure for the...}" where "the sentiment {...at least one intent} as identified by the text sentiment recognizer 404 may be a part of the conversational context"; McDuff, ¶¶ [0018], [0065]), wherein the instructions to generate the familiarity measure for the at least one intent cause the one or more processors to: process… a plurality of parameters to generate the familiarity measure, ("The conversational context can include the audio, text, and/or video inputs as well as other factors sensed or available to the conversational agent system."; McDuff, ¶¶ [0020], [0065]) wherein the plurality of parameters processed using the machine learning model to generate the familiarity measure include one or more intent specific parameters that are based on historical interactions, of the user with the automated assistant (where the "The conversational context can include the audio, text, and/or video inputs as well as other factors sensed or available to the conversational agent system... includ[ing] usage behavior of the user associated with the system" where usage behavior can include "total usage time, usage frequency, time of day of usage, identity of applications launched, powered on time, standby time," which are intent specific parameters that are derived from "communication history {thus, based on historical interaction}"; McDuff, ¶¶ [0020]), for the at least one intent specified by the user input (The system detects "sentiments in the content of an input by the user 102" and "intents in the conversational input such as speech identified by the speech recognition module 712," thus each of the sentiment and intent {the at least one intent associated with the user input} are also specified by the user input.; McDuff, ¶¶ [0018], [0065]); determine a response, of the automated assistant to the user input, based on the familiarity measure and based on the at least one intent ("A dialogue generation module 720 captures input from the linguistic style detection module 714 and the intent recognition module 718 to generate for dialogue that will be produced by the conversational agent" and "the speech synthesizer 722 may generate the response dialogue {determining a response of the automated assistant…} based the conversational context {...to the user input}," where the conversational context includes the sentiment; McDuff, ¶¶ [0130], [0065], [0135]); and cause the client device to render the determined response. ("the speech synthesizer… [is] used to cause a computing device {and causing the client device...} to generate the sounds of synthetic speech {...to render the determined response}."; McDuff, ¶¶ [0135]). However, McDuff fails to expressly recite wherein the at least one intent corresponds to controlling one or more connected smart devices, [and] processing, using a machine learning model, a plurality of parameters to generate the familiarity measure.
The relevance of Zhang is described above with relation to claim 1.  Regarding claim 18, Zhang teaches process, using a machine learning model, a plurality of parameters to generate the familiarity measure, (Discloses “semi-supervised [machine learning] approaches... for learning from past and present conversations” to derive “different types of dialog models” which are “adaptive to the dynamic conversation contexts,” thus disclosing determining “past and present” conversation context {a familiarity measure} by processing using a machine learning model, and where the past and present conversation includes the plurality of parameters; Zhang, ¶¶ [0045]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the linguistic style matching agent of McDuff to incorporate the teachings of Zhang to include processing, using a machine learning model, a plurality of parameters to generate the familiarity measure. The systems and methods described in Zhang overcome the requirement for “hand written rules and manually labelled training data for the systems to learn the communication rules,” thus “provid[ing] an improved solution for the development and application of a virtual agent,” by as recognized by Zhang. (Zhang, ¶ [0004]-[0005]). However, McDuff and Zhang fail to expressly recite wherein the at least one intent corresponds to controlling one or more connected smart devices.
The relevance of Li is described above with relation to claim 1.  Regarding claim 18, Li teaches wherein the at least one intent corresponds to controlling one or more connected smart devices (“the home assistant system 300… [identifies] a user's intent expressed in a natural language input received from the user… and executing the task flow to fulfill the deduced intent” where “the memory includes a home control module 360 that utilizes the APIs of the home control services to control different home appliances that are registered with the digital assistant system”; Li, [0055]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the linguistic style matching agent of McDuff, as modified by the context-sensitive virtual agents of Zhang, to incorporate the teachings of Li to include wherein the at least one intent corresponds to controlling one or more connected smart devices. Controlling smart devices in light of the user intent can make device interactions less “time-consuming and cumbersome,” which will “improve the way that multiple devices are controlled by the digital assistant using voice-based commands,” as recognized by Li. (Li, ¶ [0004]-[0006]). 

Claims 2 and 4 is/are rejected under 35 U.S.C. 103 as being unpatentable over McDuff, Zhang, and Li as applied to claim 1 and 3 above, and further in view of Motomura.

Regarding claim 2, the rejection of claim 1 is incorporated. McDuff, Zhang, and Li disclose all of the elements of the current invention as stated above. However, McDuff fails to expressly recite wherein determining the response based on the familiarity measure and based on the determined at least one intent comprises: determining whether the familiarity measure satisfies a threshold; when the familiarity measure fails to satisfy the threshold: including, in the response: computer generated speech that is responsive to the intent, or text that is converted to computer generated speech when the client device renders the determined response; when the familiarity measure satisfies the threshold: omitting, from the response, any computer generated speech and any text.
Motomura teaches “an information processing device and the like that presents a given phrase to a speaker in response to a voice uttered by the speaker.” (Motomura, ¶ [0001]). Regarding claim 2, Motomura teaches wherein determining the response based on the familiarity measure and based on the determined at least one intent comprises: determining whether the familiarity measure satisfies a threshold (“The output necessity determination section 22 compares the threshold 41b {determining whether… satisfies a threshold} with the relational value Rc,” where the relational value Rc is a degree of intimacy {familiarity measure}; Motomura, ¶¶ [0094]); when the familiarity measure fails to satisfy the threshold: including, in the response: computer generated speech that is responsive to the intent, or text that is converted to computer generated speech (“In a case where the relational value Rc (degree of intimacy) exceeds the threshold 41d (NO in S413), the output necessity determination section 22 determines that a phrase received in the step S409 needs to be outputted (S414),” where exceeding a threshold is not satisfying a threshold and “the phrase received in the step S409” is responsive to the intent.; Motomura, ¶¶ [0094]) when the client device renders the determined response (“...determines that a phrase received in the step S409 needs to be outputted (S414),” thus when the client renders the determined response; Motomura, ¶¶ [0094]); when the familiarity measure satisfies the threshold: omitting, from the response, any computer generated speech and any text (“ in a case where the relational value Rc does not exceed the threshold 41d (YES in S413), the output necessity determination section 22 determines that the phrase does not need to be outputted (S415).”; Motomura, ¶¶ [0094]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the linguistic style matching agent of McDuff, as modified by the context-sensitive virtual agents of Zhang, and as modified by the home device control systems of Li, to incorporate the teachings of Motomura to include wherein determining the response based on the familiarity measure and based on the determined at least one intent comprises: determining whether the familiarity measure satisfies a threshold; when the familiarity measure fails to satisfy the threshold: including, in the response: computer generated speech that is responsive to the intent, or text that is converted to computer generated speech when the client device renders the determined response; when the familiarity measure satisfies the threshold: omitting, from the response, any computer generated speech and any text. The “relational value allows a relationship between the interactive robot 100 and a speaker to be objectively quantified” which can be applied to differentiate between speakers and “realize a natural interaction with a speaker even in a case where a plurality of voices are successively inputted,” as recognized by Motomura. (Motomura, ¶ [0089], [0012]).

Regarding claim 4, the rejection of claim 3 is incorporated. McDuff, Zhang, and Motomura disclose all of the elements of the current invention as stated above. McDuff further discloses wherein modifying the initial response to generate the abridged response comprises: replacing a noun in the text with a pronoun that has less characters than the noun (“the dialogue manager 216 may... Add... personal pronouns...” where the personal pronoun “abbreviates... the utterance”; McDuff, ¶¶ [0039]), and/or performing text summarization to convert the text to a shortened version of the text (“and abbreviate... the utterance to better match the conversational style of the user 102.”; McDuff, ¶¶ [0039]). However, McDuff, Zhang, and Motomura fail to expressly recite removing, from the initial response, any computer generated speech or any text that is converted to computer generated speech when the client device renders the determined response.
The relevance of Motomura is described above with relation to claim 2. Regarding claim 4, Motomura further teaches removing, from the initial response, any computer generated speech or any text that is converted to computer generated speech when the client device renders the determined response (“in a case where the relational value Rc does not exceed the threshold 41d (YES in S413), the output necessity determination section 22 determines that the phrase does not need to be outputted (S415).”; Motomura, ¶¶ [0094]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the linguistic style matching agent of McDuff as modified by the context-sensitive virtual agents of Zhang, and as modified by the home device control systems of Li, to incorporate the teachings of Motomura to include removing, from the initial response, any computer generated speech or any text that is converted to computer generated speech when the client device renders the determined response. The “relational value allows a relationship between the interactive robot 100 and a speaker to be objectively quantified” which can be applied to differentiate between speakers and “realize a natural interaction with a speaker even in a case where a plurality of voices are successively inputted,” as recognized by Motomura. (Motomura, ¶ [0089], [0012]).

Claim 14 is/are rejected under 35 U.S.C. 103 as being unpatentable over McDuff, Zhang, and Li as applied to claim 1 above, and further in view of Herold.

Regarding claim 14, the rejection of claim 1 is incorporated. McDuff, Zhang, and Li disclose all of the elements of the current invention as stated above. McDuff further discloses wherein the at least one intent includes one or more intents that are referenced by the user input ("The custom intent recognizer 214 recognizes intents in the speech identified by the speech recognizer 206," where "An intent may be the “goal” of the user 102 such as booking a flight or finding out when a package will be delivered," thus including intents that are referenced by the user.; McDuff, ¶¶ [0035]). However, McDuff fails to expressly recite as well as one or more additional intents that are defined, in a stored taxonomy of intents, as related to the intent.
Herold teaches systems and method to “intelligently determine the content and/or timing of messages communicated to users and/or the performance of actions.” (Herold, ¶ [0018]). Regarding claim 14, Herold teaches as well as one or more additional intents that are defined, in a stored taxonomy of intents, as related to the intent ("the method may analyze both utterances as a single or combined utterance, and/or may use one or more elements from a prior utterance to generate one or more slots in an intent template for a current utterance."; Herold, ¶¶ [0087]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the linguistic style matching agent of McDuff, as modified by the context-sensitive virtual agents of Zhang, and as modified by the home device control systems of Li,  to incorporate the teachings of Herold to include as well as one or more additional intents that are defined, in a stored taxonomy of intents, as related to the intent. Control of timing based on detected intent can provide “numerous benefits to the user” and allow the user to “interact with a smart assistant device in a way that feels natural and conversational,” as recognized by Herold. (Herold, ¶ [0018]).

Claim 15 is/are rejected under 35 U.S.C. 103 as being unpatentable over McDuff, Zhang, and Li as applied to claim 1 above, and further in view of Kennewick.

Regarding claim 15, the rejection of claim 1 is incorporated. McDuff, Zhang, and Li disclose all of the elements of the current invention as stated above. McDuff further discloses wherein the at least one intent includes a given intent, specified in the user input, to control at least one smart device ("The custom intent recognizer 214 recognizes intents in the speech identified by the speech recognizer 206," where "An intent may be the “goal” of the user 102 such as booking a flight or finding out when a package will be delivered," thus including a goal {a given intent} that are referenced by the user {specified in the user input}, where booking a flight or finding outwhere a package will be delivered through voice commands to the conversational agent is controlling at least one smart device.; McDuff, ¶¶ [0035]), and where determining the response based on the familiarity measure and based on the at least one intent comprises: responsive to determining the familiarity measure satisfies a threshold ("The conversational context can include the audio, text, and/or video inputs," which is the user content, based on what is "available to the conversational agent system," where level of availability can be viewed as a threshold, and "In various implementations, the machine assistant 200 includes a privacy subsystem 215 that includes one or more privacy setting filters associated with user information, such as user information included in the user interaction input(s) {the familiarity measure}"; McDuff, ¶¶ [0020], [0057]): [altering]... Any prompt requesting a value for the control of the at least one smart device ("the dialogue manager 216 may... Add... personal pronouns... and abbreviate... the utterance to better match the conversational style of the user 102."; McDuff, ¶¶ [0039]). However, McDuff fails to expressly recite omitting, from the response, any prompt requesting a value for the control of the at least one smart device; and wherein the method further comprises: automatically generating the value for the control of the at least one smart device based on one or more prior user inputs, of the user, to prior responses that were responsive to the given intent and that included the prompt; and transmitting one or more commands that are based on the automatically generated value, wherein transmitting the one or more commands causes control of the at least one smart device based on the automatically generated value.
Kennewick teaches “systems and methods of providing improved natural language processing… by providing intent predictions for an utterance.” (Kennewick, ¶ [0002]). Regarding claim 15, Kennewick teaches omitting, from the response, any prompt requesting a value for the control of the at least one smart device ("system 100 may predict a subsequent utterance of a user based on a prior utterance of the user" where "multiple predictions regarding what a user intended when speaking a natural language utterance may be performed, and content (or results) related to the predicted intents may be obtained" without confirmation by the user; Kennewick, ¶¶ [0028], [0073]-[0074]); and wherein the method further comprises: automatically generating the value for the control of the at least one smart device ("results... are obtained based on predicted intents (or requests generated from these or other predictions) {automatically generating the value for the control of the at least one smart device}"; Kennewick, ¶¶ [0075]) based on one or more prior user inputs, of the user, to prior responses that were responsive to the given intent and that included the prompt ("results that are obtained based on predicted intents (or requests generated from these or other predictions) [are] associated with a natural language utterance of a user {based on one or more prior user inputs, of the user, to prior responses}" where the "one or more first results related to the utterance (or the first request) may be obtained based on the first request" as part of a chain {first, second, etc.} of requests and utterances.; Kennewick, ¶¶ [0074]-[0075]); and transmitting one or more commands that are based on the automatically generated value, wherein transmitting the one or more commands causes control of the at least one smart device based on the automatically generated value (Results obtained based on the predicted intent {…based on the automatically generated value} "… may be provided {transmitting one or more commands…} for presentation to the user, {causes control of the at least one smart device}"; Kennewick, ¶¶ [0074]-[0075]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the linguistic style matching agent of McDuff, as modified by the context-sensitive virtual agents of Zhang, and as modified by the home device control systems of Li,, to incorporate the teachings of Kennewick to include omitting, from the response, any prompt requesting a value for the control of the at least one smart device; and wherein the method further comprises: automatically generating the value for the control of the at least one smart device based on one or more prior user inputs, of the user, to prior responses that were responsive to the given intent and that included the prompt; and transmitting one or more commands that are based on the automatically generated value, wherein transmitting the one or more commands causes control of the at least one smart device based on the automatically generated value. Proactively predicting intent for a user utterance can allow for “predicting “full” utterances based on partial utterances and/or previous full utterances” thus avoiding “unnecessary delay before a response to the utterance can be provided to a user,” as recognized by Kennewick. (Kennewick, ¶ [0004], [0003]).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Park et al. (U.S. Pat. App. Pub. No. 2020/0258514) discloses systems and methods for a dialogue system and processing method capable of recognizing user intent based on a plurality of input types.

Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 



Any inquiry concerning this communication or earlier communications from the examiner should be directed to Sean E. Serraguard whose telephone number is (313)446-6627. The examiner can normally be reached 07:00-17:00 M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel C. Washburn can be reached on (571) 272-5551. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/Sean E Serraguard/Patent Examiner, Art Unit 2657      

/LAMONT M SPOONER/Primary Examiner, Art Unit 2657                                                                                                                                                                                                        
10/21/2022