DETAILED ACTION
Introduction
This office action is in response to Applicant’s submission filed on March 3rd, 2021. Claims 1-18 are pending in the application. As such, claims 1-18 have been examined.
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Priority
Acknowledgment is made of applicant’s claim for foreign priority under 35 U.S.C. 119 (a)-(d). 
Receipt is acknowledged of certified copies of papers required by 37 CFR 1.55.
Information Disclosure Statement
The information disclosure statement (IDS) submitted on March 3rd, 2021 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.
Specification
The disclosure is objected to because of the following informalities:
Paragraph 32, lines 1-3 read: “…the chatbot device may be configured to receive the multilingual audio signal… and covert the multilingual audio signal into the text component.”
It appears to the examiner that “covert” should instead read “convert”.
Appropriate correction is required.
The lengthy specification has not been checked to the extent necessary to determine the presence of all possible minor errors. Applicant’s cooperation is requested in correcting any errors of which applicant may become aware in the specification.
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claims 7-12 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Des Jardins et al. (U.S. Patent Application Publication 2018/0018959 A1, hereinafter “Des Jardins”).
In regards to claim 7, Des Jardins teaches:
A chatbot (Abstract: the system processes and interprets audible commands) device, comprising: 
a natural language processor (NLP) (Paragraph 26: the example method may be performed by any computing device, for example the computing device 200; the computing device 200 is construed as a natural language processor, because the method further described by Des Jardins as being performed by this computing device 200 is a natural language process (see e.g. Fig. 3a, which includes natural language tasks such as processing and analyzing command data 309 and identifying/extracting key phrases 311)) configured to: 
generate a multilingual audio signal based on utterance by a user to initiate an operation, wherein the utterance is associated with a plurality of languages (Paragraph 26, lines 12-22: a user’s audio or voice command (i.e. utterance to initiate an operation) is converted into an electronic voice signal (i.e. audio signal); Paragraph 26, lines 1-3: the spoken command has multiple languages comingled within (that is, the utterance is associated with a plurality of languages));  
convert, for each of a plurality of language transcripts that corresponds to the plurality of languages, the multilingual audio signal into a text component (Paragraph 39: at least two (i.e. a plurality) language databases and corresponding acoustic models (i.e. language transcripts corresponding to the plurality of languages) are used to process a command (i.e. convert the multilingual audio signal into a text component); see also Paragraphs 57 and 58: the NLP 420 (note that, despite the fact that Des Jardins only refers to this particular module as a natural language processor, the system 200 as a whole, including e.g. the speech recognition module may be construed as a natural language processor) may utilize an acoustic model for native-English and native-Spanish speakers to generate a transcript (i.e. text component) in English and Spanish of the user’s command (i.e. multilingual audio signal)); 
generate, for the text component of each of the plurality of language transcripts, a plurality of tokens (Paragraph 28, lines 4-7: the system generates a resulting acoustic transcript and parses the transcript to identify a plurality of phrases (i.e. generates a plurality of tokens) comprising the user’s spoken command (i.e. for the text component of each of the plurality of language transcripts); also Paragraph 37: the speech module determines the particular words/phrases (i.e. tokens) being spoken by the user); 
validate the plurality of tokens that corresponds to each of the plurality of language transcripts by use of a language transcript dictionary associated with a respective language transcript, wherein the plurality of tokens is validated to obtain a set of validated tokens (Paragraph 57 and 58: the NLP 420 (note that, despite the fact that Des Jardins only refers to this particular module as a natural language processor, the system 200 as a whole, including e.g. the speech recognition module may be construed as a natural language processor) may utilize an acoustic model for native-English and native-Spanish speakers to generate a transcript in English and Spanish of the user’s command; in addition, the NLP may utilize an English language database and Spanish language database (i.e. language transcript dictionary associated with a respective language transcript) to analyze the spoken command data and identify the various words or phrases (i.e. tokens) spoken by the user; by performing this second task in addition to the first, the second task may be considered a validation of the first task, as they both aim for the same result); 
determine at least entity (Paragraph 65: the system may use an entity database to interpret acoustic transcripts and/or audible commands representing spoken user commands; Des Jardins provides “harry potter” and “spiderman” as two example entities), keyword (Paragraph 50, lines 14-18: the NPL may attempt to recognize and parse keywords from each received transcription), and action (Paragraph 50, lines 18-20: the detected keywords may comprise e.g. action entities or other types of action classifiers) features based on at least the set of validated tokens; and 
detect one or more intents based on at least the determined entity, keyword, and action features (Paragraph 94: the system determines a match phrase (i.e. intent) that best represents the operational command desired by the user, based on e.g. action classifiers as well as words/phrases from one or more different acoustic models and/or acoustic transcripts (i.e. determined entity, keyword, and action features); Paragraph 95: multiple match phrases (i.e. one or more intents) may be determined and subsequently considered by the system), wherein the operation is automatically executed based on an intent from the one or more intents (Paragraph 97, lines 17-25: the match phrase (i.e. intent) may be added to a response array; the system may execute one or more responses in the array; see also Paragraph 83, lines 19-26: the system may automatically execute actions based on the user’s intent).
In regards to claim 8, Des Jardins further teaches:
The system of claim 7, wherein the NLP is further configured to generate a set of valid multilingual sentences based on the set of validated tokens (Paragraph 57: the NLP may compare the command data to one or more voice templates to identify the various words or phrases (i.e. validated tokens) being spoken by the user. The NLP may then determine that the transcript (i.e. set of valid multilingual sentences) comprises the determined phrases/words).
In regards to claim 9, Des Jardins further teaches:
The system of claim 8, wherein the NLP is further configured to determine the entity feature based on the set of valid multilingual sentences (Paragraph 65: the system may use an entity database to interpret acoustic transcripts and/or audible commands representing spoken user commands; Des Jardins provides “harry potter” and “spiderman” as two example entities).
In regards to claim 10, Des Jardins further teaches:
The system of claim 7, wherein the NLP is further configured to determine the keyword and action features based on the set of validated tokens by using a filtration database that includes at least a set of validated entity, keyword, and action features for each stored intent (Paragraph 50: the NLP may use a variety of heuristic rules and/or other information to select top entities and/or phrases (i.e. entity, keyword, and action features); also Paragraph 74: Des Jardins provides an example of determining an action feature (“action = Channel Name”) based on a heuristic rules database (i.e. filtration database) based on detecting the name of a programming network (i.e. a validated entity feature included in the database); Paragraph 75: Des Jardins notes how the identification of the one or more phrases/words may provide the system guidance as to how the phrases/words should be interpreted; Paragraph 77: the system may determine that the word/phrase “Ver” may best correspond to an intended “action entity” (i.e. the system may determine a keyword/action feature) based on the system’s previous interpretation of the content entity phrase “Brad Pitt” (i.e. a validated entity included in the filtration database), and in view of the resulting phrase fitting more appropriately within the context of an intended operation command (i.e. stored intent)); see also Paragraph 65: the NPL may utilize information from e.g. an entity database (i.e. filtration database) to analyze and parse the phrases comprising the command data).
In regards to claim 11, Des Jardins further teaches:
The system of claim 7, wherein the NLP is further configured to determine an intent score for each intent based on at least the determined entity, keyword, and action features (Paragraph 96: the system may determine a response score (i.e. intent score) for one or more of the match phrases (i.e. based on at least the determined entity, keyword, and action features); see also Paragraph 94: the system determines a match phrase (i.e. intent) that best represents the operational command desired by the user, based on e.g. action classifiers as well as words/phrases from one or more different acoustic models and/or acoustic transcripts (i.e. determined entity, keyword, and action features)).
In regards to claim 12, Des Jardins further teaches:
The system of claim 11, wherein the NLP is further configured to select the intent from the one or more intents based on the intent score of each of the one or more intents, and wherein the intent score of the selected intent is greater than the intent score of each of remaining intents of the one or more intents (Paragraph 102, lines 13-15: the system may determine (i.e. select) an action response for a match phrase in the response array (i.e. intent from the one or more intents) having the highest response score (i.e. the intent score is greater than the intent score of each of the remaining intents)).
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 1-6 and 13-18 are rejected under 35 U.S.C. 103 as being unpatentable over Des Jardins in view of Sukumar (U.S. Patent Application Publication 2021/0358496 A1).
In regards to claim 1, Des Jardins teaches:
A method, comprising: 
generating, by a natural language processor (NLP) (Paragraph 26: the example method may be performed by any computing device, for example the computing device 200; the computing device 200 is construed as a natural language processor, because the method further described by Des Jardins as being performed by this computing device 200 is a natural language process (see e.g. Fig. 3a, which includes natural language tasks such as processing and analyzing command data 309 and identifying/extracting key phrases 311)), a multilingual audio signal based on utterance by a user to initiate an operation, wherein the utterance is associated with a plurality of languages (Paragraph 26, lines 12-22: a user’s audio or voice command (i.e. utterance to initiate an operation) is converted into an electronic voice signal (i.e. audio signal); Paragraph 26, lines 1-3: the spoken command has multiple languages comingled within (that is, the utterance is associated with a plurality of languages)); 
converting, by the NLP, for each of a plurality of language transcripts corresponding to the plurality of languages, the multilingual audio signal into a text component (Paragraph 39: at least two (i.e. a plurality) language databases and corresponding acoustic models (i.e. language transcripts corresponding to the plurality of languages) are used to process a command (i.e. convert the multilingual audio signal into a text component); see also Paragraphs 57 and 58: the NLP 420 (note that, despite the fact that Des Jardins only refers to this particular module as a natural language processor, the system 200 as a whole, including e.g. the speech recognition module may be construed as a natural language processor) may utilize an acoustic model for native-English and native-Spanish speakers to generate a transcript (i.e. text component) in English and Spanish of the user’s command (i.e. multilingual audio signal)); 
generating, by the NLP, for the text component of each of the plurality of language transcripts, a plurality of tokens (Paragraph 28, lines 4-7: the system generates a resulting acoustic transcript and parses the transcript to identify a plurality of phrases (i.e. generates a plurality of tokens) comprising the user’s spoken command (i.e. for the text component of each of the plurality of language transcripts); also Paragraph 37: the speech module determines the particular words/phrases (i.e. tokens) being spoken by the user); 
validating, by the NLP, the plurality of tokens corresponding to each of the plurality of language transcripts using a language transcript dictionary associated with a respective language transcript, wherein the plurality of tokens is validated to obtain a set of validated tokens (Paragraph 57 and 58: the NLP 420 (note that, despite the fact that Des Jardins only refers to this particular module as a natural language processor, the system 200 as a whole, including e.g. the speech recognition module may be construed as a natural language processor) may utilize an acoustic model for native-English and native-Spanish speakers to generate a transcript in English and Spanish of the user’s command; in addition, the NLP may utilize an English language database and Spanish language database (i.e. language transcript dictionary associated with a respective language transcript) to analyze the spoken command data and identify the various words or phrases (i.e. tokens) spoken by the user; by performing this second task in addition to the first, the second task may be considered a validation of the first task, as they both aim for the same result); 
determining, by the NLP, at least entity (Paragraph 65: the system may use an entity database to interpret acoustic transcripts and/or audible commands representing spoken user commands; Des Jardins provides “harry potter” and “spiderman” as two example entities), keyword (Paragraph 50, lines 14-18: the NPL may attempt to recognize and parse keywords from each received transcription), and action (Paragraph 50, lines 18-20: the detected keywords may comprise e.g. action entities or other types of action classifiers) features based on at least the set of validated tokens; and 
detecting, by the NLP, one or more intents based on at least the determined entity, keyword, and action features, (Paragraph 94: the system determines a match phrase (i.e. intent) that best represents the operational command desired by the user, based on e.g. action classifiers as well as words/phrases from one or more different acoustic models and/or acoustic transcripts (i.e. determined entity, keyword, and action features); Paragraph 95: multiple match phrases (i.e. one or more intents) may be determined and subsequently considered by the system), wherein the operation is automatically executed based on an intent from the one or more intents (Paragraph 97, lines 17-25: the match phrase (i.e. intent) may be added to a response array; the system may execute one or more responses in the array; see also Paragraph 83, lines 19-26: the system may automatically execute actions based on the user’s intent).
	However, Des Jardins fails to explicitly teach that the user is in a vehicle, initiating an in-vehicle operation.
	In a related art, Sukumar teaches a voice assistant system for a vehicle (Abstract). Notably, Sukumar teaches inputting a voice input, converting the voice input into a text data file, analyzing the text data file to determine a requested action, and executing the requested action (Abstract). Furthermore, Sukumar directs their teachings specifically to a user in a vehicle initiating an in-vehicle operation (Paragraph 52: the vehicle occupant may use natural language commands to control a vehicle cockpit system).
	It would have been obvious to one of ordinary skill in the art at the time of filing to modify Des Jardins to incorporate the teachings of Sukumar to perform its functions in a vehicle. Doing so would have been a simple substitution of one known element for another to obtain predictable results.
Sukumar teaches a voice assistant system for a vehicle; however, the voice assistant system taught by Sukumar lacks the multi-lingual functionality taught by the instant application.
Des Jardins teaches a multi-lingual voice assistant system that is capable of performing the functions of the instant application. 
One of ordinary skill in the art could have substituted the voice assistant system in the vehicle taught by Sukumar with the multi-lingual voice assistant system taught by Des Jardins; the resulting combination would have been predictable because virtual assistant taught by Sukumar was already performing similar functions to the one taught by Des Jardins.
Thus, the combination of Des Jardins and Sukumar teaches:
A method, comprising: 
generating, by a natural language processor (NLP) (Des Jardins, Paragraph 26: the example method may be performed by any computing device, for example the computing device 200; the computing device 200 is construed as a natural language processor, because the method further described by Des Jardins as being performed by this computing device 200 is a natural language process (see e.g. Des Jardins, Fig. 3a, which includes natural language tasks such as processing and analyzing command data 309 and identifying/extracting key phrases 311)), a multilingual audio signal based on utterance by a user in a vehicle to initiate an in-vehicle operation (Sukumar, Paragraph 52: a vehicle occupant may use natural language commands to control a vehicle cockpit system), wherein the utterance is associated with a plurality of languages (Des Jardins, Paragraph 26, lines 12-22: a user’s audio or voice command (i.e. utterance to initiate an operation) is converted into an electronic voice signal (i.e. audio signal); Des Jardins, Paragraph 26, lines 1-3: the spoken command has multiple languages comingled within (that is, the utterance is associated with a plurality of languages)); 
converting, by the NLP, for each of a plurality of language transcripts corresponding to the plurality of languages, the multilingual audio signal into a text component (Des Jardins, Paragraph 39: at least two (i.e. a plurality) language databases and corresponding acoustic models (i.e. language transcripts corresponding to the plurality of languages) are used to process a command (i.e. convert the multilingual audio signal into a text component); see also 
Des Jardins, Paragraphs 57 and 58: the NLP 420 (note that, despite the fact that Des Jardins only refers to this particular module as a natural language processor, the system 200 as a whole, including e.g. the speech recognition module may be construed as a natural language processor) may utilize an acoustic model for native-English and native-Spanish speakers to generate a transcript (i.e. text component) in English and Spanish of the user’s command (i.e. multilingual audio signal)); 
generating, by the NLP, for the text component of each of the plurality of language transcripts, a plurality of tokens (Des Jardins, Paragraph 28, lines 4-7: the system generates a resulting acoustic transcript and parses the transcript to identify a plurality of phrases (i.e. generates a plurality of tokens) comprising the user’s spoken command (i.e. for the text component of each of the plurality of language transcripts); also Des Jardins, Paragraph 37: the speech module determines the particular words/phrases (i.e. tokens) being spoken by the user); 
validating, by the NLP, the plurality of tokens corresponding to each of the plurality of language transcripts using a language transcript dictionary associated with a respective language transcript, wherein the plurality of tokens is validated to obtain a set of validated tokens (Des Jardins, Paragraph 57 and 58: the NLP 420 (note that, despite the fact that Des Jardins only refers to this particular module as a natural language processor, the system 200 as a whole, including e.g. the speech recognition module may be construed as a natural language processor) may utilize an acoustic model for native-English and native-Spanish speakers to generate a transcript in English and Spanish of the user’s command; in addition, the NLP may utilize an English language database and Spanish language database (i.e. language transcript dictionary associated with a respective language transcript) to analyze the spoken command data and identify the various words or phrases (i.e. tokens) spoken by the user; by performing this second task in addition to the first, the second task may be considered a validation of the first task, as they both aim for the same result); 
determining, by the NLP, at least entity (Des Jardins, Paragraph 65: the system may use an entity database to interpret acoustic transcripts and/or audible commands representing spoken user commands; Des Jardins provides “harry potter” and “spiderman” as two example entities), keyword (Des Jardins, Paragraph 50, lines 14-18: the NPL may attempt to recognize and parse keywords from each received transcription), and action (Des Jardins, Paragraph 50, lines 18-20: the detected keywords may comprise e.g. action entities or other types of action classifiers) features based on at least the set of validated tokens; and 
detecting, by the NLP, one or more intents based on at least the determined entity, keyword, and action features, (Des Jardins, Paragraph 94: the system determines a match phrase (i.e. intent) that best represents the operational command desired by the user, based on e.g. action classifiers as well as words/phrases from one or more different acoustic models and/or acoustic transcripts (i.e. determined entity, keyword, and action features); Des Jardins, Paragraph 95: multiple match phrases (i.e. one or more intents) may be determined and subsequently considered by the system), wherein the in-vehicle operation is automatically executed based on an intent from the one or more intents (Des Jardins, Paragraph 97, lines 17-25: the match phrase (i.e. intent) may be added to a response array; the system may execute one or more responses in the array; see also Des Jardins, Paragraph 83, lines 19-26: the system may automatically execute actions based on the user’s intent).
In regards to claim 2, Des Jardins further teaches:
The method of claim 1, further comprising generating, by the NLP, a set of valid multilingual sentences based on the set of validated tokens (Paragraph 57: the NLP may compare the command data to one or more voice templates to identify the various words or phrases (i.e. validated tokens) being spoken by the user. The NLP may then determine that the transcript (i.e. set of valid multilingual sentences) comprises the determined phrases/words).
In regards to claim 3, Des Jardins further teaches:
The method of claim 2, wherein the entity feature is further determined based on the set of valid multilingual sentences (Paragraph 65: the system may use an entity database to interpret acoustic transcripts and/or audible commands representing spoken user commands; Des Jardins provides “harry potter” and “spiderman” as two example entities).
In regards to claim 4, Des Jardins further teaches:
The method of claim 1, wherein the keyword and action features are further determined based on the set of validated tokens by using a filtration database including at least a set of validated entity, keyword, and action features for each stored intent (Paragraph 50: the NLP may use a variety of heuristic rules and/or other information to select top entities and/or phrases (i.e. entity, keyword, and action features); also Paragraph 74: Des Jardins provides an example of determining an action feature (“action = Channel Name”) based on a heuristic rules database (i.e. filtration database) based on detecting the name of a programming network (i.e. a validated entity feature included in the database); Paragraph 75: Des Jardins notes how the identification of the one or more phrases/words may provide the system guidance as to how the phrases/words should be interpreted; Paragraph 77: the system may determine that the word/phrase “Ver” may best correspond to an intended “action entity” (i.e. the system may determine a keyword/action feature) based on the system’s previous interpretation of the content entity phrase “Brad Pitt” (i.e. a validated entity included in the filtration database), and in view of the resulting phrase fitting more appropriately within the context of an intended operation command (i.e. stored intent)); see also Paragraph 65: the NPL may utilize information from e.g. an entity database (i.e. filtration database) to analyze and parse the phrases comprising the command data).
In regards to claim 5, Des Jardins further teaches:
The method of claim 1, further comprising determining, by the NLP, an intent score for each intent based on at least the determined entity, keyword, and action features (Paragraph 96: the system may determine a response score (i.e. intent score) for one or more of the match phrases (i.e. based on at least the determined entity, keyword, and action features); see also Paragraph 94: the system determines a match phrase (i.e. intent) that best represents the operational command desired by the user, based on e.g. action classifiers as well as words/phrases from one or more different acoustic models and/or acoustic transcripts (i.e. determined entity, keyword, and action features)).
In regards to claim 6, Des Jardins further teaches:
The method of claim 5, further comprising selecting, by the NLP, the intent from the one or more intents based on the intent score of each of the one or more intents, wherein the intent score of the selected intent is greater than the intent score of each of remaining intents of the one or more intents (Paragraph 102, lines 13-15: the system may determine (i.e. select) an action response for a match phrase in the response array (i.e. intent from the one or more intents) having the highest response score (i.e. the intent score is greater than the intent score of each of the remaining intents)).
In regards to claim 13, Des Jardins teaches:
A chatbot (Abstract: the system processes and interprets audible commands) device, comprising: 
a natural language processor (NLP) (Paragraph 26: the example method may be performed by any computing device, for example the computing device 200; the computing device 200 is construed as a natural language processor, because the method further described by Des Jardins as being performed by this computing device 200 is a natural language process (see e.g. Fig. 3a, which includes natural language tasks such as processing and analyzing command data 309 and identifying/extracting key phrases 311)) configured to: 
generate a multilingual audio signal based on utterance by a user to initiate an operation, wherein the utterance is associated with a plurality of languages (Paragraph 26, lines 12-22: a user’s audio or voice command (i.e. utterance to initiate an operation) is converted into an electronic voice signal (i.e. audio signal); Paragraph 26, lines 1-3: the spoken command has multiple languages comingled within (that is, the utterance is associated with a plurality of languages));  
convert, for each of a plurality of language transcripts that corresponds to the plurality of languages, the multilingual audio signal into a text component (Paragraph 39: at least two (i.e. a plurality) language databases and corresponding acoustic models (i.e. language transcripts corresponding to the plurality of languages) are used to process a command (i.e. convert the multilingual audio signal into a text component); see also Paragraphs 57 and 58: the NLP 420 (note that, despite the fact that Des Jardins only refers to this particular module as a natural language processor, the system 200 as a whole, including e.g. the speech recognition module may be construed as a natural language processor) may utilize an acoustic model for native-English and native-Spanish speakers to generate a transcript (i.e. text component) in English and Spanish of the user’s command (i.e. multilingual audio signal)); 
generate, for the text component of each of the plurality of language transcripts, a plurality of tokens (Paragraph 28, lines 4-7: the system generates a resulting acoustic transcript and parses the transcript to identify a plurality of phrases (i.e. generates a plurality of tokens) comprising the user’s spoken command (i.e. for the text component of each of the plurality of language transcripts); also Paragraph 37: the speech module determines the particular words/phrases (i.e. tokens) being spoken by the user); 
validate the plurality of tokens that corresponds to each of the plurality of language transcripts by use of a language transcript dictionary associated with a respective language transcript, wherein the plurality of tokens is validated to obtain a set of validated tokens (Paragraph 57 and 58: the NLP 420 (note that, despite the fact that Des Jardins only refers to this particular module as a natural language processor, the system 200 as a whole, including e.g. the speech recognition module may be construed as a natural language processor) may utilize an acoustic model for native-English and native-Spanish speakers to generate a transcript in English and Spanish of the user’s command; in addition, the NLP may utilize an English language database and Spanish language database (i.e. language transcript dictionary associated with a respective language transcript) to analyze the spoken command data and identify the various words or phrases (i.e. tokens) spoken by the user; by performing this second task in addition to the first, the second task may be considered a validation of the first task, as they both aim for the same result); 
determine at least entity (Paragraph 65: the system may use an entity database to interpret acoustic transcripts and/or audible commands representing spoken user commands; Des Jardins provides “harry potter” and “spiderman” as two example entities), keyword (Paragraph 50, lines 14-18: the NPL may attempt to recognize and parse keywords from each received transcription), and action (Paragraph 50, lines 18-20: the detected keywords may comprise e.g. action entities or other types of action classifiers) features based on at least the set of validated tokens; and 
detect one or more intents based on at least the determined entity, keyword, and action features (Paragraph 94: the system determines a match phrase (i.e. intent) that best represents the operational command desired by the user, based on e.g. action classifiers as well as words/phrases from one or more different acoustic models and/or acoustic transcripts (i.e. determined entity, keyword, and action features); Paragraph 95: multiple match phrases (i.e. one or more intents) may be determined and subsequently considered by the system), wherein the operation is automatically executed based on an intent from the one or more intents (Paragraph 97, lines 17-25: the match phrase (i.e. intent) may be added to a response array; the system may execute one or more responses in the array; see also Paragraph 83, lines 19-26: the system may automatically execute actions based on the user’s intent).
	However, Des Jardins fails to explicitly teach that the user is in a vehicle, initiating an in-vehicle operation.
	In a related art, Sukumar teaches a voice assistant system for a vehicle (Abstract). Notably, Sukumar teaches inputting a voice input, converting the voice input into a text data file, analyzing the text data file to determine a requested action, and executing the requested action (Abstract). Furthermore, Sukumar directs their teachings specifically to a user in a vehicle initiating an in-vehicle operation (Paragraph 52: the vehicle occupant may use natural language commands to control a vehicle cockpit system).
	It would have been obvious to one of ordinary skill in the art at the time of filing to modify Des Jardins to incorporate the teachings of Sukumar to perform its functions in a vehicle. Doing so would have been a simple substitution of one known element for another to obtain predictable results.
Sukumar teaches a voice assistant system for a vehicle; however, the voice assistant system taught by Sukumar lacks the multi-lingual functionality taught by the instant application.
Des Jardins teaches a multi-lingual voice assistant system that is capable of performing the functions of the instant application. 
One of ordinary skill in the art could have substituted the voice assistant system in the vehicle taught by Sukumar with the multi-lingual voice assistant system taught by Des Jardins; the resulting combination would have been predictable because virtual assistant taught by Sukumar was already performing similar functions to the one taught by Des Jardins.
Thus, the combination of Des Jardins and Sukumar teaches:
A vehicle chatbot (Sukumar, Paragraph 52) device, comprising: 
a natural language processor (NLP) (Des Jardins, Paragraph 26: the example method may be performed by any computing device, for example the computing device 200; the computing device 200 is construed as a natural language processor, because the method further described by Des Jardins as being performed by this computing device 200 is a natural language process (see e.g. Fig. 3a, which includes natural language tasks such as processing and analyzing command data 309 and identifying/extracting key phrases 311)) configured to: 
generate a multilingual audio signal based on utterance by a user in a vehicle to initiate an in-vehicle operation (Sukumar, Paragraph 52: a vehicle occupant may use natural language commands to control a vehicle cockpit system), wherein the utterance is associated with a plurality of languages (Des Jardins, Paragraph 26, lines 12-22: a user’s audio or voice command (i.e. utterance to initiate an operation) is converted into an electronic voice signal (i.e. audio signal); Des Jardins, Paragraph 26, lines 1-3: the spoken command has multiple languages comingled within (that is, the utterance is associated with a plurality of languages));  
convert, for each of a plurality of language transcripts that corresponds to the plurality of languages, the multilingual audio signal into a text component (Des Jardins, Paragraph 39: at least two (i.e. a plurality) language databases and corresponding acoustic models (i.e. language transcripts corresponding to the plurality of languages) are used to process a command (i.e. convert the multilingual audio signal into a text component); see also Des Jardins, Paragraphs 57 and 58: the NLP 420 (note that, despite the fact that Des Jardins only refers to this particular module as a natural language processor, the system 200 as a whole, including e.g. the speech recognition module may be construed as a natural language processor) may utilize an acoustic model for native-English and native-Spanish speakers to generate a transcript (i.e. text component) in English and Spanish of the user’s command (i.e. multilingual audio signal)); 
generate, for the text component of each of the plurality of language transcripts, a plurality of tokens (Des Jardins, Paragraph 28, lines 4-7: the system generates a resulting acoustic transcript and parses the transcript to identify a plurality of phrases (i.e. generates a plurality of tokens) comprising the user’s spoken command (i.e. for the text component of each of the plurality of language transcripts); also Des Jardins, Paragraph 37: the speech module determines the particular words/phrases (i.e. tokens) being spoken by the user); 
validate the plurality of tokens that corresponds to each of the plurality of language transcripts by use of a language transcript dictionary associated with a respective language transcript, wherein the plurality of tokens is validated to obtain a set of validated tokens (Des Jardins, Paragraph 57 and 58: the NLP 420 (note that, despite the fact that Des Jardins only refers to this particular module as a natural language processor, the system 200 as a whole, including e.g. the speech recognition module may be construed as a natural language processor) may utilize an acoustic model for native-English and native-Spanish speakers to generate a transcript in English and Spanish of the user’s command; in addition, the NLP may utilize an English language database and Spanish language database (i.e. language transcript dictionary associated with a respective language transcript) to analyze the spoken command data and identify the various words or phrases (i.e. tokens) spoken by the user; by performing this second task in addition to the first, the second task may be considered a validation of the first task, as they both aim for the same result); 
determine at least entity (Des Jardins, Paragraph 65: the system may use an entity database to interpret acoustic transcripts and/or audible commands representing spoken user commands; Des Jardins provides “harry potter” and “spiderman” as two example entities), keyword (Des Jardins, Paragraph 50, lines 14-18: the NPL may attempt to recognize and parse keywords from each received transcription), and action (Des Jardins, Paragraph 50, lines 18-20: the detected keywords may comprise e.g. action entities or other types of action classifiers) features based on at least the set of validated tokens; and 
detect one or more intents based on at least the determined entity, keyword, and action features (Des Jardins, Paragraph 94: the system determines a match phrase (i.e. intent) that best represents the operational command desired by the user, based on e.g. action classifiers as well as words/phrases from one or more different acoustic models and/or acoustic transcripts (i.e. determined entity, keyword, and action features); Des Jardins, Paragraph 95: multiple match phrases (i.e. one or more intents) may be determined and subsequently considered by the system), wherein the operation is automatically executed based on an intent from the one or more intents (Des Jardins, Paragraph 97, lines 17-25: the match phrase (i.e. intent) may be added to a response array; the system may execute one or more responses in the array; see also Des Jardins, Paragraph 83, lines 19-26: the system may automatically execute actions based on the user’s intent).
In regards to claim 14, claim 14 is a device claim corresponding to the method of claim 2. Thus, it is rejected on similar grounds.
In regards to claim 15, claim 15 is a device claim corresponding to the method of claim 3. Thus, it is rejected on similar grounds.
In regards to claim 16, claim 16 is a device claim corresponding to the method of claim 4. Thus, it is rejected on similar grounds.
In regards to claim 17, claim 17 is a device claim corresponding to the method of claim 5. Thus, it is rejected on similar grounds.
In regards to claim 18, claim 18 is a device claim corresponding to the method of claim 6. Thus, it is rejected on similar grounds.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Bellegarda et al. (U.S. Patent 10,067,938 B2) teaches systems and processes for multilingual word prediction (Abstract) directed particularly towards communications with more than one language interleaved in a particular script (Col. 1, lines 21-25).
Li et al. (U.S. Patent Application Publication 2021/0027784 A1) teaches a speech recognition method (Abstract) or performing speech recognition on multilingual speech (Paragraph 4), notably utilizing a dictionary including words in different languages (Paragraph 98).
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ALEXANDER J KIM whose telephone number is (571)272-4442. The examiner can normally be reached M-F 7:30 AM - 5:30 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Andrew Flanders can be reached on (571) 272-7516. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/ALEXANDER JOONGIE KIM/Examiner, Art Unit 2655     

/ANDREW C FLANDERS/Supervisory Patent Examiner, Art Unit 2655