DETAILED ACTION
Introduction
1.	This office action is in response to Applicant’s submission filed on 9/13/2022.   Claims 1-20 are pending in the application and have been examined.

Notice of Pre-AIA  or AIA  Status
2.	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
3.	The information disclosure statement (IDS) submitted on November 10, 2022 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Response to Arguments
4.	Applicant’s arguments with respect to Claims 1, 5, and 14 have been considered but are moot in view of the new ground of rejection relying on Kim.

Claim Rejections - 35 USC § 103
5.	In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

6.	Claims 5-8, 10-17, 19, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over US Pat. App. Pub. No. 20130021459 (Vasilieff et al., hereinafter “Vasilieff”) (Cited in IDS of March 8, 2022) in view of US Pat. App. Pub. No. 20210110821 (Lim et al., hereinafter “Lim”) and US Pat. App. Pub. No. 20160210116 (Kim et al., hereinafter “Kim”).
With regard to Claim 5, Vasilieff describes:
“A computer-implemented method comprising:
receiving input audio data representing an utterance captured by a first device; (Paragraph 45 describes that audio input 1002 is received (Figure 10))
receiving input image data corresponding to the input audio data; (Paragraph 45 describes that an image on which face detection is performed is received.)
determining the utterance is directed from a first user to a second user; (Paragraph 36 describes that the device determines which audio is directed to the device and which audio is directed elsewhere (which is ignored))
Vasilieff does not explicitly describe:
“in response to determining the utterance is directed from the first user to the second user, using a first component to determine a system response to the utterance is to be generated;
processing the input audio data to determining output data; and
causing the first device to present the output data.”
However, Lim describes:
“processing the input audio data to determining output data; (Paragraphs 51 and 54 describe turning a text response to a user utterance to speech.) and
causing the first device to present the output data.”  (Paragraph 52 describes outputting a speech response.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the response to the user as described by Lim into the system of Vasilieff to provide responses tailored to a user, as described at paragraph 50 of Lim.
Vasilieff in view of Lim does not explicitly describe “in response to determining the utterance is directed from the first user to the second user, using a first component to determine a system response to the utterance is to be generated.”
However, paragraph 150 and 151 of Kim describe a device that reviews past inputs by a first and a second user, and then suggests responses (stickers) based on new inputs.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the dialogue history and suggested responses as described by Kim into the system of Vasilieff in view of Lim to provide a response tailored based on previous dialogues, as described at paragraph 146 of Kim.
With regard to Claim 6, Vasilieff does not explicitly describe this subject matter.
However, Lim describes:
determining user profile data corresponding to at least one of the first user and the second user; (Paragraph 39 describes that user data can be obtained based on the user being recognized in an image.) and
processing at least a first portion of the user profile data using the first component to determine the system response to the utterance is to be generated.  (Paragraph 49 describes that the response to the user may depend on the user profile information.  The user age (paragraph 47) is cited as “a first portion.”)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the response to the user as described by Lim into the system of Vasilieff to provide responses tailored to a user, as described at paragraph 50 of Lim.
With regard to Claim 7, Vasilieff does not explicitly describe this subject matter.
However, Lim describes “processing at least a second portion of the user profile data to determine the output data.”  (Paragraph 49 describes that the response to the user may depend on the user profile information.  The user gender (paragraph 47) is cited as “a second portion.”)
With regard to Claim 8, Vasilieff does not explicitly describe this subject matter.
However, Lim describes:
“performing speech processing using the input audio data to determine speech processing result data; (Paragraph 47 describes that speech recognition is performed in user speech 30.) and
determining that the speech processing result data corresponds to an actionable command performable by a system, (Paragraph 41 describes that the detected speech may be a wake word (cited as “an actionable command.))
wherein the first component uses data representing that the speech processing result data corresponds to the actionable command in determining the system response to the utterance is to be generated.”  (Paragraph 41 describes that the user response to the wake word command may depend on the user profile information (age).)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the response to the user as described by Lim into the system of Vasilieff to provide responses tailored to a user, as described at paragraph 50 of Lim.
With respect to Claim 10, Vasilieff in view of Lim does not explicitly describe these features.
However, Kim describes:
 determining dialog data representing at least one previous exchange between the first user and the second user; (Paragraph 139 describes determining dialog data representing at least one previous exchange.)
processing the dialog data and the input audio data to determine the utterance refers to an entity represented in the dialog data; (Paragraph 138 describes an utterance refers to an entity (“That player” is an entity.)) and
determining the output data based at least in part on the entity.  (Paragraph 138 describes a recommended sticker (“output data”) based on the entity.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the dialogue history and entity detection as described by Kim into the system of Vasilieff in view of Lim to provide a response tailored based on previous dialogues, as described at paragraph 146 of Kim.
With respect to Claim 11, Vasilieff in view of Lim does not explicitly describe these features.
However, Kim describes:
determining dialog data representing at least one previous exchange between the first user and the second user; (Paragraph 139 describes determining dialog data representing at least one previous exchange.) and
storing updated dialog data representing the output data. (Paragraph 140 describes updating dialog data based on new utterances.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the updated dialogue data as described by Kim into the system of Vasilieff in view of Lim to provide a response tailored based on previous dialogues, as described at paragraph 146 of Kim.
With regard to Claim 12, Vasilieff describes:
“determining the utterance is directed from the first user to the second user comprises:
using the input audio data, the input image data, and a second component to determine second output data; (Paragraph 36 describes that both users are included in the image so that facial feature detection can determine when each is talking.  The facial feature detector is “a second component” and whose lips are moving and where each user is looking are “second output data.”) and
processing the second output data to determine the utterance is directed from the first user to the second user.” (Paragraph 36 describes that both users are included in the image so that facial feature detection can determine when each is talking.  The whose lips are moving and where each user is looking are “second output data.”)
With regard to Claim 13, Vasilieff describes “the input image data includes a representation of at least one of the first user or the second user.”  (Paragraph 36 describes that both users are included in the image so that facial feature detection can determine when each is talking.)
With respect to Claims 14-17, 19, and 20, system Claims 14-17, 19, and 20 and method Claims 5-8, 10, and 11 are related as system and the method of using same, with each claimed element's function corresponding to the claimed method step. Further, Vasilieff describes system memory 130 (paragraph 30) and processor 120 (paragraph 30).  Accordingly, Claims 14-17, 19, and 20 are similarly rejected under the same rationale as applied above with respect to Claims 5-8, 10, and 11.

7.	Claims 9 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Vasilieff in view of Lim and Kim further in view of US Pat. App. Pub. No. 20180040046 (Gotoh et al., hereinafter “Gotoh”).
With respect to Claim 9, Vasilieff in view of Lim and Kim does not explicitly describe these features.
However, Gotoh describes:
“receiving time data corresponding to the input audio data; (Paragraph 131 describes that a user (second) utterance time is received.)
processing the time data using the first component to determine the system response to the utterance is to be generated; (Paragraph 132 describes that the utterance time is compared to the device (first) utterance time to determine a response to the user (success or failure to sell an additional item).) and
processing the time data to determine timing of presentation of the output data.”  (Paragraphs 104-106 describe that the cash register settlement is output to the user after success or failure is determined.   The cash register settlement can only occur after success or failure is determined, which is based on the time data.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the timing data as described by Gotoh into the system of Vasilieff in view of Lim and Kim to provide correct and timely responses to a user, as described at paragraph 104 of Gotoh.
With respect to Claim 18, system Claim 18 and method Claim 9 are related as system and the method of using same, with each claimed element's function corresponding to the claimed method step. Further, Vasilieff describes system memory 130 (paragraph 30) and processor 120 (paragraph 30).  Accordingly, Claim 18 is similarly rejected under the same rationale as applied above with respect to Claim 9.

8.	Claims 1-4 are rejected under 35 U.S.C. 103 as being unpatentable over Vasilieff in view of Lim, Kim, and US Pat. App. Pub. No. 20210104242 (Hashimoto et al., hereinafter “Hashimoto”).
With respect to Claim 1, Vasilieff describes:
“A computer-implemented method comprising:
receiving second input audio data representing a second utterance spoken by a first user; (Paragraph 45 describes that audio input 1002 is received (Figure 10))
receiving first input image data representing the first user speaking the second utterance; (Paragraph 45 describes that an image on which face detection is performed is received.)
processing the second input audio data and the first input image data to determine the first user is speaking the second utterance to a second user; (Paragraph 36 describes that the device determines which audio is directed to the device and which audio is directed elsewhere (which is ignored))
Vasilieff does not explicitly describe:
“receiving, by a user device operating in a first mode, first input audio data representing a first utterance initiated by a wakeword;
processing the first utterance to determine a command to operate in a second mode corresponding to system participation in a conversation between at least two users;
beginning operation in the second mode; 
in response to determining the first user is speaking the second utterance to the second user and to the operation in the second mode, using a first component to determine a system response to the second utterance is to be generated;
in response to determining the system response to the second utterance is to be generated, processing the second input audio data to determine first output data responsive to the second utterance; and
causing the user device to present the first output data.”
However, Lim describes:
“receiving, by a user device operating in a first mode, first input audio data representing a first utterance initiated by a wakeword; (Paragraph 43 describes receiving and identifying a wakeword.)
in response to determining the system response to the second utterance is to be generated, processing the second input audio data to determine first output data responsive to the second utterance; (Paragraphs 51 and 54 describe turning a text response to a user utterance to speech.) and
causing the user device to present the first output data.”  (Paragraph 52 describes outputting a speech response.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the wakeword and response to the user as described by Lim into the system of Vasilieff to provide responses tailored to a user, as described at paragraph 50 of Lim.
Vasilieff in view of Lim does not explicitly describe:
“in response to determining the first user is speaking the second utterance to the second user and to the operation in the second mode, using a first component to determine a system response to the second utterance is to be generated; 
processing the first utterance to determine a command to operate in a second mode corresponding to system participation in a conversation between at least two users;
beginning operation in the second mode.”
However, Kim describes “in response to determining the first user is speaking the second utterance to the second user and to the operation in the second mode, using a first component to determine a system response to the second utterance is to be generated.”
However, paragraph 150 and 151 of Kim describe a device that reviews past inputs by a first and a second user, and then suggests responses (stickers) based on new inputs.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the dialogue history and suggested responses as described by Kim into the system of Vasilieff in view of Lim to provide a response tailored based on previous dialogues, as described at paragraph 146 of Kim.
Vasilieff in view of Lim and Kim does not explicitly describe:
“processing the first utterance to determine a command to operate in a second mode corresponding to system participation in a conversation between at least two users;
beginning operation in the second mode.”
However, Hashimoto describes:
“processing the first utterance to determine a command to operate in a second mode corresponding to system participation in a conversation between at least two users; (Paragraph 50 describes that a hot word (“a command”) received by the device will place it in conversation mode.) 
beginning operation in the second mode.”  (Paragraph 51 of Hashimoto describes beginning a conversation mode between multiple users using devices 1A to 1D.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the conversation mode as described by Hashimoto into the system of Vasilieff in view of Lim and Kim to provide the ability for multiple users to use the system, as described at paragraph 51 of Hashimoto.
With regard to Claim 2, Vasilieff does not explicitly describe these features.
However, Lim describes:
“processing the second input audio data to determine speech processing result data; (Paragraphs 51 and 54 describe turning a text response to a user utterance to speech.) and
 determining user profile data corresponding to the first user.”  (Paragraph 39 describes that user data can be obtained based on the user being recognized in an image.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the response to the user as described by Lim into the system of Vasilieff to provide responses tailored to a user, as described at paragraph 50 of Lim.
Vasilieff in view of Lim does not explicitly describe:
“determining dialog data representing at least one previous exchange between the first user and the second user,
wherein using the first component to determine a system response to the second utterance is to be generated comprises using the speech processing result data, the user profile data, the dialog data and the first component.”
However, Kim describes:
“determining dialog data representing at least one previous exchange between the first user and the second user, (Paragraph 139 describes determining dialog data representing at least one previous exchange.)
wherein using the first component to determine a system response to the second utterance is to be generated comprises using the speech processing result data, the user profile data, the dialog data and the first component.” (Paragraph 138 describes a recommended sticker (“system response”) based on the dialog data.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the dialogue history as described by Kim into the system of Vasilieff in view of Lim to provide a response tailored based on previous dialogues, as described at paragraph 146 of Kim.
With regard to Claim 3, Vasilieff in view of Lim do not explicitly describe these features.
However, Kim describes:
“determining dialog data representing at least one previous exchange between the first user and the second user; (Paragraph 139 describes determining dialog data representing at least one previous exchange.)
processing the dialog data and the second input audio data to determine the second utterance refers to an entity represented in the dialog data; (Paragraph 138 describes an utterance refers to an entity (“That player” is an entity.))
determining the first output data based at least in part on the entity; (Paragraph 138 describes a recommended sticker (“output data”) based on the entity.) and
storing data representing the first output data as part of second dialog data. (Paragraph 140 describes updating dialog data based on new utterances.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the entity detection and updated dialogue data as described by Kim into the system of Vasilieff in view of Lim to provide a response tailored based on previous dialogues, as described at paragraph 146 of Kim.
With regard to Claim 4, Vasilieff describes:
receiving second input image data to determine the second user performed a gesture directed at the first user; (Paragraph 47 describes that the device determines if the first user is looking at the device or elsewhere, such as at a second user.  Looking at a second user rather than the device is a “gesture.”)
processing encoded data corresponding to the second input image data using the first component to determine a system response to the gesture is to be generated; (Paragraphs 36 and 37 describe that the device will process speech when the first user is looking at the camera, and may ignore speech otherwise.)
in response to determining the system response to the gesture is to be generated, processing the second input image data to determine second output data; (Paragraph 57 describes that the speech processor operates or not based on the image, and output to the user is based on the output from the speech processor.)
causing the user device to present the second output data.  (Paragraph 57 describes that output 1218 is provided to the user.)

Conclusion
9.	The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
U.S. Patent App. Pub. No. 20210312938 (Yun et al.) describes a device that generates a response to a first user utterance to a second user, as it translates the utterance.
10.	Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to EDWARD TRACY whose telephone number is (571)272-8332. The examiner can normally be reached Monday-Friday 9 AM- 5PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh Mehta can be reached on 571-272-7453. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/EDWARD TRACY JR./Examiner, Art Unit 2656                                                                                                                                                                                                        
/BHAVESH M MEHTA/Supervisory Patent Examiner, Art Unit 2656