DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
1.	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Priority Acknowledgment
2.               Acknowledgment is made of applicant’s claim for foreign priority under 35 U.S.C. 119 (a)-(d). The certified copy has been filed in Application 10-2019-0125169 on 10/10/2019 in Republic of Korea. 

Information Disclosure Statement
3.	The information disclosure statement (IDS) submitted on 10/07/2020, 04/01/2021, 01/03/2022 and 09/30/2022 is in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement is being considered by the examiner.

Claim Rejections - 35 USC § 101
4.	35 U.S.C. 101 reads as follows: 
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title. 

5.	Claims 1-22 are rejected under 35 U.S.C. 101 because the claimed invention is directed to a judicial exception (i.e., a law of nature, a natural phenomenon, or an abstract idea) without significantly more.
	All of the claims are directed towards the statutory category of  a machine/apparatus or process.  
Claim 1 recites
 	“1. An electronic apparatus comprising: 
 	a microphone; 
 	a camera; a memory configured to store at least one command; and at least one processor configured to be connected to the microphone, the camera, and the memory and control the electronic apparatus, wherein the at least one processor is further configured, by executing the at least one command, to: 
 	based on a first user voice being input by a user, obtain and provide a response to the first user voice, 
 	based on an audio signal including a voice being input while the response to the first user voice is being provided, analyze an image captured by the camera and determine whether there is a second user voice uttered by the user in the audio signal, and 
 	based on determining that the second user voice uttered by the user is in the audio signal, stop the providing of the response to the first user voice and obtain and provide a response to the second user voice.”
	Claim 11 recites 
	“11. A controlling method of an electronic apparatus, the method comprising: 
 	based on a first user voice being input by a user, providing a response to the first user voice; 
 	based on an audio signal including a voice being input while the response to the first user voice is being provided, analyzing an image captured by a camera and determining whether a second user voice uttered by the user is in the audio signal; and 
 	based on determining that the second user voice uttered by the user is in the audio signal, stopping the providing of the response to the first user voice and obtaining and providing a response to the second user voice.”
 	The independent claims 1 and 11 recite substantially the same concept but do so in the context of an apparatus and a method. 
	The limitation of “providing a response to the first user voice based on a first user voice being input by a user.... analyze an image captured by the camera and determine whether there is a second user voice uttered by the user in the audio signal based on an audio signal including a voice being input while the response to the first user voice is being provided and stopping the providing of the response to the first user voice and obtaining and providing a response to the second user voice based on determining that the second user voice uttered by the user is in the audio signal.” as drafted covers mental activities. More specifically, while a human provides a response to a first user voice, the human could hear another second user voice. By looking at a person who utters the second user voice, the human could determine whether the person who utters the second user voice is the same as the person who utters the first user voice. If so, the human could stop providing the response to the first user voice and providing a response to the second user voice.
	The judicial exception is not integrated into a practical application. In particular, claim 1 recites additional elements of “a microphone,” “a camera,” “a memory,” and “a processor” The additional element(s) or combination of elements such as a microphone, a camera, a memory, and a processor in the claim(s) other than the abstract idea per se amount(s) to no more than (i) mere instructions to implement the idea on a computer, and/or (ii) recitation of generic computer structure that serves to perform generic computer functions that are well-understood, routine, and conventional activities previously known to the pertinent industry. Viewed as a whole, these additional claim element(s) do not provide meaningful limitation(s) to transform the abstract idea into a patent eligible application of the abstract idea such that the claim(s) amounts to significantly more than the abstract idea itself. Therefore, the claim(s) are rejected under 35 U.S.C. 101 as being directed to non-statutory subject matter. There is further no improvement to the computing device other than translating the dialect. The mere recitation of “a microphone, a camera, a memory, and a processor” and/or the like is akin of adding the word “apply it” and/or “use it” with a computer in conjunction with the abstract idea. The paragraph [0010] of the specification discloses “In accordance with an aspect of the disclosure, an electronic apparatus is provided. The electronic apparatus includes a microphone, a camera, a memory configured to store at least one command, and at least one processor configured to be connected to the microphone, the camera, and the memory and control the electronic apparatus.” As filed in the specification, the computer is listed as a general-purpose computer and are mainly used as an application thereof. Accordingly, these additional elements do not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. 
The claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to the integration of the abstract idea into a practical application, the additional element of using a computer is noted as a general computer as noted. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. The claims are not patent eligible.
 	The dependent claims further do not remedy the issues noted above. More specifically, claims 2 and 12 recite determining whether the person who utters the second user voice is the same as the person who utters the first user voice by extracting an area including the registered user in the image. This reads on a human could look at the person who utters the second user voice to determining whether who utters the second user voice is the same as the person who utters the first user. No additional limitations are presented. Claims 3 and 13 recite determining whether the second user voice uttered by the registered user by looking at the lip area of the registered user. This reads on a human could determining whether one person is speaking by looking at the lip movement of that person. No additional limitations are presented. Claims 4 and 14 recite storing the audio signal in a buffer without regard to whether the second user voice uttered by the user in the audio signal. This reads on a human could listen to and memory the audio signal. No additional limitations are presented. Claims 5 and 15 recite providing a response to the second user voice if the person who utters the second user voice is the same as the person who utters the first user voice, if not, ignoring the second user voice. This reads on a human keep listening and talking with the person who is talking with the human and ignoring any questions from other people. No additional limitations are presented. Claims 6 and 16 recite understanding the second user voice based on identifying category information in the first user voice. This reads on a human could understand what the user is speaking based on the what category the user spoken previously. No additional limitations are presented. Claims 7 and 17 recite determining the relationship between the category information in the first user voice and the category information in the second user voice in order to determine whether transmitting information regarding the second user voice to the server. Transmitting information to the server is well-known, routine and conventional. No additional limitations are presented. Claims 8 and 18 recite performing natural language understanding regarding the second user voice based on the identified sentience type of the second user voice. This reads on a human could listen to and understanding semantics of the sentence based on what type of question, e.g., declarative, interrogative, imperative and exclamative sentence. No additional limitations are presented. Claims 9 and 19 recite providing the response to the second voice based on the second voice and the conversation history information. This reads on a human acts in the conversation, the response to the current question/command not only based on the current question/command but also based on the previously information. No additional limitations are presented. Claims 10 and 20 recites inquiring whether to stop providing the response to the first user voice. This reads on a human makes a confirmation to make user if the user changes his request. No additional limitations are presented. Claim 21 recites transmitting information regarding the second user voice to a server if the identified sentence is interrogative sentence. This reads on a human consults another one in order to make response to the interrogative sentence. No additional limitations are presented. Claim 22 recites a priority of the second user voice to transmit the information regarding the second user voice to the server. Considering the priority of the voice information in the conversation is mental process. Transmitting information to the server is well-known, routine and conventional. Merely appending well understand, routine, conventional activities previously known to the industry, specified at a high level of generality is not considered as providing significantly more. No additional limitations are presented.
For at least the supra provided reasons, claims 1-22 are rejected under 35 U.S.C. 101 as being directed to non-statutory subject matter. 

  Claim Rejections - 35 USC § 102
6.	The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

7.	Claims 1, 4-6, 8, 11, 14-16, 18 are rejected under 35 U.S.C. 102(a) (2) as being anticipated by Alameh et al. (US 2020/0160857 A1.) 

	With respect to Claim 1, Alameh et al. disclose 
 	An electronic apparatus comprising: 
 	a microphone (Alameh et al. [0050] the one or more microphones 309); 
 	a camera (Alameh et al. [0062] The imaging system 312 can include an imager. In one embodiment, the imager comprises a two-dimensional imager configured to receive at least one image of an environment of the electronic device 300. In one embodiment, the imager comprises a two-dimensional Red-Green-Blue (RGB) imager. In another embodiment, the imager comprises an infrared imager. Other types of imagers will be obvious to those of ordinary skill in the art having the benefit of this disclosure, see paragraphs [0040, 0063-0065]); 
 	a memory configured to store at least one command; and at least one processor configured to be connected to the microphone, the camera, and the memory and control the electronic apparatus, wherein the at least one processor is further configured, by executing the at least one command (Alameh et al. Fig. 3 elements 314, 304, 312, see paragraph [0045]), to: 
 	based on a first user voice being input by a user, obtain and provide a response to the first user voice (Alameh et al. Fig. 2 elements 201-204 Receive process initiation command as voice input, Extract identifying characteristics from received voice input, Store voice print reference in memory as “initiation voice” and Initiate process in response to process initiation command),  
 	based on an audio signal including a voice being input while the response to the first user voice is being provided, analyze an image captured by the camera and determine whether there is a second user voice uttered by the user in the audio signal (Alameh et al. [0072] In one embodiment, the identification system 313 determines from whom audio input 318,319 is received, and whether the first audio input 318 is received from the same person as the second audio input 319, by capturing one or more of images with the imager or depth scans with the depth scanner to detect lip movements as the audio input 318 is received. Illustrating by example, when the electronic device 300 receives the audio input 318, the imager system 312 can monitor persons within the environment of the electronic device 300 to determine who is speaking. When later voice input 319 is received, the imager system 312 can perform a similar operation to determine whether the person delivering audio input 319 is the same person that delivered audio input 318, see paragraph [0076]), and 
 	based on determining that the second user voice uttered by the user is in the audio signal, stop the providing of the response to the first user voice and obtain and provide a response to the second user voice (Alameh et al. Abstract The one or more sensors receive a first audio input defining a process initiation command and initiate, in response to the process initiation command, a process. Thereafter, the one or more sensors receive a second audio input defining a process cessation command. The one or more processors determine whether one or more substantially matching audio characteristics are present in both the first audio input and the second audio input...Where they are present, the one or more processors cease the process in response to the process cessation command, [0023] when a user starts an action with the electronic device by way of a voice command, the same user—and only the same user—can stop the action or change the action while is continuing, [0026] only the person delivering the process initiation command could, successfully, deliver a process cessation command causing the process to stop. If one or more audio characteristics from the process initiation command fail to sufficiently match audio characteristics from the process cessation command, in one or more embodiments one or more processors of the electronic device will ignore the process cessation command. Advantageously, while anyone can use an electronic device to start a process in one or more embodiments, only the person who started the process can stop it, [0028] only that person who started the process or action can stop or change it, [0038] Where decision 208 determines one or more substantially matching audio characteristics (or voice prints) are present in both the first audio input received at step 201 and the second audio input received at step 205, the method moves to step 209 which can comprise executing, with the one or more processors of the electronic device, the process control command identified at step 205. Where the process control command comprises a process cessation command, step 209 can comprise ceasing the process in response to the process cessation command. Similarly, where the process control command comprises a process modification command, step 209 can comprise modifying the process, e.g., adjusting volume, brightness, content selection, and so forth, in response to the process cessation command.)

 	With respect to Claim 4, Alameh et al. disclose 
 	wherein the at least one processor is further configured to, based on the audio signal being input while the response to the first user voice is being provided, store the audio signal in a buffer without regard to whether the second user voice uttered by the user is in the audio signal (Alameh et al. [0036] At step 206, the method 200 extracts one or more audio characteristics from the audio input received at step 205. As before, these audio characteristics can include identifying characteristics that distinguish the audio input received at step 205 from other audio input received from another person. The audio characteristics can also include the audio input itself, saved as a digital file.)

 	With respect to Claim 5, Alameh et al. disclose 
wherein the at least one processor is further configured to: 
based on determining that the second user voice uttered by the user is in the audio signal, obtain and provide the response to the second user voice based on the audio signal stored in the buffer (Alameh et al. [0038] Where decision 208 determines one or more substantially matching audio characteristics (or voice prints) are present in both the first audio input received at step 201 and the second audio input received at step 205, the method moves to step 209 which can comprise executing, with the one or more processors of the electronic device, the process control command identified at step 205. Where the process control command comprises a process cessation command, step 209 can comprise ceasing the process in response to the process cessation command. Similarly, where the process control command comprises a process modification command, step 209 can comprise modifying the process, e.g., adjusting volume, brightness, content selection, and so forth, in response to the process cessation command); and 
 based on determining that the second user voice uttered by the user is not in the audio signal, ignore the audio signal stored in the buffer (Alameh et al. [0039] By contrast, where decision 208 determines the one or more substantially matching audio characteristics are absent from one of the first audio input received at step 201 or the second audio input received at step 205, the method 200 can move to step 210, which can comprise ignoring, by the one or more processors, the process control command. Accordingly, the process can continue.)

 	With respect to Claim 6, Alameh et al. disclose
 	wherein the at least one processor is further configured to:
identify category information regarding the first user voice and store the information in the memory (Alameh et al. [0024] following the receipt of a voice command, an electronic device configured in accordance with one or more embodiments of the disclosure captures one or more voice characteristics corresponding to the voice command. The electronic device then initiates a process requested by the voice command. Examples of such processes include playing music, presenting images or videos, making voice calls, sending text messages or multimedia messages, interacting with remote computer systems across a network, storing data in memory, and so forth, [0036] At step 206, the method 200 extracts one or more audio characteristics from the audio input received at step 205. As before, these audio characteristics can include identifying characteristics that distinguish the audio input received at step 205 from other audio input received from another person. The audio characteristics can also include the audio input itself, saved as a digital file); 
 based on determining that the second user voice uttered by the user is in the audio signal, identify category information regarding the second user voice (Alameh et al. [0038] Where decision 208 determines one or more substantially matching audio characteristics (or voice prints) are present in both the first audio input received at step 201 and the second audio input received at step 205, the method moves to step 209 which can comprise executing, with the one or more processors of the electronic device, the process control command identified at step 205. Where the process control command comprises a process cessation command, step 209 can comprise ceasing the process in response to the process cessation command. Similarly, where the process control command comprises a process modification command, step 209 can comprise modifying the process, e.g., adjusting volume, brightness, content selection, and so forth, in response to the process cessation command); and 
 perform natural language understanding regarding the second user voice based on the category information regarding the first user voice and the category information regarding the second user voice (Alameh et al. [0023] provide electronic devices and systems where when a user starts an action with the electronic device by way of a voice command, the same user—and only the same user—can stop the action or change the action while it is continuing.)

 	With respect to Claim 8, Alameh et al. disclose
 	wherein the at least one processor is further configured to:
 	based on determining that the second user voice uttered by the user is in the audio signal, identify a sentence type of the second user voice (Alameh et al. [0038] Where decision 208 determines one or more substantially matching audio characteristics (or voice prints) are present in both the first audio input received at step 201 and the second audio input received at step 205, the method moves to step 209 which can comprise executing, with the one or more processors of the electronic device, the process control command identified at step 205. Where the process control command comprises a process cessation command, step 209 can comprise ceasing the process in response to the process cessation command. Similarly, where the process control command comprises a process modification command, step 209 can comprise modifying the process, e.g., adjusting volume, brightness, content selection, and so forth, in response to the process cessation command. The Examiner notes that the sentence type of the second user voice is imperative sentence); and 
 	perform natural language understanding regarding the second user voice, based on the identified sentence type (Alameh et al. [0038] Where decision 208 determines one or more substantially matching audio characteristics (or voice prints) are present in both the first audio input received at step 201 and the second audio input received at step 205, the method moves to step 209 which can comprise executing, with the one or more processors of the electronic device, the process control command identified at step 205. Where the process control command comprises a process cessation command, step 209 can comprise ceasing the process in response to the process cessation command. Similarly, where the process control command comprises a process modification command, step 209 can comprise modifying the process, e.g., adjusting volume, brightness, content selection, and so forth, in response to the process cessation command.)

 	With respect to Claim 11, Alameh et al. disclose 
 	A controlling method of an electronic apparatus, the method comprising: 
 	based on a first user voice being input by a user, providing a response to the first user voice (Alameh et al. Fig. 2 elements 201-204 Receive process initiation command as voice input, Extract identifying characteristics from received voice input, Store voice print reference in memory as “initiation voice” and Initiate process in response to process initiation command); 
 	based on an audio signal including a voice being input while the response to the first user voice is being provided, analyzing an image captured by a camera and determining whether a second user voice uttered by the user is in the audio signal (Alameh et al. [0072] In one embodiment, the identification system 313 determines from whom audio input 318,319 is received, and whether the first audio input 318 is received from the same person as the second audio input 319, by capturing one or more of images with the imager or depth scans with the depth scanner to detect lip movements as the audio input 318 is received. Illustrating by example, when the electronic device 300 receives the audio input 318, the imager system 312 can monitor persons within the environment of the electronic device 300 to determine who is speaking. When later voice input 319 is received, the imager system 312 can perform a similar operation to determine whether the person delivering audio input 319 is the same person that delivered audio input 318, see paragraph [0076]); and 
 	based on determining that the second user voice uttered by the user is in the audio signal, stopping the providing of the response to the first user voice and obtaining and providing a response to the second user voice (Alameh et al. Abstract The one or more sensors receive a first audio input defining a process initiation command and initiate, in response to the process initiation command, a process. Thereafter, the one or more sensors receive a second audio input defining a process cessation command. The one or more processors determine whether one or more substantially matching audio characteristics are present in both the first audio input and the second audio input...Where they are present, the one or more processors cease the process in response to the process cessation command, [0023] when a user starts an action with the electronic device by way of a voice command, the same user—and only the same user—can stop the action or change the action while is continuing, [0026] only the person delivering the process initiation command could, successfully, deliver a process cessation command causing the process to stop. If one or more audio characteristics from the process initiation command fail to sufficiently match audio characteristics from the process cessation command, in one or more embodiments one or more processors of the electronic device will ignore the process cessation command. Advantageously, while anyone can use an electronic device to start a process in one or more embodiments, only the person who started the process can stop it, [0028] only that person who started the process or action can stop or change it, [0038] Where decision 208 determines one or more substantially matching audio characteristics (or voice prints) are present in both the first audio input received at step 201 and the second audio input received at step 205, the method moves to step 209 which can comprise executing, with the one or more processors of the electronic device, the process control command identified at step 205. Where the process control command comprises a process cessation command, step 209 can comprise ceasing the process in response to the process cessation command. Similarly, where the process control command comprises a process modification command, step 209 can comprise modifying the process, e.g., adjusting volume, brightness, content selection, and so forth, in response to the process cessation command.)

 	With respect to Claim 14, Alameh et al. disclose 
 	further comprising: 
 	based on the audio signal being input while the response to the first user voice is being provided, storing the audio signal in a buffer without regard to whether the second user voice uttered by the user is in the audio signal (Alameh et al. [0036] At step 206, the method 200 extracts one or more audio characteristics from the audio input received at step 205. As before, these audio characteristics can include identifying characteristics that distinguish the audio input received at step 205 from other audio input received from another person. The audio characteristics can also include the audio input itself, saved as a digital file.)

 	With respect to Claim 15, Alameh et al. disclose 
wherein the obtaining and providing of the response to the second user voice comprises: 
based on determining that the second user voice uttered by the user is in the audio signal, obtaining a response to the second user voice based on the audio signal stored in the buffer (Alameh et al. [0038] Where decision 208 determines one or more substantially matching audio characteristics (or voice prints) are present in both the first audio input received at step 201 and the second audio input received at step 205, the method moves to step 209 which can comprise executing, with the one or more processors of the electronic device, the process control command identified at step 205. Where the process control command comprises a process cessation command, step 209 can comprise ceasing the process in response to the process cessation command. Similarly, where the process control command comprises a process modification command, step 209 can comprise modifying the process, e.g., adjusting volume, brightness, content selection, and so forth, in response to the process cessation command), and 
 based on determining that the second user voice uttered by the user is not in the audio signal, ignoring the audio signal stored in the buffer (Alameh et al. [0039] By contrast, where decision 208 determines the one or more substantially matching audio characteristics are absent from one of the first audio input received at step 201 or the second audio input received at step 205, the method 200 can move to step 210, which can comprise ignoring, by the one or more processors, the process control command. Accordingly, the process can continue.)

 	With respect to Claim 16, Alameh et al. disclose
 	further comprising: 
identifying category information regarding the first user voice and storing the category information (Alameh et al. [0024] following the receipt of a voice command, an electronic device configured in accordance with one or more embodiments of the disclosure captures one or more voice characteristics corresponding to the voice command. The electronic device then initiates a process requested by the voice command. Examples of such processes include playing music, presenting images or videos, making voice calls, sending text messages or multimedia messages, interacting with remote computer systems across a network, storing data in memory, and so forth, [0036] At step 206, the method 200 extracts one or more audio characteristics from the audio input received at step 205. As before, these audio characteristics can include identifying characteristics that distinguish the audio input received at step 205 from other audio input received from another person. The audio characteristics can also include the audio input itself, saved as a digital file), 
wherein the obtaining and providing of the response to the second user voice comprises: 
 	 	based on determining that the second user voice uttered by the user is in the audio signal, identifying category information regarding the second user voice (Alameh et al. [0038] Where decision 208 determines one or more substantially matching audio characteristics (or voice prints) are present in both the first audio input received at step 201 and the second audio input received at step 205, the method moves to step 209 which can comprise executing, with the one or more processors of the electronic device, the process control command identified at step 205. Where the process control command comprises a process cessation command, step 209 can comprise ceasing the process in response to the process cessation command. Similarly, where the process control command comprises a process modification command, step 209 can comprise modifying the process, e.g., adjusting volume, brightness, content selection, and so forth, in response to the process cessation command); and 
 	 performing natural language understanding regarding the second user voice based on the category information regarding the first user voice and the category information regarding the second user voice (Alameh et al. [0023] provide electronic devices and systems where when a user starts an action with the electronic device by way of a voice command, the same user—and only the same user—can stop the action or change the action while it is continuing.)

 	With respect to Claim 18, Alameh et al. disclose
 	user voice comprises: 
 	based on determining that the second user voice uttered by the user is in the audio signal, identifying a sentence type of the second user voice (Alameh et al. [0038] Where decision 208 determines one or more substantially matching audio characteristics (or voice prints) are present in both the first audio input received at step 201 and the second audio input received at step 205, the method moves to step 209 which can comprise executing, with the one or more processors of the electronic device, the process control command identified at step 205. Where the process control command comprises a process cessation command, step 209 can comprise ceasing the process in response to the process cessation command. Similarly, where the process control command comprises a process modification command, step 209 can comprise modifying the process, e.g., adjusting volume, brightness, content selection, and so forth, in response to the process cessation command. The Examiner notes that the sentence type of the second user voice is imperative sentence); and 
 	performing natural language understanding regarding the second user voice based on the identified sentence type (Alameh et al. [0038] Where decision 208 determines one or more substantially matching audio characteristics (or voice prints) are present in both the first audio input received at step 201 and the second audio input received at step 205, the method moves to step 209 which can comprise executing, with the one or more processors of the electronic device, the process control command identified at step 205. Where the process control command comprises a process cessation command, step 209 can comprise ceasing the process in response to the process cessation command. Similarly, where the process control command comprises a process modification command, step 209 can comprise modifying the process, e.g., adjusting volume, brightness, content selection, and so forth, in response to the process cessation command.)

Claim Rejections - 35 USC § 103
8.	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.

9.	Claims 2, 3, 12, 13 are rejected under 35 U.S.C.103 as being unpatentable over Alameh et al. (US 2020/0160857 A1) in view of Kashiwagi (US 2011/0135152 A1.)

	With respect to Claim 2, Alameh et al. disclose 
 	wherein the at least one processor is further configured to: 
 	extract an area including the registered user in the image captured by the camera while the response to the first user voice is being provided (Alameh et al. [0026] the person starting the process and ceasing the process must not only be the same (a primary embodiment of the disclosure), but must also be an authorized user of the electronic device, [0040] Thus, as set forth in FIG. 2, a method 200 allows a person to start an action by delivering a voice command via audio input to an electronic device. As set forth in this method 200, in one embodiment the person and only the same person, or, as will be described below, a person of a predefined group previously authorized to deliver commands to the electronic device, can stop the action or change the action while it is continuing. Also, as will be described below the audio processing engine can be assisted/supported/supplemented by employing other sensors such as camera of location to confirm same person starting altering and/or stopping process); and 
 	determine whether the second user voice uttered by the registered user is in the audio signal by analyzing the extracted area including the registered user (Alameh et al. see paragraph [0040, 0062-0065].)  
	Alameh et al. fail to explicitly teach 
 	based on a third user voice including a wake-up word being input by the user, recognize the wake-up word and register a user included in an image captured by the camera; 
	However, Kashiwagi teaches
 	based on a third user voice including a wake-up word being input by the user, recognize the wake-up word and register a user included in an image captured by the camera (Kashiwagi [0037] The person specifying unit 16 allows voice information (which is supplied from the voice analysis unit 20) acquired upon detecting a face to correspond to the person with the face detected by the face detection unit 13 and identified by the face identifying unit 14, and registers the voice information in the person-voice database 17. Moreover, the person specifying unit 16 also allows keywords extracted by the character information extraction unit 21 to correspond to the person with the face identified by the face identifying unit 14 and registers the keywords in the person-voice database 17); 
  	Alameh et al. and Kashiwagi are analogous art because they are from a similar field of endeavor in the Signal Processing techniques and applications. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the steps of detecting lip movement to determine whether the person delivering the second audio input is the same person that delivered the first audio input as taught by Alameh et al., using teaching of identifying the person by the face detection unit as taught by Kashiwagi for the benefit of registering the keywords in the person-voice database (Kashiwagi [0037] The person specifying unit 16 allows voice information (which is supplied from the voice analysis unit 20) acquired upon detecting a face to correspond to the person with the face detected by the face detection unit 13 and identified by the face identifying unit 14, and registers the voice information in the person-voice database 17. Moreover, the person specifying unit 16 also allows keywords extracted by the character information extraction unit 21 to correspond to the person with the face identified by the face identifying unit 14 and registers the keywords in the person-voice database 17). 

	With respect to Claim 3, Alameh et al. in view of Kashiwagi teach 
 	wherein the at least one processor is further configured to: 
 	extract a lip area of the registered user in the image captured by the camera (Alameh et al. [0026] the person starting the process and ceasing the process must not only be the same (a primary embodiment of the disclosure), but must also be an authorized user of the electronic device, [0072] In one embodiment, the identification system 313 determines from whom audio input 318,319 is received, and whether the first audio input 318 is received from the same person as the second audio input 319, by capturing one or more of images with the imager or depth scans with the depth scanner to detect lip movements as the audio input 318 is received. Illustrating by example, when the electronic device 300 receives the audio input 318, the imager system 312 can monitor persons within the environment of the electronic device 300 to determine who is speaking. When later voice input 319 is received, the imager system 312 can perform a similar operation to determine whether the person delivering audio input 319 is the same person that delivered audio input 318); and
 	determine whether the second user voice uttered by the registered user is in the audio signal by analyzing whether there is a movement of the extracted lip area of the registered user (Alameh et al. [0072] In one embodiment, the identification system 313 determines from whom audio input 318,319 is received, and whether the first audio input 318 is received from the same person as the second audio input 319, by capturing one or more of images with the imager or depth scans with the depth scanner to detect lip movements as the audio input 318 is received. Illustrating by example, when the electronic device 300 receives the audio input 318, the imager system 312 can monitor persons within the environment of the electronic device 300 to determine who is speaking. When later voice input 319 is received, the imager system 312 can perform a similar operation to determine whether the person delivering audio input 319 is the same person that delivered audio input 318.)  

 	With respect to Claim 12, Alameh et al. disclose 
 	further comprising: 
 	wherein the determining that the second user voice uttered by the user is in the audio signal comprises: 
 	extracting an area including the registered user in the image captured by the camera while the response to the first user voice is being provided (Alameh et al. [0026] the person starting the process and ceasing the process must not only be the same (a primary embodiment of the disclosure), but must also be an authorized user of the electronic device, [0040] Thus, as set forth in FIG. 2, a method 200 allows a person to start an action by delivering a voice command via audio input to an electronic device. As set forth in this method 200, in one embodiment the person and only the same person, or, as will be described below, a person of a predefined group previously authorized to deliver commands to the electronic device, can stop the action or change the action while it is continuing. Also, as will be described below the audio processing engine can be assisted/supported/supplemented by employing other sensors such as camera of location to confirm same person starting altering and/or stopping process); and 
 	determining whether the second user voice uttered by the registered user is in the audio signal by analyzing the extracted area including the registered user (Alameh et al. see paragraph [0040, 0062-0065].)  
	Alameh et al. fail to explicitly teach 
 	based on a third user voice including a wake-up word being input by the user, recognizing the wake-up word and registering a user included in an image captured by the camera, 
 	However, Kashiwagi teaches
 	based on a third user voice including a wake-up word being input by the user, recognizing the wake-up word and registering a user included in an image captured by the camera, (Kashiwagi [0037] The person specifying unit 16 allows voice information (which is supplied from the voice analysis unit 20) acquired upon detecting a face to correspond to the person with the face detected by the face detection unit 13 and identified by the face identifying unit 14, and registers the voice information in the person-voice database 17. Moreover, the person specifying unit 16 also allows keywords extracted by the character information extraction unit 21 to correspond to the person with the face identified by the face identifying unit 14 and registers the keywords in the person-voice database 17); 
  	Alameh et al. and Kashiwagi are analogous art because they are from a similar field of endeavor in the Signal Processing techniques and applications. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the steps of detecting lip movement to determine whether the person delivering the second audio input is the same person that delivered the first audio input as taught by Alameh et al., using teaching of identifying the person by the face detection unit as taught by Kashiwagi for the benefit of registering the keywords in the person-voice database (Kashiwagi [0037] The person specifying unit 16 allows voice information (which is supplied from the voice analysis unit 20) acquired upon detecting a face to correspond to the person with the face detected by the face detection unit 13 and identified by the face identifying unit 14, and registers the voice information in the person-voice database 17. Moreover, the person specifying unit 16 also allows keywords extracted by the character information extraction unit 21 to correspond to the person with the face identified by the face identifying unit 14 and registers the keywords in the person-voice database 17). 

 	With respect to Claim 3, Alameh et al. in view of Kashiwagi teach 
 	wherein the determining that the second user voice uttered by the user is in the audio signal further comprises: 
 	extracting a lip area of the registered user in the image captured by the camera (Alameh et al. [0026] the person starting the process and ceasing the process must not only be the same (a primary embodiment of the disclosure), but must also be an authorized user of the electronic device, [0072] In one embodiment, the identification system 313 determines from whom audio input 318,319 is received, and whether the first audio input 318 is received from the same person as the second audio input 319, by capturing one or more of images with the imager or depth scans with the depth scanner to detect lip movements as the audio input 318 is received. Illustrating by example, when the electronic device 300 receives the audio input 318, the imager system 312 can monitor persons within the environment of the electronic device 300 to determine who is speaking. When later voice input 319 is received, the imager system 312 can perform a similar operation to determine whether the person delivering audio input 319 is the same person that delivered audio input 318); and
 	determining whether the second user voice uttered by the registered user is in the audio signal by analyzing whether there is a movement of the extracted lip area of the registered user (Alameh et al. [0072] In one embodiment, the identification system 313 determines from whom audio input 318,319 is received, and whether the first audio input 318 is received from the same person as the second audio input 319, by capturing one or more of images with the imager or depth scans with the depth scanner to detect lip movements as the audio input 318 is received. Illustrating by example, when the electronic device 300 receives the audio input 318, the imager system 312 can monitor persons within the environment of the electronic device 300 to determine who is speaking. When later voice input 319 is received, the imager system 312 can perform a similar operation to determine whether the person delivering audio input 319 is the same person that delivered audio input 318.)  

10.	Claims 9, 19 are rejected under 35 U.S.C.103 as being unpatentable over Alameh et al. (US 2020/0160857 A1) in view of Hart et al. (US 2014/0249817 A1.)

 	With respect to Claim 9, Alameh et al. disclose all the limitation of Claim 1 upon which Claim 9 depends. Alameh et al. fail to explicitly teach 
 	wherein the at least one processor is further configured to:
 	based on the first user voice being input, store the first user voice and information regarding the response to the first user voice in the memory as conversation history information of the user ; and 
 	based on determining that the second user voice uttered by the user is in the audio signal, obtain and provide the response to the second user voice based on the second user voice and the conversation history information.  
	However, Hart et al. teach 
 	wherein the at least one processor is further configured to:
 	based on the first user voice being input, store the first user voice and information regarding the response to the first user voice in the memory as conversation history information of the user  (Hart et al. [0021] the device 106 may communicate with the companion application to surface information to the user 104, such as previous voice commands provided to the device 106 by the user (and how the device interpreted these commands), content that is supplementary to a voice command issued by the user (e.g., cover art for a song playing on the device 106 as requested by the user 104), and the like, [0027] For instance, if the response engine 128 attempts to identify the user, the engine 128 may compare the audio to the user profile(s) 130, each of which is associated with a respective user. Each user profile may store an indication of the voice signature 134 associated with the respective user based on previous voice interactions between the respective user and the voice-controlled device 106, other voice-controlled devices, other voice-enabled devices or applications, or the respective user and services accessible to the device (e.g., third-party websites, etc.). In addition, each of the profiles 130 may indicate one or more other characteristics 136 learned from previous interactions between the respective user and the voice-controlled device 106); and 
 	based on determining that the second user voice uttered by the user is in the audio signal, obtain and provide the response to the second user voice based on the second user voice and the conversation history information (Hart et al. [0014] In addition, the device(s) may utilize one or more characteristics other than voice signatures to determine whether or not the first user provided the speech and, hence, whether or not to interpret the speech as a valid voice command. For instance, the device may utilize a sequence or choice of words, grammar, time of day, a location within the environment from which speech is uttered, and/or other context information to determine whether the first user uttered the speech "stop . . . " In the instant example, the device(s) may determine, from the speaker-identification information and the additional characteristics, that the first user did not utter the word "stop" and, hence, may refrain from stopping playback of the audio. In addition, the device within the environment may query the first user to ensure the device has made the proper determination. For instance, the device may output the following query: "Did you say that you would like to stop the music?" In response to receiving an answer via speech, the device(s) may again utilize the techniques described above to determine whether or not the first user actually provided the answer and, hence, whether to comply with the user's answer.)
 	Alameh et al. and Hart et al. are analogous art because they are from a similar field of endeavor in the Signal Processing techniques and applications. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the steps of detecting lip movement to determine whether the person delivering the second audio input is the same person that delivered the first audio input as taught by Alameh et al., using teaching of the user’s profile as taught by Hart et al. for the benefit of storing an indication of the voice signature associated with a respective user and commands previously issued by the respective user (Hart et al. [0027] each of the profiles 130 may indicate one or more other characteristics 136 learned from previous interactions between the respective user and the voice-controlled device 106, other voice-controlled devices, or other voice-enabled devices or applications. For instance, these characteristics may include: [0028] commands often or previously issued by the respective user, [0029] command sequences often or previously issued by the respective user.)

 	With respect to Claim 19, Alameh et al. disclose all the limitation of Claim 11 upon which Claim 19 depends. Alameh et al. fail to explicitly teach 
 	further comprising: 
 	based on the first user voice being input, storing the first user voice and information regarding the response to the first user voice in a memory as conversation history information of the user, 
 	wherein the obtaining and providing of the response to the second user voice comprises, based on determining that the second user voice uttered by the user is in the audio signal, obtaining and providing the response to the second user voice based on the second user voice and the conversation history information.  
 	However, Hart et al. teach
 	further comprising: 
 	based on the first user voice being input, storing the first user voice and information regarding the response to the first user voice in a memory as conversation history information of the user (Hart et al. to the first user voice in the memory as conversation history information of the user  (Hart et al. [0021] the device 106 may communicate with the companion application to surface information to the user 104, such as previous voice commands provided to the device 106 by the user (and how the device interpreted these commands), content that is supplementary to a voice command issued by the user (e.g., cover art for a song playing on the device 106 as requested by the user 104), and the like, [0027] For instance, if the response engine 128 attempts to identify the user, the engine 128 may compare the audio to the user profile(s) 130, each of which is associated with a respective user. Each user profile may store an indication of the voice signature 134 associated with the respective user based on previous voice interactions between the respective user and the voice-controlled device 106, other voice-controlled devices, other voice-enabled devices or applications, or the respective user and services accessible to the device (e.g., third-party websites, etc.). In addition, each of the profiles 130 may indicate one or more other characteristics 136 learned from previous interactions between the respective user and the voice-controlled device 106), 
 	wherein the obtaining and providing of the response to the second user voice comprises, based on determining that the second user voice uttered by the user is in the audio signal, obtaining and providing the response to the second user voice based on the second user voice and the conversation history information (Hart et al. [0014] In addition, the device(s) may utilize one or more characteristics other than voice signatures to determine whether or not the first user provided the speech and, hence, whether or not to interpret the speech as a valid voice command. For instance, the device may utilize a sequence or choice of words, grammar, time of day, a location within the environment from which speech is uttered, and/or other context information to determine whether the first user uttered the speech "stop . . . " In the instant example, the device(s) may determine, from the speaker-identification information and the additional characteristics, that the first user did not utter the word "stop" and, hence, may refrain from stopping playback of the audio. In addition, the device within the environment may query the first user to ensure the device has made the proper determination. For instance, the device may output the following query: "Did you say that you would like to stop the music?" In response to receiving an answer via speech, the device(s) may again utilize the techniques described above to determine whether or not the first user actually provided the answer and, hence, whether to comply with the user's answer.)
 	Alameh et al. and Hart et al. are analogous art because they are from a similar field of endeavor in the Signal Processing techniques and applications. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the steps of detecting lip movement to determine whether the person delivering the second audio input is the same person that delivered the first audio input as taught by Alameh et al., using teaching of the user’s profile as taught by Hart et al. for the benefit of storing an indication of the voice signature associated with a respective user and commands previously issued by the respective user (Hart et al. [0027] each of the profiles 130 may indicate one or more other characteristics 136 learned from previous interactions between the respective user and the voice-controlled device 106, other voice-controlled devices, or other voice-enabled devices or applications. For instance, these characteristics may include: [0028] commands often or previously issued by the respective user, [0029] command sequences often or previously issued by the respective user.)

11.	Claims 10, 20 are rejected under 35 U.S.C.103 as being unpatentable over Alameh et al. (US 2020/0160857 A1) in view of Hart et al. (US 2014/0249817 A1) and Winter et al. (US 2015/0019074 A1.)

	With respect to Clam 10, Alameh et al. disclose all the limitations of Claim 1 upon which Claim 10 depends. Alameh et al. fail to explicitly teach 
 	further comprising: 
 	a display, wherein the at least one processor is further configured to, based on determining that the second user voice uttered by the user is in the audio signal, control the display to display a user interface (UI) inquiring whether to stop providing the response to the first user voice.  
	However, Hart et al. teach
 	a display, wherein the at least one processor is further configured to, based on determining that the second user voice uttered by the user is in the audio signal, control the display to display a user interface (UI) inquiring whether to stop providing the response to the first user voice (Hart et al. Fig. 2 element 206, 208, 208(1), 208(2), 210 Receive a second voice command requesting performance of a second operation, Same use issue 1st and 2nd command? Yes, Cause performance of the second operation, [0010] the first user may request, via a voice command, to begin playing music on the device or on another device. After the device begins playing the music, the first user may continue to provide voice commands to the device, such as "stop", "next song", "please turn up the volume", and the like, [0014] In addition, the device(s) may utilize one or more characteristics other than voice signatures to determine whether or not the first user provided the speech and, hence, whether or not to interpret the speech as a valid voice command. For instance, the device may utilize a sequence or choice of words, grammar, time of day, a location within the environment from which speech is uttered, and/or other context information to determine whether the first user uttered the speech "stop . . . " In the instant example, the device(s) may determine, from the speaker-identification information and the additional characteristics, that the first user did not utter the word "stop" and, hence, may refrain from stopping playback of the audio. In addition, the device within the environment may query the first user to ensure the device has made the proper determination. For instance, the device may output the following query: "Did you say that you would like to stop the music?" In response to receiving an answer via speech, the device(s) may again utilize the techniques described above to determine whether or not the first user actually provided the answer and, hence, whether to comply with the user's answer. Examiner notes that Hart et al. output a confirmation request audibly.)
 	Alameh et al. and Hart et al. are analogous art because they are from a similar field of endeavor in the Signal Processing techniques and applications. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the steps of detecting lip movement to determine whether the person delivering the second audio input is the same person that delivered the first audio input as taught by Alameh et al., using teaching of query as taught by Hart et al. to inquiring whether to stop providing the response to the first user voice (Hart et al. [0014] In addition, the device(s) may utilize one or more characteristics other than voice signatures to determine whether or not the first user provided the speech and, hence, whether or not to interpret the speech as a valid voice command. For instance, the device may utilize a sequence or choice of words, grammar, time of day, a location within the environment from which speech is uttered, and/or other context information to determine whether the first user uttered the speech "stop . . . " In the instant example, the device(s) may determine, from the speaker-identification information and the additional characteristics, that the first user did not utter the word "stop" and, hence, may refrain from stopping playback of the audio. In addition, the device within the environment may query the first user to ensure the device has made the proper determination. For instance, the device may output the following query: "Did you say that you would like to stop the music?" In response to receiving an answer via speech, the device(s) may again utilize the techniques described above to determine whether or not the first user actually provided the answer and, hence, whether to comply with the user's answer.)
	Alameh et al. in view of Hart et al. fail to explicitly teach a display and control a display to display a user interface inquiring (e.g., the bold limitation). However, Winter et al. teach
 	a display and control a display to display a user interface inquiring (Winter et al. [0016] Upon receipt of a voice command from the user, the speech recognition system may output a confirmation request, such as an utterance or text display, to confirm the voice command uttered by the user. In this disclosure, the term "confirmation request" means a request to confirm that the voice command recognized by the speech recognition system is correct. The user may then confirm the voice command using, for example, her spoken words.)
 	Alameh et al., Hart et al. and Winter et al. are analogous art because they are from a similar field of endeavor in the Signal Processing techniques and applications. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the steps of detecting lip movement to determine whether the person delivering the second audio input is the same person that delivered the first audio input as taught by Alameh et al., using teaching of query as taught by Hart et al. to inquiring whether to stop providing the response to the first user voice, using teaching of a text display as taught by Winter et al. for the benefit of displaying a confirmation request (Winter et al. [0016] Upon receipt of a voice command from the user, the speech recognition system may output a confirmation request, such as an utterance or text display, to confirm the voice command uttered by the user. In this disclosure, the term "confirmation request" means a request to confirm that the voice command recognized by the speech recognition system is correct. The user may then confirm the voice command using, for example, her spoken words.)

 	With respect to Clam 20, Alameh et al. disclose all the limitations of Claim 11 upon which Claim 20 depends. Alameh et al. fail to explicitly teach 
	further comprising: 
 	based on determining that the second user voice uttered by the user is in the audio signal, displaying a user interface (UI) inquiring whether to stop providing the response to the first user voice.  
 	However, Hart et al. teach
 	further comprising: 
 	based on determining that the second user voice uttered by the user is in the audio signal, displaying a user interface (UI) inquiring whether to stop providing the response to the first user voice (Hart et al. Fig. 2 element 206, 208, 208(1), 208(2), 210 Receive a second voice command requesting performance of a second operation, Same use issue 1st and 2nd command? Yes, Cause performance of the second operation, [0010] the first user may request, via a voice command, to begin playing music on the device or on another device. After the device begins playing the music, the first user may continue to provide voice commands to the device, such as "stop", "next song", "please turn up the volume", and the like, [0014] In addition, the device(s) may utilize one or more characteristics other than voice signatures to determine whether or not the first user provided the speech and, hence, whether or not to interpret the speech as a valid voice command. For instance, the device may utilize a sequence or choice of words, grammar, time of day, a location within the environment from which speech is uttered, and/or other context information to determine whether the first user uttered the speech "stop . . . " In the instant example, the device(s) may determine, from the speaker-identification information and the additional characteristics, that the first user did not utter the word "stop" and, hence, may refrain from stopping playback of the audio. In addition, the device within the environment may query the first user to ensure the device has made the proper determination. For instance, the device may output the following query: "Did you say that you would like to stop the music?" In response to receiving an answer via speech, the device(s) may again utilize the techniques described above to determine whether or not the first user actually provided the answer and, hence, whether to comply with the user's answer. Examiner notes that Hart et al. output a confirmation request audibly.)
 	Alameh et al. and Hart et al. are analogous art because they are from a similar field of endeavor in the Signal Processing techniques and applications. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the steps of detecting lip movement to determine whether the person delivering the second audio input is the same person that delivered the first audio input as taught by Alameh et al., using teaching of query as taught by Hart et al. to inquiring whether to stop providing the response to the first user voice (Hart et al. [0014] In addition, the device(s) may utilize one or more characteristics other than voice signatures to determine whether or not the first user provided the speech and, hence, whether or not to interpret the speech as a valid voice command. For instance, the device may utilize a sequence or choice of words, grammar, time of day, a location within the environment from which speech is uttered, and/or other context information to determine whether the first user uttered the speech "stop . . . " In the instant example, the device(s) may determine, from the speaker-identification information and the additional characteristics, that the first user did not utter the word "stop" and, hence, may refrain from stopping playback of the audio. In addition, the device within the environment may query the first user to ensure the device has made the proper determination. For instance, the device may output the following query: "Did you say that you would like to stop the music?" In response to receiving an answer via speech, the device(s) may again utilize the techniques described above to determine whether or not the first user actually provided the answer and, hence, whether to comply with the user's answer.)
 	Alameh et al. in view of Hart et al. fail to explicitly teach displaying a user interface (e.g., the bold limitation). However, Winter et al. teach
	displaying a user interface (UI) inquiring (Winter et al. [0016] Upon receipt of a voice command from the user, the speech recognition system may output a confirmation request, such as an utterance or text display, to confirm the voice command uttered by the user. In this disclosure, the term "confirmation request" means a request to confirm that the voice command recognized by the speech recognition system is correct. The user may then confirm the voice command using, for example, her spoken words.)
 	Alameh et al., Hart et al. and Winter et al. are analogous art because they are from a similar field of endeavor in the Signal Processing techniques and applications. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the steps of detecting lip movement to determine whether the person delivering the second audio input is the same person that delivered the first audio input as taught by Alameh et al., using teaching of query as taught by Hart et al. to inquiring whether to stop providing the response to the first user voice, using teaching of a text display as taught by Winter et al. for the benefit of displaying a confirmation request (Winter et al. [0016] Upon receipt of a voice command from the user, the speech recognition system may output a confirmation request, such as an utterance or text display, to confirm the voice command uttered by the user. In this disclosure, the term "confirmation request" means a request to confirm that the voice command recognized by the speech recognition system is correct. The user may then confirm the voice command using, for example, her spoken words.)

12.	Claim 21 is rejected under 35 U.S.C.103 as being unpatentable over Alameh et al. (US 2020/0160857 A1) in view of Xu et al. (US 2021/0125600 A1.)

	With respect to Claim 21, Alameh et al. disclose all the limitations of Claim 8 upon which Claim 21 depends. Alameh et al. fail to explicitly teach 
 	wherein the at least one processor is further configured to, based on the identified sentence type being interrogative, control to transmit information regarding the second user voice to a server providing a second natural language understanding model.  
	However, Xu et al. teach 
 	wherein the at least one processor is further configured to, based on the identified sentence type being interrogative, control to transmit information regarding the second user voice to a server providing a second natural language understanding model (Xu et al. [0110] If the user sends the voice question “please introduce the current work” to the electronic photo frame, the electronic photo frame sends the question voice to the voice server, and the voice server recognizes the question voice and returns the recognized question text to the electronic photo frame, and then the electronic photo frame sends the question text and the painting ID to the picture service server. The picture service server sends the question text to the general semantic parsing server (here, for example, the name of the painting such as “Mona Lisa” can also be sent to the general semantic parsing server), at the same time the question text and the painting ID are sent to the dedicated semantic parsing server. The general semantic parsing server will parse the question text and obtain the first parsing result. The dedicated semantic parsing server pre-models an LSTM model, which is trained for words in the art field, and is specifically used for parsing the question text in the art field to obtain the second parsing result.)
 	Alameh et al. and Xu et al. are analogous art because they are from a similar field of endeavor in the Signal Processing techniques and applications. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the steps of detecting lip movement to determine whether the person delivering the second audio input is the same person that delivered the first audio input as taught by Alameh et al., using teaching of transmitting information regarding the user’s question to the semantic parsing server as taught by Xu et al. for the benefit of parsing the question text and obtaining the parsing result (Xu et al. [0110] If the user sends the voice question “please introduce the current work” to the electronic photo frame, the electronic photo frame sends the question voice to the voice server, and the voice server recognizes the question voice and returns the recognized question text to the electronic photo frame, and then the electronic photo frame sends the question text and the painting ID to the picture service server. The picture service server sends the question text to the general semantic parsing server (here, for example, the name of the painting such as “Mona Lisa” can also be sent to the general semantic parsing server), at the same time the question text and the painting ID are sent to the dedicated semantic parsing server. The general semantic parsing server will parse the question text and obtain the first parsing result. The dedicated semantic parsing server pre-models an LSTM model, which is trained for words in the art field, and is specifically used for parsing the question text in the art field to obtain the second parsing result.)

13.	Claim 22 is rejected under 35 U.S.C.103 as being unpatentable over Alameh et al. (US 2020/0160857 A1) in view of Inui (US 2021/0005203 A1.)

	With respect to Claim 22, Alameh et al. disclose all the limitations of Claim 1 upon which Claim 21 depends. Alameh et al. fail to explicitly teach 
 	wherein the at least one processor is further configured to: 
 	determine a priority of the second user voice; and 
 	based on a determination that the priority of the second user voice is higher than a threshold value, control to transmit information regarding the second user voice to a server providing a second natural language understanding model.
	However, Inui teaches 
 	wherein the at least one processor is further configured to: 
 	determine a priority of the second user voice (Inui [0083] When the driver and the passenger in the front seat are the users and the voice pattern information of the driver and the passenger in the front seat is registered, only the voice emitted in the state where at least one of the driver and the passenger in the front seat opens his/her mouth is transmitted to the server 25. The camera 18 takes an image of only the driver and the passenger in the front seat. When the driver and the passenger in the front seat emit the voice at the same time, it is applicable that only the voice having predetermined higher priority is transmitted to the server 25); and 
 	based on a determination that the priority of the second user voice is higher than a threshold value, control to transmit information regarding the second user voice to a server providing a second natural language understanding model (Inui [0083] When the driver and the passenger in the front seat are the users and the voice pattern information of the driver and the passenger in the front seat is registered, only the voice emitted in the state where at least one of the driver and the passenger in the front seat opens his/her mouth is transmitted to the server 25. The camera 18 takes an image of only the driver and the passenger in the front seat. When the driver and the passenger in the front seat emit the voice at the same time, it is applicable that only the voice having predetermined higher priority is transmitted to the server 25.)
 	Alameh et al. and Inui are analogous art because they are from a similar field of endeavor in the Signal Processing techniques and applications. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the steps of detecting lip movement to determine whether the person delivering the second audio input is the same person that delivered the first audio input as taught by Alameh et al., using teaching of the priority level as taught by Inui for the benefit of determining whether to transmit the voice to the server (Inui [0083] When the driver and the passenger in the front seat are the users and the voice pattern information of the driver and the passenger in the front seat is registered, only the voice emitted in the state where at least one of the driver and the passenger in the front seat opens his/her mouth is transmitted to the server 25. The camera 18 takes an image of only the driver and the passenger in the front seat. When the driver and the passenger in the front seat emit the voice at the same time, it is applicable that only the voice having predetermined higher priority is transmitted to the server 25.)

Allowable Subject Matter
14. 	Claims 7 and 17 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims and if the 101 rejection noted above is overcome.
 	The claims stand rejected under 101, and for the application to pass to allowance this rejection need to be overcome. Any amendments to overcome the 101 rejection that results in any change in scope require further search and/or consideration in order to determine it allowability. 

Conclusion
15. 	The prior art made of record and not relied upon is considered pertinent to application’s disclosure. See PTO-892
a.	VanBlon et al. (US 2017/0169817 A1.) In this reference, VanBlon et al. disclose a method/a system for identifying an individual who issued a first command and accept subsequent command from the identified user, e.g., for a predetermined period of time. 
b.  	Dusane (US 2019/0049942 A1.) In this reference, Dusane disclose a method/a system for accepting the command only if it is issued by the same user using the same mobile device.
c.	Ittycheriah et al. (5,924,070). In this reference, Ittycheriah et al. disclose a method/a system for checking if the speaker of the subsequent utterance is the same user of the previously identified in the beginning.  

16. 	Any inquiry concerning this communication or earlier communications from the examiner should be directed to THUYKHANH LE whose telephone number is (571)272-6429. The examiner can normally be reached Mon-Fri: 9am-5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Andrew C. Flanders can be reached on 571-272-7516. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/THUYKHANH LE/Primary Examiner, Art Unit 2655