DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claims 1-8, 10-17 and 19-20 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Lemay (US PG Pub 20140040748).	As per claims 1, 10 and 20, Lemay discloses:	A method, apparatus and non-transitory computer-readable storage medium for speech assistant control, comprising: 
a processor (Lemay; Fig 28, item 62; p. 0075); and
memory configured to store instructions executable by the processor (Lemay; Fig 28, item 65; p. 0079), wherein the processor is configured to:	displaying, according to a control instruction corresponding to received speech data, a target interface corresponding to the control instruction (Lemay; p. 0402 - the digital assistant object is displayed in an object region 1254 (FIGS. 12-15 and 33); Fig. 34, item 3402; p. 0408 - after the speech input is received, the digital assistant object is displayed (3402) in an object region of the video display screen) after waking up a speech assistant (Lemay; p. 0407 – “Hey Siri”); 	displaying a speech reception identifier in the target interface and controlling to continuously receive speech data, in response to the target interface being different from an interface of the speech assistant (Lemay; p. 0408 - An example of the digital assistant object is the microphone icon 1252 shown in FIG. 12. An exemplary object region 1254 is shown in FIGS. 12-15 and described above. As described above, the digital assistant object may be used to invoke the digital assistant service and/or show its status; also see Fig 19- 20 & p. 0186-0187 - In FIG. 19, the user has activated virtual assistant 1002 while viewing email message 1751 from within the email application. In one embodiment, the display of email message 1751 moves upward on the screen to make room for prompt 150 from virtual assistant 1002. This display reinforces the notion that virtual assistant 1002 is offering assistance in the context of the currently viewed email message 1751. Accordingly, the user's input to virtual assistant 1002 will be interpreted in the current context wherein email message 1751 is being viewed); 	determining whether a target control instruction to be executed is included in received second speech data based on the second speech data received in a displaying process of the target interface (Lemay; p. 0187 - In FIG. 20, the user has provided a command 2050: "Reply let's get this to marketing right away". Context information, including information about email message 1751 and the email application in which it displayed, is used to interpret command 2050. This context can be used to determine the meaning of the words "reply" and "this" in command 2050, and to resolve how to set up an email composition transaction to a particular recipient on a particular message thread; also see p. 0409); and 	displaying an interface corresponding to the target control instruction in response to the target control instruction being included in the second speech data (Lemay; Fig. 20; p. 0187 - In FIG. 20, the user has provided a command 2050: "Reply let's get this to marketing right away". Context information, including information about email message 1751 and the email application in which it displayed, is used to interpret command 2050. This context can be used to determine the meaning of the words "reply" and "this" in command 2050, and to resolve how to set up an email composition transaction to a particular recipient on a particular message thread. In this case, virtual assistant 1002 is able to access context information to determine that "marketing" refers to a recipient named John Applecore and is able to determine an email address to use for the recipient. Accordingly, virtual assistant 1002 composes email 2052 for the user to approve and send. In this manner, virtual assistant 1002 is able to operationalize a task (composing an email message) based on user input together with context information describing the state of the current application; Figs. 34, 13 and 14; p. 0410-0412 - Upon determining that the at least one information item can be displayed in its entirety in the display region of the video display screen (3410--Yes), the at least one information item is displayed (3416) in its entirety in the display region).

As per claims 2 and 11, Lemay discloses:	The method and apparatus according to claims 1 and 10, wherein the displaying an interface corresponding to the target control instruction comprises: displaying a window interface in the target interface in response to there is the window interface corresponding to the target control instruction (Lemay; Fig. 20; p. 0187 - In FIG. 20, the user has provided a command 2050: "Reply let's get this to marketing right away". Context information, including information about email message 1751 and the email application in which it displayed, is used to interpret command 2050. This context can be used to determine the meaning of the words "reply" and "this" in command 2050, and to resolve how to set up an email composition transaction to a particular recipient on a particular message thread. In this case, virtual assistant 1002 is able to access context information to determine that "marketing" refers to a recipient named John Applecore and is able to determine an email address to use for the recipient. Accordingly, virtual assistant 1002 composes email 2052 for the user to approve and send. In this manner, virtual assistant 1002 is able to operationalize a task (composing an email message) based on user input together with context information describing the state of the current application; Figs. 34, 13 and 14; p. 0410-0412 - Upon determining that the at least one information item can be displayed in its entirety in the display region of the video display screen (3410--Yes), the at least one information item is displayed (3416) in its entirety in the display region).

	As per claims 3 and 12, Lemay discloses:
	The method and apparatus according to claims 2 and 11, further comprising: closing the window interface in response to a display duration of the window interface reaching a target duration (Lemay; p. 0298 - different instances and/or embodiments of method 10 may be initiated at one or more different time intervals (e.g., during a specific time interval (target duration), at regular periodic intervals, at irregular periodic intervals, upon demand, and the like); also see p. 0311 - If, after viewing the response, the user is done 790, the method ends).

As per claims 4 and 13, Lemay discloses:
The method and apparatus according to claims 1 and 10, wherein the determining whether a target control instruction to be executed is included in received second speech data based on the second speech data comprises: performing speech recognition on the second speech data to obtain text information corresponding to the second speech data; matching the text information with instructions in an instruction library; and in response to a target instruction matched with the text information being determined and the text information meeting an instruction execution condition, determining that the target control instruction is included in the speech data (Lemay; p. 0187 - In FIG. 20, the user has provided a command 2050: "Reply let's get this to marketing right away". Context information, including information about email message 1751 and the email application in which it displayed, is used to interpret command 2050. This context can be used to determine the meaning of the words "reply" and "this" in command 2050, and to resolve how to set up an email composition transaction to a particular recipient on a particular message thread; also see p. 0409; also see p. 0314 - Referring now to FIG. 3, there is shown a flow diagram depicting a method for using context in speech elicitation and interpretation 100, so as to improve speech recognition according to one embodiment. Context 1000 can be used, for example, for disambiguation in speech recognition to guide the generation, ranking, and filtering of candidate hypotheses that match phonemes to words; also see p. 0337 - The method begins 200. Input text 202 is received. In one embodiment, input text 202 is matched 210 against words and phrases using pattern recognizers 2760, vocabulary databases 2758, ontologies and other models 1050, so as to identify associations between user input and concepts. Step 210 yields a set of candidate syntactic parses 212, which are matched for semantic relevance 220 producing candidate semantic parses 222. Candidate parses are then processed to remove ambiguous alternatives at 230, filtered and sorted by relevance 232, and returned; p. 0340).

As per claims 5 and 14, Lemay discloses:
	The method and apparatus according to claims 4 and 13, wherein the instruction execution condition comprises at least one of following conditions: voiceprint features corresponding to the text information are the same as voiceprint features of last speech data; voiceprint features corresponding to the text information are voiceprint features of a target user; and semantic features between the text information and text information corresponding to last speech data are continuous (Lemay; p. 0203-217 - Another source of context data is the user's dialog history 1052 with virtual assistant 1002. Such history may include, for example, references to domains, people, places, and so forth. Referring now to FIG. 15, there is shown an example in which virtual assistant 1002 uses dialog context to infer the location for a command, according to one embodiment. In screen 1551, the user first asks "What's the time in New York"; virtual assistant 1002 responds 1552 by providing the current time in New York City. The user then asks "What's the weather". Virtual assistant 1002 uses the previous dialog history to infer that the location intended for the weather query is the last location mentioned in the dialog history. Therefore its response 1553 provides weather information for New York City).

As per claims 6 and 15, Lemay discloses:	The method and apparatus according to claims 1 and 10, further comprising: in response to the target control instruction being included in the second speech data, displaying text information corresponding to the second speech data at a position corresponding to the speech reception identifier (Lemay; Fig. 20 - "Reply let's get this to marketing right away"; p. 0187 - In FIG. 20, the user has provided a command 2050: "Reply let's get this to marketing right away". Context information, including information about email message 1751 and the email application in which it displayed, is used to interpret command 2050. This context can be used to determine the meaning of the words "reply" and "this" in command 2050, and to resolve how to set up an email composition transaction to a particular recipient on a particular message thread. In this case, virtual assistant 1002 is able to access context information to determine that "marketing" refers to a recipient named John Applecore and is able to determine an email address to use for the recipient. Accordingly, virtual assistant 1002 composes email 2052 for the user to approve and send. In this manner, virtual assistant 1002 is able to operationalize a task (composing an email message) based on user input together with context information describing the state of the current application; Figs. 34, 13 and 14; p. 0410-0412 - Upon determining that the at least one information item can be displayed in its entirety in the display region of the video display screen (3410--Yes), the at least one information item is displayed (3416) in its entirety in the display region).

As per claims 7, 16 and 17, Lemay discloses:
The method and apparatus according to claims 1 and 10, further comprising: displaying a speech waiting identifier in the target interface and monitoring a wake-up word or a speech hot word in response to determining the speech assistant meeting a sleep state (Lemay; p. 0407 – “Hey Siri”); displaying the speech reception identifier in the target interface in response to detecting the wake-up word; and executing a control instruction corresponding to the speech hot word in response to detecting the speech hot word (Lemay; p. 0402 - the digital assistant object is displayed in an object region 1254 (FIGS. 12-15 and 33); Fig. 34, item 3402; p. 0408 - after the speech input is received, the digital assistant object is displayed (3402) in an object region of the video display screen), wherein the determining that the speech assistant meets the sleep state is based on at least one of following situations: the target control instruction is not included in speech data received in a first preset time period; and no speech data is received in a second preset time period, a duration of the second preset time period being longer than that of the first preset time period (Lemay; p. 0298 - different instances and/or embodiments of method 10 may be initiated at one or more different time intervals (e.g., during a specific time interval (target duration), at regular periodic intervals, at irregular periodic intervals, upon demand, and the like); also see p. 0311 - If, after viewing the response, the user is done 790, the method ends).

As per claim 8, Lemay discloses:
	The method according to claim 1, wherein prior to the determining whether a target control instruction to be executed is included in received second speech data based on the second speech data, the method further comprises: acquiring detection information of a terminal, the detection information being configured for determining whether a user sends speech to the terminal; determining whether the received second speech data is speech data sent by the user to the terminal based on the detection information (Lemay; p. 0401 - the digital assistant object is used to show the status of the digital assistant. For example, if the digital assistant is waiting to be invoked it may display a first icon (e.g., a microphone icon), when the digital assistant is "listening" to the user (i.e., recording user speech input), the digital assistant display a second icon (e.g., a colorized icon showing the fluctuations in recorded speech amplitude); and when the digital assistant is processing the user's input it may display a third icon (e.g., a microphone icon with a light source swirling around the perimeter of the microphone icon)); and in response to determining that the second speech data is speech data sent by the user to the terminal, determining whether the target control instruction to be executed is included in the second speech data based on the received second speech data (Lemay; p. 0187 - In FIG. 20, the user has provided a command 2050: "Reply let's get this to marketing right away". Context information, including information about email message 1751 and the email application in which it displayed, is used to interpret command 2050. This context can be used to determine the meaning of the words "reply" and "this" in command 2050, and to resolve how to set up an email composition transaction to a particular recipient on a particular message thread; also see p. 0409).

	As per claim 19, Lemay discloses:
	A mobile terminal comprising the apparatus of claim 10, further comprising a microphone, a speaker, and a display screen (Lemay; p. 0081 - Computing device 60 includes processor(s) 63 which run software for implementing virtual assistant 1002. Input device 1206 can be of any type suitable for receiving user input, including for example a keyboard, touchscreen, microphone (for example, for voice input), mouse, touchpad, trackball, five-way switch, joystick, and/or any combination thereof. Output device 1207 can be a screen, speaker, printer, and/or any combination thereof), wherein the display screen is configured to display interfaces of other applications during user interaction with the speech assistant, and the speech assistant is configured to continuously receive speech data while the display screen displaying the interfaces of the other applications, such that operations corresponding to the continuously received speech data are capable of being executed in the interfaces of the other applications through the speech assistant, without repeated waking-up operations from the user (Lemay; p. 0408 - An example of the digital assistant object is the microphone icon 1252 shown in FIG. 12. An exemplary object region 1254 is shown in FIGS. 12-15 and described above. As described above, the digital assistant object may be used to invoke the digital assistant service and/or show its status; also see Fig 19- 20 & p. 0186-0187 - In FIG. 19, the user has activated virtual assistant 1002 while viewing email message 1751 from within the email application. In one embodiment, the display of email message 1751 moves upward on the screen to make room for prompt 150 from virtual assistant 1002. This display reinforces the notion that virtual assistant 1002 is offering assistance in the context of the currently viewed email message 1751. Accordingly, the user's input to virtual assistant 1002 will be interpreted in the current context wherein email message 1751 is being viewed).

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 9 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Andersen (US PG Pub 20190138268).

As per claim 9, Lemay discloses:
The method according to claim 8, of determining whether the received second speech data is speech data sent by the user to the terminal based on the detection information.	Lemay, however, fails to disclose when the detection information is rotation angle information of the terminal, determining that the second speech data is speech data sent by the user to the terminal in response to determining that a distance between a microphone array of the terminal and a speech data source is reduced based on the rotation angle information of the terminal; and when the detection information is face image information, performing gaze estimation based on the face image information, and determining that the second speech data is speech data sent by the user to the terminal in response to determining that a gaze point corresponding to the face image information is at the terminal based on the gaze estimation.	Andersen does teach when the detection information is rotation angle information of the terminal, determining that the second speech data is speech data sent by the user to the terminal in response to determining that a distance between a microphone array of the terminal and a speech data source is reduced based on the rotation angle information of the terminal (Andersen; p. 0089 - In yet another example, the proximity sensor 148 may provide data as to the proximity of other objects within the monitored environment 150 to the smart speaker device 140, including the source of the speech component of the captured audio sample. If the human speaker is within a certain predetermined distance or proximity of the smart speaker device, then this is more indicative of that the human speaker is directing the speech towards the smart speaker device 140. If the human speaker is outside this predetermined distance or proximity, then it is more likely that the human speaker is not directing their speech towards the smart speaker device 140); and when the detection information is face image information, performing gaze estimation based on the face image information, and determining that the second speech data is speech data sent by the user to the terminal in response to determining that a gaze point corresponding to the face image information is at the terminal based on the gaze estimation (Andersen; p. 0088 - As another example, the image and video data from the sensor(s) 146 may be analyzed to identify gaze detection information (eye contact), head nod or other gesture detection indicative of the direction of attention towards the smart speaker device 140. The particular gaze detection and other video or image analysis may be directed to portions of images/video that correlate with the source of the speech component of the captured audio sample. As noted above, this may be done via correlation mechanisms that correlate a determined source location within the monitored environment 150 with elements of the image/video. The correlation may include tracking the eye position of a source of the captured audio sample and movement over several frames in the image/video data. This may include performing image recognition for facial recognition with checking of the eye location and determining the angle of the eye to understand that the eyes are looking to the camera or computer (a range of angle), when the camera or computer is part of the smart speaker device, for example. Facial recognition is not needed initially, but it may assist to lower the computation to find the eyes initially. After face recognition, eye position and time looking at the computer or camera may be used to classify the eye movement as part of gaze detected. If the human speaker is looking at the smart speaker device 140 at the time that the audio sample is captured and maintains such eye contact, or gaze, for a predetermined period of time, this is more indicative that the human speaker is directing the speech towards the smart speaker device 140; otherwise, is it more likely that the human speaker is not directing the speech towards the smart speaker device 140).	Therefore, it would have been obvious to one of ordinary skill in the art to modify the method of Lemay to include when the detection information is rotation angle information of the terminal, determining that the second speech data is speech data sent by the user to the terminal in response to determining that a distance between a microphone array of the terminal and a speech data source is reduced based on the rotation angle information of the terminal; and when the detection information is face image information, performing gaze estimation based on the face image information, and determining that the second speech data is speech data sent by the user to the terminal in response to determining that a gaze point corresponding to the face image information is at the terminal based on the gaze estimation, as taught by Andersen, in order to more accurately determine, by the fusion sensor service, whether the user input is specifically directed to the HCI device based on the captured sensor data (Andersen; p. 0004).

A per claim 18, the claim is directed to an apparatus that discloses language that is similar to the combination of the limitations in claims 8 and 9. Thus, the claim is rejected similarly.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. The prior art made of record and not relied upon includes:	Kudurshian (US PG Pub 20170358305) discloses systems and processes for operating a digital assistant are provided. In one example, a method includes receiving a first speech input from a user. The method further includes identifying context information and determining a user intent based on the first speech input and the context information. The method further includes determining whether the user intent is to perform a task using a searching process or an object managing process. The searching process is configured to search data, and the object managing process is configured to manage objects. The method further includes, in accordance with a determination the user intent is to perform the task using the searching process, performing the task using the searching process; and in accordance with the determination that the user intent is to perform the task using the object managing process, performing the task using the object managing process (Kudurshian; Abstract).
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Rodrigo A Chavez whose telephone number is (571)270-0139. The examiner can normally be reached Monday - Friday 9-6 ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Richemond Dorvil can be reached on 5712727602. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/RODRIGO A CHAVEZ/Examiner, Art Unit 2658                                                                                                                                                                                                        

/RICHEMOND DORVIL/Supervisory Patent Examiner, Art Unit 2658