Notice of Pre-AIA  or AIA  Status
• The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
• This action is responsive to the following communication: an amendment filed on 04/22/2022.
• Claims 21-40 are currently pending; claims 1-20 have been canceled.

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claim(s) 21-40 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Holtman (US 20180232201).
Regarding claim 21, Holtman discloses a computer-implemented method, comprising: receiving first data corresponding to at least one image (face recognition via using front/back camera of device, fig. 3, par.173) representing a user;  
receiving audio data (via audio input microphone, fig. 2) corresponding to an utterance spoken by the user;  
processing the first data corresponding to the at least one image (facial recognition where the user’s face is directing at the device, par. 170) to determine that the user is directing the utterance is (figs. 3-5, 13-15, par. 173) at a device;  and 
in response to processing the first data to determine that the utterance is at the device, causing speech processing (speech processing, figs. 3-5, 13-15, abstract, pars. 3-5, see entire reference for more details in regards to speech recognition) to be performed using the audio data. 
	
 
	Regarding claim 22, Holtman further discloses the computer-implemented method of claim 21, wherein processing the first data comprises: processing the first data to determine the user is facing (using face recognition, fig. 3) the device. 
 
	Regarding claim 23, Holtman further discloses the computer-implemented method of claim 21, further comprising: receiving image data representing the at least one image;  and processing the image data using a first component to determine feature data corresponding to the at least one image, wherein the first data includes the feature data, wherein processing the first data to determine that the utterance is directed at a device comprises processing the feature data using at least one classifier (face recognition, par. 63 and abstract). 
 
	Regarding claim 24, Holtman further discloses the computer-implemented method of claim 21, further comprising: processing the audio data to determine feature data corresponding to the 
utterance, wherein processing the first data to determine that the at least one image indicates that the user is directing the utterance at a device comprises processing the first data and the feature data using at least one classifier (face recognition, par. 63). 
 
	Regarding claim 25, Holtman further discloses the computer-implemented method of claim 24, wherein processing the audio data to determine feature data comprises: performing automatic speech recognition (ASR) on the audio data to determine ASR result data (fig. 6;)  and processing the ASR result data to determine the feature data (speaker’s voice recognition, par. 63). 
 
	Regarding claim 26, Holtman further discloses the computer-implemented method of claim 21, wherein causing speech processing to be performed using the audio data comprises sending the audio data to at least one remote device (fig. 11) for the speech processing. 
 
	Regarding claim 27, Holtman further discloses the computer-implemented method of claim 21, wherein audio data was received without detection of a wakeword (continue conversation without keyword, pars. 83-86) associated with the utterance. 
 
	Regarding claim 28, Holtman further discloses the computer-implemented method of claim 21, wherein audio data was received based at least in part on detection of a wakeword (keyword, fig. 3, par. 43) associated with the utterance. 
 
	Regarding claim 29, Holtman further discloses the computer-implemented method of claim 21, further comprising: processing the first data to determine the user is looking at a second device (second device includes tablet, laptop, figs. 13, 15);  and based at least in part on the user looking at the second device, causing output data to be sent to the second device (figs. 4-5). 
 
	Regarding claim 30, Holtman further discloses the computer-implemented method of claim 29, wherein determination that the user is looking at the second device (smartphone, laptop, tablet, figs. 13, 15, par. 165) occurs after receipt of the audio data. 

Regarding claims 31-40 recite limitations that are similar and in the same scope of invention as to those in claims 21-30 above; therefore, claims 31-40 are rejected for the same rejection rationale/basis as described in claims 21-30.

Response to Arguments
• Applicant's arguments filed 4/22/2022 have been fully considered but they are not persuasive. 
---Regarding claims 21-40, the applicants argued the cited prior art of record Holtman (US 20180232201) fails to teach and/or suggest “processing the first data corresponding at least one image to determine that the user is directing the utterance at a device”. 
In response, the examiner herein fully disagrees. Holtman discloses a system/method for speech processing including facing/voice recognition. Face recognition requires user’s face is directed at the device. In some cases, both face and voice recognition are required to use the system (see par. 173 for more details). See cited paragraphs below for more details.
[0029] Computing system 200 includes at least one sensor 220, an entity tracker 210, a voice listener 230, a parser 240, an intent handler 250, a commitment engine 260, and at least one output device 270. In some examples the sensors 220 may include one or more microphones 222, visible light cameras 224, infrared cameras 226, and connectivity devices 228, such as Wi-Fi or Bluetooth modules. In some examples sensor(s) 220 may comprise stereoscopic and/or depth cameras, head trackers, eye trackers, accelerometers, gyroscopes, gaze detection devices, electric-field sensing componentry, GPS or other location tracking devices, temperature sensors, device state sensors, and/or any other suitable sensor.
[0033] In some examples the voice listener 230 and/or commitment engine 260 may receive context information including associated confidence values from entity tracker 210. As described in more detail below, entity tracker 210 may determine an identity, position, and/or current status of one or more entities within range of one or more sensors, and may output such information to one or more other modules, such as voice listener 230, commitment engine 260, etc. In some examples, entity tracker 210 may interpret and evaluate sensor data received from one or more sensors, and may output context information based on the sensor data. Context information may include the entity tracker's guesses/predictions as to the identity, position, and/or status of one or more detected entities based on received sensor data.
[0034] FIG. 3 is a flow diagram depicting an example method 300 for registering a person with an intelligent assistant computer. Method 300 may be performed by a computing system that includes the intelligent assistant computer, such as the previously described computing systems of FIGS. 1 and 2, for example.
[0035] At 310, visual capture of an initially unregistered person may be performed. As described in further detail below, the computing system may compare facial recognition data extracted from image frames of persons observed by the computing system via one or more cameras to a database of previously observed persons to determine if a person is registered or unregistered. If a person cannot be matched to a previously observed person, the computing system may establish a new person profile for that unrecognized person, and identify that person as being initially unregistered.
[0049] Directing the initially unregistered person for image or audio capture may include one or more of outputting an audible direction via an audio speaker and/or outputting a visual direction via a graphical display device. The initially unregistered person may be directed by the computing system responsive to or at later time after receiving the spoken command from the registered person to register the initially unregistered person. For example, referring again to FIG. 1, one or more image frames of person 122 (i.e., “Tom”) that are captured by a camera (e.g., of computing device 110 or camera 118) may be presented on graphical display device 112 to enable person 122 to position his face within the field of view of the camera. The computing system may provide feedback to the person by outputting a visual and/or audible prompt for the person move closer/further away, up/down, right/left, or to speak louder, to repeat a word or phrase, etc.
[0050] In at least some implementations, the spoken command to register the initially unregistered person may be received after some or all of the one or more image frames depicting the initially unregistered person are captured via the one or more cameras. In these implementations, the one or more image frames may be stored in a data storage device prior to receiving the spoken command to register the initially registered person. The one or more image frames may be retrieved from the data storage system, and presented via a graphical display device for review by the registered person. For example, referring again to FIG. 1, one or more image frames of person 122 (i.e., “Tom”) may be captured by a camera (e.g., of computing device 110 or camera 118) prior to person 120 providing the spoken command to register person 122.
[0170] The examples described above include recognition of an audible keyword to activate one or more functions of the intelligent assistant computing system. In some examples, functions of the system may be activated by recognition of one or more other signals. Such signals may include, for example, a user gesture captured by a camera, a user eye-gaze, and a face direction of the user.
[0173] In some examples, captured images of a user may be used to identify which device a user is facing when speaking. In some examples, indicators such as face detection may be used to identify a user. In some examples, captured video may indicate lip movement of a user that may be used to associate a spoken keyword with the user. In an environment with multiple users, such indicators also may identify the particular user who is addressing a device. As such, both voice and physical recognition may be used as parameters to distinguish a user from among the plurality of users.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 


Any inquiry concerning this communication or earlier communications from the examiner should be directed to THIERRY L PHAM whose telephone number is (571)272-7439. The examiner can normally be reached M-F, 11-6.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Benny Tieu can be reached on (571)272-7490. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/THIERRY L PHAM/Primary Examiner, Art Unit 2674