DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .


Response to Amendment
This office action is responsive to applicant’s remarks received on November 16, 2022. Claims 1-20 remain pending.


Response to Arguments
Applicant’s arguments with respect to the amended claims filed on November 16, 2022 have been fully considered but they are not persuasive.

A:  Applicant’s Remarks
For applicant’s remarks “See Applicant Arguments/Remarks Made in an Amendment” filed on November 16, 2022.

A:  Examiner’s Response 
Applicant argues that the cited references either alone or in combination do not teach, disclose or suggest…
1.	Soema teaches away from processing voice data to determine that an utterance spoken by the particular person detected comprises a verbal command directed toward the computing device and adjusting a microphone beamform based on a determination that the utterance spoken by the particular user comprises the command directed toward the computing device.

2.	See Sarin at Abstract. However, Sarin also does not disclose or suggest determining that the user is gazing at the camera of the computing device based on the image data captured by the camera, as recited in claim 1 (and similarly in claim 11).

3.	The references fail to disclose at least adjusting a microphone beamform based on a determination that the utterance spoken by the particular user comprises the command directed toward the computing device, as recited in claim 1 (and similarly in claim 11).

Examiner understands Applicant’s arguments but respectfully disagree. 
1.	Soema ‘571 does not teach away from processing voice data to determine that an utterance spoken by the particular person detected comprises a verbal command directed toward the computing device and adjusting a microphone beamform based on a determination that the utterance spoken by the particular user comprises the command directed toward the computing device. For example, Soema ‘571 discloses at Paragraph 0117 wherein a user may enter commands and information into computer 2210 through input devices such as a keyboard 2262 and pointing device 2261, commonly referred to as a mouse, trackball, or touch pad. Other input devices (not shown) may include a microphone, joystick, game pad, satellite dish, scanner, or the like. Because a user may enter a command via a microphone, one of ordinary skill in the art would understand that this command would be a verbal command towards the computing device. Furthermore, Soema ‘571 discloses at Paragraph 0037 wherein the first audio recording and the second audio recording may each be generated using beamforming techniques. The beamforming techniques may be applied to sounds associated with a plurality of microphones such that the plurality of microphones acts as a single highly directional microphone (i.e., sounds originating within a particular range of directions are amplified while sounds originating outside the particular range of directions are attenuated). These techniques allow sounds originating from person 28 to be focused upon while sounds originating from person 29 are suppressed.

2.	NOTE: Determining that the user is gazing at the camera of the computing device based on the image data captured by the camera is claimed in Dependent Claims 4 & 14 and not in Independent Claims 1 & 11. 
With this said, Applicant is correct in assessing that Sarin ‘207 at the Abstract does not disclose or suggest determining that the user is gazing at the camera of the computing device based on the image data captured by the camera. As a result, Examined cited Soema ‘571 at Paragraph 0028 to disclose this limitation. For example, Paragraph 0028 teaches that the target detection and tracking system 10 may further include a capture device 20. The capture device 20 may include one or more microphones for capturing or recording sounds and one or more cameras for capturing or recording images. In one embodiment, capture device 20 may include a camera that may be used to visually monitor one or more targets including one or more users, such as the user 18. Gestures (including poses) performed by the one or more users may be captured, analyzed, and tracked to perform one or more controls or actions for the user-interface of an operating system or application. One of ordinary skill in the art would understand that because the capture device 20 may include a camera that may be used to visually monitor one or more targets including one or more users, and this is to involves gestures (including poses) performed by the one or more users and may be captured, analyzed, and tracked, this could reasonably suggest that the user is gazing at the camera of the computing device based on the image data captured by the camera during a gesture.

3.	The cited references does not fail to disclose at least adjusting a microphone beamform based on a determination that the utterance spoken by the particular user comprises the command directed toward the computing device, as recited in claim 1 (and similarly in claim 11). Soema ‘571 discloses at Paragraph 0037 wherein the first audio recording and the second audio recording may each be generated using beamforming techniques. The beamforming techniques may be applied to sounds associated with a plurality of microphones such that the plurality of microphones acts as a single highly directional microphone (i.e., sounds originating within a particular range of directions are amplified while sounds originating outside the particular range of directions are attenuated). These techniques allow sounds originating from person 28 to be focused upon while sounds originating from person 29 are suppressed.
	As a result, the cited references are not far removed from the art of record. Accordingly, it is submitted that the present application is not in condition for allowance.


Claim Objections
(The previous claim objections are withdrawn in light of the applicant’s amendments.)


Claim Rejections - 35 USC § 103
1.	In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
2.	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

3.	Claims 1-20 are rejected under 35 U.S.C. 103 as being unpatentable over Soemo et al. (US 20130060571 A1 hereinafter, Soema ‘571) in combination with Sarin et al. (US 20130124207 A1 hereinafter, Sarin ‘207).
Regarding claim 11; Soema ‘571 discloses a computing device (Fig. 2, Tracking System 10) 
associated with a user (i.e. Fig. 2 depicts one embodiment of a target detection and tracking system 10 including a computing environment 12 and a capture device 20. The target detection and tracking system 10 may be used to detect, recognize, analyze, and/or track human targets, such as user 18. Paragraph 0026)
the computing device comprising: (NOTE: Capture device 20 and computing environment 12 may be integrated within a single computing device. The single computing device may be a mobile device, such as mobile device 11 in Fig. 1. Paragraph 0041)
a plurality of microphones (Microphones – Not Shown i.e. The capture device 20 may include one or more microphones for capturing or recording sounds. Paragraph 0028)
a camera (Fig. 4, Capture Device 20); 
data processing hardware (Fig 4, Processor 42) 
and memory hardware (Fig 4, Memory 44) in communication with the data processing hardware and storing instructions that when executed on the data processing hardware cause the data processing hardware to perform operations (i.e. The capture device 20 may include a memory 44 that may store the instructions that may be executed by the processor 42, images or frames of images captured by the 3-D camera or RGB camera, filters or profiles, or any other suitable information, images, or the like. Paragraph 0049);
comprising: 
receiving an acoustic signal comprising voice data corresponding to an utterance spoken by the user of the computing device (i.e. Image and audio processing engine 194 may apply audio and voice recognition techniques to audio data. For example, audio recognition may be used to detect a particular sound. The particular faces, voices, sounds, and objects to be detected may be stored in one or more memories contained in memory unit 192. Paragraph 0051); 
processing the voice data to determine that the utterance spoken by the user comprises a verbal command directed toward the computing device (i.e. A user may enter commands and information into computer 2210 through input devices such as a keyboard 2262 and pointing device 2261, commonly referred to as a mouse, trackball, or touch pad. Other input devices (not shown) may include a microphone, joystick, game pad, satellite dish, scanner, or the like. Paragraph 0117)
receiving image data of a vicinity of the computing device, the image data captured by the camera of the computing device (i.e. Capture device 20 may capture image and audio data relating to one or more users and/or objects. For example, capture device 20 may be used to capture information relating to partial or full body movements, gestures, and speech of one or more users. The information captured by capture device 20 may be received by computing environment 12 and/or a processing element within capture device 20 and used to render, interact with, and control aspects of a gaming or other application. In one example, capture device 20 captures image and audio data relating to a particular user and computing environment 12 processes the captured information to identify the particular user by executing facial and voice recognition software. Paragraphs 0028-0030); 
determining a direction of the user relative to the computing device based on the image data captured by the camera (i.e. Target detection and tracking system 10 determines a first direction (e.g., via sound source localization) within the environment 300 associated with person 28 and a second direction within the environment 300 associated with person 29. The first direction and the second direction may each be identified by a particular direction (or angle) with respect to reference coordinates associated with the capture device 20. A particular direction may also be identified by a set of coordinates associated with the particular direction within the environment 300. Once the first direction and the second direction are determined, target detection and tracking system 10 may generate a first audio recording associated with the first direction and a second audio recording associated with the second direction.  Paragraphs 0036-0037);
and adjusting, based on the direction of the user relative to the computing device and the determination that the utterance spoken by the user comprises a verbal command directed toward the computing device, a microphone beamform (i.e. The first audio recording and the second audio recording may each be generated using beamforming techniques. The beamforming techniques may be applied to sounds associated with a plurality of microphones such that the plurality of microphones acts as a single highly directional microphone (i.e., sounds originating within a particular range of directions are amplified while sounds originating outside the particular range of directions are attenuated). These techniques allow sounds originating from person 28 to be focused upon while sounds originating from person 29 are suppressed. Paragraph 0037).
Soema ‘571 discloses a tracking System 10 with several computing devices and Paragraph 0041 teaches wherein Capture device 20 and computing environment 12 may be integrated within a single computing device. The single computing device may be a mobile device, such as mobile device 11 in Fig. 1. Although, Examiner reasonably believes that a single computing device is specifically taught, Examiner cites Sarin ‘207 to clear up any deficiencies in Soema ‘571. For example, Sarin ‘207 teaches at the Abstract wherein computing device (e.g., a smart phone, a tablet computer, digital camera, or other device with image capture functionality) causes an image capture device to capture one or more digital images based on audio input (e.g., a voice command) received by the computing device. For example, a user's voice (e.g., a word or phrase) is converted to audio input data by the computing device, which then compares (e.g., using an audio matching algorithm) the audio input data to an expected voice command associated with an image capture application. In another aspect, a computing device activates an image capture application and captures one or more digital images based on a received voice command. In another aspect, a computing device transitions from a low-power state to an active state, activates an image capture application, and causes a camera device to capture digital images based on a received voice command.
Soema ‘571 and Sarin ‘207 are combinable because they are from same field of endeavor of speech systems (Sarin ‘207 at “Background”). 
	At the time the invention was effectively filed, it would have been obvious to a person of ordinary skill in the art to modify the speech system as taught by Soema ‘571 by adding a single computing device as taught by Sarin ‘207. The motivation for doing so would have been advantageous because with the increasing popularity of computing devices (e.g., smart phones) having image capture functionality, there is a need for improving the user experience by allowing quick access to image-capture functionality. Therefore, it would have been obvious to combine Soema ‘571 with Sarin ‘207 to obtain the invention as specified.

Regarding claim 12; Soema ‘571 discloses wherein adjusting the microphone beamform comprises adjusting the microphone beamform to better isolate and capture the voice data in the acoustic signal (i.e. The first audio recording and the second audio recording may each be generated using beamforming techniques. The beamforming techniques may be applied to sounds associated with a plurality of microphones such that the plurality of microphones acts as a single highly directional microphone (i.e., sounds originating within a particular range of directions are amplified while sounds originating outside the particular range of directions are attenuated). These techniques allow sounds originating from person 28 to be focused upon while sounds originating from person 29 are suppressed. Paragraphs 0037 & 0072).

Regarding claim 13; Soema ‘571 discloses wherein adjusting the microphone beamform comprises adjusting the microphone beamform to reduce an amount of noise captured in the acoustic signal from acoustic sources other than the user (i.e. The beamforming techniques may be applied to sounds associated with a plurality of microphones such that the plurality of microphones acts as a single highly directional microphone (i.e., sounds originating within a particular range of directions are amplified while sounds originating outside the particular range of directions are attenuated). Paragraphs 0037 & 0072).

Regarding claim 14; Soema ‘571 discloses wherein the operations further comprise: determining that the user is gazing toward the computing device based on the image data captured by the camera (i.e. The target detection and tracking system 10 may further include a capture device 20. The capture device 20 may include one or more microphones for capturing or recording sounds and one or more cameras for capturing or recording images. In one embodiment, capture device 20 may include a camera that may be used to visually monitor one or more targets including one or more users, such as the user 18. Gestures (including poses) performed by the one or more users may be captured, analyzed, and tracked to perform one or more controls or actions for the user-interface of an operating system or application. Paragraph 0028);
wherein adjusting the microphone is further based on the determining that the user is gazing toward the computing device (i.e. Once the first direction and the second direction are determined, target detection and tracking system 10 may generate a first audio recording associated with the first direction and a second audio recording associated with the second direction. The first audio recording and the second audio recording may each be generated using beamforming techniques. The beamforming techniques may be applied to sounds associated with a plurality of microphones such that the plurality of microphones acts as a single highly directional microphone (i.e., sounds originating within a particular range of directions are amplified while sounds originating outside the particular range of directions are attenuated). These techniques allow sounds originating from person 28 to be focused upon while sounds originating from person 29 are suppressed. Paragraph 0037).

Regarding claim 15; Soema ‘571 discloses wherein the operations further comprise: determining that the image data captured by the camera includes a representation of a person, wherein determining the direction of the user is based on the received audio signal and the determination that the image data captured by the camera includes the representation of the person. (i.e. The environment 300 includes persons 28 and 29 and a non-human object (chair 16). As depicted, person 28 is closer to capture device 20 than person 29. Person 28 is also associated with a different direction (or angle) than person 29 from the point of view of the capture device 20. In some embodiments, target detection and tracking system 10 determines a first direction (e.g., via sound source localization) within the environment 300 associated with person 28 and a second direction within the environment 300 associated with person 29. Paragraph 0036).

Regarding claim 16; Soema ‘571 discloses wherein determining that the image data captured by the camera includes the representation of the person comprises determining that the image data includes a representation of both eyes, a nose, and a mouth of a person (i.e. Object recognition may be used to detect particular objects (e.g., soccer balls, cars, or landmarks) and facial recognition may be used to detect the face of a particular person. Paragraph 0052).

Regarding claim 17; Soema ‘571 discloses wherein determining that the image data captured by the camera includes the representation of the person comprises determining that the image data includes a representation of an authorized user of the one or more computing devices (i.e. Capture device 20 captures image and audio data relating to a particular user and computing environment 12 processes the captured information to identify the particular user by executing facial and voice recognition software. Paragraph 0030).

Regarding claim 18; Soema ‘571 discloses wherein the operations further comprise, after adjusting the microphone beamform, performing speech recognition on the voice data corresponding to the utterance spoken by the user (i.e. The beamforming techniques may be applied to sounds associated with a plurality of microphones such that the plurality of microphones acts as a single highly directional microphone (i.e., sounds originating within a particular range of directions are amplified while sounds originating outside the particular range of directions are attenuated). These techniques allow sounds originating from person 28 to be focused upon while sounds originating from person 29 are suppressed. Once a particular audio recording has been generated (e.g., the first audio recording), the target detection and tracking system 10 may perform local speech recognition on the particular audio recording. Paragraphs 0037-0038).

Regarding claim 19; Soema ‘571 discloses wherein receiving the image data of the vicinity of the computing device comprises receiving the image data of the vicinity of the computing device while receiving the audio signal comprising the voice data corresponding to the utterance spoken by the user (i.e. Image and audio processing engine 194 may apply audio and voice recognition techniques to audio data. For example, audio recognition may be used to detect a particular sound. The particular faces, voices, sounds, and objects to be detected may be stored in one or more memories contained in memory unit 192. Paragraph 0052).

Regarding claim 20; Soema ‘571 discloses wherein the audio signal further comprises noises captured from acoustic sources other than the user (i.e. Echo cancellation may be used to suppress speech sounds and/or other sounds originating outside the first environment that are projected into the first environment through the one or more audio speakers. Noise suppresion techniques may also be applied to the one or more sounds in order to remove background noise. In one example, a band-pass filter may be applied to the one or more sounds in order to remove background noise. Paragraphs 0067 & 0083).

Regarding claim 1; Claim 1 contains substantially the same subject matter as claim 11. Therefore, claim 1 is rejected on the same grounds as claim 11.
Regarding claim 2; Claim 2 contains substantially the same subject matter as claim 12. Therefore, claim 2 is rejected on the same grounds as claim 12.
Regarding claim 3; Claim 3 contains substantially the same subject matter as claim 13. Therefore, claim 3 is rejected on the same grounds as claim 13.
Regarding claim 4; Claim 4 contains substantially the same subject matter as claim 14. Therefore, claim 4 is rejected on the same grounds as claim 14.
Regarding claim 5; Claim 5 contains substantially the same subject matter as claim 15. Therefore, claim 5 is rejected on the same grounds as claim 15.
Regarding claim 6; Claim 16 contains substantially the same subject matter as claim 16. Therefore, claim 16 is rejected on the same grounds as claim 16.
Regarding claim 7; Claim 17 contains substantially the same subject matter as claim 17. Therefore, claim 17 is rejected on the same grounds as claim 17.
Regarding claim 8; Claim 18 contains substantially the same subject matter as claim 18. Therefore, claim 18 is rejected on the same grounds as claim 18.
Regarding claim 9; Claim 9 contains substantially the same subject matter as claim 19. Therefore, claim 9 is rejected on the same grounds as claim 19.
Regarding claim 10; Claim 10 contains substantially the same subject matter as claim 20. Therefore, claim 10 is rejected on the same grounds as claim 20.


Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MARCUS T. RILEY, ESQ. whose telephone number is (571)270-1581. The examiner can normally be reached 9-5 M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Tammy P. Goddard can be reached on 517-272-7773. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

MARCUS T. RILEY, ESQ.
Primary Examiner
Art Unit 2677



/MARCUS T RILEY/Primary Examiner, Art Unit 2677