DETAILED ACTION


Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-6, 11, 13-20, 24, 26, and 27 are rejected under 35 U.S.C. 103 as being unpatentable over Di Censo (US Patent Publication No. 2015/0193005) in view of Haley Jr. et al. (US Patent Publication 2015/0379770; hereinafter Haley).
With reference to claim 1, Di Censo discloses a method (see paragraphs 73-74; Fig. 8), comprising: 
determining, by computer hardware components (see paragraphs 73-74; Fig. 9), and based on a voice command (in teaching voice prompt (event trigger, steps 810-820); see paragraphs 57-58, 62; Figs. 6, 8) and a gesture (420) of an occupant of a vehicle (see paragraphs 57-58; Figs. 6, 8), a task to be carried out by at least: 
acquiring the voice command (see paragraph 34);
selecting the task (identifying) to be carried out based on the voice command (see paragraphs 31, 34; Fig. 8);
acquiring the gesture (see paragraph 34);
receiving object information about objects outside the vehicle (analyzing data received from the camera to identify and/or characterize the object; see paragraph 24) (see paragraph 34); 
identifying, by the computer hardware components, and based on the voice command and the gesture, a plurality of the objects that correspond to the task (in teaching wide angle image capture device which captures images of potential objects of interest within the angle of capture; see paragraphs 34, 51; Fig. 6);
determining, by the computer hardware components, a context of the voice command (see paragraphs 34, 36);
selecting, by the computer hardware components and based on the context of the voice command, the one of the plurality of objects based on the context of the voice command (see paragraphs 38-39); and
performing, by the computer hardware components and based on the selected one of the plurality of objects, the task (see paragraphs 65-66; Fig. 8).
While Di Censo discloses selecting an object based on directional input (see paragraphs 38-39), however fails to disclose information to differentiate one of the plurality of object as recited. 
Haley further discloses wherein the context of the voice command comprises information usable to differentiate the selected one of the plurality of objects from others of the plurality of objects (see paragraph 38). 
Therefore it would have been obvious to one of ordinary skill in the art to allow the usage of voice command similar to that which is taught by Haley to be carried out in a system similar to that which is taught by Di Censo to thereby disambiguate between multiple objects (see Haley; paragraph 35).

With reference to claim 2, Di Censo and Haley disclose the method of claim 1, wherein Di Censo further discloses wherein the object information is received from at least one external sensor (120-1) or based on information communicated from another vehicle (see paragraph 31; Fig. 8).

With reference to claim 3, Di Censo and Haley disclose the method of claim 1, wherein Di Censo further discloses wherein the external sensor (120-1) comprises at least one of a radar sensor, a camera, a lidar sensor, or an ultrasonic sensor (see paragraphs 26, 49; Fig. 5).

With reference to claim 4, Di Censo and Haley disclose the method of claim 1, wherein Di Censo further discloses wherein the gesture is acquired based on at least one internal sensor (120-2) (see paragraphs 27, 49; Fig. 5).

With reference to claim 5, Di Censo and Haley disclose the method of claim 1, wherein Di Censo further discloses wherein the internal sensor (120-2) comprises at least one of a vision system, an infrared vision system, a near infrared vision system, a red-green-blue vision system, a red-green-blue infrared vision system, or a time-of-flight camera (see paragraph 49). 

With reference to claim 6, Di Censo and Haley disclose the method of claim 1, wherein Di Censo further discloses wherein the gesture (see paragraph 27) comprises at least one of a pointing direction of a finger of the occupant, information indicating whether a hand of the occupant follows a pre-determined trajectory (420), or a viewing direction of the occupant (in teaching pointing direction detected by I/O device (410) located on the wrist based on the position of the shoulder I/O device (415) for pointing in the direction of the trajectory (420); see paragraph 48; Fig. 4A-B).

With reference to claim 11, Di Censo and Haley disclose the method of claim 1, wherein Di Censo further discloses wherein the task comprises at least one of a classification task, an identification task, or a validation task (see paragraph 34).

With reference to claim 13, Di Censo discloses a system (100) (see Fig. 1), comprising: 
a plurality of computer hardware components (see paragraphs 73-74; Fig. 9), configured to determine based on a voice command (in teaching voice prompt (event trigger, steps 810-820); see paragraphs 57-58, 62; Figs. 6, 8) and a gesture (420) of an occupant of a vehicle (see paragraphs 57-58; Figs. 6, 8), a task to be carried out by at least: 
acquiring the voice command (see paragraph 34);
selecting the task (identifying) to be carried out based on the voice command (see paragraphs 31, 34; Fig. 8);
acquiring the gesture (see paragraph 34);
receiving object information about objects outside the vehicle (analyzing data received from the camera to identify and/or characterize the object; see paragraph 24) (see paragraph 34); 
based on the voice command and the gesture, a plurality of the objects that correspond to the task (in teaching wide angle image capture device which captures images of potential objects of interest within the angle of capture; see paragraphs 34, 51; Fig. 6);
determining, by the computer hardware components, a context of the voice command (see paragraphs 34, 36);
selecting based on the context of the voice command, the one of the plurality of objects (see paragraphs 38-39); and 
performing based on the selected one of the plurality of objects, the task (see paragraphs 65-66; Fig. 8).
While Di Censo discloses selecting an object based on directional input (see paragraphs 38-39), however fails to disclose information to differentiate one of the plurality of object as recited. 
Haley further discloses wherein the context of the voice command comprises information usable to differentiate the selected one of the plurality of objects from others of the plurality of objects (see paragraph 38). 
Therefore it would have been obvious to one of ordinary skill in the art to allow the usage of voice command similar to that which is taught by Haley to be carried out in a system similar to that which is taught by Di Censo to thereby disambiguate between multiple objects (see Haley; paragraph 35).

With reference to claim 14, Di Censo and Haley disclose the method of claim 13, wherein Di Censo further discloses wherein the plurality of computer hardware components are comprised by the vehicle (100) (see paragraph 22; Fig. 1, 5).

With reference to claim 15, Di Censo and Haley disclose the method of claim 13, wherein Di Censo further discloses wherein the object information is received from at least one external sensor (120-1) or based on information communicated from another vehicle (see paragraph 31; Fig. 8).

With reference to claim 16, Di Censo and Haley disclose the method of claim 13, wherein Di Censo further discloses wherein the external sensor (120-1) comprises at least one of a radar sensor, a camera, a lidar sensor, or an ultrasonic sensor (see paragraph 26, 49; Fig. 5).

With reference to claim 17, Di Censo and Haley disclose the method of claim 13, wherein Di Censo further discloses wherein the gesture is acquired based on at least one internal sensor (120-2) (see paragraphs 27, 49; Fig. 5). 

With reference to claim 18, Di Censo and Haley disclose the method of claim 13, wherein Di Censo further discloses wherein the internal sensor (120-2) comprises at least one of a vision system, an infrared vision system, a near infrared vision system, a red-green-blue vision system, a red-green-blue infrared vision system, or a time-of-flight camera (see paragraph 49).

With reference to claim 19, Di Censo and Haley disclose the method of claim 13, wherein Di Censo further discloses wherein the gesture comprises at least one of a pointing direction of a finger of the occupant, information indicating whether a hand of the occupant follows a pre-determined trajectory, or a viewing direction of the occupant (in teaching pointing direction detected by I/O device (410) located on the wrist based on the position of the shoulder I/O device (415) for pointing in the direction of the trajectory (420); see paragraph 48; Fig. 4A-B).

With reference to claim 20, Di Censo discloses a non-transitory computer readable medium comprising instructions, that when executed, configure a plurality of computer hardware components (see paragraphs 73-74; Fig. 9) to determine, based on a voice command (in teaching voice prompt (event trigger, steps 810-820); see paragraphs 57-58, 62; Figs. 6, 8) and a gesture (420) of an occupant of a vehicle (see paragraphs 57-58; Figs. 6, 8), a task to be carried out by at least: 
acquiring the voice command (see paragraph 34);
selecting the task (identifying) to be carried out based on the voice command (see paragraphs 31, 34; Fig. 8);
acquiring the gesture (see paragraph 34);
receiving object information about objects outside the vehicle (analyzing data received from the camera to identify and/or characterize the object; see paragraph 24) (see paragraph 34);
identifying, based on the voice command and the gesture, a plurality of the objects that correspond to the task (in teaching wide angle image capture device which captures images of potential objects of interest within the angle of capture; see paragraphs 34, 51; Fig. 6);
determining a context of the voice command (see paragraphs 34, 36), 
selecting based on the context of the voice command, the one of the plurality of objects (see paragraphs 38-39); and 
performing based on the selected one of the plurality of objects, the task (see paragraphs 65-66; Fig. 8).
While Di Censo discloses selecting an object based on directional input (see paragraphs 38-39), however fails to disclose information to differentiate one of the plurality of object as recited. 
Haley further discloses wherein the context of the voice command comprises information usable to differentiate the selected one of the plurality of objects from others of the plurality of objects (see paragraph 38). 
Therefore it would have been obvious to one of ordinary skill in the art to allow the usage of voice command similar to that which is taught by Haley to be carried out in a system similar to that which is taught by Di Censo to thereby disambiguate between multiple objects (see Haley; paragraph 35).

With reference to claim 22, Di Censo and Haley disclose the method of claim 1, wherein Haley further discloses wherein the processor (4) capable of recognizing verbal commands to indicate selection of an object along with user’s gaze information (see paragraph 37-39; Figs. 1A-D) wherein the context of the voice command comprises a color (see paragraph 86-87, 90; Figs. 7-9). 
Therefore it would have been obvious to one of ordinary skill in the art to allow the usage of a voice command identifying color of an object to be identified similar to that which is taught by Haley to be carried out in a system similar to that which is taught by Di Censo to thereby disambiguate between multiple objects (see Haley; paragraph 35).

With reference to claim 24, Di Censo and Haley discloses the method of claim 23, wherein Haley further discloses wherein the context of the voice command comprises a color (see paragraph 86-87, 90; Figs. 7-9).

With reference to claim 26, Di Censo and Haley discloses the method of claim 20, wherein Haley further discloses wherein the context of the voice command comprises a color (see paragraph 86-87, 90; Figs. 7-9). 

With reference to claim 27, Di Censo and Haley disclose the method of claim 18, wherein Haley further discloses wherein the gesture is acquired based on a time-of-flight camera (see paragraphs 63-64, 69).


Claims 28 is rejected under 35 U.S.C. 103 as being unpatentable over Di Censo in view of Haley as applied to claim 13 above, and further in view of Dai et al. (US Patent Publication No. 2014/0347263; hereinafter).
With reference to claim 28, Di Censo and Haley disclose the features as explained above, however fails to disclose a validation task as recited.
Dai discloses a system for recognizing gestures having a detection module and a tracking module, wherein a fusion module is capable of performing a task wherein the task comprises a validation task (see paragraphs 73-74; Figs. 9).
Therefore it would have been obvious to one of ordinary skill in the art to allow the usage of a validation task similar to that which is taught by Dai to be carried out in a system similar to that which is taught by Di Censo and Haley to thereby provide improved object tracking and performance (see Dai; paragraphs 76-77).


Response to Arguments
Applicant's arguments filed 08/12/2022 have been fully considered but they are not persuasive. The applicant argues that Di Censo and Haley fail to disclose “identifying…based on the voice command and the gesture, a plurality of objects that correspond to the task”; “determining…a context of the voice command, the context of the voice command comprising information usable to differentiate one of the plurality of objects from others other the plurality of objects”; and “selecting…the one of the plurality of objects” as recited.  However, the examiner finds that Di Censo discloses a plurality of objects that are captured within the field of view of the camera, that correspond to the task of identification, wherein a gesture and/or voice command is used to further select a specific item of interest (210).  Therefore the examiner finds that Di Censo discloses identifying a plurality of objects that correspond to the task to be carried out as recited.  Further Haley, discloses using verbal commands which indicate specific information usable to differentiate one of the plurality of objects from others in teaching a command specifying a location of an object for identification (see paragraph 38). In order for the command to be received to identify the object, Haley also discloses a speech recognition algorithm for processing verbal commands thereby disclosing determining the context of the voice command.  Remaining arguments are directed towards amended subject matter, wherein the examiner finds that Di Censo and Haley disclose the features as explained above.  For these reasons the examiner finds that Di Censo and Haley discloses the subject matter as recited.


Pertinent Prior Art
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
HAMPIHOLI (USPGPub 2016/0267335) discloses a mobile device having at least an outward-facing camera having a field of view that includes a vehicle environment and a user-facing camera having a field of view that includes the driver of the vehicle, the device includes instruction to receive position and motion data from the mobile device and determine the driver state and transmit image data comprising video data of the driver and the vehicle environment, based on an indication of driver gaze and objects of interest in a path of the vehicle determining that the driver gaze is directed to one or more objects of interest (see abstract; paragraphs 77-90; Figs. 1-9).
RAMASWAMY (US Pat. 9,975,483) discloses a combination of location determining and tracking techniques to provide driver assistance such as visual and/or audible assistance to alert a driver of potential danger (see abstract), wherein the device determines the gaze direction of the user, as well as the relative directions and/or locations of nearby objects outside of the vehicle (see column 2, lines 5- 53; column 4, line 1-column 5, line 18; Figs. 1-4).
PAINE et al. (USPGPub 2018/0189354) discloses a computing device comprising a processor with instruction to determine a user’s gaze target and voice input to determine the object of user focus (see paragraphs 16-33; Figs. 1-5).
LEE et al. (USPGPub 2015/0339098) discloses a communicator which receives a pointing signal from a remote control apparatus, a recognizer which recognizes at least one of a voice command and a gesture, and a processor which selects one item among the plurality of items based on at least one of the point signal and the gesture, and in response to receiving the voice command  (see abstract, 25-34, 65-71, 131-197; Figs. 1-3, 7-8).
HERNANDEZ-ABREGO et al. (US2012/0295708) discloses a system for interfacing a user with a computer program utilizing gaze detection and voice recognition by determining a gaze of a user for specifying an object in one mode, and processing voice command for a task in a second mode (see abstract; 31-39; Figs. 1-6).


Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ALECIA DIANE ENGLISH whose telephone number is (571)270-1595. The examiner can normally be reached Mon.-Fri. 7:00am-3:00am.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, William Boddie can be reached on 571-272-0666. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/ADE/Examiner, Art Unit 2625                                                                                                                                                                                                        
/WILLIAM BODDIE/Supervisory Patent Examiner, Art Unit 2625