EXAMINER’S AMENDMENT
1.	Authorization for this examiner’s amendment was given in an interview with Jason M.S. Goodman on 3/24/2021.
2.	The application has been amended as follows: 

1.	(Cancelled)

2.	(Currently Amended)	A method for facial and gesture recognition, the method comprising:
capturing an input image with one or more sensors of a facial and gesture recognition system, the sensors disposed on a motor vehicle, or a remote sensor in communication with the motor vehicle;
passing the input image through a convolutional neural network (CNN) of the facial and gesture recognition system, the CNN having at least a first sub-network and a second sub-network distinct from the first sub-network, 
utilizing a controller of the facial and gesture recognition system in electronic communication with the sensors, the controller having a processor, a memory, and input/output ports, the memory storing the CNN, and the processor configured to execute the CNN;
performing, with the CNN, a first task of facial recognition with 
performing, with the CNN, a second task of gesture recognition with the second sub-network; 
assigning a first confidence level to the facial recognition performed by the first sub-network, and assigning a second confidence level to the gesture recognition performed by the second sub-network;
engaging an authentication mode including:

cropping the image of the registered user and feeding the cropped image of the registered user into a face representation module of the controller and a gesture representation module of the controller;
performing face extraction and feature extraction within the face representation module before feeding the input image into the CNN;
performing frame detection and feature extraction within the gesture representation module before feeding the input image into the CNN; 
calculating, within the CNN, facial confidence levels that the image of the registered user is associated with a particular registered user profile;
calculating, within the CNN, gesture confidence levels that the image of the registered user is associated with a particular gesture;
outputting the facial confidence level as an output of the first sub-network;
outputting the gesture confidence level as an output of the second sub-network;
receiving the first sub-network output and the second sub-network output within a decision module;
performing, within the decision module, face decision criterion calculations comparing the facial confidence level to a first threshold to determine if the facial confidence level exceeds the threshold; 
performing, within the decision module, gesture decision criterion calculations comparing the gesture confidence level to a second threshold to determine if the gesture confidence level exceeds the second threshold; 
performing, within the decision module, a decision criterion calculation to determine whether the facial and gesture system will perform an action, wherein when the ; and 


3.	(Previously Presented) The method of claim 2 further comprising:
directly receiving the input image in a feature extraction layer (FEL) portion of the CNN, the CNN having multiple convolution, pooling (CPL), and activation layers stacked together with each other;
conducting, within the FEL portion, a learning operation to learn to represent at least a first stage of data of the input image in a form including horizontal and vertical lines, and blobs of color; and
outputting the first stage of data to at least each of the first sub-network and the second sub-network, wherein a first CPL portion directly receives the first stage of data.

4. 	(Previously Presented)  The method of claim 3 further comprising: 
conducting, within the FEL portion, a learning operation to represent at least a second stage of data of the input image in a form including shapes including circles, rectangles, and triangles the input image, wherein a second CPL portion receives the second stage of data.
	
5.	(Previously Presented) The method of claim 4 further comprising:
detecting, within the FEL portion, complex combinations of features from a previous FEL layer or layers and forming representations in a form including wheels, faces, and grids; and


6.	(Previously Presented) The method of claim 5 further comprising:
conducting, within the FEL portion, a learning operation to learn to represent at least a third stage of data defining complex combinations of features from the FEL portion and the first CPL portion to form representations of faces for use in facial recognition and assignment of confidence levels, and
merging data from each of the FEL portion, the first CPL portion, the second CPL portion to generate a fully connected layer, wherein data in the fully connected layer is used to generate an output from the first sub-network.
	
7.	(Currently Amended)	The method of claim 2 further comprising:
engaging a registration mode including:
capturing an image of a new user with the one or more sensors on the motor vehicle or a remote sensor in communication with the motor vehicle, wherein the new user is a user which has not previously been authenticated, or which has been previously authenticated but deleted from memory; 
utilizing the image of the new user for one or more of facial recognition and gesture recognition[[,]]; 
wherein for facial recognition, the image includes a series of images of a face of the new user taken at different angles, and the image is fed into the face representation module of the controller[[,]]; and 

		 
8.	(Previously Presented) The method of claim 7 further comprising:
performing face detection and feature extraction on the image of the new user, including:
	identifying whether the image includes a face;
	identifying and defining points of interest on the face; and
	determining a spatial relationship between the points of interest.

9.	(Previously Presented) The method of claim 8 further comprising:
performing frame difference analysis and feature extraction on the image of the new user for gesture extraction, the frame difference analysis and feature extraction including:
	analyzing differences between multiple images to determine if a gesture is being deployed; and
	identifying features of the gesture that characterize the gesture.

10.	(Previously Presented) The method of claim 7 further comprising:
feeding output of the face representation module and the gesture representation module into the CNN as the input image for the CNN, wherein during registration mode the input image is annotated and associated with the new user;
training the CNN with the input image that is annotated and associated with the new user; and
generating a registered user profile based on the input image that is annotated and associated with the new user, wherein the user profile includes data associated with the user’s face and gestures.

11.	(Cancelled)	

12.	(Cancelled)	

13.	(Currently Amended)	A system for facial and gesture recognition, the system comprising:
one or more sensors, the one or more sensors disposed on a motor vehicle or a remote sensor in communication with the motor vehicle, the one or more sensors capturing an input image;
a controller having a processor, a memory, and input/output ports, the input/output ports in electronic communication with the one or more sensors and receiving the input image, the memory storing a convolutional neural network (CNN), the controller passing the input image through the CNN, the CNN having at least a first sub-network and a second sub-network distinct from the first sub-network;
the first sub-network of the CNN performing a first task of facial recognition;
the second sub-network of the CNN performing a second task of gesture recognition; 
the CNN assigning a first confidence level to the facial recognition performed by the first sub-network, and the CNN assigning a second confidence level to the gesture recognition performed by the second sub-network, 
the CNN further including a feature extraction layer (FEL) portion, and multiple convolution, pooling (CPL) and activation layers stacked together with each other, wherein
the FEL portion conducts a learning operation to learn to represent at least a first stage of data of the input image in a form including horizontal and vertical lines, and blobs of color and outputs the first stage of data to at least each of the first sub-network and the second sub-network, wherein a first CPL portion directly receives the first stage of data;

the system crops the image of the registered user and feeds the cropped image of the registered user into a face representation module of the controller and a gesture representation module of the controller; 
the system performs face extraction and feature extraction within the face representation module before feeding the input image into the CNN; 
the system performs frame detection and feature extraction within the gesture representation module before feeding the input image into the CNN;
wherein the CNN calculates facial confidence levels that the image of the registered user is associated with a particular registered user profile;
the CNN calculates gesture confidence levels that the image of the registered user is associated with a particular gesture;
wherein the facial confidence level is an output of the first sub-network, and the gesture confidence level is an output of the second sub-network;
a decision module receives the first sub-network output and the second sub-network output; 
the decision module performs face decision criterion calculations comparing the facial confidence level to a first threshold to determine if the facial confidence level exceeds the first threshold; gesture decision criterion calculations comparing the gesture confidence level to a second threshold to determine if the gesture confidence level exceeds the second threshold; ; and 


14. 	(Currently Amended)	The system of claim 13, wherein the FEL portion conducts a learning operation to learn to:
represent at least a second stage of data of the input image in a form including shapes including circles, rectangles, and triangles the input image, wherein a second CPL portion receives the second stage of data the FEL portion detects complex combinations of features from a previous FEL layer or layers and forms representations including wheels, faces, and grids; 
distribute the first stage of data to each of the first sub-network, the second sub-network, and additional sub-networks, wherein the second stage of data captures the circles, rectangles, and triangles, and wherein the second stage of data is used for object detection, classification, and localization[[,]] ; and 


15.	(Previously Presented) The system of claim 14 wherein the CNN merges data from each of the FEL portion, the first CPL portion, the second CPL portion to generate a fully connected layer, wherein data in the fully connected layer is used to generate an output from the first sub-network.
	
16.	(Currently Amended)	The system of claim 13 further including a registration mode wherein:

wherein for facial recognition, the image includes a series of images of a face of the new user taken at different angles, and the image is fed into the face representation module of the controller, and 
		 
17.	(Previously Presented) The system of claim 16 wherein:
the CNN performs face detection and feature extraction on the image of the new user, including:
identifying whether the image includes a face;
identifying and defining points of interest on the face; and
determining a spatial relationship between the points of interest.

18.	(Previously Presented) The system of claim 17 wherein:
the CNN performs frame difference analysis and feature extraction on the image of the new user for gesture extraction, the frame difference analysis and feature extraction including:
	analyzing differences between multiple images to determine if a gesture is being deployed; and
	identifying features of the gesture that characterize the gesture.

19.	(Previously Presented) The system of claim 16 wherein:

the CNN is trained with the input image that is annotated and associated with the new user; and
a registered user profile is generated by the CNN based on the input image that is annotated and associated with the new user, wherein the user profile includes data associated with the user’s face and gestures.

20.	(Cancelled)	

21.	(Cancelled)	

Allowable Subject Matter
3.	The pending claims 2-10, and 13-19 are allowed. The following is an examiner’s statement of reasons for allowance: The prior art of records as set forth in the Non-Final rejection issued on 12/13/2020 teach some of the previous claimed limitations except for claims 12 and 21. Now the independent claims incorporate allowable languages of claims 12 and 21 into independent form. Moreover, the examiner didn’t find reference(s) that teach all the languages as arranged in the independent claims 2 and 13.   
Further prior art searches failed to produce any relevant results. Thus, the above pending claims are allowed.
4.	Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee.  Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance.”
CONCLUSION

5.	The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
US 20200026996 A1 is directed to method, apparatus and computer program for generating robust automatic learning systems and testing trained automatic learning systems
[0002] this disclosure relates generally to the field of neural network classifiers and, more specifically, to systems and methods for training deep neural network classifiers with provable robustness to inputs that include norm-bounded adversarial perturbations. 
US 20180189581 A1 is direct to vehicle manipulation using convolutional image processing. [0086] FIG. 8 is an example illustrating a convolution neural network (CNN). A convolutional neural network such as this network 800 can be used for deep learning, where the deep learning can be applied to vehicle manipulation using convolutional image processing. A computer is initialized, and images of an occupant of a first vehicle are obtained. A multilayered analysis engine on the computer is trained using the images. Further images, where the further images include facial image data from persons in a second vehicle, are evaluated using the multilayered analysis engine. Manipulation data is provided to the second vehicle. The convolutional neural network can be applied to such tasks as cognitive state analysis, mental state analysis, mood analysis, emotional state analysis, and so on. Cognitive state data can include mental processes, where the mental processes can include attention, creativity, memory, perception, problem solving, thinking, use of language, and the like.
US 20180330178 A1 is directed to cognitive state evaluation for vehicle navigation. [0034] in embodiments, a computing device is used for analyzing images obtained using imaging devices. The analyzing includes cognitive state evaluation of occupants of vehicles. The imaging devices obtain images of one or more occupants of one or more vehicles. The results of the image analysis and processing is vehicle navigation. An imaging device within a first vehicle is used to obtain one or more images of an occupant of the first vehicle. A computing device is used to 
US 10850693 B1 is directed to determining comfort settings in vehicles using computer vision. Embodiments include providing comfort settings in vehicles using computer vision that may (I) utilize interior cameras of a vehicle, (ii) generate comfort profiles for individual users of a vehicle using facial recognition, (iii) determine a comfort profile for a user based on a body type, (iv) pre-adjust seats for a person entering a vehicle, (v) determine characteristics of occupants of a vehicle, (vi) implement fleet learning to train a convolutional neural network, (vii) utilize computer vision with sensor fusion and/or (viii) be implemented as one or more integrated circuits.
 6.	Information regarding the status of an application may be obtained from the patent application information retrieval (PAIR) system. Status information for published application may be obtained from either Private –PAIR or Public-PAIR. Status information for unpublished applications is available through Private-PAIR only. For more information about the PAIR system, please see pair-direct.uspto.gov web site. Should you have questions regarding access to the PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free)?
 7.	Any inquiry concerning this communication or earlier communications from the Examiner should be directed to Tadesse Hailu, whose telephone number is (571) 272-4051, e-mail address tadesse.hailu@uspto.gov, and the Fax number is (571) 273-4051. The Examiner can normally be reached on M-F from 10:30 – 7:00 ET. If attempts to reach the Examiner by telephone are unsuccessful, the Examiner’s supervisor, Kieu Vu, can be reached at (571) 272-4057 Art Unit 2173.