DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claim(s) 1-3, 9, 10, 12-14, and 16-20 are rejected under 35 U.S.C. 103 as being unpatentable over Linden, US 2019/0303724 A1 (Linden), and further in view of Narasimha-Iyer et al., US 2014/0240675 A1 (N-I).
Regarding claim 1, Linden teaches a method for training a gaze tracking model (method for generating 3D gaze predictions based on training a neural network) (Abstract), comprising: 
obtaining a training sample set (obtaining training images) ([0080]), the training sample set comprising multiple training sample pairs (the training images per user eye) ([0080]) (wherein the first and second training images can be input as a pair) ([0093]), each training sample pair (wherein the first and second training images can be input as a pair) ([0093]) comprising an eye sample image and a labeled gaze vector corresponding to the eye sample image (comprising an image of the eye and a 2D gaze vector per eye) ([0053-0054]); 
processing the eye sample images in the training sample set by using an initial gaze tracking model to obtain a predicted gaze vector of each eye sample image (wherein the neural network is used to obtain the predicted 2D gaze vector per user eye based on the training images) ([0080]); 
determining a model loss (determining a loss function for the neural network) ([0086-0087]) according to a distance between the predicted gaze vector and the labeled gaze vector for each eye sample image (determining a distance according to a predicted gaze vector, i.e. angle and distance, and the ground truth measurements) ([0086-0087]); and 
iteratively adjusting (wherein the training is iterative) ([0080]) one or more reference parameters of the initial gaze tracking model (update the parameters of the neural network) ([0086]) until the model loss meets a convergence condition (updating the parameters of the neural network such that its loss function is minimized) ([0086]), to obtain a target gaze tracking model (continuing to update the neural network parameters where minimizing the loss function includes minimizing the angle term and the distance term such that the predicted angle is as close as possible to the ground truth angle and the predicted distance is as close as possible to the ground truth distance) ([0086-0087]).  
However, Linden does not explicitly state that the distance is a “cosine” distance.
N-I teaches methods for improving gaze tracking (Abstract); and wherein comparing the vectors can be based on a cosine similarity measure ([0038]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Linden to include, as a distance, a cosine similarity measure since it allows for the best match to be determined very quickly so it does not add significantly to the procedure time (N-I; [0038]).

Regarding claim 2, Linden teaches wherein each training sample pair (the training images per user eye) ([0080]) (wherein the first and second training images can be input as a pair) ([0093]) further comprises labeled coordinates of an eyeball in the eye sample image (determining the known 2D coordinate system for the eyes which are projected into a 3D coordinate system) ([0048]); and 
the processing the eye sample images in the training sample set by using an initial gaze tracking model to obtain a predicted gaze vector of each eye sample image (wherein the neural network predicts a 2D gaze vector per eye) ([0025]) comprises: 
processing the eye sample image by using the initial gaze tracking model (wherein the neural network is used to obtain the predicted 2D gaze vector per user eye based on the training images) ([0080]), to obtain the predicted gaze vector of the eye sample image (wherein the neural network predicts a 2D gaze vector per eye) ([0025]) and predicted coordinates of the eyeball (determining the known 2D coordinate system for the eyes which are projected into a 3D coordinate system) ([0048]); and the method further comprises: 
determining the model loss determining a loss function for the neural network) ([0086-0087]) according to a distance between the predicted coordinates of the eyeball and the labeled coordinates of the eyeball (determining a distance according to a predicted gaze vector, i.e. angle and distance, and the ground truth measurements) ([0086-0087]).  
However, Linden does not explicitly state that the distance is a “Euclidean” distance.
N-I teaches methods for improving gaze tracking (Abstract); and wherein comparing the vectors can be based on a Euclidean distance function ([0038]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Linden to include, as a distance, a Euclidean distance function since it allows for the best match to be determined very quickly so it does not add significantly to the procedure time (N-I; [0038]).

Regarding claim 3, Linden teaches wherein the labeled gaze vector is a unit circle-based direction vector (wherein the ground truth vector is based on angle and distance) ([0086]) (and is 3D gaze information) ([0007]), and the method further comprises: normalizing the predicted gaze vector to obtain a normalized gaze vector (the calibration images are normalized to generate a normalized gaze vector) ([0025-0026] and [0075]); and the determining a model loss (determining a loss function for the neural network) ([0086-0087]) according to a distance between the predicted gaze vector and the labeled gaze vector for each eye sample image (determining a distance according to a predicted gaze vector, i.e. angle and distance, and the ground truth measurements) ([0086-0087]) comprises: determining the model loss according to a distance between the normalized gaze vector and the labeled gaze vector (determining the loss function based on the normalized gaze and the ground truth gaze) ([0075]).  
However, Linden does not explicitly state that the distance is a “cosine” distance.
N-I teaches methods for improving gaze tracking (Abstract); and wherein comparing the vectors can be based on a cosine similarity measure ([0038]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Linden to include, as a distance, a cosine similarity measure since it allows for the best match to be determined very quickly so it does not add significantly to the procedure time (N-I; [0038]).

Regarding claim 9, Linden teaches wherein the method further comprises: obtaining a target eye image (a camera captures a 2D image of a user’s gaze, such as by using the user’s eye) ([0025]); processing the target eye image by using the target gaze tracking model to determine a predicted gaze vector of the target eye image; and performing gaze tracking according to the predicted gaze vector (wherein the predictions and the trained neural network can be used for eye tracking systems; also referred to as gaze tracking systems) ([0026] and [0030]).  

Regarding claim 10, Linden teaches wherein the method further comprises: determining coordinates of an eyeball in the target eye image (determining the 2D coordinate system for the eyes which are projected into a 3D coordinate system) ([0048]); and performing, by using the coordinates of the eyeball as a gaze starting point (a two dimensional gaze origin of the eye of the user in the image) ([0007] and [0041]), gaze tracking (gaze tracking) ([0040-0042]) according to a direction indicated by the predicted gaze vector (predicted gaze vector from the origin according to a gaze direction) ([0053-0054]).  

Regarding claim 12, see the rejection made to claim 1, as well as Linden for a computer device (computer system) ([0090]), comprising a processor (a processor) ([0090]) and a memory (a non-transitory computer-readable medium) ([0090]), the memory being configured to store a plurality of program codes (a non-transitory computer-readable medium that stores computer-readable instructions) ([0090]) that, when executed by the processor (the instructions are executable by a processor) ([0090]), cause the computer device to perform a plurality of operations (causing the computer system to perform the specific operations) ([0090]), for they teach all the limitations within this claim.
Regarding claim 13, see the rejection made to claim 2, as well as Linden for a computer device (computer system) ([0090]), comprising a processor (a processor) ([0090]) and a memory (a non-transitory computer-readable medium) ([0090]), the memory being configured to store a plurality of program codes (a non-transitory computer-readable medium that stores computer-readable instructions) ([0090]) that, when executed by the processor (the instructions are executable by a processor) ([0090]), cause the computer device to perform a plurality of operations (causing the computer system to perform the specific operations) ([0090]), for they teach all the limitations within this claim.
Regarding claim 14, see the rejection made to claim 3, as well as Linden for a computer device (computer system) ([0090]), comprising a processor (a processor) ([0090]) and a memory (a non-transitory computer-readable medium) ([0090]), the memory being configured to store a plurality of program codes (a non-transitory computer-readable medium that stores computer-readable instructions) ([0090]) that, when executed by the processor (the instructions are executable by a processor) ([0090]), cause the computer device to perform a plurality of operations (causing the computer system to perform the specific operations) ([0090]), for they teach all the limitations within this claim.
Regarding claim 16, see the rejection made to claim 9, as well as Linden for a computer device (computer system) ([0090]), comprising a processor (a processor) ([0090]) and a memory (a non-transitory computer-readable medium) ([0090]), the memory being configured to store a plurality of program codes (a non-transitory computer-readable medium that stores computer-readable instructions) ([0090]) that, when executed by the processor (the instructions are executable by a processor) ([0090]), cause the computer device to perform a plurality of operations (causing the computer system to perform the specific operations) ([0090]), for they teach all the limitations within this claim.

Regarding claim 17, see the rejection made to claim 1, as well as Linden a non-transitory computer-readable storage medium (a non-transitory computer-readable medium) ([0090]), storing a plurality of program codes (a non-transitory computer-readable medium that stores computer-readable instructions) ([0090]) that, when executed by a processor (the instructions are executable by a processor) ([0090]) of a computer device (computer system) ([0090]), cause the computer device to perform a plurality of operations (causing the computer system to perform the specific operations) ([0090]), for they teach all the limitations within this claim.
Regarding claim 18, see the rejection made to claim 2, as well as Linden a non-transitory computer-readable storage medium (a non-transitory computer-readable medium) ([0090]), storing a plurality of program codes (a non-transitory computer-readable medium that stores computer-readable instructions) ([0090]) that, when executed by a processor (the instructions are executable by a processor) ([0090]) of a computer device (computer system) ([0090]), cause the computer device to perform a plurality of operations (causing the computer system to perform the specific operations) ([0090]), for they teach all the limitations within this claim.
Regarding claim 19, see the rejection made to claim 3, as well as Linden a non-transitory computer-readable storage medium (a non-transitory computer-readable medium) ([0090]), storing a plurality of program codes (a non-transitory computer-readable medium that stores computer-readable instructions) ([0090]) that, when executed by a processor (the instructions are executable by a processor) ([0090]) of a computer device (computer system) ([0090]), cause the computer device to perform a plurality of operations (causing the computer system to perform the specific operations) ([0090]), for they teach all the limitations within this claim.
Regarding claim 20, see the rejection made to claim 9, as well as Linden a non-transitory computer-readable storage medium (a non-transitory computer-readable medium) ([0090]), storing a plurality of program codes (a non-transitory computer-readable medium that stores computer-readable instructions) ([0090]) that, when executed by a processor (the instructions are executable by a processor) ([0090]) of a computer device (computer system) ([0090]), cause the computer device to perform a plurality of operations (causing the computer system to perform the specific operations) ([0090]), for they teach all the limitations within this claim.

Claim(s) 4-8 and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Linden, US 2019/0303724 A1 (Linden), Narasimha-Iyer et al., US 2014/0240675 A1 (N-I), and further in view of Liu et al., US 2020/0202128 A1 (Liu).
Regarding claim 4, Linden teaches wherein the processing the eye sample images in the training sample set by using an initial gaze tracking model to obtain a predicted gaze vector of each eye sample image (wherein the neural network is used to obtain the predicted 2D gaze vector per user eye based on the training images) ([0080]) comprises: 
flipping a first eye sample image in the training sample set into a second eye sample image (mirroring either the left eye or the right eye) ([0052] and [0066]), and correspondingly flipping a labeled gaze vector corresponding to the first eye sample image (wherein when the image is mirrored so is the gaze vector) ([0052-0053] and [0066]), the second eye sample image being an image of an eye in a target direction (the mirrored image being in a specific direction/alignment) ([0052] and [0066]), the initial gaze tracking model being configured to process the image of the eye in the target direction (the neural network using the orientation of the eye in the target alignment) ([0052] and [0066]), the second eye sample image being a left eye sample image when the first eye sample image is a right eye sample image, and the second eye sample image being a right eye sample image when the first eye sample image is a left eye sample image (if the first eye is the left eye than the mirrored image is the right eye and vice versa) ([0052] and [0066]); and performing wrapping on each eye sample image, to obtain a standard image (the 2D image is normalized to generate warped images centered around the user’s eye at a high resolution) ([0025]).
N-I teaches methods for improving gaze tracking (Abstract); and wherein mapping the standard image in the initial gaze tracking model to obtain a predicted gaze vector of the standard image (the subject’s eye can be compared to the reference eyes, and the mapping function for one or more reference matches can be used to estimate the subject’s gaze direction; using a vector) (Abstract and [0005]).
However, neither explicitly teaches “performing at least one type of processing on the sample image, the at least one type of processing comprising: affine transformation, white balance, auto contrast, or Gaussian blur; and “using inverted residual blocks”.
Liu teaches a method for computing a dominant class of a scene (Abstract); wherein performing at least one type of processing on the sample image, the at least one type of processing comprising: affine transformation, white balance, auto contrast, or Gaussian blur (wherein the digital camera used to take the image can include processing the white balance based on the adjustments of the capture settings) ([0049-0050]); and wherein inverted residual blocks can be used within the neural network ([0060] and [0101]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the combination of prior arts to include processing and using inverted residual blocks since they extract features from the input image while being able to operate efficiently on the constraint computational resources that could be available such as on a mobile device (Liu; [0060]).

Regarding claim 5, Linden teaches wherein the method further comprises: flipping, when the standard image is obtained from the first eye sample image (wherein mirroring when the image is obtained from a first unmirrored eye warped image) ([0052]), the predicted gaze vector of the standard image back to a space corresponding to the first eye sample image (wherein by mirroring the image the predicted gaze is then overlaid) ([0066-0067]).  

Regarding claim 6, Linden teaches wherein the eye in the target direction is a left eye (detecting the gaze direction for the left eye) ([0033] and [0066-0067]), and the method further comprises: obtaining a first horizontal coordinate value in a predicted gaze vector of the left eye and a second horizontal coordinate value in a predicted gaze vector of a right eye (detecting the gaze direction and angle generating a vector for each eye’s gaze) ([0033] and [0065-0067]), the left eye and the right eye belonging to the same user object (wherein the left and right eyes are of the user) ([0047-0049]); and correcting the first horizontal coordinate value and the second horizontal coordinate value when the first horizontal coordinate value represents that the left eye looks to the left and the second horizontal coordinate value represents that the right eye looks to the right (wherein the neural network can correctly find the angles and distances of the gaze; and correcting based on minimizing the loss function) ([0027]).  

Regarding claim 7, Linden teaches wherein the correcting the first horizontal coordinate value and the second horizontal coordinate value (updating the calibration parameters that minimize the loss based on distance and angle of the gaze) ([0075]) comprises: determining an average value of a horizontal coordinate of the left eye and a horizontal coordinate of the right eye according to the first horizontal coordinate value and the second horizontal coordinate value (wherein the gaze directions from the left eye and from the right eye may then be combined to form a combined estimated direction) ([0033]); adjusting the predicted gaze vector of the right eye and the predicted gaze vector of the left eye to be parallel to each other (making sure the image around the x-axis is in the horizontal position; such that the vector between the users eyes is horizontal to each other) ([0050-0051]), the horizontal coordinate of the right eye after the adjustment being a third horizontal coordinate value (adjusting the calibration parameters that have to do with the gaze; such as distance and angle) ([0074-0075]); and determining a fourth horizontal coordinate value of the horizontal coordinate of the right eye according to the average value and the third horizontal coordinate value (updating the angle of the gaze for the right eye based on the new calibration parameters; while the loss function is minimized) ([0075]).  

Regarding claim 8, Liu teaches wherein a number of the inverted residual blocks is less than 19 (wherein the number of blocks is less than 19) (Fig. 8, item 822; [0101]).

Regarding claim 15, see the rejection made to claim 4, as well as Linden for a computer device (computer system) ([0090]), comprising a processor (a processor) ([0090]) and a memory (a non-transitory computer-readable medium) ([0090]), the memory being configured to store a plurality of program codes (a non-transitory computer-readable medium that stores computer-readable instructions) ([0090]) that, when executed by the processor (the instructions are executable by a processor) ([0090]), cause the computer device to perform a plurality of operations (causing the computer system to perform the specific operations) ([0090]), for they teach all the limitations within this claim.

Claim(s) 11 is rejected under 35 U.S.C. 103 as being unpatentable over Linden, US 2019/0303724 A1 (Linden), Narasimha-Iyer et al., US 2014/0240675 A1 (N-I), and further in view of Young et al., US 2019/0354174 A1 (Young).
Regarding claim 11, Linden teaches wherein the method further comprises: after processing the target eye image by using the target gaze tracking model to determine a predicted gaze vector of the target eye image (wherein the neural network predicts a 2D gaze vector per eye) ([0025]). N-I teaches methods for improving gaze tracking (Abstract); and wherein mapping the standard image in the initial gaze tracking model to obtain a predicted gaze vector of the standard image (the subject’s eye can be compared to the reference eyes, and the mapping function for one or more reference matches can be used to estimate the subject’s gaze direction; using a vector) (Abstract and [0005]).
However, neither explicitly teaches “determining, when the target eye image belongs to a video frame in a video stream, a reference eye image corresponding to the target eye image, the reference eye image and the target eye image being images in consecutive video frames in the video stream; and performing smoothing on the predicted gaze vector corresponding to the target eye image according to a predicted gaze vector corresponding to the reference eye image”.
Young teaches obtaining gaze tracking information (Abstract); wherein determining, when the target eye image belongs to a video frame in a video stream (detecting that the gaze is within frames of a video) ([0185]), a reference eye image corresponding to the target eye image, the reference eye image and the target eye image being images in consecutive video frames in the video stream (the gaze tracking can determine that the eyes of the person were in the previous frame and the first frame) ([0185]); and performing smoothing on the predicted gaze vector corresponding to the target eye image according to a predicted gaze vector corresponding to the reference eye image (it is possible to delay detection and use previous position and the next position to obtain a smoother estimate of velocity; smooth pursuit for tracking) ([0076-0077]) (gaze tracking) ([0170]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed language to modify the combination of prior arts to include detecting eyes in video for more accuracy (Young; [0167]).

Contact
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHAEL J VANCHY JR whose telephone number is (571)270-1193. The examiner can normally be reached Monday - Friday 9am - 5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Emily Terrell can be reached on (571) 270-3717. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/MICHAEL J VANCHY  JR/Primary Examiner, Art Unit 2666                                                                                                                                                                                                        Michael.Vanchy@uspto.gov