DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

In the response to this Office Action, the Examiner respectfully requests that support be shown for language added to any original claims on amendment and any new claims. That is, indicate support for newly added claim language by specifically pointing to page(s) and line numbers in the specification and/or drawing figure(s). This will assist the Examiner in prosecuting this application.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.

Claims 1-19 and 26 are rejected under 35 U.S.C. 103 as being unpatentable over Non-Patent Application Publication “Eye Tracking for Everyone” by Krafka et al. (hereinafter "Krafka") in view of U.S. Patent Application Publication 2014/0002349 A1 to Hansen (hereinafter "Hansen").
 Claim 1, Krafka teaches a method of camera-based gaze tracking, the method comprising: at a device with one or more processors and a computer-readable storage medium:
receiving a stream of pixel events output by a camera (Abstract; Fig. 5; Sections 3.1-4-2 of Krafka; eye tracking software that works on commodity hardware such as mobile phones and tablets), the camera comprising a plurality of pixel sensors positioned to receive light from a surface of an eye, each respective pixel event generated in response to a respective pixel sensor detecting a change in light intensity of the light at a respective camera pixel that exceeds a comparator threshold (Abstract; Fig. 5; Sections 3.1-4-2 of Krafka; eye tracking software that works on commodity hardware such as mobile phones and tablets… GazeCapture application could involve showing workers dots on a screen at random locations and recording their gaze using the front-facing camera); deriving an image from the stream of pixel events, wherein deriving the image comprises accumulating pixel events of the stream of pixel events for multiple event camera pixels; generating a gaze characteristic using a neural network, wherein generating the gaze characteristic comprises providing the image as input to the neural network, the neural network trained to determine the gaze characteristic using a training dataset of training images that identify the gaze characteristic; and tracking a gaze of the eye based on the gaze characteristic generated using the neural network (Abstract; Sections 3.1-4-2 of Krafka; estimate head pose, h, and gaze direction, g… eye tracking CNN… Inputs include left eye, right eye, and face images detected and cropped from the original frame (all of size 224×224)… even though the face already contains them) to provide the network with a higher resolution image of the eye to allow it to identify subtle changes).
Krafka does not explicitly disclose an event camera.
Hansen teaches an event camera (Para. 12, 19 of Hansen; Dynamic Vision Sensor is used, a sequence of (2D) events representing changes in intensity is output).
Therefore, at the time when the invention was filed, it would have been obvious to a person of ordinary skill in the art to include an event camera using the teachings of Hansen in order to modify the device taught by Krafka. The motivation to combine these analogous arts would have been to advantageously provide an eye tracking system can hence keep track of the gaze even when the image is distorted with reflections from inferior light sources (Para. 11-14 of Hansen).

Regarding Claim 2, the combination of Krafka and Hansen teaches that the neural network is a convolutional neural network (CNN) (Fig. 5; Section 4.1 of Krafka).

Regarding Claim 3, the combination of Krafka and Hansen teaches that generating the gaze characteristic using the neural network comprises: down-sampling the image to produce a lower resolution image, the lower resolution image have a lower resolution than the image; generating an initial pupil characteristic using a first neural network, wherein generating the initial pupil characteristic comprises providing the lower resolution image as input to the first neural network; determining a portion of the image based on the initial pupil characteristic, the portion of the image and the image having a same resolution and the portion of the image having fewer pixels than the image; generating a correction of the pupil characteristic using a second neural network, wherein generating the correction comprises providing the portion of the image as input to the second neural network; and determining the gaze characteristic by (Abstract; Sections 4.1-4-2 of Krafka; Inputs include left eye, right eye, and face images detected and cropped from the original frame (all of size 224×224)… even though the face already contains them) to provide the network with a higher resolution image of the eye to allow it to identify subtle changes).

Regarding Claim 4, the combination of Krafka and Hansen teaches that the gaze characteristic is indicative of a center of a pupil of the eye, a contour of the pupil of the eye, one or more glints generated using a light emitting diode (LED), a probability that each of the one or more glints is visible to the event camera, or a gaze direction (Figs. 1-2; Para. 5-6, 67-77 of Hansen).

Regarding Claim 5, the combination of Krafka and Hansen teaches that the neural network is a recurrent neural network trained to remember data from previously inputted images in identifying gaze characteristics for subsequently inputted images (Fig. 5; Section 4.1 of Krafka
NOTE – Utilizing a recurrent neural network to remember data from previously inputted images in identifying gaze characteristics for subsequently inputted images would only require routine skill for a person of ordinary skill in the art based on the combination of Krafka and Hansen. Therefore, one of ordinary skill in the art would have pursued having a recurrent neural network to remember data from previously inputted images in identifying gaze characteristics for subsequently inputted images with a reasonable expectation of success that would have yielded predictable results and can be accomplished without any undue experimentation in order to determine eye gaze direction with high accuracy and robustness to lighting conditions of the working environment).

Regarding Claim 6, the combination of Krafka and Hansen teaches that deriving the image from the stream of pixel events comprises deriving an intensity reconstruction image based on tracking pixel events of the stream of pixel events over time (Para. 19 of Hansen; Dynamic Vision Sensor is used, a sequence of (2D) events representing changes in intensity is output).

Regarding Claim 7, the combination of Krafka and Hansen teaches that deriving the image from the stream of pixel events comprises deriving a timestamp image, the timestamp image encoding an amount of time since an event occurred at each event camera pixel (Para. 19 of Hansen; Dynamic Vision Sensor is used, a sequence of (2D) events representing changes in intensity is output).

Regarding Claim 8, the combination of Krafka and Hansen teaches that deriving the image from the stream of pixel events comprises deriving a frequency response image indicative of glint events generated using a plurality of frequency-modulated light emitting diodes (LEDs) (Para. 19 of Hansen; Dynamic Vision Sensor is used, a sequence of (2D) events representing changes in intensity is output
NOTE – Utilizing a plurality of frequency-modulated light emitting diodes (LEDs) would only require routine skill for a person of ordinary skill in the art based on the combination of Krafka and Hansen. Therefore, one of ordinary skill in the art would have pursued deriving the image from the stream of pixel events comprises deriving a frequency response image indicative of glint events generated using a plurality of frequency-modulated light emitting diodes (LEDs) with a reasonable expectation of success that would have yielded predictable results and can be accomplished without any undue experimentation in order to determine eye gaze direction with high accuracy and robustness to lighting conditions of the working environment).

Regarding Claim 9, the combination of Krafka and Hansen teaches that deriving the image from the stream of pixel events comprises: distinguishing pupil events and glint events in the event stream based on a frequency associated with the pixel events in the stream of pixel events; deriving an intensity reconstruction image or a timestamp image based on the pupil events; and deriving a frequency response image based on the glint events, the frequency response image indicative of glint events generated using a plurality of frequency-modulated light emitting diodes (LEDs) (Para. 19 of Hansen; Dynamic Vision Sensor is used, a sequence of (2D) events representing changes in intensity is output
NOTE – Utilizing a plurality of frequency-modulated light emitting diodes (LEDs) would only require routine skill for a person of ordinary skill in the art based on the combination of Krafka and Hansen. Therefore, one of ordinary skill in the art would have pursued deriving a frequency response image based on the glint events, the frequency response image indicative of glint events generated using a plurality of frequency-modulated light emitting diodes (LEDs) with a reasonable expectation of success that would have yielded predictable results and can be accomplished without any undue experimentation in order to determine eye gaze direction with high accuracy and robustness to lighting conditions of the working environment).

Regarding Claim 10, the combination of Krafka and Hansen teaches that generating the gaze characteristic using the neural network comprises providing as input to the neural network: an intensity reconstruction image; a timestamp image; or a frequency response image derived from the stream of pixel events (Para. 19 of Hansen; Dynamic Vision Sensor is used, a sequence of (2D) events representing changes in intensity is output).

Regarding Claim 11, the combination of Krafka and Hansen teaches that tracking the gaze of the eye comprises updating the gaze characteristic in real time as subsequent pixel events in the event stream are used to derive additional images and the additional images are used as input to the neural network to generate updated gaze characteristics (Abstract; Sections 4.1-4-2 of Krafka; iTracker, a convolutional neural network for eye tracking, which achieves a significant reduction in error over previous approaches while running in real time (10–15fps) on a modern mobile device).

Claim 12, the combination of Krafka and Hansen teaches identifying an item displayed on a display based on the gaze characteristic or the updating of the gaze characteristic (Fig. 2; Section 3.1 of Krafka).

Regarding Claim 13, the combination of Krafka and Hansen teaches displaying movement of a graphical indicator on a display based on the gaze characteristic or the updating of the gaze characteristic (Fig. 2; Section 3.1 of Krafka).

Regarding Claim 14, the combination of Krafka and Hansen teaches selecting an item displayed on a display based on the gaze characteristic or the updating of the gaze characteristic (Fig. 2; Section 3.1 of Krafka).

Regarding Claim 15, Krafka teaches a system comprising: a non-transitory computer-readable storage medium; and one or more processors coupled to the non-transitory computer-readable storage medium, wherein the non-transitory computer-readable storage medium comprises program instructions that, when executed on the one or more processors, cause the system to perform operations comprising: generating a stream of pixel events at a camera (Abstract; Fig. 5; Sections 3.1-4-2 of Krafka; eye tracking software that works on commodity hardware such as mobile phones and tablets), the camera comprising a plurality of pixel sensors positioned to receive light from a surface of an eye, each respective pixel event generated in response to a respective pixel sensor detecting a change in light intensity of the light at a respective camera pixel that exceeds a comparator threshold (Abstract; Fig. 5; Sections 3.1-4-2 of Krafka; eye tracking software that works on commodity hardware such as mobile phones and tablets… GazeCapture application could involve showing workers dots on a screen at random locations and recording their gaze using the front-facing camera); deriving an image from the stream of pixel events, the image derived by accumulating pixel events of the stream of pixel events for multiple camera pixels; down-sampling the image to produce a lower resolution image, the lower resolution image having a lower resolution than the image;
identifying an initial pupil center using the lower resolution image as input to a first neural network; determining a portion of the image based on the initial pupil center, the portion of the image and the image having a same resolution and the portion of the image having fewer pixels than the image; generating a correction to the pupil center using the portion of the image as input to a second neural network; and determining a final pupil center by adjusting the initial pupil characteristic using the correction to the pupil center (Abstract; Sections 3.1-4-2 of Krafka; estimate head pose, h, and gaze direction, g… eye tracking CNN… Inputs include left eye, right eye, and face images detected and cropped from the original frame (all of size 224×224)… even though the face already contains them) to provide the network with a higher resolution image of the eye to allow it to identify subtle changes).
Krafka does not explicitly disclose an event camera.
However, Hansen teaches an event camera (Para. 12, 19 of Hansen; Dynamic Vision Sensor is used, a sequence of (2D) events representing changes in intensity is output).
Therefore, at the time when the invention was filed, it would have been obvious to a person of ordinary skill in the art to include an event camera using the teachings of Hansen in order to modify the device taught by Krafka. The motivation to combine these analogous arts (Para. 11-14 of Hansen).

Regarding Claim 16, the combination of Krafka and Hansen teaches that the first neural network and the second neural networks are convolutional neural networks (CNNs) (Fig. 5; Section 4.1 of Krafka).

Regarding Claim 17, the combination of Krafka and Hansen teaches that the first neural network is configured to: solve a first regression problem to identify the initial pupil center; solve a second regression problem to identify one or more glints; and perform a classification to determine whether each of the one or more glints is likely to be visible or not visible (Fig. 5; Sections 4.1-4.2, 5.3-5.4 of Krafka
Figs. 1-2; Para. 5-6, 67-77 of Hansen).

Regarding Claim 18, the combination of Krafka and Hansen teaches that the first neural network and the second neural networks are recurrent neural networks trained to remember data from previously inputted images (Fig. 5; Section 4.1 of Krafka
NOTE – Utilizing a recurrent neural network to remember data from previously inputted images in identifying gaze characteristics for subsequently inputted images would only require routine skill for a person of ordinary skill in the art based on the combination of Krafka and Hansen. Therefore, one of ordinary skill in the art would have pursued having the first neural network and the second neural networks to be recurrent neural networks trained to remember data from previously inputted images with a reasonable expectation of success that would have yielded predictable results and can be accomplished without any undue experimentation in order to determine eye gaze direction with high accuracy and robustness to lighting conditions of the working environment).

Regarding Claim 19, the combination of Krafka and Hansen teaches that deriving the image from the stream of pixel events comprises: deriving an intensity reconstruction image based on tracking pixel events of the stream of pixel events over time; and deriving a timestamp image, the timestamp image encoding an amount of time since an event occurred at each event camera pixel (Para. 19 of Hansen; Dynamic Vision Sensor is used, a sequence of (2D) events representing changes in intensity is output).

Regarding Claim 26, Krafka teaches a non-transitory computer-readable storage medium, storing program instructions computer-executable on a computer to perform operations comprising: receiving a stream of pixel events output by a camera (Abstract; Fig. 5; Sections 3.1-4-2 of Krafka; eye tracking software that works on commodity hardware such as mobile phones and tablets), the camera comprising a plurality of pixel sensors positioned to receive light from a surface of an eye, each respective pixel event generated in response to a respective pixel sensor detecting a change in light intensity of the light at a respective camera pixel that exceeds a comparator threshold (Abstract; Fig. 5; Sections 3.1-4-2 of Krafka; eye tracking software that works on commodity hardware such as mobile phones and tablets… GazeCapture application could involve showing workers dots on a screen at random locations and recording their gaze using the front-facing camera); deriving an image from the stream of pixel events, wherein deriving the image comprises accumulating pixel events of the stream of pixel events for multiple camera pixels; and generating a gaze characteristic using a neural network, wherein generating the gaze characteristic comprises providing the image as input to the neural network, the neural network trained to determine the gaze characteristic using a training dataset of training images that identify the gaze characteristic; and tracking a gaze of the eye based on the gaze characteristic generated using the neural network (Abstract; Sections 3.1-4-2 of Krafka; estimate head pose, h, and gaze direction, g… eye tracking CNN… Inputs include left eye, right eye, and face images detected and cropped from the original frame (all of size 224×224)… even though the face already contains them) to provide the network with a higher resolution image of the eye to allow it to identify subtle changes).
Krafka does not explicitly disclose an event camera; distinguishing pupil events and glint events in the event stream based on a frequency associated with the pixel events in the stream of pixel events; deriving an image based on the pupil events, wherein the image is an intensity reconstruction image or a timestamp image.
However, Hansen teaches an event camera (Para. 12, 19 of Hansen; Dynamic Vision Sensor is used, a sequence of (2D) events representing changes in intensity is output); distinguishing pupil events and glint events in an event stream based on a frequency associated with pixel events in the stream of pixel events; deriving an image based on the pupil events, wherein the image is an intensity reconstruction image or a timestamp image (Figs. 1-2; Para. 12, 19, 39, 53, 67-85 of Hansen; Dynamic Vision Sensor is used, a sequence of (2D) events representing changes in intensity is output… processing of the image comprises identifying in the image a field where at least one of the user's eye pupils is present; where the field is defined to have a smaller size than the image; and where processing to compute an image feature such as a position or circumference of the user's pupil in the image is confined to the field… image extract 203 shows a section of the eyeball where glints from various light sources are shown. The glints are shown as circles, but are of course of any shape depending on the light sources. The glints are designated g1, g2, . . . g9. Eye features, such as the eye pupil, are designated reference numeral 201 and the fovea reference numeral 202).
Therefore, at the time when the invention was filed, it would have been obvious to a person of ordinary skill in the art to include an event camera; distinguishing pupil events and glint events in the event stream based on a frequency associated with the pixel events in the stream of pixel events; deriving an image based on the pupil events, wherein the image is an intensity reconstruction image or a timestamp image using the teachings of Hansen in order to modify the device taught by Krafka. The motivation to combine these analogous arts would have been to advantageously provide an eye tracking system can hence keep track of the gaze even when the image is distorted with reflections from inferior light sources (Para. 11-14 of Hansen).

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ABHISHEK SARMA whose telephone number is (571)272-9887.  The examiner can normally be reached on Mon - Fri 8:00-5:00.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Alexander Eisen can be reached on 571-272-7687.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished 



/ABHISHEK SARMA/

Primary Examiner, Art Unit 2622