Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

DETAILED ACTION

Status of Claims
Claims 1-4, 6-13 and 15-20 are currently pending in this application.
Claims 5 and 14 have been canceled.

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 5/31/2022 is hereby acknowledged.  All references have been considered by the examiner. Initialed copies of the PTO-1449 are included in this correspondence.

Response to Amendments
The applicant amended independent claims 1, 9 and 18 with features canceled claims 5 (14).  The claims now recite features similar to: 
“capturing a video stream comprising images of multiple users of the computing system”; 
“receive the images of the multiple users from the camera”; 
“detect face regions of each of the multiple users within the images”; 
“detect facial feature regions of each of the multiple users within the images based on the detected face regions”;
“analyze the detected facial feature regions of the multiple users to determine which one of the multiple users is a current presenter”; 
“determine whether the images represent a complete disengagement of the current presenter from the computing system based on the detected facial feature regions of the current presenter”; 
“if the images do not represent the complete disengagement of the current presenter user from the computing system, detect an eye region of the current presenter user within the images based on the detected facial feature regions”; 
“compute a desired eye gaze direction of the current presenter based on the detected eye region”; 
“generate gaze-adjusted images based on the desired eye gaze direction of the current presenter, wherein …”

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.


Claims 1, 3-4, 6-13, 15-18 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Nilsson et al. (2016/0323541; IDS) in view of Novelli et al. (2021/0201021).

Regarding claim 1, Nilsson teaches a computing system (e.g., FIG. 1 shows a communication system 100, which comprises a network 116, a user device 104 accessible to a user 102 (near-end user), and another user device 120 accessible to another user 118 (far-end user). Nilsson: [0019] L.1-4. The user device 104 comprises a processor 108, Nilsson: [0020] L.1), comprising:
a camera for capturing a video stream comprising images of multiple users of the computing system (e.g., the user device 104 is connected to the network 116 - computer storage in the form of a memory 110, a display 106 in the form of a screen, a camera 124 and (in some embodiments) a depth sensor 126. Nilsson: [0020] L.4-8. The camera 124 has an image capture component that faces outwardly of the display. That is, the camera 124 is located relative to the display so that when the user 102 is in front of and looking at the display, the camera 126 captures a frontal view of the user's face. Nilsson: [0021] L.5-10. See 1_1 below); and 
a processor for executing computer-executable instructions (e.g., the user devices 104, 120 may also include an entity (e.g. software) that causes hardware of the devices to perform operations, e.g., processors functional blocks, and so on. Nilsson: [0118] L.1-4) that cause the processor to: 
receive the images of the multiple users from the camera (e.g., Video to be transmitted to the far-end device 102 (near-end video) is received (locally) by the gaze correction system 201 from the camera 124; Nilsson: [0027] L.1-3.  See 1_1 below); 
detect face regions of each the multiple users within the images (e.g., The facial tracker 208 tracks the user's face, and the modification of the received video by the eye replacement module 202 is based on the tracking of the user's face by the facial tracker 208. Nilsson: [0027] L.13-16.  See 1_1 below); 
detect facial feature regions of each of the multiple users within the images based on the detected face regions (e.g., the tracking of the user's face by the facial tracker 208 indicates a location(s) corresponding to the user's eyes in a to-be modified frame and a replacement eye image(s) is inserted at a matching location(s). Nilsson: [0027] L.16-20.  See 1_1 below); 
analyze the detected facial feature regions of the multiple users to determine which one of the multiple users is a current presenter (e.g., a user device 104 accessible to a user 102 (near-end user), and another user device 120 accessible to another user 118 (far-end user). Nilsson: [0019] L.2-4. See 1_2 below);
determine whether the images represent a complete disengagement of the current presenter from the computing system based on the detected facial features regions (e.g., The modification is selective i.e. frames of the received video are modified when and only when eye gaze correction is considered appropriate. Nilsson: [0028] L.1-3. The controller 246 also temporarily halts the eye gaze correction module 202 if the eye locations indicated by the model M differ too much from the currently tracked eye locations indicated by the eye tracker 248. Nilsson: [0050] L.15-19. More particularly, the tracking module 208 is only able to function properly when each of one or more of the user's pose coordinates (r, α)=(x, y, z, P, R, Y) has a respective current value that is within a respective range of possible values. Should any of those coordinate(s) move out of its respective range of possible values, the tracker fails and the model M therefore becomes unavailable to the other functional modules. It can only re-enter the active tracking mode, so that the model once again becomes available to the other functional modules, when every one of those coordinate(s) has returned to a value within its respective range of possible values. Nilsson: [0053].  When the user’s pose coordinates move out of the respective range of possible values and the model M becomes unavailable, the out of range of possible values (unavailability of model M) is interpreted as disengagement of the user(s) 102/118 from the system.  As users 102/118 are persons to communicate (call participants) through the device 104 and 120 respectively, each of the user(s) 102/118 is taken as a current presenter to the devices 104 and 120 respectively); 
if the images do not represent the complete disengagement of the current presenter from the computing system (e.g., e.g., The modification is selective i.e. frames of the received video are modified when and only when eye gaze correction is considered appropriate. Nilsson: [0028] L.1-3.  Therefore, when the eye gaze correction is considered appropriate, the received frames of the video are modified for users 102, 118), detect an eye region of the current presenter within the images based on the detected facial feature regions (e.g., the tracking of the user's (of 102, 118) face by the facial tracker 208 indicates a location(s) corresponding to the user's eyes in a to-be modified frame and a replacement eye image(s) is inserted at a matching location(s). Nilsson: [0027] L.16-20);
compute a desired eye gaze direction of the current presenter based on the detected eye region (e.g., The eye gaze correction module 202 modifies the (locally) received video to replace the eyes of the user 102 with an image of eyes looking at the camera. The replacement eye images come from “templates” Ts, which are stored in the memory 110. The facial tracker 208 tracks the user's face, and the modification of the received video by the eye replacement module 202 is based on the tracking of the user's face by the facial tracker 208. Nilsson: [0027] L.8-16. The gaze corrector 202 receives a pair of templates (template pair) T selected for the current frame by the template selection module 204. A template pair T in the context of the described embodiments means a set of left and right templates {tl, tr} which can be used to replace the user's (102, 118) left and right eyes respectively, and which in this example comprise images of the user's left and right eyes respectively looking directly at the camera. Nilsson: [0043] L.1-8); 
generate gaze-adjusted images based on the desired eye gaze direction of the current presenter (e.g., The selectively modified video is outputted by the gaze correction system 201 as an outgoing video feed. Nilsson: [0029] L.1-2. If and only if the pose of the user's (102, 118) head is within a particular region of 3D space, and oriented towards the camera, then eye gaze correction is performed. Nilsson: [0051] L.3-6), wherein the gaze-adjusted images comprise at least one of a saccadic eye movement, a micro-saccadic eye movement, or a vergence eye movement (e.g., The capture process can be a “manual” process i.e. in which the user is asked to look directly at the camera, or automatic using a gaze estimation system. In the embodiments described herein, the templates Ts are parts of individual frames (template frames) of a template video that was captured with the camera 124 when the user was looking directly at it, and each template comprises an image of only a single eye (left or right). That is, the templates Ts are from temporally consecutive frames of the template video. The template video is short, e.g. having a duration of about 1 to 2 seconds. During this time, the user's eyes may exhibit one or more saccades. A saccade in this context is a very rapid, simultaneous movement between two (temporal) phases of fixation, in which the eyes are fixated on the camera 124. That is, a saccade is a very rapid movement away from then back to the camera 124. Note that the user is considered to be looking directly at the camera both during such phases of fixation and throughout any intervening saccades.  Nilsson: [0031] L.7-24); and
replace the images within the video stream with the gaze-adjusted images (e.g., The selectively modified video is outputted by the gaze correction system 201 as an outgoing video feed.  Nilsson: [0029] L.1-2).
While Nilsson does not explicitly teach, Novelli teaches:
(1_1). multiple users (e.g., e.g., FIG. 6 shows a gaze detection system 500 that is tracking multiple users and objects simultaneously, according to certain embodiments. The gaze detection system shows both a user 504 and bystander 604 scanning over various visual elements on display 520, according to certain embodiments. User 504 is shown holding an input device 610 (e.g., a remote control) with one or more input elements. The visual elements 522 may include, for example, active or passive windows, icons, media interfaces, and the like. In some instances, the field-of-view 530 of camera 512 may be wide enough to allow system 500 to perform gaze tracking on multiple people. Referring to FIG. 6, system 600 uses camera 512 to perform gaze tracking on both user 504 and bystander 604. In these instances, system 500 is tasked with determining which visual element(s) a user may want to interact with (e.g., execute, open a file, select an icon) and which tracked person is providing the confirmation input.  Novelli: [0082] and Fig. 6; reproduced below for reference.  

    PNG
    media_image1.png
    515
    680
    media_image1.png
    Greyscale

Therefore, the camera and gaze tracking of Nilsson is modified captures images and perform gaze tracking on multiple people);
(1_2).  analyze the detected facial feature regions of the multiple users to determine which one of the multiple users is a current presenter (e.g., User 504 is shown holding an input device 610 (e.g., a remote control) with one or more input elements. The visual elements 522 may include, for example, active or passive windows, icons, media interfaces, and the like. Novelli: [0082] L.6-9.  In certain embodiments, the observational data can include facial data corresponding to a relative distance that a face of the user (or multiple faces) is to the image sensor(s) (e.g., camera 512). In such cases, method 900 can further comprise confirming that the detected user is the user and not the bystander based on a detected face of a plurality of detected faces that is closest to the display or facing the display. In some embodiments, tracking a location of the user's face can be based on the observational data of the detected user and calculating and periodically updating a confidence score that indicates a likelihood that the detected user is the user and not the bystander based on the tracked location of the user's face. Novelli: [0104].  Therefore, besides the users 102 and 118 who are the persons in front of the devices 104 and 120 respectively.  User 503 is detected holding a remote control 610 and user closest to the display are taken as the person to communicate or a presenter and the gaze correction module 202 is applied to correct the gaze of the person to communicate (a presenter));
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Novelli into the teaching of Nilsson so that gaze of presenter (call participant) is corrected to be perceived as looking directly at the camera.

Regarding claim 3, the combined teaching of Nilsson and Novelli teaches the computing system of claim 1, wherein the computer-executable instructions further cause the processor to determine whether the images represent the complete disengagement of the current presenter from the computing system based on the detected facial features regions by: 
estimating a head pose of the current presenter based on the detected facial feature regions (e.g., The facial tracker 208 is a 3D mesh based face tracker, which gives 6-degree of freedom (DOF) output in 3D space: x, y, z, pitch (P), roll (R), and yaw (Y), which are six independent variables. These six degrees of freedom constitute what is referred to herein as a “pose space”. As illustrated in FIG. 3B, the x, y and z coordinates are (Cartesian) spatial coordinates, whereas pitch, roll and yaw are angular coordinates representing rotation about the x, z and y axes respectively. An angular coordinate means a coordinate defining an orientation of the user's (102 / 118) face.  Nilsson: [0034] L.1-10); 
estimating an orientation of the camera based on the detected facial feature regions (e.g., When operating in an active tracking mode, the tracker 208 uses the RGB (i.e. camera output only) or RGB and depth input (i.e. camera and depth sensor outputs) to generate a model M of the user's face. The model M indicates a current orientation and a current location of the user's face, and facial features of the user 102. Nilsson: [0035]); and 
determining whether the images represent the complete disengagement of the current presenter from the computing system based on the detected facial feature regions, the estimated head pose of the current presenter, and the estimated orientation of the camera (e.g., the user's (102 / 118) face has angular coordinates α=(P, R, Y) in this coordinate system (bold typeface denoting vectors), and the model M comprises current values of the angular coordinates α. The current values of the angular coordinates α represent the current orientation of the user's face relative to the camera 124.  Nilsson: [0036] L.1-6. The user's (102 / 118) face also has spatial coordinates r=(x, y, z), and the model M also comprises current values of the spatial coordinates in this example. These represent the current location in three-dimensional space of the user's (102 / 118) face relative to the camera 124. They can for example represent the location of a particular known reference point on or near the user's (102 / 118) face, such as a central point of their face or head, or point at or near which a particular facial, cranial or other head feature is located.  Nilsson: [0037].  The spatial and angular coordinates of the user's (102 / 118) face (r, α)=(x, y, z, P, R, Y) constitute what is referred to herein as a pose of the user, the user's (102 / 118) current pose being represented by the current values of (r, α).  Nilsson: [0038]. As the user's (102 / 118) current pose (r, α) is computed relative to the camera 124 by the tracker 208, it is possible to place limits—denoted Δ herein and in the figures—on these values within which accurate gaze correction can be performed. As long as the tracked pose remains within these limits Δ, the gaze correction module 202 remains active and outputs its result as the new RGB video formed of the modified frames F′ (subject to any internal activation/deactivation within the eye gaze correction module 202, e.g. as triggered by blink detection). Conversely, if the tracked pose is not within the defined limits Δ then original video is supplied for compression and transmission unmodified.  Nilsson: [0057]).

Regarding claim 4, the combined teaching of Nilsson and Novelli teaches the computing system of claim 3, wherein the computer-executable instructions further cause the processor to compute the desired eye gaze direction of the current presenter based on the detected eye region, the estimated head pose of the current presenter, and the estimated orientation of the camera (e.g., The model M generated by the facial tracker 208 is used upon initialisation of the eye gaze correction module 202, and in particular by the eye tracker 248 to determine (at least approximate) current locations of the user's (102 / 118) eyes. Thereafter, the model co-ordinates are not used to locate the eyes, until a re-initialisation occurs, as using the model co-ordinates would alone led to obvious jittering of the eye over time. Rather, after initialisation, the eyes are tracked separately in the live video over scale, location and rotation by the eye tracker 248 for example based on image recognition. The templates are transformed based on this tracking by the eye tracker 248, to match the current tracked orientation and scale of the user's (102 / 118) eyes. The mixing function is also computed based on this tracking by the eye tracker 248 so that the correct part of the frame F, i.e. in which the applicable eye is present, is replaced.  Nilsson: [0047].  The model M indicates a current orientation and a current location of the user's face, and facial features of the user 102 (118).  Nilsson: [0035] L.4-6. The user's (102 / 118) face has angular coordinates α=(P, R, Y) in this coordinate system (bold typeface denoting vectors), and the model M comprises current values of the angular coordinates α. The current values of the angular coordinates α represent the current orientation of the user's (102 / 118) face relative to the camera 124. Nilsson: [0036] L.1-6. The user's (102 / 118) face also has spatial coordinates r=(x, y, z), and the model M also comprises current values of the spatial coordinates in this example. These represent the current location in three-dimensional space of the user's (102 / 118) face relative to the camera 124. Nilsson: [0037] L.1-5. The eye gaze correction module 202 modifies the (locally) received video to replace the eyes of the user 102 (118) with an image of eyes looking at the camera. The replacement eye images come from “templates” Ts, which are stored in the memory 110. The facial tracker 208 tracks the user's face, and the modification of the received video by the eye replacement module 202 is based on the tracking of the user's face by the facial tracker 208. Nilsson: [0027] L.8-16.  The replacement eye images are taken as the desired eye image (eye gaze)). 

Regarding claim 6, the combined teaching of Nilsson and Novelli teaches the computing system of claim 1, wherein the computer-executable instructions further cause the processor to:
automatically monitor whether a user-selectable on/off mechanism is moved to an "on" position or an "off" position (e.g., Should any of those coordinate(s) move out of its respective range of possible values, the tracker fails and the model M therefore becomes unavailable to the other functional modules. Nilsson: [0053] L.5-8.  It is obvious that the availability of model M is similar to turning on/off of model M and the monitoring of coordinate is similar to enable/disable of a manual switch so that the user conveniently control the feature); and 
prevent the replacement of the images within the video stream with the gaze-adjusted images when the user-selectable on/off mechanism is moved to the "off" position (It is obvious that the unavailability of model M to other functional modules, that is, the tracked pose is not within the defined limits Δ then original video is supplied for compression and transmission unmodified.  Nilsson: [0057] L.10-12.  Therefore, the manual switch controls the modification (gaze correction, Nilsson: [0057] L.4; eye replacement; Nilsson: [0061] L.7) of video for compression and transmission).

Regarding claim 7, the combined teaching of Nilsson and Novelli teaches the computing system of claim 1, wherein the computer-executable instructions further cause the processor to compute the desired eye gaze direction of the current presenter by:
computing the desired eye gaze direction of the current presenter such that an eye gaze of the current presenter is directed towards the camera (e.g., The eye gaze correction module is configured to modify every frame of at least one continuous interval of the video to replace each of the user's (102 / 118) eyes with that of a respective template selected for that frame, whereby the user (102 / 118) is perceived to be looking directly at the camera in the modified frames.  Nilsson: [0005] L.9-14. The video is modified to replace the user's (102 / 118) eyes as they appear therein with those of a pre-recorded image of their eyes that have the desired eye gaze. Another person viewing the modified video will thus perceive the user (102 / 118) to be making eye contact with them. In the context of a video call, the perceived eye contact encourages the call participants to better engage with one another.  Nilsson: [0017] L.4-11. The eye gaze correction module 202 modifies the (locally) received video to replace the eyes of the user 102 with an image of eyes looking at the camera. The replacement eye images come from “templates” Ts, which are stored in the memory 110. The facial tracker 208 tracks the user's face, and the modification of the received video by the eye replacement module 202 is based on the tracking of the user's face by the facial tracker 208. Nilsson: [0027] L.8-16. The facial tracker 208 is a 3D mesh based face tracker, which gives 6-degree of freedom (DOF) output in 3D space: x, y, z, pitch (P), roll (R), and yaw (Y), which are six independent variables. These six degrees of freedom constitute what is referred to herein as a “pose space”. Nilsson: [0034] L.1-5); or 
computing the desired eye gaze direction of the current presenter such that the eye gaze of the current presenter is directed towards a focal point of interest that is located on a display device of the computing system.

Regarding claim 8, the combined teaching of Nilsson and Novelli teaches the computing system of claim 1, wherein the computer-executable instructions further cause the processor to generate the gaze-adjusted images based on the desired eye gaze direction of the current presenter by:
analyzing the images to determine at least one of a saccadic eye movement, a micro-saccadic eye movement, or a vergence eye movement of the current presenter within the images (e.g., The eye gaze correction module 202 modifies the (locally) received video to replace the eyes of the user 102 (118) with an image of eyes looking at the camera. The replacement eye images come from “templates” Ts, which are stored in the memory 110.  Nilsson: [0027] L.8-12. In the embodiments described herein, the templates Ts are parts of individual frames (template frames) of a template video that was captured with the camera 124 when the user was looking directly at it, and each template comprises an image of only a single eye (left or right). That is, the templates Ts are from temporally consecutive frames of the template video. The template video is short, e.g. having a duration of about 1 to 2 seconds. During this time, the user's eyes may exhibit one or more saccades. A saccade in this context is a very rapid, simultaneous movement between two (temporal) phases of fixation, in which the eyes are fixated on the camera 124. That is, a saccade is a very rapid movement away from then back to the camera 124. Note that the user (102 / 118) is considered to be looking directly at the camera both during such phases of fixation and throughout any intervening saccades. Nilsson: [0031] L.9-24); 
comparing an eye gaze of the current presenter within the images with the desired eye gaze direction of the current presenter (e.g., The computer storage holds a plurality of templates (which may, for example, be from temporally consecutive frames of a template video in some embodiments), each comprising a different image of an eye of the user (102 / 118) looking directly at the camera. The eye gaze correction module is configured to modify every frame of at least one continuous interval of the video to replace each of the user's (102 / 118) eyes with that of a respective template selected for that frame, whereby the user is perceived to be looking directly at the camera in the modified frames. The template selection module is configured to select the templates for the continuous interval. Different templates are selected for different frames of the continuous interval so that the user's (102 / 118) eyes exhibit animation throughout the continuous interval.  Nilsson: [0005] L.4-18. The gaze corrector 202 receives a pair of templates (template pair) T selected for the current frame by the template selection module 204. A template pair T in the context of the described embodiments means a set of left and right templates {tl, tr} which can be used to replace the user's (102 / 118) left and right eyes respectively, and which in this example comprise images of the user's (102 / 118) left and right eyes respectively looking directly at the camera. Nilsson: [0043] L.1-8.  A number (some or all) of the templates Ts are compared with one or more current and/or recent live frames of the video as received from the camera 124 to find a template pair that matches the current frame, and the matching template pair is selected (S606) by the template selection module 204 to be used for correction of the current frame by the eye gaze correction module 202. Recent frames means within a small number of frames from the current video—e.g. of order 1 or 10. A template pair matching the current frame means a left and a right template that exhibits a high level of visual similarity with their respective corresponding parts of the current and/or recent frame(s) relative to any other template frames that were compared with the current and/or recent frame(s). This ensures a smooth transition back to active gaze correction.  Nilsson: [0075] L.9-23); and 
adjusting the at least one of the saccadic eye movement, the micro-saccadic eye movement, or the vergence eye movement of the current presenter within the images to produce the gaze-adjusted images (e.g., In particular, when replacing with only a single static direct gaze patches the user (102 / 118) can occasionally appear “uncanny” i.e. having a glazed look about them as, in particular, the eyes lack the high frequency saccading present in real eyes. As indicated above, a saccade is a quick, simultaneous movement of both eyes away and back again.  Nilsson: [0068] L.4-9. In embodiments the eyes are replaced instead with a temporal sequence of templates gathered during training time, so that the eyes exhibit animation. That is, a sequence of direct gaze patches, blended temporally to appear life-like. The template selection module 201 selects different ones of the templates Ts for different frames of at least one continuous interval of the video received from the camera 124, a continuous interval being formed of an unbroken (sub)series of successive frames. For example, the continuous interval may be between two successive blinks or other re-initialization triggering events. In turn, the eye gaze correction module 202 modifies every frame of the continuous interval of the video to replace the user's eyes with those of whichever template has been selected for that frame. Because of the selections intentionally differ throughout the continuous interval, the user's (102 / 118) eyes exhibit animation throughout the continuous interval due to the visual variations exhibited between the sorted templates Ts. When the user's (102 / 118) eyes are animated in this manner, they appear more natural in the modified video.  Nilsson: [0069]).

Regarding claims 9, 11-12 and 16-17, the claims are method claims of system claims 1, 3-4 and 7-8 respectively. The claims are similar in scope to claims 1, 3-4 and 7-8 respectively and they are rejected under similar rationale as claims 1, 3-4 and 7-8 respectively. 
Nilsson further teaches that “A method of correcting an eye gaze of a user comprising: receiving from a camera video of the user's (102 / 118) face; accessing a plurality of stored templates, each comprising a different image of an eye of the use looking directly at the camera; and modifying every frame of at least one continuous interval of the video to replace each of the user's (102 / 118) eyes with that of a respective template selected for that frame, whereby the user (102 / 118) is perceived to be looking directly at the camera in the modified frames, wherein different templates are selected for different frames of the continuous interval so that the user's eyes exhibit animation throughout the continuous interval.” (Nilsson: Claim 19).

Regarding claim 10, the combined teaching of Nilsson and Novelli teaches the method of claim 9, further comprising:
analyzing the detected eye region to determine whether eye movements of the current presenter represent shifting eye movements associated with reading (e.g., the user's (102 / 118) face has angular coordinates α=(P, R, Y) in this coordinate system (bold typeface denoting vectors), and the model M comprises current values of the angular coordinates α. The current values of the angular coordinates α represent the current orientation of the user's (102 / 118) face relative to the camera 124.  Nilsson: [0036] L.1-6. The user's (102 / 118) face also has spatial coordinates r=(x, y, z), and the model M also comprises current values of the spatial coordinates in this example. These represent the current location in three-dimensional space of the user's (102 / 118) face relative to the camera 124. They can for example represent the location of a particular known reference point on or near the user's (102 / 118) face, such as a central point of their face or head, or point at or near which a particular facial, cranial or other head feature is located.  Nilsson: [0037].  The spatial and angular coordinates of the user's face (r, α)=(x, y, z, P, R, Y) constitute what is referred to herein as a pose of the user (102 / 118), the user's (102 / 118) current pose being represented by the current values of (r, α).  Nilsson: [0038]. As the user's (102 / 118) current pose (r, α) is computed relative to the camera 124 by the tracker 208, it is possible to place limits—denoted Δ herein and in the figures—on these values within which accurate gaze correction can be performed. As long as the tracked pose remains within these limits Δ, the gaze correction module 202 remains active and outputs its result as the new RGB video formed of the modified frames F′ (subject to any internal activation/deactivation within the eye gaze correction module 202, e.g. as triggered by blink detection). Conversely, if the tracked pose is not within the defined limits Δ then original video is supplied for compression and transmission unmodified.  Nilsson: [0057].  During reading, the user (102 / 118) is moving by changing the yaw angle.  When the yaw angle is within a limit, the eye modification is performed); and 
computing the desired eye gaze direction of the current presenter if the eye movements of the current presenter represent the shifting eye movements associated with reading; or terminating the method if the eye movements of the current presenter do not represent the shifting eye movements associated with reading (e.g., when the yaw angle is changed, the angular coordinate is changed and the pose is changed, if the pose is within limits Δ, that is the eye is moving left and right within the width of a document, gaze correction is performed.  If the eye is moving beyond the left and right limits, the angular coordinate and the pose are out of limits and gaze correction is not performed).

Regarding claim 13, the combined teaching of Nilsson and Novelli teaches the method of claim 9, comprising:
generating the gaze-adjusted images using a trained image generator; and training the image generator prior to executing the method of claim 9 (e.g., In embodiments the eyes are replaced instead with a temporal sequence of templates gathered during training time, so that the eyes exhibit animation. That is, a sequence of direct gaze patches, blended temporally to appear life-like. Nilsson: [0069] L.1-5), wherein training the image generator comprises: 
inputting a plurality of target images and a plurality of gaze-adjusted images generated by the image generator into an image discriminator (e.g., The template selection module 201 selects different ones of the templates Ts for different frames of at least one continuous interval of the video received from the camera 124, a continuous interval being formed of an unbroken (sub)series of successive frames. For example, the continuous interval may be between two successive blinks or other re-initialization triggering events.  Nilsson: [0069] L.5-11. The eye gaze correction module comprises a controller 247. In this example, the controller 247 comprises a blink detector 246 which detects when the user 102 blinks.  Nilsson: [0050] L.3-6.  The controller that determines a threshold is exceeded is taken as the image discriminator);
comparing the plurality of target images and the plurality of gaze-adjusted images using the image discriminator (e.g., When a difference between at least one of the replacement patches and its corresponding input patch is large enough, i.e. exceeds a threshold, this triggers a blink detection. Nilsson: [0050] L.6-9.  Therefore, the replacement patches and input patch are compared); 
assigning an authenticity value of real or fake to each of the plurality of gaze-adjusted images (e.g., This temporarily halts modification of the frames F until the difference drops below the threshold again. In this manner, when a blink by the user 102 is detected in certain frames, these frames are left unmodified so that the blink remains visible in the outgoing video feed. Nilsson: [0050] L.9-13.  When the difference exceeds a threshold, the replacement patches are not replacing the input patch, a blink is detected and this is taken as a fake authenticity); and 
updating the image generator in response to assigning the authenticity value of fake to any of the plurality of gaze-adjusted images (e.g., when a blink by the user 102 is detected in certain frames, these frames are left unmodified so that the blink remains visible in the outgoing video feed. Modification resumes when the end of the blink is detected and the user's eyes are open once more. Nilsson: [0050] L.11-15.  Therefore, the frames are left unmodified so that blink (fake authenticity) remains visible in the outgoing video feed).

Regarding claim 15, the combined teaching of Nilsson and Novelli teaches the method of claim 9, further comprising:
analyzing the generated gaze-adjusted images to assign a confidence value to the gaze-adjusted images (e.g., As the user's (102 / 118) current pose (r, α) is computed relative to the camera 124 by the tracker 208, it is possible to place limits—denoted Δ herein and in the figures—on these values within which accurate gaze correction can be performed.  Nilsson: [0057] L.1-4. In the embodiments described herein the limits Δ are in the form of a set of subranges—a respective subrange of values for each of the six coordinates. A user's (102 / 118) pose (r, α) is within Δ if and only if every one of the individual coordinates x, y, z, P, R, Y is within its respective subrange. In other embodiments, limits may only be placed on one or some of the coordinates—for example, in some scenarios imposing limits on just one angular coordinate is sufficient. Nilsson: [0058] L.1-9. Therefore, gaze correction (eye replacement) is performed when the user’s (102 / 118) pose is within the limit and a confidence value of correcting the frame is defined to carry out the correction); 
if the confidence value is above a specified threshold value, replacing the images within the video stream with the gaze-adjusted images (e.g. when it is confident that the user’s pose is within limit, gaze correction (eye replacement) is performed); and 
if the confidence value is below the specified threshold value, preventing the replacement of the images within the video stream with the gaze-adjusted images (e.g., when the user’s (102 / 118) pose is out of limit, gaze correction (eye replacement) is not performed.  Generally, the specific thresholds may be determined by the performance of the gaze correction algorithm in the specific camera/display setup.  Nilsson: [0067].  Therefore, a threshold can be defined as a value to enable/disable the gaze correction).

Regarding claim 18, the claim is a computer-readable storage medium claim of system claim 1.  The claim is similar in scope to claim 1 and it is rejected under similar rationale as claim 1.
Nilsson further teaches that “A computer program product for correcting an eye gaze of a user comprising code stored on a computer readable storage medium and configured when run on a computer to: receive from a camera video of the user's face; access a plurality of stored templates, each comprising a different image of an eye of the use looking directly at the camera; and modify every frame of at least one continuous interval of the video to replace each of the user's eyes with that of a respective template selected for that frame, whereby the user is perceived to be looking directly at the camera in the modified frames, wherein different templates are selected for different frames of the continuous interval so that the user's eyes exhibit animation throughout the continuous interval.” (Nilsson: Claim 20).

Regarding claim 20, the claim is a computer-readable storage medium claim of combination of system claims 3 and 4. The claim is similar in scope to the combination of claims 3 and 4 and it is rejected under similar rationale as the combination of claims 3 and 4.

Claims 2 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Nilsson in view of Novelli as applied to claim 1 (18) and further in view of Isikdogan et al. (2019/0266701).

Regarding claim 2, the combined teaching of Nilsson and Novelli teaches the computing system of claim 1, wherein the computer-executable instructions further cause the processor to generate the gaze-adjusted images using a trained image generator, wherein the image generator is trained using an image discriminator within a generative adversarial network (GAN) (see 2_1 below).
While the combined teaching of Nilsson and Novelli does not explicitly teach, Isikdogan teaches:
(2_1). to generate the gaze-adjusted images using a trained image generator, wherein the image generator is trained using an image discriminator within a generative adversarial network (GAN) (e.g., realistic images of eyes looking into different directions are programmatically generated. For example, a synthetic data generator can use the UnityEyes platform, first released by Wood et al. in 2016, to render and rasterize images of eyes, which are later refined by a generative adversarial network. In some examples, the sets of eye images can be created by programmatically moving the cursor to move the gaze towards random directions. For example, the cursor movements can be modeled as a zero mean Gaussian random variable, where zero means a centered gaze, looking right into the camera.  Isikdogan: [0033] L.1-11.  In some examples, to enhance photorealism, a generative adversarial network can be used. For example, the generative adversarial network can learn a mapping between synthetic and real samples and bring the distribution of the synthetically generated data closer to the ones captured by cameras. Using the trained generative adversarial network, all images in the synthetic dataset can be refined to create a large dataset that consists of photorealistic images having virtually perfect labels.  Isikdogan: [0038]. In various examples, the natural dataset can be used both to evaluate the model and to make the synthetic dataset more photorealistic. For example, a generative adversarial network can be used to convert synthetic images into natural looking ones. Being able to generate a photorealistic synthetic dataset allows for generating immense amount of data with pixel-perfect labels with a minimal cost.  Isikdogan: [0041].  Therefore, the generative adversarial network can be applied to the selected templates to convert the patches into natural looking images with a minimal cost).
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Isikdogan into the combined teaching of Nilsson and Novelli so that patches of selected templates of Nilsson can be converted into more natural looking images with generative adversarial network.

Regarding claim 19, the claim is a computer-readable storage medium claim of system claim 2.  The claim is similar in scope to claim 2 and it is rejected under similar rationale as claim 2.

Response to Arguments
Applicant’s arguments filed on May 31, 2022 have been fully considered and they are persuasive.  Therefore, the rejection has been withdrawn.  However, upon further consideration, a new ground(s) of rejection is made in view of the reference of Novelli (2021/0201021).

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SING-WAI WU whose telephone number is (571)270-5850. The examiner can normally be reached 9:00am - 5:30pm (Central Time).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kee Tung can be reached on 571-272-7794. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/SING-WAI WU/Primary Examiner, Art Unit 2611