Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION


Allowable Subject Matter
	Claims 17, 19-20, and 22 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.


Claims 1-2 and 9-10 are rejected under 35 U.S.C. 103 as being unpatentable over Stoyles et al. (Pub No. US 2019/0251728 A1) in view of Roche et al. (Patent No. US 10,467,792 B1) in further view of Morin et al. (Pub No. US 2009/0202114 A1).

As per claim 1, Stoyles teaches the claimed:
1. A system for remote visualization of real-time three-dimensional (3D) facial animation with synchronized voice, the system comprising:
a sensor device (This is shown in figure 2B as the “AV API 240” and modules 211, 212 and 230) that (i) captures one or more frames of a face of a person, each frame comprising one or more color images of the person’s face (In figure 2B where the “Image sensor 211” captures color images of the person’s face.  Also please see [0042] “Face tracking API 230 can receive image information and depth information from, e.g., image sensor 211 and depth sensor 212” and [0049] “… Image sensor 211 can generate image data, e.g. RGB color image data at 1280×720 pixels and 60 frames/second”), one or more depth maps of the person’s face (In figure 2B where the “Depth sensor 212” receives depth maps of the person’s face.  Also please see [0042] “Face tracking API 230 can receive image information and depth information from, e.g., image sensor 211 and depth sensor 212” and in [0007] “The method can include receiving a plurality of frames of depth information representing an object, such as a human head and face, using a depth sensor”), voice stream data associated with the person ([0078] “In operation 530, it can be determined whether to begin recording image, depth, and audio data … recording can begin in response to detecting the user's voice” and [0079] “… A/V API 240 can receive a plurality of frames of audio data from microphone 21”), -- and (ii) generates a 3D face model of the person using the one or more depth maps (In figure 2B where the “AV API 240” contains the “Face Tracking API 230”.  Also please see in the 2nd half of [0053] “… Face tracking API 230 can generate a base mesh representing a face and/or head of user, from the image and depth sensor data.”);

a computing device coupled to the sensor device, the computing device comprising a memory that stores computer-executable instructions and a processor that executes the instructions (In figure 2A where it shows the computing device “Client Device 110/115” comprising “CPU(s) 215” and “Memory 216” and “Storage 217”) to: receive the one or more frames of the person’s face and the 3D face model from the sensor device (In figure 2A where the computing device (“Client Device 110/115”) includes modules 250, 260, and 270 and in figure 2B where frames of the person’s face (RGB and depth) and the 3D face model (base mesh) is received at module 250 from the sensor device 240 and 211-212 ); 
preprocess the 3D face model (In [0054] “… Face tracking technique 252 can analyze metadata at a plurality of vertices of the mesh of the user's face to determine one or more activation points on the mesh.”); 
for each received frame: detect facial landmarks using the one or more color images and match the 3D face model to the one or more depth maps using non-rigid registration ([0063] “… As additional frames of image and depth information are received, differences can be determined between the base mesh and the image and depth frames received. Differences can be tracked in groups termed landmarks. For example, a group of vertices around the flexible portion of the ear 365 can be tracked as a group for movement. Similarly, smile lines 340, mouth 335, cheek line 360, eyebrow 320, eyelids 325, and forehead 355 can each be tracked as a group of mesh vertices or landmarks” and [0082] “In operation 547, face tracking API 230 can generate differences between the base mesh of the user's face and/or head and received frames of image and depth data. In an embodiment, the differences can be expressed as a change magnitude value, e.g. 0 . . . 255, per-vertice of the base mesh. In an embodiment, differences at landmarks (groups of vertices) can be determined for the vertices in each landmark, in aggregate, and a value can be expressed for a blend shape for the landmark that represents the change in the landmark vertices.”
In this instance, by finding the differences between the depth data (depth map) and the base mesh, a non-rigid registration is occurring because the system is determine how the base mesh registers or maps with the changed depth data (depth map data)); 


Stoyles alone does not explicitly teach the remaining claim limitations.
However, Stoyles in combination with Roche teaches the claimed:
a sensor device that (i) captures -- a timestamp and synchronize the 3D face model with a segment of the voice stream data using the timestamp (Please see Roche in the upper portion of col 11 where they refer to “Markup tags may include a timestamp that corresponds to a time occurrence of a communication expression in speech data. The timestamp may be used to synchronize the animation of a virtual object 324 with audio or textual output to simulate a communication expression. A simulation client 334 may use the timestamp to determine when to animate a virtual object 324 to simulate a particular communication expression. As a specific example, a communication mark time may a timestamp for a simulation client 334 to animate a virtual object 324 to simulate a smile that coincides with audio output that includes expressions of happiness”); and 3
transmit the synchronized 3D face model and voice stream data to a remote device for display (Stoyles performs this function in figure 5C in step 580 where their rendered 3D emoji is transmitted with voice stream data to a remote device.  Also please see Stoyles in [0089] “In operation 580, for each recipient message application 280 can transmit to the message system 130 the message and the version of the puppeted emoji video indicated by the message system 130 as being optimal for the receiving client device 115 of the one or more message recipients” and Stoyles in [0066] “… then message system 130 can indicate in communication 410 only those versions of the puppeted emoji which the sending client device 110 is capable of rendering. A version may include rendering a video with audio”.  In this instance, the “audio” corresponds to the claimed “voice stream data”.  The claimed feature is taught when the emoji animation in Stoyles is synchronized with Stoyles’ audio using the timestamp technique as taught by Roche).
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use the timestamps as taught by Roche with the system of Stoyles in order to better synchronize specific face expressions in the animation with specific portions of the speech audio stream (col 11, lines 14-30 in Roche).

Morin in combination with Stoyles and Roche teaches the claimed:
update a texture on a portion of the 3D face model corresponding to the person’s face using the one or more color images, wherein the texture comprises a cropped portion of one or more of the color images (Morin teaches this feature at the end of paragraph [0019] “… The method can further comprise cropping from the video image areas outside an area proximate to the face”, Morin in paragraph [0069]-[0070] “The figures generally show processes by which aspects associated with a person's face in a moving image may be identified and then tracked as the user's head moves … to generate the mask for display on a computer system, such as for a face of an avatar that reflects a player's facial motions and expressions in real time. Also, a user's facial image is extracted via reverse rendering into a texture that may then be laid over a frame of the mask … the morphed face may be displayed in real time later as the user moves his or her head and changes his or her facial expressions”, and Morin in [0166] “The gaming device 560 may include, for example, a web cam 574 … for capturing video at a user's location, such as video that includes an image of the user's face”.
According to these passages from Morin, non-facial portions of the images captured by its camera for a plurality of frames are cropped out in order to extract the facial image of the user.  This allows the real-time tracking of the user’s face in the video frames (the color images) and this allows the real-time updating of the avatar to texture based on these video frames (the color images)).
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention for the texture to comprise a cropped portion of one or more of the color images as taught by Morin with the system of Stoyles as modified by Roche in order to allow an avatar to be updated in real-time with relevant portions of the user’s face (paragraph [0070] in Morin).  The claimed feature is taught when this texturing of the face area from Morin is incorporated into the avatar animation system of Stoyles, e.g. in order to incorporate texture data from the user’s face into the avatar’s appearance in Stoyles.


As per claim 2, Stoyles teaches the claimed:
2. The system of claim 1, wherein the 3D face model comprises one or more of: a face of the person, a chest of the person, one or more shoulders of the person, and a back of a head of the person (Please see in figure 2B where the output from the “AV API 240” includes “Tracked face with mesh”.  Thus, the “3D face model” comprises at least a face of the person.  Also please see in [0010] “Face tracking API can receive image sensor data and depth sensor data and generate a base mesh of a head and/or face of the user.”).


As per claims 9-10, these claims are similar in scope to limitations recited in claims 1-2 respectively, and thus are rejected under the same rationale.


Claims 3 and 11 are rejected under 35 U.S.C. 103 as being unpatentable over Stoyles in view of Roche in further view of Morin, Riesen et al. (Pub No. US 2021/0166461 A1) and Farahbakhshian et al. (Pub No. 2019/0304181 A1).

As per claim 3, Stoyles alone does not explicitly teach the claimed limitations.
Riesen in combination with Stoyles and Roche teaches the claimed:
3. The system of claim 2, wherein preprocessing the 3D face model comprises: loading the 3D face model into memory (Please see Riesen in [0136] “In step 12, a character or an avatar, for example in the form of a head, can therefore be initialized. In this case, the avatar is defined by a virtual model in the form of a three-dimensional skeleton comprising a set of hierarchically connected bones, for example a number of 250, and a mesh of vertices which is coupled thereto, and is loaded into a memory area which can be addressed by a graphics unit of the program”.  The claimed feature is taught when this loading into memory step is applied as part of the preprocessing of the user’s face mesh (3D face model) in Stoyles).
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to load the 3D face model into memory as part of the preprocessing as taught by Riesen with the system of Stoyles as modified by Roche and Morin in order to allow the system to obtain fast read and write access to the vertex and mesh data that makes up the 3D face model.  This helps allow the software to manipulate and make comparisons of face data with the 3D face model that is being used to represent the user’s shape graphically and spatially.

Farahbakhshian in combination with Stoyles, Roche, and Riesen teaches the claimed:
separating a part of the 3D face model corresponding to the face of the person from one or more other parts of the 3D face model (Farahbakhshian in [0036] “Additionally, the method also segments the body mesh into meaningful body regions such as head, neck, chest, etc.”).
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to separating a part of the 3D face model corresponding to the face of the person from one or more other parts of the 3D face model as taught by Farahbakhshian with the system of Stoyles as modified by Roche, Morin, and Riesen in order to allow the system to model and focus on modifications of specific portions of the user (including both the user’s body and face).  Thus, this enables a more complete avatar or 3D model of the user to be created and displayed to remotely located recipient devices in a chat or conversation.

As per claim 11, this claim is similar in scope to limitations recited in claim 3, and thus is rejected under the same rationale.


Claims 4 and 12 are rejected under 35 U.S.C. 103 as being unpatentable over Stoyles in view of Roche in further view of Morin and Jiao et al. (Pub No. 2017/0206694 A1).

As per claim 4, Stoyles alone does not explicitly teach the claimed limitations.
However, Jiao in combination with Stoyles and Roche teaches the claimed:
4. The system of claim 1, wherein the computing device preprocesses the 3D face model once at a beginning of a streaming session (Jiao teaches this feature in figure 4 as step 206 “Initial Face Mesh Fitting 206” and in paragraph [0049].  This initial face mesh fitting is performed once at the beginning before any messages or communication is generated using the 3D face model.  The claimed feature is taught when Jiao’s initial face mesh fitting is used with the face mesh model used in Stoyles as modified by Roche).
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use the 3D mesh face fitting as taught by Jiao with the system of Stoyles as modified by Roche and Morin in order to allow the system to check the mesh to see how well it represents the user’s appearance before using the mesh in a conversation or with communication with a remotely located other user.

As per claim 12, this claim is similar in scope to limitations recited in claim 4, and thus is rejected under the same rationale.


Claims 5 and 13 are rejected under 35 U.S.C. 103 as being unpatentable over Stoyles in view of Roche in further view of Morin and Mishra et al. (Pub No. 2019/0172458 A1).

As per claim 5, Stoyles alone does not explicitly teach the claimed limitations.
However, Mishra in combination with Stoyles and Roche teaches the claimed:
5. The system of claim 1, wherein detecting facial landmarks using the one or more color images comprises executing a pre-trained neural network model on the one or more color images to detect the facial landmarks (Mishra in [0089] “… The system deep learning can be accomplished using a convolution neural network or other techniques. The deep learning can accomplish facial recognition and analysis tasks. The network includes an input layer 1210. The input layer 1210 receives image data …  The input layer 1210 can then perform processing such as identifying boundaries of the face, identifying landmarks of the face, extracting features of the face, and/or rotating a face within the plurality of images”).
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use the 3D mesh face fitting as taught by Mishra with the system of Stoyles as modified by Roche and Morin in order to utilize technology that is known to effectively identify prominent features on the user’s face when provided with an input image.  Thus, this helps make implementation of the system easier.

As per claim 13, this claim is similar in scope to limitations recited in claim 5, and thus is rejected under the same rationale. 



Claims 18 and 21 are rejected under 35 U.S.C. 103 as being unpatentable over Stoyles in view of Roche in further view of Morin and Braga et al. (Pub No. US 2013/0097194 A1).

As per claim 18, Stoyles alone does not explicitly teach the claimed limitations.
However, Braga in combination with Stoyles, Roche, and Morin teaches the claimed:
18. The system of claim 1, wherein synchronizing the 3D face model with a segment of the voice stream data using the timestamp comprises: determining that the timestamp associated with the frame is delayed based upon a current time; and discarding one or more of (i) the segment of the voice segment data or (ii) the frame (Braga in [0046] “Exemplary Frame Discarding: The exemplary system can be organized around two threads: the image database matching thread, which can produce the best matching frame based on the controller's real time skeleton, and the rendering thread, which can display the matched frames on the screen. The matching thread can add frames to a queue, annotated with a timestamp of the query, and the rendering thread consumes frames from the queue. In order to avoid occasional long lags between the controller's movement and the video that is displayed back to him/her, maintaining the feel of real-time control, the rendering thread can discard the frames that are too old when dequeing a new frame for display”.  In this instance, a frame that is too old corresponds to the claimed “the frame is delayed based upon a current time”.  The claimed feature is taught when the matching frame feature from Braga is incorporated into Stoyles in order to depict an avatar that has an outer appearance based upon the matching frames used in Braga).
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to discarded the frame when the frame is delayed based upon a current time and the timestamp as taught by Braga with the system of Stoyles as modified by Roche and Morin in order to help maintain the feel of real-time control on the interactive video display (Braga in [0046]).

As per claim 21, this claim is similar in scope to limitations recited in claim 18, and thus is rejected under the same rationale. 



Response to Arguments
Applicant’s arguments, filed May 20, 2022, with respect to how the newly amended claim features differ from the prior art cited in the last office have been fully considered. These arguments are found to be persuasive.  Therefore, the rejection has been withdrawn.  However, upon further consideration, a new ground(s) of rejection is made in this office action.




Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DANIEL HAJNIK whose telephone number is (571) 272-7642.  The examiner can normally be reached on Mon-Fri (8:30A-5:00P).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.  
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jennifer Mehmood can be reached on (571) 272-2976.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/DANIEL F HAJNIK/Primary Examiner, Art Unit 2612