Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION

Response to Amendment
The amendment filed on January 27, 2022 has been entered.
In view of the amendment to the specification, paragraphs [0003] and [0066] of the specification have been acknowledged.
In view of the amendment to the claims, the amendment of claims 1, 3, 12, 14 and 20 have been acknowledged. Claims 4 and 15 have been canceled. New claims 21-22 have been added.
In view of the amendment of claims 1 and 12, Applicant amended each claim to specific “a facial area”. Accordingly, the 35 U.S.C. 112 rejections of claims 1-3, 5-14 and 16-19 have been withdrawn.

Response to Arguments
Applicant’s arguments, see pages 8-9 of Remarks, filed January 27, 2022 have been fully considered. Applicant’s arguments are directed to the amended limitations of claims and addressed in the claim rejections below.

	 Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):



The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claim 20 is rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor, or for pre-AIA  the applicant regards as the invention.

Claim 20 recites the limitation of “generating a target image by combining the region of interest of the puppet object and the supplementary data”.  However, Applicant amended the claim and changed “a region of interest” to “the facial area”. Thus, “the region of interest” is undefined. Therefore, the claims are rejected under U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.


Claims 1, 3, 5-9, 12, 14 and 16-22 are rejected under 35 U.S.C. 103 as being unpatentable over SHIN et al (U.S. Patent Application Publication 2020/0410739 A1) in view of Astarabadi et al (U.S. Patent Application Publication 2020/0358983 A1).

Regarding claim 1, SHIN discloses a method for voice driven animation of an object in an image, the method comprising: 
	sampling an input video (FIGS. 2 and 3; paragraph [0117], the robot 100 can include an image acquisition part 120 capable of photographing surroundings of the main body 101 and 102 within a predetermined range on the basis of the front side of the main body 101 and 102; paragraph [0118], the image acquisition part 120 may include a camera module. The camera module may include a digital camera. The digital camera may include … The digital signal processor can generate a video composed of frames configured as still images), depicting a puppet object (Paragraph [0021], the data related to the user may be video data in which the user has been photographed or real-time video data in which the user is photographed), to obtain Paragraphs [0346]-[0347], FIG. 17 is a flowchart showing a method for operating a robot according to an embodiment of the present disclosure and shows a method for operating a robot which recognizes emotions of a videotelephony partner during video telephony; the robot 100 according to an embodiment of the present disclosure can receive image from a robot of a videotelephony partner (S1710); paragraph [0378], step S1720 of recognizing emotional information of the videotelephony user may include a step in which the robot 100 transmits data received from the robot of the videotelephony user to the emotion recognition server 70; FIG. 5 shows the emotion recognition server 70 including the emotion recognizer 74a; paragraph [0197], the input data 590 may be video data including captured images of a user, and the video data may include video data including a captured image of the face of the user; paragraph [0200], the modal divider 530 can separate image unimodal input data 533 including one or more pieces of face image data from the video data included in the input data 590); 
receiving audio data (Paragraph [0347], the robot 100 according to an embodiment of the present disclosure can receive audio data from a robot of a 
videotelephony partner (S1710)); 
extracting voice related features from the audio data (Paragraph [0378], step S1720 of recognizing emotional information of the videotelephony user may include a step in which the robot 100 transmits data received from the robot of the videotelephony user to the emotion recognition server 70 including an artificial neural network trained to recognize emotional information on the basis of the audio data and a step in which the robot 100 receives; paragraph [0198], the modal divider 530 can divide the input data 590 into text unimodal input data 531 obtained by converting the audio data included in the input data 590 into text data, and sound unimodal input data of the audio data, such as a speech tone, magnitude and height; paragraphs [0207], the speech emotion recognizer 522 extracts feature points of input speech data.  Here, the speech feature points may include the tone, volume, waveform and the like of speech); 
Paragraphs [0348-[0349], an emotion recognition result obtained from emotional information recognition may include a probability value for each emotional class; paragraph [0350], the controller 140 of the robot 100 can generate an avatar by mapping the recognized emotional information of the videotelephony partner to face information of the videotelephony partner included in the data received from the robot of the videotelephony partner (S1730)), wherein the expression representation is related to an appearance of a facial area (Paragraphs [0210]-[0211], FIG. 6 is a diagram referred to in description of emotion recognition according to an embodiment of the present disclosure and illustrates components of an expression … eyebrows 61, eyes 62, cheeks 63, a forehead 64, a nose 65, a mouth 66 and a chin 67 may correspond to expression landmark points; paragraph [0247], FIGS. 7 to 10 are diagrams referred to in description of expression of characters according to an embodiment of the present disclosure; paragraph [0385], the robot 100 can understand emotional feature points of a user and reproduce recognized emotional feature points through an avatar.  For example, the robot 100 can recognize a unique feature point of a user (a specific emotional expression of a speaker) such as raising of the corners of the mouth when the user smiles and map the feature point to an avatar. Thus, the expression representation related to an appearance of a face area) while producing a sound (Paragraph [0155], referring to FIG. 2, the audio output part 181 can be disposed on the left and right sides of the head 110 and output predetermined information as audio; paragraph [0207],  the speech emotion recognizer 522 extracts feature points of input speech data.  Here, the speech feature points may include the tone, volume, waveform and the like of speech.  The speech emotion recognizer 522 can identify a user's emotion by detecting the tone of the speech; paragraph [0295], the controller 140 of the robot 100 can map the emotional information of the user to the image data of the user and synchronize the audio data of the user therewith to generate a video of the avatar; paragraph [0368], the controller 140 can map the recognized emotional information of the videotelephony partner to the audio data of the videotelephony partner to generate converted audio data.  The audio output part 181 can utter the converted audio data under the control of the controller 140); 
obtaining, from the image, auxiliary data related to content of the image (Paragraph [0204], the image unimodal input data 533 including one or more pieces of face image data can be input to the face emotion recognizer 523 which performs deep learning using image learning data; paragraph [0340], according to an embodiment, for a user having resistance to exposure of the face and surrounding environments, the face and surrounding environments of the user can be recognized and a character and a background image can be generated on the basis of the recognized information and used. Thus, the background information is recognized from the received image); and 
generating a target image based on the expression representation and the auxiliary data (Paragraph [0373], according to an embodiment, for users having resistance to exposure of surrounding environments, a background image can be generated and the generated avatar can be displayed on the generated background image. Thus, a target image is generated based on the expression representation of the generated avatar and the extracted background image from the received image).
SHIN discloses videotelephony functions, camera is used to capture video images and the emotion recognizer recognizes emotion information based on the input data.
However, SHIN does not specifically disclose sampling an input video to obtain an image.
In the similar field of endeavor, Astarabadi discloses (Paragraph [0100], FIG. 4 shows a similar implementation of face model calculation with multiple images) sampling an input video to obtain an image (Paragraph [0102], the device (or the remote computer system) can extract a set of frames from the video clip and then execute the foregoing methods and techniques to converge on a set of coefficients for each frame in this set.  For example, the device can: implement methods and techniques described above to detect the user's face in each frame in the video clip; implement the facial landmark extractor to generate a facial landmark container for each frame in the video clip; and select a subset of frames (e.g., ten frames, 32 frames, 64 frames)--from the video clip--that correspond to facial landmark containers exhibiting least similarity and/or greatest ranges of facial landmark values within this set of facial landmark containers.  More specifically, the device can compare facial landmark containers extracted from frames in the video clip to identify a subset of frames that represent a greatest range of face poses and facial expressions within the video clip; paragraph [0103], the device can then: select a first frame--from this subset of frames--associated with a first facial landmark container; extract a first authentic face image from a region of the first frame depicting the user's face. Thus, an image is obtained by sampling the input video).
SHIN and Astarabadi are analogous art because both pertain to utilize the method for video calling. It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the processing of the emotion recognizer taught by SHIN incorporate the teachings of Astarabadi, and applying the frame selection taught by Astarabadi to process the input video, identify a subset of frames that represent a greatest range of face poses and facial expressions within the video clip and extract a first authentic face image from a region of the selected first frame depicting the user's face, and as it could be used to allow the emotion recognizer extract an image from the input video for recognizing the emotional information. Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify SHIN according to the relied-upon teachings of Astarabadi to obtain the invention as specified in claim.

	Regarding claim 3, the combination of SHIN in view of Astarabadi discloses everything claimed as applied above (see claim 1), and SHIN further disclose wherein the facial area comprises an area including a mouth of the puppet object (FIGS. 2 and 3; paragraph [0152],  the display 182 may be disposed on the front side of the head 110; paragraph [0247], FIGS. 7 to 10 are diagrams referred to in description of expression of characters according to an embodiment of the present disclosure; Thus, the face area includes a mouth).	However, SHIN does not specifically disclose an area including a neck area of the puppet object.
	In the similar field of endeavor, Astarabadi discloses an area including a neck area of the puppet object (Paragraph [0127], in particular, like the facial 
deconstruction model and the facial landmark extractor described above, the 
device and/or the remote computer system can implement the body landmark 
extractor: to detect spatial characteristics of a body--such as including positions of a neck, shoulders, a chest, arms, hands, an abdomen, a waist--depicted in a 2D image; paragraph [0129], for each image in this set, the device can: detect a body (e.g., neck, shoulders, chest, arms, hands, abdomen, waist) in a region of the image; extract an authentic body image from this region of the image; extract an authentic body image from this region of the image; implement the body landmark extractor to extract a body landmark container from the image; and calculate a set of coefficients that--when injected into the synthetic body generator with the body landmark container--produces a synthetic body image that approximates the authentic body image, such as to a degree that a human may recognize the user's body in the synthetic body image and/or such that a human may discern limited visual differences between the authentic body image and the synthetic body image).
	SHIN and Astarabadi are analogous art because both pertain to utilize the method for video calling. It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the processing of the 

	Regarding claim 5, the combination of SHIN in view of Astarabadi discloses everything claimed as applied above (see claim 1), and SHIN further disclose wherein producing the expression representation further comprises matching the expression representation to an identity of the puppet object (Paragraphs [0210]-[0211], FIG. 6 is a diagram referred to in description of emotion recognition according to an embodiment of the present disclosure and illustrates components of an expression … eyebrows 61, eyes 62, cheeks 63, a forehead 64, a nose 65, a mouth 66 and a chin 67 may correspond to expression landmark points; FIGS. 2 and 3; paragraph [0358], the robot 100 can understand emotional feature points of a user and reproduce recognized emotional feature points through an avatar.  For example, the robot 100 can recognize a unique feature point of a user (a specific 
emotional expression of a speaker) such as raising of the corners of the mouth 
when the user smiles and map the feature point to an avatar).

	Regarding claim 6, the combination of SHIN in view of Astarabadi discloses everything claimed as applied above (see claim 1), and SHIN further disclose wherein FIG. 5; paragraphs [0207], the speech emotion recognizer 522 extracts feature points of input speech data.  Here, the speech feature points may include the tone, volume, waveform and the like of speech; FIG. 17; paragraphs [0348-[0349], an emotion recognition result obtained from emotional information recognition may include a probability value for each emotional class; paragraph [0350], the controller 140 of the robot 100 can generate an avatar by mapping the recognized emotional information of the videotelephony partner to face information of the videotelephony partner included in the data received from the robot of the videotelephony partner (S1730)).

	Regarding claim 7, the combination of SHIN in view of Astarabadi discloses everything claimed as applied above (see claim 1), and SHIN further disclose wherein the voice related features correspond to a sentiment of a puppet object depicted in the received audio data (FIG. 5; paragraph [0198], the modal divider 530 can divide the input data 590 into text unimodal input data 531 obtained by converting the audio data included in the input data 590 into text data, and sound unimodal input data of the audio data, such as a speech tone, magnitude and height; paragraphs [0207], the speech emotion recognizer 522 extracts feature points of input speech data.  Here, the speech feature points may include the tone, volume, waveform and the like of speech).

Regarding claim 8, the combination of SHIN in view of Astarabadi discloses everything claimed as applied above (see claim 1), and SHIN further disclose wherein the auxiliary data comprises data related to the puppet object (FIG. 5; paragraph [0204], the image unimodal input data 533 including one or more pieces of face image data can be input to the face emotion recognizer 523 which performs deep learning using image learning data) and supplementary data related to a scene represented in the image (Paragraph [0340], according to an embodiment, for a user having resistance to exposure of the face and surrounding environments, the face and surrounding environments of the user can be recognized and a character and a background image can be generated on the basis of the recognized information and used. Thus, the background information is related to the user and surrounding environments of the user recognized in the received image).

	Regarding claim 9, the combination of SHIN in view of Astarabadi discloses everything claimed as applied above (see claim 1), and SHIN discloses further comprising performing a training stage (FIG. 17; paragraphs [0374]-[0375], Recognition (S1720) of emotions of the videotelephony user can be 
performed by the robot 100 … the robot 100 can include the emotion recognizer 74a recognizer 74a which includes an artificial neural network trained to recognize emotional information on the basis of image data and audio data, and when data received from the robot of the videotelephony user is input) by repeating the steps (Paragraph [0346], FIG. 17 is a flowchart showing a method for operating a robot according to an embodiment of the present disclosure and shows a method for operating a robot which recognizes emotions of a videotelephony partner during video telephony) of sampling (FIG. 5; paragraph [0200], the modal divider 530 can separate image unimodal input data 533 including one or more pieces of face image data from the video data included in the input data 590), extracting (Paragraphs [0207], the speech emotion recognizer 522 extracts feature points of input speech data.  Here, the speech feature points may include the tone, volume, waveform and the like of speech), Paragraph [0204], the image unimodal input data 533 including one or more pieces of face image data can be input to the face emotion recognizer 523 which performs deep learning using image learning data) Paragraph [0198], the modal divider 530 can divide the input data 590 into text unimodal input data 531 obtained by converting the audio data included in the input data 590 into text data, and sound unimodal input data of the audio data, such as a speech tone, magnitude and height).
	SHIN discloses an artificial neural network is implemented for training the emotional information recognition. SHIN does not disclose a training stage including producing and generating a target image.
	In the similar field of endeavor, Astarabadi discloses a training stage including producing and generating a target image (Paragraph [0014], the first and second devices can also implement facial deconstruction and facial reconstruction models--such as trained on a population of users or the first user specifically based on deep learning or artificial intelligence techniques--to rapidly decompose a first video feed recorded at the first device into a first facial landmark feed and to reconstruct this first facial landmark feed into a first synthetic--but photorealistic--video depicting highest-import content from the first video feed (i.e., the first user's face); paragraph [0161], upon receipt of a facial landmark container and a corresponding audio packet from the first device, the second device can: extract audio data from the audio packet; insert the facial landmark container and the first face model of the first user into a local copy of the synthetic face generator--stored in local memory on the second device--to generate a synthetic face image; and render the synthetic face image over the first background within the video call portal (e.g., to form a "first synthetic video feed") while playing back the audio data via an integrated or connected audio driver; paragraph [0085], FIG. 3; in Block S202, the remote computer system can train the conditional generative adversarial network to output a synthetic face image based on a set of input conditions, including: a facial landmark container, which captures relative locations (and/or sizes, orientations) of facial landmarks that represent a facial expression; and a face model, which contains a (pseudo-) unique set of coefficients characterizing a unique human face and secondary physiognomic features (e.g., face shape, skin tone, facial hair, makeup, freckles, wrinkles, eye color, hair color, hair style, and/or jewelry)).
SHIN and Astarabadi are analogous art because both pertain to utilize the method for video calling. It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the processing of the 

	Regarding claim 12, SHIN discloses a system for voice driven animation of an object in an image, the system comprising: 
a memory (FIG. 1 shows a configuration of a robot system; FIG. 3 shows an exemplary internal block diagram of the robot; paragraph [0130], the storage 130 which store various types of data); and 
a processor (Paragraph [0130], the robot 100 may include the controller 140 which controls overall operation) configured to: 
sample an input video (FIGS. 2 and 3; paragraph [0117], the robot 100 can include an image acquisition part 120 capable of photographing surroundings of the main body 101 and 102 within a predetermined range on the basis of the front side of the main body 101 and 102; paragraph [0118], the image acquisition part 120 may include a camera module. The camera module may include a digital camera. The digital camera may include … The digital signal processor can generate a video composed of frames configured as still images), depicting a Paragraph [0021], the data related to the user may be video data in which the user has been photographed or real-time video data in which the user is photographed), to obtain Paragraphs [0346]-[0347], FIG. 17 is a flowchart showing a method for operating a robot according to an embodiment of the present disclosure and shows a method for operating a robot which recognizes emotions of a videotelephony partner during video telephony; the robot 100 according to an embodiment of the present disclosure can receive image from a robot of a videotelephony partner (S1710); paragraph [0378], step S1720 of recognizing emotional information of the videotelephony user may include a step in which the robot 100 transmits data received from the robot of the videotelephony user to the emotion recognition server 70; FIG. 5 shows the emotion recognition server 70 including the emotion recognizer 74a; paragraph [0197], the input data 590 may be video data including captured images of a user, and the video data may include video data including a captured image of the face of the user; paragraph [0200], the modal divider 530 can separate image unimodal input data 533 including one or more pieces of face image data from the video data included in the input data 590); 
receive audio data (Paragraph [0347], the robot 100 according to an embodiment of the present disclosure can receive audio data from a robot of a videotelephony partner (S1710)); 
extract voice related features from the audio data (Paragraph [0378], step S1720 of recognizing emotional information of the videotelephony user may include a step in which the robot 100 transmits data received from the robot of the videotelephony user to the emotion recognition server 70 including an artificial neural network trained to recognize emotional information on the basis of the audio data and a step in which the robot 100 receives; paragraph [0198], the modal divider 530 can divide the input data 590 into text unimodal input data 531 obtained by converting the audio data included in the input data 590 into text data, and sound unimodal input data of the audio data, such as a speech tone, magnitude and height; paragraphs [0207], the speech emotion recognizer 522 extracts feature points of input speech data.  Here, the speech feature points may include the tone, volume, waveform and the like of speech); 
produce an expression representation based on the voice related features (Paragraphs [0348-[0349], an emotion recognition result obtained from emotional information recognition may include a probability value for each emotional class; paragraph [0350], the controller 140 of the robot 100 can generate an avatar by mapping the recognized emotional information of the videotelephony partner to face information of the videotelephony partner included in the data received from the robot of the videotelephony partner (S1730)), wherein the expression representation is related to an appearance of a facial area (Paragraphs [0210]-[0211], FIG. 6 is a diagram referred to in description of emotion recognition according to an embodiment of the present disclosure and illustrates components of an expression … eyebrows 61, eyes 62, cheeks 63, a forehead 64, a nose 65, a mouth 66 and a chin 67 may correspond to expression landmark points; paragraph [0247], FIGS. 7 to 10 are diagrams referred to in description of expression of characters according to an embodiment of the present disclosure; paragraph [0385], the robot 100 can understand emotional feature points of a user and reproduce recognized emotional feature points through an avatar.  For example, the robot 100 can recognize a unique feature point of a user (a specific emotional expression of a speaker) such as raising of the corners of the mouth when the user smiles and map the feature point to an avatar. Thus, the expression representation related to an appearance of a face area) while producing a sound (Paragraph [0155], referring to FIG. 2, the audio output part 181 can be disposed on the left and right sides of the head 110 and output predetermined information as audio; paragraph [0207],  the speech emotion recognizer 522 extracts feature points of input speech data.  Here, the speech feature points may include the tone, volume, waveform and the like of speech.  The speech emotion recognizer 522 can identify a user's emotion by detecting the tone of the speech; paragraph [0295], the controller 140 of the robot 100 can map the emotional information of the user to the image data of the user and synchronize the audio data of the user therewith to generate a video of the avatar; paragraph [0368], the controller 140 can map the recognized emotional information of the videotelephony partner to the audio data of the videotelephony partner to generate converted audio data.  The audio output part 181 can utter the converted audio data under the control of the controller 140); 
obtain, from the image, auxiliary data related to content of the image (Paragraph [0204], the image unimodal input data 533 including one or more pieces of face image data can be input to the face emotion recognizer 523 which performs deep learning using image learning data; paragraph [0340], according to an embodiment, for a user having resistance to exposure of the face and surrounding environments, the face and surrounding environments of the user can be recognized and a character and a background image can be generated on the basis of the recognized information and used. Thus, the background information is recognized from the received image); and 
generate a target image based on the expression representation and the auxiliary data (Paragraph [0373], according to an embodiment, for users having resistance to exposure of surrounding environments, a background image can be generated and the generated avatar can be displayed on the generated background image. Thus, a target image is generated based on the expression representation of the generated avatar and the extracted background image from the received image).
SHIN discloses videotelephony functions, camera is used to capture video images and the emotion recognizer recognizes emotion information based on the input data.
However, SHIN does not specifically disclose sample an input video to obtain an image.
In the similar field of endeavor, Astarabadi discloses (Paragraph [0100], FIG. 4 shows a similar implementation of face model calculation with multiple images) sample an input video to obtain an image (Paragraph [0102], the device (or the remote computer system) can extract a set of frames from the video clip and then execute the foregoing methods and techniques to converge on a set of coefficients for each frame in this set.  For example, the device can: implement methods and techniques described above to detect the user's face in each frame in the video clip; implement the facial landmark extractor to generate a facial landmark container for each frame in the video clip; and select a subset of frames (e.g., ten frames, 32 frames, 64 frames)--from the video clip--that correspond to facial landmark containers exhibiting least similarity and/or greatest ranges of facial landmark values within this set of facial landmark containers.  More specifically, the device can compare facial landmark containers extracted from frames in the video clip to identify a subset of frames that represent a greatest range of face poses and facial expressions within the video clip; paragraph [0103], the device can then: select a first frame--from this subset of frames--associated with a first facial landmark container; extract a first authentic face image from a region of the first frame depicting the user's face. Thus, an image is obtained by sampling the input video).
SHIN and Astarabadi are analogous art because both pertain to utilize the method for video calling. It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the processing of the emotion recognizer taught by SHIN incorporate the teachings of Astarabadi, and applying the frame selection taught by Astarabadi to process the input video, identify a subset of frames that represent a greatest range of face poses and facial expressions within the video clip and extract a first authentic face image from a region of the selected first frame depicting the user's face, and as it could be used to allow the emotion recognizer extract an image from the input video for recognizing the emotional information. Therefore, it would have been obvious to a person of ordinary skill in the art 

	Regarding claim 14, the combination of SHIN in view of Astarabadi discloses everything claimed as applied above (see claim 12), and SHIN further disclose wherein the facial area comprises an area including a mouth of the puppet object (FIGS. 2 and 3; paragraph [0152],  the display 182 may be disposed on the front side of the head 110; paragraph [0247], FIGS. 7 to 10 are diagrams referred to in description of expression of characters according to an embodiment of the present disclosure; Thus, the face area includes a mouth).
	However, SHIN does not specifically disclose an area including a neck area of the puppet object.
In the similar field of endeavor, Astarabadi discloses an area including a neck area of the puppet object (Paragraph [0127], in particular, like the facial 
deconstruction model and the facial landmark extractor described above, the 
device and/or the remote computer system can implement the body landmark 
extractor: to detect spatial characteristics of a body--such as including positions of a neck, shoulders, a chest, arms, hands, an abdomen, a waist--depicted in a 2D image; paragraph [0129], for each image in this set, the device can: detect a body (e.g., neck, shoulders, chest, arms, hands, abdomen, waist) in a region of the image; extract an authentic body image from this region of the image; extract an authentic body image from this region of the image; implement the body landmark extractor to extract a body landmark container from the image; and calculate a set of coefficients that--when injected into the synthetic body generator with the body landmark container--produces a synthetic body image that approximates the authentic body image, such as to a degree that a human may recognize the user's body in the synthetic body image and/or such that a human may discern limited visual differences between the authentic body image and the synthetic body image).
	SHIN and Astarabadi are analogous art because both pertain to utilize the method for video calling. It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the processing of the emotion recognizer taught by SHIN incorporate the teachings of Astarabadi, and applying the frame selection taught by Astarabadi to process the image, detect spatial characteristics of a body in a region of the image and extract the body image from the region of the image. Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify SHIN according to the relied-upon teachings of Astarabadi to obtain the invention as specified in claim.
	Regarding claim 16, the combination of SHIN in view of Astarabadi discloses everything claimed as applied above (see claim 12), and SHIN further disclose wherein the processor is further configured to produce the expression representation by matching the expression representation to an identity of the puppet object (Paragraphs [0210]-[0211], FIG. 6 is a diagram referred to in description of emotion recognition according to an embodiment of the present disclosure and illustrates components of an expression … eyebrows 61, eyes 62, cheeks 63, a forehead 64, a nose 65, a mouth 66 and a chin 67 may correspond to expression landmark points; FIGS. 2 and 3; paragraph [0358], the robot 100 can understand emotional feature points of a user and reproduce recognized emotional feature points through an avatar.  For example, the robot 100 can recognize a unique feature point of a user (a specific emotional expression of a speaker) such as raising of the corners of the mouth when the user smiles and map the feature point to an avatar).    

	Regarding claim 17, the combination of SHIN in view of Astarabadi discloses everything claimed as applied above (see claim 12), and SHIN further disclose wherein the voice related features correspond to a sentiment of a puppet object depicted in the received audio data (FIG. 5; paragraph [0198], the modal divider 530 can divide the input data 590 into text unimodal input data 531 obtained by converting the audio data included in the input data 590 into text data, and sound unimodal input data of the audio data, such as a speech tone, magnitude and height; paragraphs [0207], the speech emotion recognizer 522 extracts feature points of input speech data.  Here, the speech feature points may include the tone, volume, waveform and the like of speech).

	Regarding claim 18, the combination of SHIN in view of Astarabadi discloses everything claimed as applied above (see claim 12), and SHIN further disclose wherein the auxiliary data comprises data related to the puppet object (FIG. 5; paragraph [0204], the image unimodal input data 533 including one or more pieces of face image data can be input to the face emotion recognizer 523 which performs deep learning using image learning data) and supplementary data related to a scene represented in the image (Paragraph [0340], according to an embodiment, for a user having resistance to exposure of the face and surrounding environments, the face and surrounding environments of the user can be recognized and a character and a background image can be generated on the basis of the recognized information and used. Thus, the background information is related to the user and surrounding environments of the user recognized in the received image).

	Regarding claim 19, the combination of SHIN in view of Astarabadi discloses everything claimed as applied above (see claim 12), and SHIN further disclose wherein the processor is further configured to perform a training stage (FIG. 17; paragraphs [0374]-[0375], Recognition (S1720) of emotions of the videotelephony user can be 
performed by the robot 100 … the robot 100 can include the emotion recognizer 74a recognizer 74a which includes an artificial neural network trained to recognize emotional information on the basis of image data and audio data, and when data received from the robot of the videotelephony user is input) by repeating the steps (Paragraph [0346], FIG. 17 is a flowchart showing a method for operating a robot according to an embodiment of the present disclosure and shows a method for operating a robot which recognizes emotions of a videotelephony partner during video telephony) of sample (FIG. 5; paragraph [0200], the modal divider 530 can separate image unimodal input data 533 including one or more pieces of face image data from the video data included in the input data 590), extract (Paragraphs [0207], the speech emotion recognizer 522 extracts feature points of input speech data.  Here, the speech feature points may include the tone, volume, waveform and the like of speech), Paragraph [0204], the image unimodal input data 533 including one or more pieces of face image data can be input to the face emotion recognizer 523 which performs deep learning using image learning data) Paragraph [0198], the modal divider 530 can divide the input data 590 into text unimodal input data 531 obtained by converting the audio data included in the input data 590 into text data, and sound unimodal input data of the audio data, such as a speech tone, magnitude and height).
	SHIN discloses an artificial neural network is implemented for training the emotional information recognition. SHIN does not disclose a training stage including produce and generate a target image.
In the similar field of endeavor, Astarabadi discloses a training stage including produce and generate a target image (Paragraph [0014], the first and second devices can also implement facial deconstruction and facial reconstruction models--such as trained on a population of users or the first user specifically based on deep learning or artificial intelligence techniques--to rapidly decompose a first video feed recorded at the first device into a first facial landmark feed and to reconstruct this first facial landmark feed into a first synthetic--but photorealistic--video depicting highest-import content from the first video feed (i.e., the first user's face); paragraph [0161], upon receipt of a facial landmark container and a corresponding audio packet from the first device, the second device can: extract audio data from the audio packet; insert the facial landmark container and the first face model of the first user into a local copy of the synthetic face generator--stored in local memory on the second device--to generate a synthetic face image; and render the synthetic face image over the first background within the video call portal (e.g., to form a "first synthetic video feed") while playing back the audio data via an integrated or connected audio driver; paragraph [0085], FIG. 3; in Block S202, the remote computer system can train the conditional generative adversarial network to output a synthetic face image based on a set of input conditions, including: a facial landmark container, which captures relative locations (and/or sizes, orientations) of facial landmarks that represent a facial expression; and a face model, which contains a (pseudo-) unique set of coefficients characterizing a unique human face and secondary physiognomic features (e.g., face shape, skin tone, facial hair, makeup, freckles, wrinkles, eye color, hair color, hair style, and/or jewelry)).
SHIN and Astarabadi are analogous art because both pertain to utilize the method for video calling. It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the processing of the emotion recognizer taught by SHIN incorporate the teachings of Astarabadi, and applying the synthetic face image generation taught by Astarabadi to implement the train stage within the operation steps of FIG. 17 taught by SHIN in order to train the  artificial neural network for producing the an expression representation based on the voice related features and obtaining and generating a target image. Therefore, it would 

	Regarding claim 20, SHIN discloses a method for voice driven animation of an object in an image, the method comprising: 
analyzing a received audio data (FIGS. 2 and 3; paragraph [0122], the robot 100 may include an audio input part 125 for receiving audio input of a user; paragraphs [0346]-[0347], FIG. 17 is a flowchart showing a method for operating a robot according to an embodiment of the present disclosure and shows a method for operating a robot which recognizes emotions of a videotelephony partner during video telephony; the robot 100 according to an embodiment of the present disclosure can receive audio data from a robot of a videotelephony partner (S1710)) to extract voice related features from the audio data (Paragraph [0378], step S1720 of recognizing emotional information of the videotelephony user may include a step in which the robot 100 transmits data received from the robot of the videotelephony user to the emotion recognition server 70 including an artificial neural network trained to recognize emotional information on the basis of the audio data and a step in which the robot 100 receives; FIG. 5 shows the emotion recognition server 70 including the emotion recognizer 74a; paragraph [0198], the modal divider 530 can divide the input data 590 into text unimodal input data 531 obtained by converting the audio data included in the input data 590 into text data, and sound unimodal input data of the audio data, such as a speech tone, magnitude and height; paragraphs [0207], the speech emotion recognizer 522 extracts feature points of input speech data.  Here, the speech feature points may include the tone, volume, waveform and the like of speech); 
generating, based on the voice related features, an expression representation (Paragraphs [0348-[0349], an emotion recognition result obtained from emotional information recognition may include a probability value for each emotional class; paragraph [0350], the controller 140 of the robot 100 can generate an avatar by mapping the recognized emotional information of the videotelephony partner to face information of the videotelephony partner included in the data received from the robot of the videotelephony partner (S1730)),  wherein the expression representation is related to an appearance of a facial area (Paragraphs [0210]-[0211], FIG. 6 is a diagram referred to in description of emotion recognition according to an embodiment of the present disclosure and illustrates components of an expression … eyebrows 61, eyes 62, cheeks 63, a forehead 64, a nose 65, a mouth 66 and a chin 67 may correspond to expression landmark points; paragraph [0247], FIGS. 7 to 10 are diagrams referred to in description of expression of characters according to an embodiment of the present disclosure; paragraph [0385], the robot 100 can understand emotional feature points of a user and reproduce recognized emotional feature points through an avatar.  For example, the robot 100 can recognize a unique feature point of a user (a specific emotional expression of a speaker) such as raising of the corners of the mouth when the user smiles and map the feature point to an avatar. Thus, the expression representation related to an appearance of a face area) while producing a sound Paragraph [0155], referring to FIG. 2, the audio output part 181 can be disposed on the left and right sides of the head 110 and output predetermined information as audio; paragraph [0207],  the speech emotion recognizer 522 extracts feature points of input speech data.  Here, the speech feature points may include the tone, volume, waveform and the like of speech.  The speech emotion recognizer 522 can identify a user's emotion by detecting the tone of the speech; paragraph [0295], the controller 140 of the robot 100 can map the emotional information of the user to the image data of the user and synchronize the audio data of the user therewith to generate a video of the avatar; paragraph [0368], the controller 140 can map the recognized emotional information of the videotelephony partner to the audio data of the videotelephony partner to generate converted audio data.  The audio output part 181 can utter the converted audio data under the control of the controller 140);
sampling an input video (Paragraph [0117], the robot 100 can include an image acquisition part 120 capable of photographing surroundings of the main body 101 and 102 within a predetermined range on the basis of the front side of the main body 101 and 102; paragraph [0118], the image acquisition part 120 may include a camera module. The camera module may include a digital camera. The digital camera may include … The digital signal processor can generate a video composed of frames configured as still images), depicting a puppet object (Paragraph [0021], the data related to the user may be video data in which the user has been photographed or real-time video data in which the user is photographed), to obtain Paragraphs [0346]-[0347], FIG. 17 is a flowchart showing a method for operating a robot according to an embodiment of the present disclosure and shows a method for operating a robot which recognizes emotions of a videotelephony partner during video telephony; the robot 100 according to an embodiment of the present disclosure can receive image from a robot of a videotelephony partner (S1710); paragraph [0378], step S1720 of recognizing emotional information of the videotelephony user may include a step in which the robot 100 transmits data received from the robot of the videotelephony user to the emotion recognition server 70; FIG. 5 shows the emotion recognition server 70 including the emotion recognizer 74a; paragraph [0197], the input data 590 may be video data including captured images of a user, and the video data may include video data including a captured image of the face of the user; paragraph [0200], the modal divider 530 can separate image unimodal input data 533 including one or more pieces of face image data from the video data included in the input data 590); 
predicting, from the image, auxiliary data related to the puppet object (Paragraph [0204], the image unimodal input data 533 including one or more pieces of face image data can be input to the face emotion recognizer 523 which performs deep learning using image learning data) and supplementary data related to a scene represented in the image (Paragraph [0340], according to an embodiment, for a user having resistance to exposure of the face and surrounding environments, the face and surrounding environments of the user can be recognized and a character and a background image can be generated on the basis of the recognized information and used. Thus, the background information is recognized from the received image); 
generating the facial area of the puppet object based on the expression representation and the auxiliary data (Paragraph [0358], the robot 100 can understand emotional feature points of a user and reproduce recognized emotional feature points through an avatar.  For example, the robot 100 can recognize a unique feature point of a user (a specific emotional expression of a speaker) such as raising of the corners of the mouth when the user smiles and map the feature point to an avatar); and 24P-597966-US 
generating a target image by combining the region of interest of the puppet object and the supplementary data (Paragraph [0373], according to an embodiment, for users having resistance to exposure of surrounding environments, a background image can be generated and the generated avatar can be displayed on the generated background image. Thus, a target image is generated based on the expression representation of the generated avatar and the extracted background image from the received image).
SHIN discloses videotelephony functions, camera is used to capture video images and the emotion recognizer recognizes emotion information based on the input data.
However, SHIN does not specifically disclose sampling an input video to obtain an image.
In the similar field of endeavor, Astarabadi discloses (Paragraph [0100], FIG. 4 shows a similar implementation of face model calculation with multiple images) Paragraph [0102], the device (or the remote computer system) can extract a set of frames from the video clip and then execute the foregoing methods and techniques to converge on a set of coefficients for each frame in this set.  For example, the device can: implement methods and techniques described above to detect the user's face in each frame in the video clip; implement the facial landmark extractor to generate a facial landmark container for each frame in the video clip; and select a subset of frames (e.g., ten frames, 32 frames, 64 frames)--from the video clip--that correspond to facial landmark containers exhibiting least similarity and/or greatest ranges of facial landmark values within this set of facial landmark containers.  More specifically, the device can compare facial landmark containers extracted from frames in the video clip to identify a subset of frames that represent a greatest range of face poses and facial expressions within the video clip; paragraph [0103], the device can then: select a first frame--from this subset of frames--associated with a first facial landmark container; extract a first authentic face image from a region of the first frame depicting the user's face. Thus, an image is obtained by sampling the input video).
SHIN and Astarabadi are analogous art because both pertain to utilize the method for video calling. It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the processing of the emotion recognizer taught by SHIN incorporate the teachings of Astarabadi, and applying the frame selection taught by Astarabadi to process the input video, identify a subset of frames that represent a greatest range of face poses and facial expressions 

Regarding claim 21, the combination of SHIN in view of Astarabadi discloses everything claimed as applied above (see claim 1), and SHIN further disclose wherein producing the expression representation is performed by a neural network that is trained to produce the expression representation based on the voice related features (FIGS. 1, 4 and 5; paragraph [0182], the emotion recognizer 74a according to an embodiment of the present disclosure may include a unimodal preprocessor 520 including a plurality of modal recognizers 521, 522 and 523 trained to recognize emotional information of a user included in unimodal input data, and a multimodal recognizer 510 which combines output data of the plurality of modal recognizers 521, 522 and 523 and is trained to recognize emotional information of a user included in the combined data; paragraph [0207], the speech emotion recognizer 522 extracts feature points of input speech data.  Here, the speech feature points may include the tone, volume, waveform and the like of speech.  The speech emotion recognizer 522 can identify a user's emotion by detecting the tone of the speech).

Regarding claim 22, the combination of SHIN in view of Astarabadi discloses everything claimed as applied above (see claim 12), and SHIN further disclose wherein the processor is further configured to produce the expression representation using a neural network that is trained to produce the expression representation based on the voice related features (FIGS. 1, 4 and 5; paragraph [0182], the emotion recognizer 74a according to an embodiment of the present disclosure may include a unimodal preprocessor 520 including a plurality of modal recognizers 521, 522 and 523 trained to recognize emotional information of a user included in unimodal input data, and a multimodal recognizer 510 which combines output data of the plurality of modal recognizers 521, 522 and 523 and is trained to recognize emotional information of a user included in the combined data; paragraph [0207], the speech emotion recognizer 522 extracts feature points of input speech data.  Here, the speech feature points may include the tone, volume, waveform and the like of speech.  The speech emotion recognizer 522 can identify a user's emotion by detecting the tone of the speech).

Claims 2 and 13 are rejected under 35 U.S.C. 103 as being unpatentable over SHIN et al (U.S. Patent Application Publication 2020/0410739 A1) in view of Astarabadi et al (U.S. Patent Application Publication 2020/0358983 A1) in view of Carmel et al (U.S. Patent Application Publication 2014/0355668 A1).

	Regarding claim 2, the combination of SHIN in view of Astarabadi discloses everything claimed as applied above (see claim 1), and SHIN disclose further FIG.3; paragraphs [0252]-[0253], an avatar can be generated by selecting one of basic animated characters that are stored in the storage 130 of the robot 100. Further, an avatar expressing a specific emotion of a user can be 
generated by changing expression landmark points of a generated animated 
character such that they correspond to recognized emotional information; FIG. 17; paragraph [0351], the controller 140 of the robot 100 can control the display 182 such that the generated avatar is displayed thereon (S1740)).
	However, SHIN does not specifically disclose appending the target image to a previously generated target image to produce an output video.      
In the similar field of endeavor, Carmel discloses (Abstract, a method of enabling iterative encoding of a video frame by a video encoder, comprising obtaining a video-encoder-state resulting from encoding of a previous input video frame and previous to encoding of a current input video frame, copying the video-encoder-state giving rise to a reserved state and obtaining a candidate current encoded video frame from the video encoder, and in case the candidate current encoded video frame does not meet an encoding criterion, copying the reserved state back to the video encoder to enable the video encoder to re-encode the current input video frame) appending the target image to a previously generated target image to produce an output video (Paragraph [0085], according to examples of the presently disclosed subject matter, the selected current encoded video frames can be appended to the previously encoded frame(s), and is thus placed in the output bitstream).  

  
	Regarding claim 13, the combination of SHIN in view of Astarabadi discloses everything claimed as applied above (see claim 12), and SHIN further disclose wherein the processor is further configured to produce an output video depicting animation of the puppet object featuring the audio data (FIG.3; paragraphs [0252]-[0253], an avatar can be generated by selecting one of basic animated characters that are stored in the storage 130 of the robot 100. Further, an avatar expressing a specific emotion of a user can be generated by changing expression landmark points of a generated animated character such that they correspond to recognized emotional information; FIG. 17; paragraph [0351], the controller 140 of the robot 100 can control the display 182 such that the generated avatar is displayed thereon (S1740)).
However, SHIN does not specifically disclose append the target image to a previously generated target image to produce an output video.
In the similar field of endeavor, Carmel discloses (Abstract, a method of enabling iterative encoding of a video frame by a video encoder, comprising obtaining a video-encoder-state resulting from encoding of a previous input video frame and previous to encoding of a current input video frame, copying the video-encoder-state giving rise to a reserved state and obtaining a candidate current encoded video frame from the video encoder, and in case the candidate current encoded video frame does not meet an encoding criterion, copying the reserved state back to the video encoder to enable the video encoder to re-encode the current input video frame) append the target image to a previously generated target image to produce an output video (Paragraph [0085], according to examples of the presently disclosed subject matter, the selected current encoded video frames can be appended to the previously encoded frame(s), and is thus placed in the output bitstream).  
SHIN and Carmel are analogous art because both pertain to utilize the method for producing the video output. It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the animated avatar generation taught by SHIN incorporate the teachings of Carmel, and applying the method of enabling iterative encoding of a video frame by a video encoder taught by Carmel to implement the avatar generation within the operation steps of FIG. 17 taught .

Claim 10 is rejected under 35 U.S.C. 103 as being unpatentable over SHIN et al (U.S. Patent Application Publication 2020/0410739 A1) in view of Astarabadi et al (U.S. Patent Application Publication 2020/0358983 A1) in view of Ingel et al (U.S. Patent Application Publication 2021/0224319 A1).

	Regarding claim 10, the combination of SHIN in view of Astarabadi discloses everything claimed as applied above (see claim 1).
However, SHIN does not specifically disclose wherein processing the received audio data comprises representing the received audio data by a spectrogram representation.
In the similar field of endeavor, Ingel discloses (Paragraph [0112], reference is now made to FIG. 1B, which shows an example of an artificial dubbing system 100 that receives a media stream in a first language, determines one or more voice profiles associated with speakers in the media stream, and outputs a media stream in a second language; paragraph [0296], consistent with the present disclosure, the received media stream may be a real-time conversation (e.g., a phone call, a video conference, or a recorded physical conversation) between the at least one individual and the particular user ) wherein processing the received audio data comprises representing the received audio data by a spectrogram representation (FIG. 8A is a flowchart of an example method for selectively selecting the language to dub in a media stream, in accordance with some embodiments of the disclosure; paragraph [0230], for example, step 802 may use step 432 and/or step 462 to receive the media stream; step 804, the processing device may obtain a transcript of the received media stream associated with utterances in the first language and utterances in the second language; paragraph [0231], step 806 may use the artificial neural network to analyze the transcript and determining whether dubbing is needed in the primary language and/or in the secondary language; paragraph [0232], step 808, analyze the received media stream to determine a set of voice parameters for each of the plurality of first individuals. As described above, voice profile determination module 406 may determine a voice profile for each one or more individuals speaking in the received media stream.  According to step 810, the processing device may determine a voice profile for each of the plurality of first individuals based on an associated set of voice parameters, or obtaining the voice profiles for the individuals in a different way; FIG. 4A; paragraph [0141], specifically, voice profile determination module 406 may determine the voice profile for each one or more individuals speaking in the received media stream by extracting spectral features, also referred to as spectral attributes, spectral envelope, or spectrogram from an audio sample of a single individual).
SHIN and Ingel are analogous art because both pertain to utilize the method for video call. It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the processing of the emotion recognizer taught by SHIN incorporate the teachings of Ingel, and applying the audio data processing taught by Ingel to analyze the received media stream in order to determine a set of voice parameters for each one or more individuals speaking in the received media stream by extracting spectral features “spectrogram”. Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify SHIN according to the relied-upon teachings of Ingel to obtain the invention as specified in claim.

Claim 11 is rejected under 35 U.S.C. 103 as being unpatentable over SHIN et al (U.S. Patent Application Publication 2020/0410739 A1) in view of Astarabadi et al (U.S. Patent Application Publication 2020/0358983 A1) in view of Palaskar et al (Learned in Speech Recognition: Contextual Acoustic Word Embeddings; Date of Conference: 12-17 May 2019; https://ieeexplore.ieee.org/abstract/document/8683868).

	Regarding claim 11, the combination of SHIN in view of Astarabadi discloses everything claimed as applied above (see claim 1).

In the similar field of endeavor, Palaskar discloses wherein processing the received audio data comprises representing the received audio data by an acoustic word embeddings representation (Page 6530; 1 introduction, the task of learning fixed-size representations for variable length data like words or sentences, either text or speech-based, is an interesting problem and a focus of much current research … Prior work towards the problem of learning word representations from variable length acoustic frames involved either providing word boundaries to align speech and text [5], or chunking (“chopping”or“padding”) input speech into fixed-length segments that usually span only one word [6, 7, 8, 9]. Since these techniques learn acoustic word embeddings from audio fragment and word pairs obtained via a given segmentation of the audio data, they ignore the specific audio context associated with a particular word).    
SHIN and Palaskar are analogous art because both pertain to utilize the method for audio analysis. It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the processing of the emotion recognizer taught by SHIN incorporate the teachings of Palaskar, and applying the learn in speech recognition from audio data taught by Palaskar to analyze the received media stream in order to learn acoustic word embeddings from audio data. Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date .

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Xilin Guo whose telephone number is (571)272-5786. The examiner can normally be reached Monday - Friday 9:00 AM-5:30 PM EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.

Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/XILIN GUO/           Primary Examiner, Art Unit 2616