Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION
Claim Objections
Claims 2-3 are objected to because of the following informalities:  Claims 2-3 recite “the electronic device” in line 2. There is insufficient antecedent basis for this limitation in the claim.  For purpose examination, Examiner interprets “the electronic device” as “the first electronic device”.  Appropriate correction is required.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claim 9, 10-18 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claim 9 depends from claim 8 recites “wherein the neutral network further comprises…”, but claim 8 does not recite “neutral network”.  It is not clear if claim 9 depends from claim 8 or different claims.  For purpose examination, Examiner interprets claim 9 depends from claim 1. 

Claim 17 depends from claim 8 recites “wherein the one or more programs further comprise instructions, which when executed by one or more processors of a first electronic device….”, but claim 8 does not recite “wherein the one or more programs further comprise instructions, which when executed by one or more processors of a first electronic device” It is not clear if claim 17 depends from claim 8 or different claims.  For purpose examination, Examiner interprets claim 17 depends from claim 1.
Claims 11-16 are rejected based on rejection of claim 10 since claims 11-16 directly or indirectly depends from claim 10.
Claim 18 is rejected based on rejection of claim 17 since claim 18 depends from claim 17.
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –




1.	Claims 1-4, 6-13, 17-23 are rejected under 35 U.S.C. 102(a) (1) as being anticipated by Shin et al., U.S Patent Application Publication No.2020/0090393 (“Shin”)
Regarding independent claim 1, Shin teaches a non-transitory computer-readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by one or more processors of a first electronic device (¶0389 “The method of operating the robot and the robot system according to an example embodiment can be implemented as a code readable by a processor on a recording medium readable by the processor. The processor-readable recording medium includes all kinds of recording apparatuses in which data that can be read by the processor is stored. Examples of the recording medium that can be read by the processor include a ROM, a RAM, a CD-ROM, a magnetic tape, a floppy disk, an optical data storage apparatus, and/or the like, and may also be implemented in the form of a carrier wave such as transmission over the Internet. In addition, the processor-readable recording medium may be distributed over network-connected computer systems so that code readable by the processor in a distributed fashion can be stored and executed”), cause the first electronic device to:
receive an audio input (¶0082 “he robot 100 may include a voice input unit 125 for receiving a speech input of a user. The voice input unit may also be called a speech input unit or a voice/speech input device); 
receive a video input including at least a portion of a user's face, wherein the video input is separate from the audio input (¶0080 “he image acquisition unit ¶0158 “For example, the input data 590 may be moving image data photographed by the user, and the moving image data may include image data in which the user's face or the like is photographed and audio data including a speech uttered by a user.”);
determine, one or more movements of the user's face based on the received audio input and received video input (¶0231-0234 “Referring to FIG. 11, the robot 100 may acquire data related to a user (S1110). [0232] The data related to the user may be moving image data that photographed a user or real-time moving image data that is photographing the user. The robot 100 may use both the stored data and the data inputted in real time. [0233] The data related to the user may include image data (including the face of the user) and voice data (uttered by the user). The image data including the face of the user may be acquired through a camera of the image acquisition unit 120, and the voice data uttered by the user may be acquired through a microphone of the voice input unit 125. [0234] The emotion recognizer 74a may recognize the emotion information of the user based on the data related to the user (S1120).” where emotion information based on the data related to the user which include image data and voice data);; and
 generate, using a neural network separately trained with a set of audio training data and a set of video training data (¶0178]-0179 “The plurality of recognizers (or plurality of recognition processors) for each modal may include an artificial neural network corresponding to input characteristics of the unimodal input data that are inputted respectively. A multimodal emotion recognizer 511 may include an , a set of characteristics for controlling an avatar representing the one or more movements of the user's face ([0190] The emotion recognizer 74a may output the plurality of unimodal emotion recognition results and one multimodal emotion recognition result as a level (probability) for each emotion class.[0191] For example, the emotion recognizer 74a may output the probability value for emotional classes of surprise, happiness, neutral, sadness, displeasure, anger, and fear, and there may be a higher probability of recognized emotional class as the probability value is higher. The sum of the probability values of seven emotion classes may be 100%.”;¶ 0241 “The robot 100 may generate an avatar character by mapping emotion information of the recognized user to the face information of the user included in the data related to the user (S1130).)”). 
Regarding claim 2, Shin teaches the non-transitory computer-readable storage medium of claim 1, wherein the audio input is received by a microphone of the electronic device (¶0233 “The data related to the user may include image data (including the face of the user) and voice data (uttered by the user). The image data including the face of the user may be acquired through a camera of the image 
Regarding claim 3, Shin teaches the non-transitory computer-readable storage medium of claim 1, wherein the video input is received by a camera of the electronic device (¶0233 “The data related to the user may include image data (including the face of the user) and voice data (uttered by the user). The image data including the face of the user may be acquired through a camera of the image acquisition unit 120, and the voice data uttered by the user may be acquired through a microphone of the voice input unit 125.”)
Regarding claim 4, Shin teaches the non-transitory computer-readable storage medium of claim 1, wherein the audio input and the video input are received from a second electronic device (¶0346-01347 “For example, the second robot 100b may receive, from the first robot 100a, image data photographed by the user of the first robot 100a, voice data uttered by the user of the first robot 100a, etc. (S1810). After that, the first robot 100a and the second robot 100b may transmit and receive data necessary for video call while continuously performing a video call. [0347] The second robot 100b, which received the image data and the voice data from the first robot 100a, may recognize the emotion of the user of the first robot 100a (i.e., the video call counterpart) based on the received image data and voice data (S1820).”)
Regarding claim 6, Shin teaches the non-transitory computer-readable storage medium of claim 1, wherein the one or more programs further comprise instructions, which when executed by one or more processors of a first electronic device, cause the first electronic device to:
 provide a set of audio training data to the neural network; provide a set of video training data to the neural network (¶0179 “For example, the facial emotion recognizer 523 for performing image-based learning and recognition may include a Convolutional Neural Network (CNN), the other emotion recognizers 521 and 522 include a deep-network neural network (DNN), and the multimodal emotion recognizer 511 may include an artificial neural network of a Recurrent Neural Network (RNN).”; ¶0184] The emotion recognizer 74a may use a total of four deep learning models including the deep learning model of three emotion recognizers for each modal 521, 522, 523 and the deep learning model of one multimodal recognizer 510.; and 
92113896318Attorney Docket No.: P42882US1/77870000335101 train the neural network using both the audio training data and the video training data (¶0184 “The emotion recognizer 74a may use a total of four deep learning models including the deep learning model of three emotion recognizers for each modal 521, 522, 523 and the deep learning model of one multimodal recognizer 510.” ¶0185 “The multimodal recognizer 510 may include a merger 512 (or hidden state merger) for combining the feature point vectors outputted from the plurality of recognizers for each modal 521, 522, and 523, and a multimodal emotion recognizer 511 that is learned to recognize emotion information of the user included in the output data of the merger 512.”)
Regarding claim 7, Shin teaches the non-transitory computer-readable storage medium of claim 6, wherein training the neural network using both the audio training data and the video training data includes at least one of: training the neural network with the audio training data and the video training data concurrently; training the neural network with the audio training data and without the video training data; and training the neural network with the video training data and without the audio training data (¶0179 “For example, the facial emotion recognizer 523 for performing image-based learning and recognition may include a Convolutional Neural Network (CNN), the other emotion recognizers 521 and 522 include a deep-network neural network (DNN), and the multimodal emotion recognizer 511 may include an artificial neural network of a Recurrent Neural Network (RNN).”; ¶0184] The emotion recognizer 74a may use a total of four deep learning models including the deep learning model of three emotion recognizers for each modal 521, 522, 523 and the deep learning model of one multimodal recognizer 510.” ¶0185 “The multimodal recognizer 510 may include a merger 512 (or hidden state merger) for combining the feature point vectors outputted from the plurality of recognizers for each modal 521, 522, and 523, and a multimodal emotion recognizer 511 that is learned to recognize emotion information of the user included in the output data of the merger 512.”)
Regarding claim 8, Shin teaches the non-transitory computer-readable storage medium of claim 1, wherein determining, a set of data representing one or more movements of the user's face based on the received audio input and received video input further comprises: 
determining a first set of data representing a first movement of the user's face (¶0210 “According to the embodiment, the robot 100 may generate an avatar character by synthesizing a facial expression landmark point image generated in correspondence with recognized emotion information on the face image data of the user as augmented reality. For example, the frowning eye, eyebrow, and forehead may cover the eye, eyebrow, and forehead of the user's face image in their own positions with augmented reality. Thus, an avatar character expressing the user's displeasure emotion may be generated.”’ ¶0220 “Referring to FIG. 8, when the emotion of the user is recognized as neutrality (or neutral), the avatar character may be generated as a smiling neutral expression 8”); and determining a second set of data representing a second movement of the user's face (¶0221 “ When the emotion of the user is recognized as a surprise, the avatar character may be generated showing a surprise expression 820 of raising eyebrows and opening the mouth”; ¶0242 “The avatar character may express individuality of the user by a character reflecting at least one of the features extracted from the face information of the user. For example, the avatar character may be generated by reflecting at least one of the facial expression landmark point extracted from the face information of the user. If the facial expression landmark point of a specific user is an eye, various emotions can be expressed by keeping the eye as a feature point. Alternatively, if eyes and mouth are considered as landmark point, eyes and mouth to a plurality of sample characters, or to characterize only eyes and mouth shapes like a caricature.”; ¶0217] If the recognized emotion level of the user is larger, the expression degree of specific emotion can be greatly changed in the default expression. For example, if the level of happiness is large, the degree of opening of the mouth, which is the landmark point included in the expression of the happiness emotion class, can be changed more widely.)
Regarding claim 9, Shin teaches the non-transitory computer-readable storage medium of claim 8, wherein the neural network further comprises a plurality of neural networks including a first neural network, a second neural network, and a third neural network (¶0239] As described with reference to FIG. 5, the server 70 including the emotion recognizer 74a may include a plurality of artificial neural networks learned by the unimodal input, and may include an artificial neural network learned by the multi-modal input based on the plurality of unimodal inputs”,  ¶0335 “ As described with reference to FIG. 5, the emotion recognition server 70 may include a plurality of artificial neural networks 521, 522, and 523 learned by the unimodal input. The emotion recognition server 70 may include an artificial neural network 511 learned by the multimodal input based on the plurality of unimodal inputs. The neural networks 511, 521, 522, 523 included in the emotion recognition server 70 may be an artificial neural network suitable for respective input data”)
Regarding claim 10, Shin teaches the non-transitory computer-readable storage medium of claim 9, wherein generating, using a neural network separately trained with a set of audio training data and a set of video training data, a set of characteristics for controlling an avatar representing the one or more movements of the user's face further comprises: 
generating, with the first neural network, a first set of characteristics representing the first movement of the user's face (¶0335 “ As described with reference to FIG. 5, the emotion recognition server 70 may include a plurality of artificial neural networks 521, 522, and 523 learned by the unimodal input. The emotion recognition server 70 may include an artificial neural network 511 learned by the multimodal input based on the plurality of unimodal inputs. The neural networks 511, 521, 522, 523 included in the emotion recognition server 70 may be an artificial neural network suitable for respective input data” ¶0164] The sound unimodal input data 532 ; 
generating, with the second neural network, a second set of characteristics representing the second movement of the user's face (¶0335 “ As described with reference to FIG. 5, the emotion recognition server 70 may include a plurality of artificial neural networks 521, 522, and 523 learned by the unimodal input. The emotion recognition server 70 may include an artificial neural network 511 learned by the multimodal input based on the plurality of unimodal inputs. The neural networks 511, 521, 522, 523 included in the emotion recognition server 70 may be an artificial neural network suitable for respective input data”; ¶0165 “The image unimodal input data 533 (including one or more face image data) may be inputted, while being used as the image learning data, to a face emotion recognizer 523 (or face emotion recognition processor) that performs deep learning.”; ¶0170 “The face emotion recognizer 523 may recognize the facial expression of the user by detecting the facial area of the user in the input image data and recognizing facial expression landmark point information which is the feature points constituting the facial expression. The face emotion recognizer 523 may output the emotion class corresponding to the recognized facial expression or the ; and 
93 113896318Attorney Docket No.: P42882US1/77870000335101 generating, with the third neural network, a combined set of characteristics representing the first movement and the second movement of the user's face (¶0185 “The multimodal recognizer 510 may include a merger 512 (or hidden state merger) for combining the feature point vectors outputted from the plurality of recognizers for each modal 521, 522, and 523, and a multimodal emotion recognizer 511 that is learned to recognize emotion information of the user included in the output data of the merger 512”; ¶0186 “The merger 512 may synchronize the output data of the plurality of recognizers for each modal 521, 522, and 523, and may combine (vector concatenation) the feature point vectors to output to the multimodal emotion recognizer 511”; ¶0188 “For example, the multimodal emotion recognizer 511 may output the emotion class having the highest probability among a certain number of preset emotion classes as the emotion recognition result, and/or may output a probability value for each emotion class as the emotion recognition result.”)
Regarding claim 11, Shin teaches the non-transitory computer-readable storage medium of claim 10, wherein the first neural network is trained with the audio training data ¶0335 “ As described with reference to FIG. 5, the emotion recognition server 70 may include a plurality of artificial neural networks 521, 522, and 523 learned by the unimodal input. The emotion recognition server 70 may include an artificial neural network 511 learned by the multimodal input based on the plurality of unimodal inputs. The neural networks 511, 521, 522, 523 included in the emotion recognition server 70 may be an artificial neural network suitable for respective input  and the second neural network is trained with the video training data(¶0335 “ As described with reference to FIG. 5, the emotion recognition server 70 may include a plurality of artificial neural networks 521, 522, and 523 learned by the unimodal input. The emotion recognition server 70 may include an artificial neural network 511 learned by the multimodal input based on the plurality of unimodal inputs. The neural networks 511, 521, 522, 523 included in the emotion recognition server 70 may be an artificial neural network suitable for respective input data”; ¶0165 “The image unimodal input data 533 (including one or more face image data) may be inputted, while being used as the image learning data, to a face emotion recognizer 523 (or face emotion recognition processor) that performs deep learning.”)
Regarding claim 12, Shin teaches the non-transitory computer-readable storage medium of claim 11, wherein the first set of data representing the first movement of the user's face and the first set of characteristics are determined based on the received audio data (¶0165 “The image unimodal input data 533 (including one or more face image data) may be inputted, while being used as the image learning data, to a face emotion recognizer 523 (or face emotion recognition processor) that performs deep learning.”; ¶0170 “The face emotion recognizer 523 may recognize the facial expression of the user by detecting the facial area of the user in the input image data and recognizing facial expression landmark point information which is the feature points constituting the facial expression. The face emotion recognizer 523 
Regarding claim 13, Shin teaches the non-transitory computer-readable storage medium of claim 11, wherein the second set of data representing the second movement of the user's face and the second set of characteristics is based on the received video data separate from the audio data (¶0165 “The image unimodal input data 533 (including one or more face image data) may be inputted, while being used as the image learning data, to a face emotion recognizer 523 (or face emotion recognition processor) that performs deep learning.”; ¶0170 “The face emotion recognizer 523 may recognize the facial expression of the user by detecting the facial area of the user in the input image data and recognizing facial expression landmark point information which is the feature points constituting the facial expression. The face emotion recognizer 523 may output the emotion class corresponding to the recognized facial expression or the probability value for each emotion class, and also output the facial feature point (facial expression landmark point) vector.”)
Regarding claim 17, Shin teaches the non-transitory computer-readable storage medium of claim 8, wherein the one or more programs further comprise instructions, which when executed by one or more processors of a first electronic device, cause the first electronic device to: 
generate an avatar representing the user (¶0211 “Alternatively, the robot 100 may first generate the animation character based on face information of the user. Such an animation character may also be generated by reflecting the detected facial ; and  94 113896318Attorney Docket No.: P42882US1/77870000335101 
animate the avatar using the combined set of characteristics representing the first movement and the second movement of the user's face (¶00211] “Additionally, the robot 100 may change the facial expression landmark points of the generated animation character to correspond to the recognized emotion information, thereby generating an avatar character expressing the specific emotion of the user.” ¶0220-0222 “ Referring to FIG. 8, when the emotion of the user is recognized as neutrality (or neutral), the avatar character may be generated as a smiling neutral expression 810. The neutral expression 810 may be set to a default expression that is used when the robot 100 does not recognize a particular emotion.  [0221] When the emotion of the user is recognized as a surprise, the avatar character may be generated showing a surprise expression 820 of raising eyebrows and opening the mouth.  [0222] When the emotion of the user is recognized as a displeasure, the avatar character may be generated showing a displeasure expression 830 of dropping the corner of his mouth and frowning”):   .
Regarding claim 18, Shin teaches the non-transitory computer-readable storage medium of claim 17, wherein animating the avatar using the combined set of characteristics representing the first movement and the second movement of the user's face further comprises: 
animating a first portion of the avatar using the first set of characteristics representing the first movement of the user's face; and animating a second portion of the avatar using the second set of characteristics representing the second movement of the user's face (¶0220-0222 “ Referring to FIG. 8, when the emotion of the user is recognized as neutrality (or neutral), the avatar character may be generated as a smiling neutral expression 810. The neutral expression 810 may be set to a default expression that is used when the robot 100 does not recognize a particular emotion.  [0221] When the emotion of the user is recognized as a surprise, the avatar character may be generated showing a surprise expression 820 of raising eyebrows and opening the mouth.  [0222] When the emotion of the user is recognized as a displeasure, the avatar character may be generated showing a displeasure expression 830 of dropping the corner of his mouth and frowning”)
Regarding claim 19, Shin teaches the non-transitory computer-readable storage medium of claim 1, wherein the one or more programs further comprise instructions, which when executed by one or more processors of a first electronic device, cause the first electronic device to: 
generate an avatar representing the user (¶0262 “According to the embodiment, the robot 100 may generate an avatar character by synthesizing a facial expression landmark point image generated in correspondence with recognized emotion information on the face image data of the user, with augmented reality.”); and 
animate the avatar using the set of characteristics representing the one or more movements of the user's face (¶0263 “Alternatively, the robot 100 may first generate the animation character based on the face information of the user. Such an animation character may also be generated by reflecting the detected landmark points of the user. The robot 100 may change the facial expression landmark points of the 
	Regarding claim 20, Shin teaches the non-transitory computer-readable storage medium of claim 19, wherein animating the avatar using the set of characteristics representing the one or more movements of the user's face further comprises: 
animating a first portion of the avatar using a first portion of the set of characteristics representing a first movement of the user's face (¶0224 “FIG. 9 illustrates facial expressions of an avatar character expressing the emotion class of anger. Referring to FIGS. 9(a) and 9(b), a first anger expression 910 and a second anger expression 920 may express shapes of eyes and mouth differently.” where shape of eyes); and 
animating a second portion of the avatar using a second portion of the set of characteristics representing a second movement of the user's face (¶0225 “FIG. 10 illustrates facial expressions of an avatar character expressing the emotion class of happiness. Referring to FIGS. 10(a), 10(b), and 10(c), a first happiness expression 1010, a second happiness expression 1020, and a third happiness expression 1030 may express shapes of the eyes and the mouth differently.” where shapes of the mouth)
Regarding claim 21, Shin teaches the non-transitory computer-readable storage medium of claim 19, wherein the one or more programs further comprise instructions, which when executed by one or more processors of a first electronic device, cause the first electronic device to:
 display the animated avatar on a screen of the electronic device (¶0066”The robot 100 may include a head 110 disposed in the upper side of the main body. A display 182 for displaying an image may be disposed on the front surface of the head 110”; ¶0284 “ The robot may recognize emotion such as happiness, sadness, anger, surprise, fear, neutrality, and displeasure of at least one of the video call participants, map the recognized emotion to the character, and display this during a call.”)
Regarding claim 22, Shin teaches an electronic device comprising: one or more processors; a memory; and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs including instructions for (¶0389] “he method of operating the robot and the robot system according to an example embodiment can be implemented as a code readable by a processor on a recording medium readable by the processor. The processor-readable recording medium includes all kinds of recording apparatuses in which data that can be read by the processor is stored. Examples of the recording medium that can be read by the processor include a ROM, a RAM, a CD-ROM, a magnetic tape, a floppy disk, an optical data storage apparatus, and/or the like, and may also be implemented in the form of a carrier wave such as transmission over the Internet. In addition, the processor-readable recording medium may be distributed over network-connected computer systems so that code readable by the processor in a distributed fashion can be stored and executed”): Remaining of claim 22 is similar in scope to claim 1, and therefore rejected under the same rationale. 
a method, comprising: at an electronic device with one or more processors and memory (¶0389] “he method of operating the robot and the robot system according to an example embodiment can be implemented as a code readable by a processor on a recording medium readable by the processor. The processor-readable recording medium includes all kinds of recording apparatuses in which data that can be read by the processor is stored. Examples of the recording medium that can be read by the processor include a ROM, a RAM, a CD-ROM, a magnetic tape, a floppy disk, an optical data storage apparatus, and/or the like, and may also be implemented in the form of a carrier wave such as transmission over the Internet. In addition, the processor-readable recording medium may be distributed over network-connected computer systems so that code readable by the processor in a distributed fashion can be stored and executed”): 
receiving an audio input (¶0082 “he robot 100 may include a voice input unit 125 for receiving a speech input of a user. The voice input unit may also be called a speech input unit or a voice/speech input device);
 receiving a video input including at least a portion of a user's face, wherein the video input is separate from the audio input (¶0080 “he image acquisition unit 120 may photograph the front direction of the robot 100, and may photograph an image for user recognition”; ¶0158 “For example, the input data 590 may be moving image data photographed by the user, and the moving image data may include image data in which the user's face or the like is photographed and audio data including a speech uttered by a user.”); 
determining a set of data representing one or more movements of the user's face based on the received audio input and received video input (¶0231-0234 “Referring to FIG. 11, the robot 100 may acquire data related to a user (S1110). [0232] The data related to the user may be moving image data that photographed a user or real-time moving image data that is photographing the user. The robot 100 may use both the stored data and the data inputted in real time. [0233] The data related to the user may include image data (including the face of the user) and voice data (uttered by the user). The image data including the face of the user may be acquired through a camera of the image acquisition unit 120, and the voice data uttered by the user may be acquired through a microphone of the voice input unit 125. [0234] The emotion recognizer 74a may recognize the emotion information of the user based on the data related to the user (S1120).” where emotion information based on the data related to the user which include image data and voice data); and 
generating, using a neural network separately trained with a set of audio training data and a set of video training data(¶0178]-0179 “The plurality of recognizers (or plurality of recognition processors) for each modal may include an artificial neural network corresponding to input characteristics of the unimodal input data that are inputted respectively. A multimodal emotion recognizer 511 may include an artificial neural network corresponding to characteristics of the input data. [0179] For example, the facial emotion recognizer 523 for performing image-based learning and recognition may include a Convolutional Neural Network (CNN), the other emotion recognizers 521 and 522 include a deep-network neural network (DNN), and the multimodal emotion recognizer 511 may include an artificial neural network of a , a set of characteristics for controlling an avatar representing the one or more movements of the user's face (¶0190] The emotion recognizer 74a may output the plurality of unimodal emotion recognition results and one multimodal emotion recognition result as a level (probability) for each emotion class.[0191] For example, the emotion recognizer 74a may output the probability value for emotional classes of surprise, happiness, neutral, sadness, displeasure, anger, and fear, and there may be a higher probability of recognized emotional class as the probability value is higher. The sum of the probability values of seven emotion classes may be 100%.”;¶ 0241 “The robot 100 may generate an avatar character by mapping emotion information of the recognized user to the face information of the user included in the data related to the user (S1130).)”)
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having 

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
1.	Claim 5 is rejected under 35 U.S.C. 103 as being unpatentable over Shin et al., U.S Patent Application Publication No.2020/0090393 (“Shin”) in view of Pike et al, U.S Patent Application Publication No. 20180232688 (“Pike”)
Regarding claim 5, Shin teaches the non-transitory computer-readable storage medium of claim 1, wherein the video input includes at least a portion of a first user's face  and wherein the audio input includes speech (¶0233 “The data related to the user may include image data (including the face of the user) and voice data (uttered by the user). The image data including the face of the user may be acquired through a camera of the image acquisition unit 120, and the voice data uttered by the user may be acquired through a microphone of the voice input unit 125.”) Shin is understood to be silent on the remaining limitations of claim 5.
In the same field of endeavor, Pike teaches wherein the audio input includes speech of a second user (¶0034] In an embodiment, the one or more utterances from the user can be received contemporaneously with the one or more images from the imaging device. Alternatively, the one or more utterances from the user can be received 
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify generating a character expressing emotion of a video call counterpart of Shin with receiving utterances from a first user and receiving images from a second user as seen in Pike because this modification would receive utterances separately in time from images (¶0034 of Pikes).
Thus, the combination of Shin and Pike teaches wherein the video input includes at least a portion of a first user's face and wherein the audio input includes speech of a second user.
2.  Claims 14-16 are rejected under 35 U.S.C. 103 as being unpatentable over Shin et al., U.S Patent Application Publication No.2020/0090393 (“Shin”) in view of el Kaliouby et al., U.S Patent Application Publication No 20190012599 (“el Kaliouby”)
Regarding claim 14, Shin teaches the non-transitory computer-readable storage medium of claim 10, wherein the second neural network is trained with the video training data(¶0335 “ As described with reference to FIG. 5, the emotion recognition server 70 may include a plurality of artificial neural networks 521, 522, and 523 learned by the unimodal input. The emotion recognition server 70 may include an artificial neural network 511 learned by the multimodal input based on the plurality of unimodal inputs. The neural networks 511, 521, 522, 523 included in the emotion recognition server 70 may be an artificial neural network suitable for respective input 
In the same field of endeavor, el Kaliouby teaches wherein the first neural network is trained with the audio training data and the video training data (¶0057 “FIG. 3 illustrates a high-level diagram for deep learning. Multimodal machine learning can be based on deep learning. A plurality of information channels is captured into a computing device such as a smartphone, personal digital assistant (PDA), tablet, laptop computer, and so on. The plurality of information channels includes contemporaneous audio information and video information from an individual. Trained weights are learned on a multilayered convolutional computing system. The trained weights are learned using the audio information and the video information from the plurality of information channels. The trained weights cover both the audio information and the video information and are trained simultaneously. The learning facilitates emotional analysis of the audio information and the video information. Further information is captured into a second computing device. The second computing device and the first computing device may be the same computing device. The further information can include physiological information, contextual information, and so on. The further information is analyzed using the trained weights to provide an emotion metric based on the further information.”; ¶0059 “Deep learning is a branch of machine learning which seeks to imitate in software the activity which takes place in layers of neurons in the neocortex of the human brain. Deep learning applications include processing of image data, audio data, and so on. 
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify using first neural network is trained with the audio  training data of Shin with training using audio information and video information as seen in el Kaliouby because this modification would facilitate emotional analysis of the audio information and the video information (¶0057 of el Kaliouby).
Thus, the combination of Shin and el Kaliouby teaches wherein the first neural network is trained with the audio training data and the video training data and the second neural network is trained with the video training data.
Regarding claim 15, Shin and el Kaliouby teach the non-transitory computer-readable storage medium of claim 14, wherein the first set of data representing the first movement of the user's face and the first set of characteristics is based on the received audio data and the received video data (¶0102 of el Kaliouby “Image analysis can include detection of facial expressions and can be performed for 
Regarding claim 16, Shin and el Kaliouby teach the non-transitory computer-readable storage medium of claim 14, wherein the second set of data representing the second movement of the user's face and the second set of characteristics is based on the received video data(¶0165 of Shin “The image unimodal input data 533 (including one or more face image data) may be inputted, while being used as the image learning data, to a face emotion recognizer 523 (or face emotion recognition processor) that performs deep learning.”; ¶0170 “The face emotion recognizer 523 may recognize the facial expression of the user by detecting the facial area of the user in the input image data and recognizing facial expression landmark point information which is the feature points constituting the facial expression. The face emotion recognizer 523 may output the emotion class corresponding to the recognized facial expression or the probability value for each emotion class, and also output the facial feature point (facial expression landmark point) vector.”)



Contact

Any inquiry concerning this communication or earlier communications from the examiner should be directed to SARAH LE whose telephone number is (571)270-7842.  The examiner can normally be reached on Monday: 8AM-4:30PM EST, Tuesday: 8 AM-3:30PM EST, Wednesday: 8AM-2:30PM EST, Thursday and Friday off.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Mark Zimmerman can be reached on 571-272-7653.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access 






/SARAH LE/Primary Examiner, Art Unit 2619