DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statements (IDS) submitted on 9/6/19 and 9/9/19 are in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 3 – 11 and 13 - 20 are rejected under 35 U.S.C. 103 as being unpatentable over US Patent Application Publication No. 20130142426 (Kaneda et al.) in view of US Patent Application Publication No. 20200074240 (Desai et al.), US Patent Application Publication No. 20150324632 (Whitehill et al.) and US Patent Application Publication No. 20170185827 (Yamaya et al).
	Regarding claim 1, Kaneda et al. discloses: “obtaining a base facial image (FIG. 10: 210, 220; [0074]: “the image obtainment unit 210 obtains image data”; “the face detection unit 220 executes a face detection process on the image obtained by the image obtainment unit 210”); obtaining a first set of base facial features (FIG. 10: 240; [0074]: “The feature point detection unit 240 detects more detailed feature points, such as inner and outer corners of the eyes”) within the base facial image, the first set of base facial features selected (FIG. 1: 130; [0040]: “The region setting unit 130 sets a plurality of local regions”; FIG. 10: 250; [0075]: “The region setting unit 250 sets a feature extraction region”); obtaining a second set of base facial features within the base facial image, at least one facial feature in the second set of base facial features being different from those in the first set of base facial features (FIG. 10: 240; [0074]: “The feature point detection unit 240 detects more detailed feature points, such as inner and outer corners of the eyes”), the second set of facial features selected” (FIG. 1: 130; [0040]: “The region setting unit 130 sets a plurality of local regions”; FIG. 10: 250; [0075]: “The region setting unit 250 sets a feature extraction region”). 
	However, Kaneda et al. does not clearly disclose the remaining limitations of the claim.  To that end, Desai et al. discloses: “as associated with a first facial action unit (AU) to be detected in an analysis facial image ([0009]: “a first estimate that represents a first set of characteristics of the input scene”); and as associated with a second facial AU to be detected in the analysis facial image; ([0009]: “the second estimate representing a second set of characteristics of the input scene”; [0010]: “the first estimate and the second estimate may include estimated facial action units of the user's face, wherein a facial action unit represents an action of one or more muscles of the user's face and identifies a facial expression of the user”); and obtaining the analysis facial image” (ABSTRACT: “receiving rich sensor data, captured by a high-resolution sensor, of an input scene; receiving limited sensor data, captured by a low-resolution sensor, of the input scene”).  It is respectfully submitted that it would have been obvious to one of ordinary skill in the art at the time of the invention to combine Kaneda et al. with the invention of Desai et al. in order to provide first and second facial action units in an analysis facial image (e.g., see Desai et al. @ [0010]).
Kaneda et al. and Desai et al. does not clearly disclose the remaining limitations of the claims.  To that end, Whitehill et al. discloses: “to facilitate prediction of a probability of the first facial AU in the analysis facial image; and to facilitate prediction of a probability of the second facial AU in the analysis facial image” ([0030]: “predefined facial expressions may include action units from the Facial Action Coding System (FACS)”; “the facial expression metric 260 may include a probability that the facial image expresses a predefined facial expression.  As an example, a facial expression metric of 0.3 may indicate that there is a 30% chance that the person in the facial image is smiling (i.e., a smile probability value)”).  It is respectfully submitted that it would have been obvious to one of ordinary skill in the art at the time of the invention to further modify the combination of Kaneda et al. and Desai et al. with the invention of Whitehill et al. in order to provide a probability/confidence level in correctness of the facial action units of the analysis facial images (e.g., see Whitehill et al. @ [0030]).
	However, the combination of Kaneda et al., Desai et al. and Whitehill et al. does not clearly disclose the remaining limitations of the claim.  To that end, Yamaya et al. discloses: “applying a first image normalization to the analysis facial image using the first set of base facial features (FIG. 3: S101, S102; FIG. 4A; [0041]: “Specifically, when the emotion recognition model generation apparatus generates the first emotion recognition model 25, by normalizing each of the facial images for learning in step S102 on the basis of the positions of the eye area and the positions of the mouth area detected in step S101, each of the facial images for learning is normalized so as to include the mouth area, as shown in FIG. 4A”); and applying a second image normalization to the analysis facial image using the second set of base facial features” (FIG. 3: S101, S102; FIG. 4B; [0056]: Specifically, when generating the second emotion recognition model 26, the emotion recognition model generation apparatus normalizes each of the facial images for learning so as not to include the mouth area, as shown in FIG. 4B, by normalizing in step S102 each of the facial images for learning on the basis of the positions of the eye area and the positions of the nose area detected in step S101”).  It is respectfully submitted that it would have been obvious to one of ordinary skill in the art at the time of the invention to further modify the combination of Kenada et al., Desai et al. and Whitehill et al. with the invention of Yamaya et al. in order to provide image normalization using different sets of base facial features (e.g., see Yamaya et al. @ FIG. 4A & FIG. 4B).
	Regarding claim 11, Kaneda et al. discloses: “a non-transitory computer-readable medium containing instructions that, when executed by one or more processors ([Claim 9]: “A non-transitory computer-readable storage medium storing a computer program for causing a computer to execute” ), are configured to perform operations, the operations comprising: obtain a base facial image with a frontal face (FIG. 10: 210, 220; [0074]: “the image obtainment unit 210 obtains image data”; “the face detection unit 220 executes a face detection process on the image obtained by the image obtainment unit 210”); obtain a first set of base facial features (FIG. 10: 240; [0074]: “The feature point detection unit 240 detects more detailed feature points, such as inner and outer corners of the eyes”) within the base facial image, the first set of base facial features selected (FIG. 1: 130; [0040]: “The region setting unit 130 sets a plurality of local regions”; FIG. 10: 250; [0075]: “The region setting unit 250 sets a feature extraction region”); obtain a second set of base facial features within the base facial image, at least one facial feature in the second set of base facial features being different from those in the first set of base facial features (FIG. 10: 240; [0074]: “The feature point detection unit 240 detects more detailed feature points, such as inner and outer corners of the eyes”), the second set of facial features selected” (FIG. 1: 130; [0040]: “The region setting unit 130 sets a plurality of local regions”; FIG. 10: 250; [0075]: “The region setting unit 250 sets a feature extraction region”).
	In addition, Desai et al. discloses: “as associated with a first facial action unit (AU) to be detected in an analysis facial image ([0009]: “a first estimate that represents a first set of characteristics of the input scene”); and as associated with a second facial AU to be detected in the analysis facial image ([0009]: “the second estimate representing a second set of characteristics of the input scene”; [0010]: “the first estimate and the second estimate may include estimated facial action units of the user's face, wherein a facial action unit represents an action of one or more muscles of the user's face and identifies a facial expression of the user”); and obtain the analysis facial image” (ABSTRACT: “receiving rich sensor data, captured by a high-resolution sensor, of an input scene; receiving limited sensor data, captured by a low-resolution sensor, of the input scene”).
	Further, Whitehill et al. discloses: “to facilitate prediction of a probability of the first facial AU in the analysis facial image; and to facilitate prediction of a probability of the second facial AU in the analysis facial image” ([0030]: “predefined facial expressions may include action units from the Facial Action Coding System (FACS)”; “the facial expression metric 260 may include a probability that the facial image expresses a predefined facial expression.  As an example, a facial expression metric of 0.3 may indicate that there is a 30% chance that the person in the facial image is smiling (i.e., a smile probability value)”).
	Furthermore, Yamaya et al. discloses: “apply a first image normalization to the analysis facial image using the first set of base facial features (FIG. 3: S101, S102; FIG. 4A; [0041]: “Specifically, when the emotion recognition model generation apparatus generates the first emotion recognition model 25, by normalizing each of the facial images for learning in step S102 on the basis of the positions of the eye area and the positions of the mouth area detected in step S101, each of the facial images for learning is normalized so as to include the mouth area, as shown in FIG. 4A”); and apply a second image normalization to the analysis facial image using the second set of base facial features” (FIG. 3: S101, S102; FIG. 4B; [0056]: Specifically, when generating the second emotion recognition model 26, the emotion recognition model generation apparatus normalizes each of the facial images for learning so as not to include the mouth area, as shown in FIG. 4B, by normalizing in step S102 each of the facial images for learning on the basis of the positions of the eye area and the positions of the nose area detected in step S101”). 
	Claims 3-10 and 13-19 are ultimately dependent upon claims 1 and 11, respectively.  As discussed above, claims 1 and 11 are disclosed by the combination of Kenada et al., Desai et al. and Whitehill et al. with the invention of Yamaya et al. Thus, those limitations of claims 1 and 11, that are recited in claims 3-10 and 13-19, respectively, are also disclosed by the combination of Kenada et al., Desai et al. and Whitehill et al. with the invention of Yamaya et al.
	Further regarding claims 3 and 13, Desai et al. discloses: “the first set of base facial features are selected to be positioned at regions of a face that move during the first facial AU” (FIG. 12: AU 1 – AU 7; [0056]: Action units (as referenced above) are fundamental actions of individual muscles or groups of muscles”; “Out of these there are Action Unit estimates for approximately 35 Action Units with increased granularity.  The state-of-the-art AU prediction technology developed by OpenFace records about 17 of them (See 
FIG. 12, for example). The reason why the Facial Action Coding System (FACS) 
is powerful is that using FACS, human coders can manually code nearly any anatomically possible facial expression, deconstructing it into specific Action 
Units and their temporal segments that produced the expression”).
	Further with respect to claims 4 and 14, Whitehill et al. discloses: “a density of the first set of base facial features are selected to be more dense at the regions of the face that move during the first facial AU than regions of the face that do not move during the first facial AU” ([0030]: “the facial expression metric 260 may include an intensity indicator of a predefined facial expression found in the facial image.  For example, the intensity indicator may range from 0 to 10 for the predefined facial expression of smile.  An intensity indicator of 10 specifies a full smile, while an intensity indicator of 2 specifies a subtle smile”).
Desai et al. discloses: “the first set of base facial features are weighted ([0050]: “identifying one or more Action Unit cases at step 865 that are trickier for the RGB data to properly estimate.  Sample weights and class weights may be assigned based upon these cases, the Neural Network may be retrained using 
these new data distributions in addition to simply using the estimate RGB 
Action Unit estimates at step 870, and processing continues back at steps 820 
and 860”) such that facial features that move during the first facial AU (FIG. 12: AU 1 –AU 7) are more heavily weighted than facial features that do not move (FIG. 12: “NEUTRAL”) during the first facial AU”.
	Further with respect to claims 6 and 16, Desai et al. discloses:  “the weights of the first set of base facial features are determined based on at least one of an occlusion sensitivity map or a set of muscle groups used in movement associated with the first facial AU” [0010]: “the first estimate and the second estimate may include estimated facial action units of the user's face, wherein a facial action unit represents an action of one or more muscles of the user's face and identifies a facial expression of the user”).
	Further regarding claims 7 and 17, Desai et al. discloses: “training a facial analysis engine by performing operations on a plurality of training facial images ([0067]: “several 
processes are employed to train an estimator which is a deep neural network to 
estimate facial action units using action unit estimates”), the operations including: to train the facial analysis engine to identify the first facial AU in the first training facial image; and to train the facial analysis engine to identify the second facial AU in the second training facial image” (FIG. 8: 810, 815, 829, 850, 855; [0050]: “As is shown, at step 810, RGB-D image data including RGB images and corresponding depth maps is received”;  “… in step 815, this data is used to train models for Action Unit estimation, and at step 820 Action Units are estimated from the RGB-D data.  In parallel, the RGB data and the 
depth maps are separated at step 830, and RGB data map is generated at step 
835, is applied to a cropped portion of the acquired image responsible for 
Activation Unit activation at step 840.  Processing then continues at step 850 
where these two sets of output data are then used to train a Convolutional 
Neural Network to mimic the Action Unit estimation using the RGB data as an 
input.  The Convolutional Neural Network includes one or more convolutional 
neural network layers.  Thereafter, Action Units are estimated from the RGB 
data at step 855”).
	In addition, Yamaya et al. discloses: “applying the first image normalization to a first training facial image using the first set of base facial features (FIG. 3: S101, S102; FIG. 4A; [0041]: “Specifically, when the emotion recognition model generation apparatus generates the first emotion recognition model 25, by normalizing each of the facial images for learning in step S102 on the basis of the positions of the eye area and the positions of the mouth area detected in step S101, each of the facial images for learning is normalized so as to include the mouth area, as shown in FIG. 4A”); and applying the second image normalization to a second training facial image using the second set of base facial features” (FIG. 3: S101, S102; FIG. 4B; [0056]: Specifically, when generating the second emotion recognition model 26, the emotion recognition model generation apparatus normalizes each of the facial images for learning so as not to include the mouth area, as shown in FIG. 4B, by normalizing in step S102 each of the facial images for learning on the basis of the positions of the eye area and the positions of the nose area detected in step S101”).
	Further with respect to claims 8 and 18, Yamaya et al. discloses: “the first image normalization and the second image normalization are the same except for using the first set of base facial features and the second set of base facial features in the first image normalization and the second image normalization, respectively” (FIG. 3: S101, S102; FIG. 4A; [0041]: “Specifically, when the emotion recognition model generation apparatus generates the first emotion recognition model 25, by normalizing each of the facial images for learning in step S102 on the basis of the positions of the eye area and the positions of the mouth area detected in step S101, each of the facial images for learning is normalized so as to include the mouth area, as shown in FIG. 4A”; FIG. 3: S101, S102; FIG. 4B; [0056]: Specifically, when generating the second emotion recognition model 26, the emotion recognition model generation apparatus normalizes each of the facial images for learning so as not to include the mouth area, as shown in FIG. 4B, by normalizing in step S102 each of the facial images for learning on the basis of the positions of the eye area and the positions of the nose area detected in step S101”).
	Further regarding claims 9 and 19, Whitehill et al. discloses:  “estimating an intensity of at least one of the first facial AU and the second facial AU in the analysis image” ([0030]: “the facial expression metric 260 may include an intensity indicator of a predefined facial expression found in the facial image.  For example, the intensity indicator may range from 0 to 10 for the predefined facial expression of smile.  An intensity indicator of 10 
specifies a full smile, while an intensity indicator of 2 specifies a subtle smile”).
	Further with respect to claim 10, Desai et al. discloses: “the base facial image includes a forward-facing neutral-expression facial image” (FIG. 12: “NEUTRAL”).
	Further with respect to claim 20, Kaneda et al. discloses: “a system comprising: one or more processors (FIG. 10: 210, 220, 230, 240; [0122]: “Aspects of the present invention can also be realized by a computer of a system or apparatus (or devices such as a CPU or MPU) that reads out and executes a program recorded on a memory device to perform the functions of the above-described embodiment(s)”); one or more non-transitory computer-readable media containing instructions, which when executed by the one or more processors ([Claim 9]: “A non-transitory computer-readable storage medium storing a computer program for causing a computer to execute” ), cause the system to perform operations comprising: obtain a base facial image with a frontal face (FIG. 10: 210, 220; [0074]: “the image obtainment unit 210 obtains image data”; “the face detection unit 220 executes a face detection process on the image obtained by the image obtainment unit 210”); obtain a first set of base facial features (FIG. 10: 240; [0074]: “The feature point detection unit 240 detects more detailed feature points, such as inner and outer corners of the eyes”) within the base facial image, the first set of base facial features selected (FIG. 1: 130; [0040]: “The region setting unit 130 sets a plurality of local regions”; FIG. 10: 250; [0075]: “The region setting unit 250 sets a feature extraction region”); obtain a second set of base facial features within the base facial image, at least one facial feature in the second set of base facial features being different from those in the first set of base facial features (FIG. 10: 240; [0074]: “The feature point detection unit 240 detects more detailed feature points, such as inner and outer corners of the eyes”), the second set of facial features selected” (FIG. 1: 130; [0040]: “The region setting unit 130 sets a plurality of local regions”; FIG. 10: 250; [0075]: “The region setting unit 250 sets a feature extraction region”).
	In addition, Desai et al. discloses: “as associated with a first facial action unit (AU) to be detected in an analysis facial image([0009]: “a first estimate that represents a first set of characteristics of the input scene”); as associated with a second facial AU to be detected in the analysis facial image ([0009]: “the second estimate representing a second set of characteristics of the input scene”; [0010]: “the first estimate and the second estimate may include estimated facial action units of the user's face, wherein a facial action unit represents an action of one or more muscles of the user's face and identifies a facial expression of the user”); and obtain the analysis facial image” (ABSTRACT: “receiving rich sensor data, captured by a high-resolution sensor, of an input scene; receiving limited sensor data, captured by a low-resolution sensor, of the input scene”).
	Further, Whitehill et al. discloses: “to facilitate prediction of a probability of the first facial AU in the analysis facial image; and to facilitate prediction of a probability of the second facial AU in the analysis facial image” ([0030]: “predefined facial expressions may include action units from the Facial Action Coding System (FACS)”; “the facial expression metric 260 may include a probability that the facial image expresses a predefined facial expression.  As an example, a facial expression metric of 0.3 may indicate that there is a 30% chance that the person in the facial image is smiling (i.e., a smile probability value)”).
	Furthermore, Yamaya et al. discloses: “apply a first image normalization to the analysis facial image using the first set of base facial features (FIG. 3: S101, S102; FIG. 4A; [0041]: “Specifically, when the emotion recognition model generation apparatus generates the first emotion recognition model 25, by normalizing each of the facial images for learning in step S102 on the basis of the positions of the eye area and the positions of the mouth area detected in step S101, each of the facial images for learning is normalized so as to include the mouth area, as shown in FIG. 4A”); and apply a second image normalization to the analysis facial image using the second set of base facial features” (FIG. 3: S101, S102; FIG. 4B; [0056]: Specifically, when generating the second emotion recognition model 26, the emotion recognition model generation apparatus normalizes each of the facial images for learning so as not to include the mouth area, as shown in FIG. 4B, by normalizing in step S102 each of the facial images for learning on the basis of the positions of the eye area and the positions of the nose area detected in step S101”).

Claims 2 and 12 are rejected under 35 U.S.C. 103 as being unpatentable over Kaneda et al. in view of Desai et al., Whitehill et al., Yamaya et al. and US Patent Application Publication No. 20190357797 (Nestor et al).
	Claims 2 and 12 are dependent upon claims 1 and 11, respectively.  As discussed above, claims 1 and 11 are disclosed by the combination of Kaneda et al., Desai et al., Whitehill et al. and Yamaya et al.  Thus, the limitations of claims 1 and 11 that are recited in claims 2 and 12, respectively, are also disclosed by the combination of Kaneda et al., Desai et al., Whitehill et al. and Yamaya et al.  
Kaneda et al., Desai et al., Whitehill et al. and Yamaya et al. does not clearly disclose the remaining limitations of the claims.  To that end with respect to claims 2 and 12, Nestor et al. discloses: “applying the first image normalization includes applying a Procrustes analysis transformation using the first set of base facial features” ([0011]: “aligning at least one expressive face space to a neutral face space using Procrustes transformation, and projecting a remaining expressive face space into the neutral face space using parameters of a Procrustes mapping function”).  It is respectfully submitted that it would have been obvious to one of ordinary skill in the art at the time of the invention to further modify the combination of Kaneda et al., Desai et al., Whitehill et al. and Yamaya et al. with the invention of Nestor et al. in order to provide a normalization/alignment with the advantages of “face space invariance by using a Procrustes transformation (e.g., see Nestor et al. @ [0074]).

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MYRON K WYCHE whose telephone number is (571)272-3390.  The examiner can normally be reached on 7:30 am - 3:30 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kathy Wang-Hurst can be reached on 571-270-5371.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications 




/Myron Wyche/                            6/5/2021
Primary Examiner                       AU2644