DETAILED ACTION

	Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale or otherwise available to the public before the effective filing date of the claimed invention.

Claims 1-15 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by He et al. (“Wasserstein CNN: Learning Invariant Features for NIR-VIS Face Recognition”).
Regarding claim 1, He discloses, a method for training a cross-modal face recognition model, comprising: acquiring a first modal face recognition model trained using a first modality image of a face (See He, p. 1764, Fig. 1, where the two channels of the CNN have the same architecture and have the same shared parameters. Further see He p. 1763 left column last para, “To make our learned invariant features be robust to heterogeneous intra-class variations of individuals, we first train our convolutional network on large-scale VIS data.”)
and having a predetermined recognition precision; (See He p. 1767 Table 1, where “WCNN + low-rank” has an accuracy of 98.7 +- 0.3)
acquiring the first modality image of the face and a second modality image of the face; inputting the first modality image of the face and the second modality image of the face into the first modal face recognition model to obtain a feature value of the first modality image of the face and a feature value of the second modality image of the face; (See He, p. 1764, Fig. 1, where the NIR and VIS images are input to the CNN, and their output features are XN and XV)
and constructing a loss function based on a difference between the feature value of the first modality image of the face and the feature value of the second modality image of the face, (See He p. 1765 Equation (13), where the Loss Ldist is calculated based on the Wasserstein distance using difference between feature distribution of NIR and VIS images.)
and tuning a parameter of the first modal face recognition model based on the loss function until the loss function converges, to obtain a trained cross-modal face recognition model.  (See He p. 1765, right column, Section 3.2, “Under the WCNN training scheme, we employ mini-batch stochastic gradient descent to optimize the objective function, and use the statistics of each mini-batch to represent the means and standard deviations instead. … If the gradient descent method is used to minimize Eq. (14), we should update parameters W; Pi; Fi and Q. We follow the back-propagation method to update the vector of convolutional parameters Ɵ.”)

Regarding claim 2, He discloses, the method according to claim 1, wherein the acquiring the first modality image of the face and a second modality image of the face comprises: acquiring a number of first modality images of the face and an equal number of second modality images of the face. (See He p. 1767, left column 2nd para, “Following the protocols in [39], we select a subset of this database for our experiments, which includes 10 subjects from Oulu University and 30 subjects from CASIA. Eight face images for each expression are randomly selected from each of the NIR and VIS datasets. As a result, a total of 96 images (48 NIR images and 48 VIS images) are available for each subject.”)

Regarding claim 3, He discloses, the method according to claim 2, wherein the tuning a parameter of the first modal face recognition model based on the loss function until the loss function converges, comprises: tuning the parameter of the first modal face recognition model until at least one of following conditions is satisfied: a mean value of feature values of the first modality images of the face and a mean value of feature values of the second modality images of the face reach a predetermined similarity; and a variance of the feature values of the first modality images of the face and a variance of the feature values of the second modality images of the face reach the predetermined similarity.  (See He p. 1765, Equation (13), where the Loss Ldist is calculated based on the Wasserstein distance using difference between mean and variance of feature distributions of NIR and VIS images. 
Further see He p. 1765 right column, 3rd para, “Under the WCNN training scheme, we employ mini-batch stochastic gradient descent to optimize the objective function, and use the statistics of each mini-batch to represent the means and standard deviations instead.”)

Regarding claim 4, He discloses, the method according to claim 1, further comprising: fine-tuning the trained cross-modal face recognition model by using a first modality image and a second modality image of a given face, to obtain an optimized cross-modal face recognition model.  (See He p. 1765, right column, Section 3.2, “Under the WCNN training scheme, we employ mini-batch stochastic gradient descent to optimize the objective function, and use the statistics of each mini-batch to represent the means and standard deviations instead.” Further see He p. 1766 right column 2nd para, “Finally, the Softmax loss functions are separately used for the NIR and VIS representations as the supervisory signals.”)

Regarding claim 5, He discloses, the method according to claim 1, wherein the first modality image is an RGB image, and the second modality image is at least one of an NIR image or a Depth image. (See He p. 1764 Fig. 1, where a VIS RGB image and a NIR image are used.)

Regarding claim 6, He discloses, an electronic device, comprising: at least one processor; and a memory, communicated with the at least one processor, wherein the memory stores an instruction executable by the at least one processor, and the instruction is executed by the at least one processor, to enable the at least one processor to perform operations, comprising: (The image processing is inherently performed by a computer containing a processor and memory that has a program stored on it.)
acquiring a first modal face recognition model trained using a first modality image of a face and having a predetermined recognition precision; acquiring the first modality image of the face and a second modality image of the face; inputting the first modality image of the face and the second modality image of the face into the first modal face recognition model to obtain a feature value of the first modality image of the face and a feature value of the second modality image of the face; and constructing a loss function based on a difference between the feature value of the first modality image of the face and the feature value of the second modality image of the face, and tuning a parameter of the first modal face recognition model based on the loss function until the loss function converges, to obtain a trained cross-modal face recognition model.  (See the rejection of claim 1 as it is equally applicable for claim 6 as well.)

Regarding claim 7, He discloses, the electronic device according to claim 6, wherein the acquiring the first modality image of the face and a second modality image of the face comprises: acquiring a number of first modality images of the face and an equal number of second modality images of the face.  (See the rejection of claim 2 as it is equally applicable for claim 7 as well.)

Regarding claim 8, He discloses, the electronic device according to claim 7, wherein the tuning a parameter of the first modal face recognition model based on the loss function until the loss function converges, comprises: tuning the parameter of the first modal face recognition model until at least one of following conditions is satisfied: a mean value of feature values of the first modality images of the face and a mean value of feature values of the second modality images of the face reach a predetermined similarity; and a variance of the feature values of the first modality images of the face and a variance of the feature values of the second modality images of the face reach the predetermined similarity.  (See the rejection of claim 3 as it is equally applicable for claim 8 as well.)

Regarding claim 9, He discloses, the electronic device according to claim 6, wherein the operations further comprise: fine-tuning the trained cross-modal face recognition model by using a first modality image and a second modality image of a given face, to obtain an optimized cross-modal face recognition model.  (See the rejection of claim 4 as it is equally applicable for claim 9 as well.)

Regarding claim 10, He discloses, the electronic device according to claim 6, wherein the first modality image is an RGB image, and the second modality image is at least one of an NIR image or a Depth image.  (See the rejection of claim 5 as it is equally applicable for claim 10 as well.)

Regarding claim 11, He discloses, a non-transitory computer readable storage medium, storing a computer instruction, wherein the computer instruction is used to cause a computer to perform operations, comprising: (The image processing is inherently performed by a computer containing a computer readable storage medium that has program instructions store on it.)
acquiring a first modal face recognition model trained using a first modality image of a face and having a predetermined recognition precision; acquiring the first modality image of the face and a second modality image of the face; inputting the first modality image of the face and the second modality image of the face into the first modal face recognition model to obtain a feature value of the first modality image of the face and a feature value of the second modality image of the face; and constructing a loss function based on a difference between the feature value of the first modality image of the face and the feature value of the second modality image of the face, and tuning a parameter of the first modal face recognition model based on the loss function until the loss function converges, to obtain a trained cross-modal face recognition model.  (See the rejection of claim 1 as it is equally applicable for claim 11 as well.)

Regarding claim 12, He discloses, the non-transitory computer readable storage medium according to claim 11, wherein the acquiring the first modality image of the face and a second modality image of the face comprises: acquiring a number of first modality images of the face and an equal number of second modality images of the face.  (See the rejection of claim 2 as it is equally applicable for claim 12 as well.)

Regarding claim 13, He discloses, the non-transitory computer readable storage medium according to claim 12, wherein the tuning a parameter of the first modal face recognition model based on the loss function until the loss function converges, comprises: tuning the parameter of the first modal face recognition model until at least one of following conditions is satisfied: a mean value of feature values of the first modality images of the face and a mean value of feature values of the second modality images of the face reach a predetermined similarity; and a variance of the feature values of the first modality images of the face and a variance of the feature values of the second modality images of the face reach the predetermined similarity.  (See the rejection of claim 3 as it is equally applicable for claim 13 as well.)

Regarding claim 14, He discloses, the non-transitory computer readable storage medium according to claim 11, wherein the operations further comprise: fine-tuning the trained cross-modal face recognition model by using a first modality image and a second modality image of a given face, to obtain an optimized cross-modal face recognition model.  (See the rejection of claim 4 as it is equally applicable for claim 14 as well.)

Regarding claim 15, He discloses, the non-transitory computer readable storage medium according to claim 11, wherein the first modality image is an RGB image, and the second modality image is at least one of an NIR image or a Depth image.  (See the rejection of claim 5 as it is equally applicable for claim 15 as well.)

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Wu et al. (“RGB-Infrared Cross Modality Person Re-Identification”)

Any inquiry concerning this communication or earlier communications from the examiner should be directed to DAVID PERLMAN whose telephone number is        (571) 270-1417. The examiner can normally be reached on Monday - Friday; 10:00am - 6:30pm. 
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.  
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Sumati Lefkowitz can be reached on (571) 272-3638.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/DAVID PERLMAN/Primary Examiner, Art Unit 2662