DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claims 1-16,18-20 and 39 are pending in the application.

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claims 1-2, 4-7, 14-16, 18, 20 and 39 are rejected under 35 U.S.C. 102(a)(1)/102(a)(2) as being anticipated by Yoo et al. (US Publication 2016/0148080 A1, hereafter Yoo).
As per claim 1,  Yoo teaches the invention as claimed including a method (Abstract), comprising: 
obtaining a neural network comprising a first sub-neural network and a second sub-neural network; 
generating a plurality of preliminary feature vectors based on an image associated with a human face, the plurality of preliminary feature vectors comprising a color-based feature vector; 
obtaining at least one input feature vector based on the plurality of preliminary feature vectors; 
generating one or more deep feature vectors based on the at least one input feature vector using the first sub-neural network; and 
recognizing the human face based on the one or more deep feature vectors 
(Yoo discloses a face recognition method using a single recognizer pre-trained to recognize a plurality of elements simultaneously (Abstract). The plurality of elements includes an identity (ID) of the face in the input image, and at least one attribute associated with the input imaged elements, such as gender, age, ethnic group etc. (para. [0008]; FIG. 4). The recognizer may include a neural network and Yoo further teaches training such a recognizer by a multi-channel training process (para. [0009]-[0010]; FIG. 4; FIG. 6). As shown in FIG. 6, a plurality of feature images is generated from a face image. The feature images may be images expressing individual features of the face image, for example, a red, green, and blue (RGB) image associated with a color, a skin image associated with a skin probability, and an edge image associated with an outline (para. [0070]). Referring to FIG. 7, the trainer 120 may generate a plurality of feature images 720 from a face image 710. The trainer 120 may extract features 730 (corresponding to recited “preliminary feature vectors”; see FIG. 4 and para. [0054] for “feature vector”) for each image from the plurality of feature images 720. The trainer 120 may input the features 730 for each image into a recognizer 740. The recognizer 740 may predict a plurality of elements based on input values. The recognizer 740 may include a DCNN. The DCNN may predict the plurality of elements using convolution layers, fully connected layers, and a loss layer (para. [0074]-[0075]). As shown in FIG. 8, the DCNN includes a plurality of convolution layers 810, a plurality of fully connected layers 820, and loss layers 830 (also referring to FIG. 5). The convolution layers 810 extract deep features from input features, and the separately extracted deep features may be jointly connected in the fully connected layers 820. Therefore the convolution layers 810 and the fully connected layers 820 may correspond to the recited “first sub-neural network”. Referring to FIG. 5 and text in para. [0068], loss layer 530 may correspond to a linear classification module of the recognizer, the linear classification module configured to recognize a plurality of elements based on an output of the element feature output module of the recognizer. Therefore the loss layer 530/810 may correspond to the recited “second sub-neural network” (para. [0077]))

As per claim 2, dependent upon claim 1, Yoo teaches:
generating an output using the second sub-neural network based on the one or more deep feature vectors; and 
recognizing the human face based on the output (See rejections applied to claim 1 above).

As per claim 4, dependent upon claim 1, Yoo teaches that the first sub-neural network includes one or more secondary sub-neural networks with convolutional network architecture (FIG. 8 shows three secondary sub-neural networks 810 corresponding to three input channels respectively), and wherein the one or more secondary sub-neural networks include a feature layer configured to generate the one or more deep feature vectors (FIG. 8 #820; FIG. 9 “FLC Layer 1”).

As per claim 5, dependent upon claim 4, Yoo teaches that the feature layer is fully connected to a layer within at least one of the one or more secondary sub-neural networks (FIG. 8 #820).

As per claim 6, dependent upon claim 1, Yoo teaches that the obtaining at least one input feature vector based on the plurality of preliminary feature vectors comprises:
using at least one of the plurality of preliminary feature vectors as the at least one input feature vector (FIG. 7 #730 “RGB feature”, “Skin feature”, and/or “Edge feature”).

As per claim 7, dependent upon claim 6, Yoo teaches the plurality of preliminary feature vectors includes at least one of a texture-based feature vector or a gradient-based feature vector (FIG. 7 #730 “Edge feature”; para. [0071]).

As per claim 14, dependent upon claim 1, Yoo teaches the generating a plurality of preliminary feature vectors includes: 
generating a plurality of sub-images based on the image, wherein the plurality of sub-images corresponds to a plurality of parts of the image; and 
generating the plurality of preliminary feature vectors based on at least one of the plurality of the sub-images (FIG. 10 “Eye part image”, “Nose part image”; FIG. 11).

As per claim 15, dependent upon claim 2, Yoo teaches that the image includes a first image and a second image (FIG. 12 Face image, Eye part image/Nose part image), and the generating a plurality of preliminary feature vectors includes further: 
generating a plurality of first sub-images based on the first image (FIG. 11; FIG. 12 Face image: RGB image, Skin image, Edge image); 
generating a plurality of first preliminary feature vectors based on at least one of the plurality of the first sub-images (FIG. 12 the features corresponding to the plurality of first sub-images being a plurality of first preliminary feature vectors; FIG. 11 “128X128” corresponding to Face image); 
generating a plurality of second sub-images based on the second image (FIG. 11; FIG. 12 Eye/Nose part image: RGB image, Skin image, Edge image); 
generating a plurality of second preliminary feature vectors based on at least one of the plurality of second sub-images (FIG. 12 the features corresponding to the plurality of second sub-images being a plurality of second preliminary feature vectors; FIG. 11 “128X40”/“40X80” corresponding to eye/nose image); and 
the obtaining at least one input feature vector based on the plurality of preliminary feature vectors includes: 
obtaining at least one first input feature vector based on the plurality of first preliminary feature vectors (FIG. 11 Face 128X128); 
obtaining at least one second input feature vector based on the plurality of the second preliminary feature vectors (FIG. 11 Eye/Nose 128X40/40X80); 
the generating one or more deep feature vectors based on the at least one input feature vector using the first sub-neural network includes: 
generating a first deep feature vector based on the at least one first input feature vector using the first sub-neural network (FIG. 11 showing a plurality of secondary sub-neural networks for extracting deep feature vectors corresponding to input feature vectors of respective part images, including a first deep feature vector corresponding to face); 
generating a second deep feature vector based on the at least one second input feature vector through the first sub-neural network (FIG. 11 showing a plurality of secondary sub-neural networks for extracting deep feature vectors corresponding to input feature vectors of respective part images, including a second deep feature vector corresponding to eye/nose); and 
the generating an output using the second sub-neural network based on the one or more deep feature vectors includes: 
generating the output using the second sub-neural network based on the first deep feature vector and the second deep feature vector (FIG. 15D).

As per claim 16, dependent upon claim 15, Yoo teaches that the generating the output using the second sub-neural network based on the first deep feature vector and the second deep feature vector further comprises: 
generating a first intermediate associated with at least one of the plurality of second sub-images based on the first deep feature vector and the second deep feature vector; generating a second intermediate based on the first intermediates associated with the at least one of the second sub-images (From the analysis presented above in rejections applied to claim 15, the first deep feature vector corresponds to face deep feature, and the second deep feature vector corresponds to eye/nose deep feature. FIG. 11 shows a combined feature vector including the first deep feature vector and the second deep feature vector, and therefore including the first intermediate and the second intermediate.); and 
generating the output based on the second intermediate (FIG. 15D).

As per claim 18, dependent upon claim 1, Yoo further teaches: 
training at least part of the neural network comprising the first sub-neural network and the second sub-neural network; and 
tuning the at least part of the neural network (para. [0009]-[0010]; FIG. 4; FIG. 7; FIG. 4 showing training by using backward propagation. For example, the trainer 120 may propagate the losses 430 in a backward direction from the output layer through the hidden layer toward the input layer of the artificial neural network in the recognizer 420. While the losses 430 are propagated in the backward direction, the connection weights between the nodes may be updated to reduce the losses 430. As described above, the trainer 120 may train the recognizer 420 in view of the losses 430 corresponding to the plurality of elements. An updated recognizer 440 may be used for a subsequent training epoch, and the multi-task training operation described above may be performed iteratively until the losses 430 are less than a predetermined or, alternatively, desired threshold value. See para. [0058]).

Claim 20, a system claim, is rejected as applied to method claim 1 above. Yoo teaches a system corresponding to method claim 1 (para. [0098]). 

Claim 39, a medium claim, is rejected as applied to method claim 1 above. Yoo teaches a medium corresponding to method claim 1 (para. [0125]). 

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 3 and 12-13 are rejected under 35 U.S.C. 103 as being unpatentable over Yoo et al. (US Publication 2016/0148080 A1, hereafter Yoo), as applied above to claim 2, in view of Wang et al. (US Publication 2019/0026538 A1, hereafter Wang).
As per claim 3, Yoo does not teach determining a pose of the human face based on the output.
Wang discloses a joint face-detection and head-pose-angle-estimation system using a small-scale hardware CNN module (Abstract; FIG. 2B). As shown in FIG. 12, face-detection prediction output 1216 can include binary face classifiers, bounding box coordinates, and head-pose estimations. For a specific detected face from an input video image, face-detection predictions 1216 can include a binary face classifier, 4 bounding box coordinates, and 3 head-pose angles (i.e., yaw, pitch and roll) (para. [0151]). 
Taking the combined teachings of Yoo and Wang as a whole, it would have been obvious for a person with ordinary skill in the art before the effective filing date of the claimed invention to consider estimating face pose in order to generate more information regarding a recognized face. 

As per claim 12, dependent upon claim 2, Yoo in view of Wang teaches generating an output using the second sub-neural network based on the one or more deep feature vectors comprises: 
fusing the one or more deep feature vectors to form an ultimate feature vector (Wang FIG. 2B #222; para. [0081]; para. [0082] “… merging module 222 is configured to merge the array of feature maps 206 by concatenating the set of 3D output matrices based on the corresponding indices to form a merged 3D feature-map matrix, while preserving the spatial relationships of the set of subimages 204”); and 
generating the output using the second sub-neural network based on the ultimate feature vector (See rejections applied to claim 2).

As per claim 13, dependent upon claim 3, Yoo in view of Wang teaches the output comprises at least one posing parameter, and wherein the posing parameter comprises at least one of a yaw parameter or a pitch parameter (Wang para. [0151]).

Claims 8-9 are rejected under 35 U.S.C. 103 as being unpatentable over Yoo et al. (US Publication 2016/0148080 A1, hereafter Yoo), as applied above to claim 1, in view of Maaninen et al. (US Publication 2015/0199963 A1).
As per claim 8, Yoo does not teach the recited limitations. 
Maaninen teaches extracting a plurality feature vectors and stacking the feature vectors to generate a combined feature vector. Maaninen further teaches inputting the combined feature vector into a neural network for speech recognition (FIG. 2 #222; para. [0033]).
Taking the combined teachings of Yoo and Maaninen as a whole, it would have been obvious for a person with ordinary skill in the art before the effective filing date of the claimed invention to consider generating a combined feature vector by stacking a plurality of feature vectors in order for the neural network to operate input data in matrix format.

As per claim 9, dependent upon claim 8, Yoo in view of Maaninen teaches that the plurality of preliminary feature vectors includes at least one of a first texture-based feature vector or a second texture-based feature vector (Yoo para. [0008]: “a local binary pattern channel image”).

Claims 10-11 are rejected under 35 U.S.C. 103 as being unpatentable over Yoo et al. (US Publication 2016/0148080 A1, hereafter Yoo), as applied above to claim 5, in view of LAKSHMANAN (US Publication 2017/0357029 A1).
As per claim 10, Yoo teaches training the neural network by performing a backpropagation operation (FIG. 4; FIG. 7; para. [0058]). Yoo further teaches determining an error at the feature layer of the one or more secondary sub-neural networks, the error including a plurality of losses corresponding to a plurality of recognized elements (FIG. 5 #530 “Loss Layer”; FIG. 4 & 7 ID loss, gender loss, etc.). Yoo, however, does not teach the rest limitations.
LAKSHMANAN discloses a rain rate prediction neural network (Abstract). The neural network comprises a plurality of secondary sub-neural networks, each performing rain rate prediction associated with a time point, and an integration model unit which integrates the predicted rates over the time period (FIG. 7; FIG. 9B; para. [0140]-[0144]). The predicted rainfall amount is compared with an actual rainfall amount, determined based on received rainfall measurements, to determine an error. If the error does not satisfy certain criteria, then the error is apportioned to each of the time points (i.e., corresponding to each of the secondary sub-neural networks), the apportioned errors are back propagated via the hidden layers, and weights associated with nodes in the hidden layers are updated (Abstract; FIG. 8-10; para. [0159]-[0164]).
Taking the combined teachings of Yoo and LAKSHMANAN as a whole, it would have been obvious for a person with ordinary skill in the art before the effective filing date of the claimed invention to consider apportioning error among a plurality of secondary sub-neural networks as performed by LAKSHMANAN in order to improve training efficiency. More dispersed error can make the neural network to converge faster. 

As per claim 11, dependent upon claim 10, Yoo in view of LAKSHMANAN further  teaches:
dividing the error into the plurality of error portions based on the number of neural units of the feature layer of the one or more secondary sub-neural networks (LAKSHMANAN FIG. 8 shows a neural network with a plurality of layers. The last layer is an output node 840 producing an output Y 850. The previous hidden layer includes two nodes 830 and 832, which represents a feature layer, corresponding to the one or more secondary sub-neural networks. LAKSHMANAN apportions error value δ860 according to a count of the nodes in a previous hidden layer (para. [0149], [0167])).

Claim 19 is rejected under 35 U.S.C. 103 as being unpatentable over Yoo et al. (US Publication 2016/0148080 A1, hereafter Yoo), as applied above to claim 18, in view of Hsu et al. (Hsu, C-C., et al., “CNN-Based Joint Clustering and Representation Learning with Feature Drift Compensation for Large-Scale Image Data”, IEEE TRANSACTIONS ON MULTIMEDIA, 2017, published Aug., 2017, hereafter Hsu), and Shen et al. (US Publication 2017/0294010 A1, hereafter Shen).
As per claim 19, Yoo teaches generating a plurality of second features at a first feature layer of the first sub-neural network or a layer connecting to the feature layer (FIG. 8; FIG. 11), but does not teach the rest limitations. 
Hsu discloses a convolutional neural network (CNN) to jointly solve clustering and representation learning in an iterative manner (Abstract). Specifically, as shown in Fig. 2, features extracted by a CCNN are clustered using mini-batch k-means to assign cluster labels to individual input samples for a mini-batch of images randomly sampled from the input image set until all images are processed. Subsequently, the proposed CNN simultaneously updates the parameters of the proposed CNN and the centroids of image clusters iteratively based on stochastic gradient descent (Abstract; Fig. 2; pages 4-5 subsection C, D and E).
It is noticed that Yoo in view of Hsu does not further teach normalizing feature vectors before clustering the feature vectors.
Shen is evidenced that normalizing a plurality of feature vectors for clustering the plurality of feature vectors is well-known and practiced (para. [0060]).
Taking the combined teachings of Yoo, Hsu and Shen as a whole, it would have been obvious for a person with ordinary skill in the art before the effective filing date of the claimed invention to consider tuning a neural network as performed by Hsu in view of Shen in order to improve training efficiency and accuracy. 

Contact
Any inquiry concerning this communication or earlier communications from the examiner should be directed to XUEMEI G CHEN whose telephone number is (571)270-3480.  The examiner can normally be reached on Monday-Friday 9am-6pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Nay Maung can be reached on 571-272-7882.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/XUEMEI G CHEN/Primary Examiner, Art Unit 2664