DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Amendment
Acknowledgement is made of Applicant’s claim amendments on 07/23/2021. The claim amendments are entered. Presently, claims 1-3, 5-6, 8-21 remain pending. Claims 1, 2, 6, 12, 13, and 17-19 have been amended.
Response to Arguments
Applicant’s arguments with respect to claim(s) 1, 12, and 18 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having 

Claims 1, 2, 3, 5, 6, 8, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Moon et al. (US-9317785-B1) in view of Kwon (US-20170068887-A1) and Kang et al. (US-20170083829-A1).
Regarding Claim 1,
Moon teaches an apparatus to demographically classify an individual, the apparatus comprising: 
a neural network structured to have an input layer, a first output layer, a second output layer subsequent to the first output layer, …the neural network structured to: 
…the inputs based on demographic information for the individual (Fig. 1; Col. 6 lines 37-42; The next multi-category decomposition architecture setup 842 step prepares the learning machines corresponding to all of the (gender, age) classes. The multi-category decomposition architecture training 844 step trains all of the learning machines so that they generate the desired outputs for the set of input facial images.),…
…the first outputs representing a plurality of first possible classifications of the individual according to a demographic classification system at a first hierarchical level (Fig. 4; Col. 7 lines 36-39; For example, the specialized classifier 1 817 is tuned to (male, "ethnicity class 1", child) class. The next classifier is tuned to (male, "ethnicity class 1", young adult), and so on.), and 
process the first outputs to form second outputs at the second output layer (Fig. 15; Col. 11 lines 12-23; The ethnicity output vectors are weighted by the auxiliary class likelihood output 848, and are added together to compute the final ethnicity vector 861: ("ethnicity class 1" sum, "ethnicity class 2" sum, "ethnicity class 3" sum). The final decision chooses the ethnicity label that has the highest score. For example, if "ethnicity class 2" score>"ethnicity class 1" score and "ethnicity class 2" score>"ethnicity class 3" score, then "ethnicity class 2" is chosen as the ethnicity of the input facial image.), the second outputs representing possible combined classifications of the individual, the possible combined classification corresponding to combinations of the plurality of first possible classifications and a plurality of second possible classifications of the individual -2-U.S. Application. No. 15/447,909Attorney Docket No. 20004/81154116US01 Response to the Office Action dated February 6, 2020 according to the demographic classification system at a second hierarchical level different from the first hierarchical level (Col. 7 lines 62-67; The final decision is made based on the output from all of the hybrid classifiers, by the classifier fusion 819 step.); and 
a processor to execute computer readable instructions to: 
select one of the second outputs at the second output layer (Col. 11 lines 24-31; A voting scheme 262 can be applied to help in the final decision process.); 
associate with the individual a respective one of the first possible classifications and a respective one of the second possible classifications corresponding to a respective one of the possible combined classifications represented by the selected second output (Col. 11 lines 24-31; A voting scheme 262 can be applied to help in the final decision process. For example, a voting scheme is applied to a series of multiple classification instances, i.e. producing a series of multiple final ethnicity vectors 1 through N, that are estimated by the learning machines for multiple input facial images of the same person rather than selecting the ethnicity label that has the highest score from a single instance of a ethnicity vector for an input facial image.).
Moon does not explicitly disclose
process inputs presented at the input layer to form first outputs at the first output layer… 
…and a plurality of neural network modules interposed between the input layer and the first output layer, respective ones of the neural network modules including corresponding groups of interconnected neural layers, each group of interconnected neural layers having at least one connection between a corresponding input layer of the group of interconnected neural layers  and corresponding output layer of the group of interconnected neural layers,…
compute a loss value based on a weighted combination of a first contribution determined from the first outputs of the first output layer of the neural network, a second contribution determined from the second outputs of the second output layer of the neural network, and a third contribution determined from coefficients of the neural network, the first contribution adjusted based on a first weight, the second contribution adjusted based on a second weight, and the third contribution adjusted based on a third weight; and 
of the coefficients of the neural network based on the loss value.
However, Kwon et al. teaches process inputs presented at the input layer to form first outputs at the first output layer… (para [0046] The input layer 210 includes k (k.gtoreq.1) input nodes 210a, and vector input data whose length is k is input to the input nodes 210a so that each element of the vector input data is input to a respective one of the input nodes 210a. The hidden layers 220 and 230 each include one or more hidden nodes 220a and 230a. The output layer 240 includes output nodes 241, 242, and 243, one for each of the classes C1, C2, and C3, respectively, and outputs the output value of the input data for each of the classes C1, C2, and C3.).
a neural network structured to have an input layer, a first output layer (para [0055] Referring to FIGS. 3A and 3B, the neural network 300 include an input layer 310, hidden layers 320 and 330, and an output layer 340, the same as in the general neural network 200 of FIG. 2.), a second output layer subsequent to the first output layer, (para [0056] Also, the neural network 300 further includes a boost pooling layer 350 subsequent to the output layer 340.) and a plurality of neural network modules (para [0055] Although two hidden layers 320 and 330 are illustrated in FIGS. 3A and 3B, the number of hidden layers is not limited to two. Examiner interprets neural network modules as a group of hidden layers) interposed between the input layer and the first output layer (para [0014] The neural network further may include one or more hidden layers between the input layer and the output layer.), para [0047] The structure of the neural network 200 is represented by information on the connections between nodes illustrated as arrows, and a weight value assigned to each connection, which is not illustrated.), each group of interconnected neural layers having at least one connection between a corresponding input layer (para [0055] the neural network 300 include an input layer 310) of the group of interconnected neural layers  and corresponding output layer of the group of interconnected neural layers (para [0015] one hidden layer of the one or more hidden layers may be positioned before the output layer, and may include two or more sets of hidden nodes, each set of which is connected to a different one of the class groups of the output layer)
It would have been obvious to persons’ having ordinary skill in the art before the effective filing date to combine the multi-classifier architecture of Moon et al. with the method of classifying data using a neural network of Kwon.
Doing so would allow for boost pooling (para [0006] The boosting method obtains a result by combining various classifier models trained through these processes, and the classification accuracy increases as the number of models increases, which is known to be more effective than the bagging method.).
Kang (US 20170083829 A1) teaches 
compute a loss value (para [0074] Loss(.theta.)=.alpha.f(P.sub.Teacher(i),P.sub.Student)+.beta.g(P.sub.Tea- cher(j),P.sub.Student)+.gamma.h(T.sub.r,P.sub.Student) [Equation 3]) based on a weighted combination of a first contribution determined from the first outputs of the first output layer of the neural network (para [0075] The model training apparatus may determine effects of the output data P.sub.Teacher(i) of the selected i-th teacher model. P.sub.Teacher(i) is the first contribution), a second contribution determined from the second outputs of the second output layer of the neural network (para [0075] the output data P.sub.Teacher(j) of the selected j-th teacher model. P.sub.Teacher(j) is the second contribution), and a third contribution determined from coefficients of the neural network (para [0075] and .gamma. is a constant which denotes a weight applied to the correct answer data T.sub.r. T.sub.r. is the third contribution determined from coefficients as it is multiplied by .gamma.), the first contribution adjusted based on a first weight, the second contribution adjusted based on a second weight, and the third contribution adjusted based on a third weight (para [0075] h denotes a cross entropy, a softmax function, or a Euclidian distance between correct answer data T.sub.r and the output data P.sub.Student of the student model 320. .beta. is a constant which denotes a weight applied to the output data P.sub.Teacher(j) of the selected j-th teacher model, and .gamma. is a constant which denotes a weight applied to the correct answer data T.sub.r); and 
update one or more of the coefficients of the neural network based on the loss value (para [0044] Error back-propagation learning refers to a method of estimating a loss with respect to input data provided through forward computation, and updating connection weights to reduce a loss in a process of propagating the estimated loss in a backward direction from an output layer toward a hidden layer and an input layer. Teacher and student models are updated by the loss function by adjusting the weights of the network. Para [0075] The model training apparatus may determine effects of the output data P.sub.Teacher(i) of the selected i-th teacher model, the output data P.sub.Teacher(j) of the selected j-th teacher model, and the correct answer data T.sub.r, by adjusting values of .alpha., .beta., and .gamma..).
It would have been obvious to one of ordinary skill in the art before the effective filing date to combine Moon’s method of training a neural network with Kang’s method of training a neural network. 
Doing so would allow for improved accuracy of the neural network (para [0080] For example, when the accuracy of the student model 420 is higher than a highest accuracy among the accuracies of the plurality of teacher models, the model training apparatus may determine the output data of the student model 420 to be used for training.)
Regarding Claim 2,
Moon, Kwon, and Kang teach the apparatus as defined in claim 1. Kwon further teaches wherein the neural network is also structured to include: 
a sorting output layer structured to sort the first outputs into groups according to the second possible classifications with which the first outputs are hierarchically associated (para [0068] Also, the output layer 440 is configured to include output nodes 441a and 441b, output nodes 442a and 442b, and output nodes 443a and 443b, respectively, with regard to the classes C1, C2, and C3. As a result, the output nodes of the output layer 440 are divided into two class groups CG1 and CG2 as illustrated in FIG. 4.); and 
a combining output layer structured to select from each of the groups a first one of the first outputs of the group having a greatest value to form third outputs (Fig. 5A element 550; para [0025] Each boost pooling node may be configured to output one output value for a corresponding class by applying any one or any combination of any two or more of a maximum, a mean, an average, and a probabilistic selection to all output values of all output nodes of the output layer for the corresponding class.), 
wherein the second output layer is structured to convert the third outputs into probabilities to form the second outputs (Fig. 5A; para [0074] The softmax layer 560 performs normalizing on the output value of the previous layer so that each of the output values is not greater than one and so that a sum of the all of the output values is one, thus making the output value of each of the classes represent a probability that the input data belongs to the class.).
Regarding Claim 3,
Moon, Kwon, and Kang teach the apparatus as defined in claim 2. Kwon further teaches the plurality of neural network modules structured to process the inputs presented at the input layer to form the first outputs at the first output layer (Fig. 5A; para [0072] A neural network 500 includes an input layer 510 including an input node 510a, hidden layers 520 and 530 including hidden nodes 520a and 531a, an output layer 540, a boost pooling layer 550, and a softmax layer 560.).
Regarding Claim 5,
para [0067] In Equation 2, g denotes a cross entropy, a softmax function, or a Euclidean distance between the correct answer data T.sub.r and the output data P.sub.Student of the student model 220.).
Regarding Claim 6,
Moon, Kwon, and Kang teach the apparatus as defined in claim 5. Kang further teaches wherein the processor is to update the one or more of the coefficients using a stochastic descent algorithm (para [0060] The model training apparatus may calculate the loss, and train the student model 220 to reduce the loss based on stochastic gradient descent (SGD).).
Regarding Claim 8,
Moon, Kwon, and Kang teach the apparatus as defined in claim 1. Moon et al. further teaches wherein the processor is to: 
query a database to obtain the demographic information for the individual (Col. 8 lines 33-34; The annotated facial image database 632); and 
form the inputs based on contents of the demographic information (Col. 8 lines 33-36; The annotated facial image database 632 is converted to a set of auxiliary class training data 882, where each training data corresponds to one of the auxiliary classes.).
Regarding Claim 10,
Col. 2 lines 19-25; The present invention handles the ethnicity classification problem by introducing a multi-category decomposition architecture, which is an exemplary embodiment of the hybrid multi-classifier architecture, where the learning machines are structured and trained to represent the face ensembles grouped by appearance-based demographics categories).
Regarding Claim 11,
Moon, Kwon, and Kang teach the apparatus as defined in claim 10. Moon et al. further teaches wherein the first possible classifications represent demographic segments of the demographic categories (Col. 8 lines 1-13; FIG. 6 shows the groundtruth demographics labeling 654 scheme for the facial image annotation 650 in an exemplary embodiment of the present invention. First, the auxiliary demographics categories 665 should be determined. In the figure, gender and age are determined as auxiliary categories. In an exemplary embodiment, the gender category has (male, female) labels and the age category has (child, young adult, adult, senior) labels. The age category can also be more finely divided into a larger number of classes, such as [-9, 10-19, 20-29, 30-39, 40-49, 50-59, 60-69, 70-]. The ethnicity classes can be determined according to a given application; exemplary divisions can be ["ethnicity class 1", "ethnicity class 2", "ethnicity class 3", . . . ].).
Regarding Claim 12,

obtaining data representative of demographic characteristics of an individual (Col. 6 lines 29-39; A preferred embodiment of the present invention is illustrated in FIG. 1. It shows the overall view of the system; the facial image annotation 650 step manually assigns labels (gender, age, and ethnicity) to each of the facial images in the face database. The granularity of the labeling should be determined beforehand. In an exemplary embodiment, age can be labeled as child, adult, or senior. Age can also be labeled as 18 and below, from 18 to 30, from 30 to 44, from 45 to 60, or 60 and above. The next multi-category decomposition architecture setup 842 step prepares the learning machines corresponding to all of the (gender, age) classes.); 
processing the data with a neural network to form first outputs at a first output layer of the neural network, the first outputs representing a plurality of first possible demographic classifications of the individual at a first hierarchical classification level (Fig. 4; Col. 7 lines 36-39; For example, the specialized classifier 1 817 is tuned to (male, "ethnicity class 1", child) class. The next classifier is tuned to (male, "ethnicity class 1", young adult), and so on.), …and 
processing the first outputs with the neural network to form second outputs at  the second output layer of the neural network (Fig. 15; Col. 11 lines 12-23; The ethnicity output vectors are weighted by the auxiliary class likelihood output 848, and are added together to compute the final ethnicity vector 861: ("ethnicity class 1" sum, "ethnicity class 2" sum, "ethnicity class 3" sum). The final decision chooses the ethnicity label that has the highest score. For example, if "ethnicity class 2" score>"ethnicity class 1" score and "ethnicity class 2" score>"ethnicity class 3" score, then "ethnicity class 2" is chosen as the ethnicity of the input facial image.), the second outputs representing possible combined demographic classifications of the individual corresponding to combinations of the plurality of first possible demographic classifications and a plurality of second possible demographic classifications, the second possible demographic classifications at a second hierarchical classification level different from the first hierarchical classification level (Col. 7 lines 62-67; The final decision is made based on the output from all of the hybrid classifiers, by the classifier fusion 819 step.).
Moon does not explicitly disclose
the neural network including an input layer, the first output layer, a second output layer subsequent to the first output layer, and a plurality of neural network modules interposed between the input layer and the first output layer, respective ones of the neural network -5-U.S. Application. No. 15/447,909Attorney Docket No. 20004/81154116US01 Response to the Office Action dated February 6, 2020 modules including corresponding groups of interconnected neural layers, each group of interconnected neural layers having at least one connection between a corresponding input layer of the group of interconnected neural layers and corresponding output layer of the group of interconnected neural layers; and 
computing a loss value based on a weighted combination of a first contribution determined from the first outputs of the first output layer of the neural network, a second contribution determined from the second outputs of the second output layer of the neural network, and a third contribution determined from coefficients of the neural network, the first contribution adjusted based on a first weight, the second contribution adjusted based on a second weight, and the third contribution adjusted based on a third weight; and 
updating one or more of the coefficients of the neural network based on the loss value.
However, Kwon teaches 
the neural network including an input layer, the first output layer (para [0055] Referring to FIGS. 3A and 3B, the neural network 300 include an input layer 310, hidden layers 320 and 330, and an output layer 340, the same as in the general neural network 200 of FIG. 2.), a second output layer subsequent to the first output layer (para [0056] Also, the neural network 300 further includes a boost pooling layer 350 subsequent to the output layer 340.), and a plurality of neural network modules  (para [0055] Although two hidden layers 320 and 330 are illustrated in FIGS. 3A and 3B, the number of hidden layers is not limited to two. Examiner interprets neural network modules as a group of hidden layers) interposed between the input layer and the first output layer (para [0014] The neural network further may include one or more hidden layers between the input layer and the output layer.), respective ones of the neural network -5-U.S. Application. No. 15/447,909Attorney Docket No. 20004/81154116US01 Response to the Office Action dated February 6, 2020modules including corresponding groups of interconnected neural layers (para [0047] The structure of the neural network 200 is represented by information on the connections between nodes illustrated as arrows, and a weight value assigned to each connection, which is not illustrated.), each group of interconnected neural layers having at least one connection between a corresponding input layer (para [0055] the neural network 300 include an input layer 310) of the group of interconnected neural layers and corresponding output para [0015] one hidden layer of the one or more hidden layers may be positioned before the output layer, and may include two or more sets of hidden nodes, each set of which is connected to a different one of the class groups of the output layer); and 
It would have been obvious to persons’ having ordinary skill in the art before the effective filing date to combine the multi-classifier architecture of Moon et al. with the method of classifying data using a neural network of Kwon.
Doing so would allow for boost pooling (para [0006] The boosting method obtains a result by combining various classifier models trained through these processes, and the classification accuracy increases as the number of models increases, which is known to be more effective than the bagging method.).
Kang further teaches
computing a loss value (para [0074] Loss(.theta.)=.alpha.f(P.sub.Teacher(i),P.sub.Student)+.beta.g(P.sub.Tea- cher(j),P.sub.Student)+.gamma.h(T.sub.r,P.sub.Student) [Equation 3]) based on a weighted combination of a first contribution determined from the first outputs of the first output layer of the neural network (para [0075] The model training apparatus may determine effects of the output data P.sub.Teacher(i) of the selected i-th teacher model. P.sub.Teacher(i) is the first contribution), a second contribution determined from the second outputs of the second output layer of the neural network (para [0075] the output data P.sub.Teacher(j) of the selected j-th teacher model. P.sub.Teacher(j) is the second contribution), and a third contribution determined from coefficients of the neural network (para [0075] and .gamma. is a constant which denotes a weight applied to the correct answer data T.sub.r. T.sub.r. is the third contribution determined from coefficients as it is multiplied by .gamma.), the first contribution adjusted based on a first weight, the second contribution adjusted based on a second weight, and the third contribution adjusted based on a third weight; and 
updating one or more of the coefficients of the neural network based on the loss value (para [0044] Error back-propagation learning refers to a method of estimating a loss with respect to input data provided through forward computation, and updating connection weights to reduce a loss in a process of propagating the estimated loss in a backward direction from an output layer toward a hidden layer and an input layer. Teacher and student models are updated by the loss function by adjusting the weights of the network. Para [0075] The model training apparatus may determine effects of the output data P.sub.Teacher(i) of the selected i-th teacher model, the output data P.sub.Teacher(j) of the selected j-th teacher model, and the correct answer data T.sub.r, by adjusting values of .alpha., .beta., and .gamma..).
It would have been obvious to one of ordinary skill in the art before the effective filing date to combine Moon’s method of training a neural network with Kang’s method of training a neural network. 
Doing so would allow for improved accuracy of the neural network (para [0080] For example, when the accuracy of the student model 420 is higher than a highest accuracy among the accuracies of the plurality of teacher models, the model training apparatus may determine the output data of the student model 420 to be used for training.)
Regarding Claim 13,
Moon et al. teaches the method as defined in claim 12. 
Moon et al. does not explicitly disclose wherein processing the first outputs to form the second outputs includes: 
forming the first outputs into groups according to the second possible demographic classifications with which they are hierarchically associated; 
identifying for each group a first one of the first outputs of the group having a greatest value to form third outputs; and 
converting the third outputs into probabilities to form the second outputs.
However, Kwon et al. teaches
forming the first outputs into groups according to the second possible demographic classifications with which they are hierarchically associated (para [0068] Also, the output layer 440 is configured to include output nodes 441a and 441b, output nodes 442a and 442b, and output nodes 443a and 443b, respectively, with regard to the classes C1, C2, and C3. As a result, the output nodes of the output layer 440 are divided into two class groups CG1 and CG2 as illustrated in FIG. 4.); 
identifying for each group a first one of the first outputs of the group having a greatest value to form third outputs (Fig. 5A element 550; para [0025] Each boost pooling node may be configured to output one output value for a corresponding class by applying any one or any combination of any two or more of a maximum, a mean, an average, and a probabilistic selection to all output values of all output nodes of the output layer for the corresponding class.); and 
Fig. 5A; para [0074] The softmax layer 560 performs normalizing on the output value of the previous layer so that each of the output values is not greater than one and so that a sum of the all of the output values is one, thus making the output value of each of the classes represent a probability that the input data belongs to the class.).
It would have been obvious to persons’ having ordinary skill in the art before the effective filing date to combine the multi-classifier architecture of Moon et al. with the method of classifying data using a neural network of Kwon.
Doing so would allow for boost pooling (para [0006] The boosting method obtains a result by combining various classifier models trained through these processes, and the classification accuracy increases as the number of models increases, which is known to be more effective than the bagging method.).
Regarding Claim 14,
Moon, Kwon, and Kang teach the method as defined in claim 12. Kwon further teaches wherein the second possible demographic classifications represent demographic categories of the individual (Col. 2 lines 19-25; The present invention handles the ethnicity classification problem by introducing a multi-category decomposition architecture, which is an exemplary embodiment of the hybrid multi-classifier architecture, where the learning machines are structured and trained to represent the face ensembles grouped by appearance-based demographics categories).
Regarding Claim 15,
Col. 8 lines 1-13; FIG. 6 shows the groundtruth demographics labeling 654 scheme for the facial image annotation 650 in an exemplary embodiment of the present invention. First, the auxiliary demographics categories 665 should be determined. In the figure, gender and age are determined as auxiliary categories. In an exemplary embodiment, the gender category has (male, female) labels and the age category has (child, young adult, adult, senior) labels. The age category can also be more finely divided into a larger number of classes, such as [-9, 10-19, 20-29, 30-39, 40-49, 50-59, 60-69, 70-]. The ethnicity classes can be determined according to a given application; exemplary divisions can be ["ethnicity class 1", "ethnicity class 2", "ethnicity class 3", . . . ].).
Regarding Claim 16,
Moon, Kwon, and Kang teach the method as defined in claim 12. Kang further teaches wherein the first contribution is based on a first cross-entropy of the first outputs and the second contribution is based on a second cross-entropy of the second outputs (para [0067] In Equation 2, g denotes a cross entropy, a softmax function, or a Euclidean distance between the correct answer data T.sub.r and the output data P.sub.Student of the student model 220.).
Regarding Claim 17,
Moon, Kwon, and Kang teach the method as defined in claim 16. Kang wherein the updating of the one or more of the coefficients includes using a stochastic descent para [0060] The model training apparatus may calculate the loss, and train the student model 220 to reduce the loss based on stochastic gradient descent (SGD).).
Regarding Claim 18,
Moon teaches a tangible computer-readable storage medium comprising instructions that, when executed, cause a machine to at least: 
obtain data representative of demographic characteristics of an individual (Col. 6 lines 29-39; A preferred embodiment of the present invention is illustrated in FIG. 1. It shows the overall view of the system; the facial image annotation 650 step manually assigns labels (gender, age, and ethnicity) to each of the facial images in the face database. The granularity of the labeling should be determined beforehand. In an exemplary embodiment, age can be labeled as child, adult, or senior. Age can also be labeled as 18 and below, from 18 to 30, from 30 to 44, from 45 to 60, or 60 and above. The next multi-category decomposition architecture setup 842 step prepares the learning machines corresponding to all of the (gender, age) classes.); 
process the data with a neural network to form first outputs at a first output layer of the neural network, the first outputs representing a plurality of first possible demographic classifications of the individual at a first hierarchical classification level (Fig. 4; Col. 7 lines 36-39; For example, the specialized classifier 1 817 is tuned to (male, "ethnicity class 1", child) class. The next classifier is tuned to (male, "ethnicity class 1", young adult), and so on.),… and 
Fig. 15; Col. 11 lines 12-23; The ethnicity output vectors are weighted by the auxiliary class likelihood output 848, and are added together to compute the final ethnicity vector 861: ("ethnicity class 1" sum, "ethnicity class 2" sum, "ethnicity class 3" sum). The final decision chooses the ethnicity label that has the highest score. For example, if "ethnicity class 2" score>"ethnicity class 1" score and "ethnicity class 2" score>"ethnicity class 3" score, then "ethnicity class 2" is chosen as the ethnicity of the input facial image.), the second outputs representing possible combined demographic classifications of the individual corresponding to combinations of the plurality of first possible demographic classifications and a plurality of second possible demographic classifications, the second possible demographic classifications at a second hierarchical classification level different from the first hierarchical classification level (Col. 7 lines 62-67; The final decision is made based on the output from all of the hybrid classifiers, by the classifier fusion 819 step.).
Moon does not explicitly disclose
the neural network including an input layer, the first output layer, a second output layer subsequent to the first output layer, and a plurality of neural network modules interposed between the input layer and the first output layer, respective ones of the neural network modules including corresponding groups of interconnected neural layers, each group of interconnected neural layers having at least one connection between a corresponding input layer of the group of interconnected neural layers and corresponding output layer of -7-U.S. Application. No. 15/447,909Attorney Docket No. 20004/81154116US01 Response to the Office Action dated February 6, 2020the group of interconnected neural layers;
, and a third contribution determined from coefficients of the neural network, the first contribution adjusted based on a first weight, the second contribution adjusted based on a second weight, and the third contribution adjusted based on a third weight; and 
update one or more of the coefficients of the neural network based on the loss value.
However, Kwon teaches
the neural network including an input layer, the first output layer (para [0055] Referring to FIGS. 3A and 3B, the neural network 300 include an input layer 310, hidden layers 320 and 330, and an output layer 340, the same as in the general neural network 200 of FIG. 2.), a second output layer subsequent to the first output layer (para [0056] Also, the neural network 300 further includes a boost pooling layer 350 subsequent to the output layer 340.), and a plurality of neural network modules  (para [0055] Although two hidden layers 320 and 330 are illustrated in FIGS. 3A and 3B, the number of hidden layers is not limited to two. Examiner interprets neural network modules as a group of hidden layers) interposed between the input layer and the first output layer (para [0014] The neural network further may include one or more hidden layers between the input layer and the output layer.), respective ones of the neural network -5-U.S. Application. No. 15/447,909Attorney Docket No. 20004/81154116US01 Response to the Office Action dated February 6, 2020modules including corresponding groups of interconnected neural layers (para [0047] The structure of the neural network 200 is represented by information on the connections between nodes illustrated as arrows, and a weight value assigned to each connection, which is not illustrated.), each group of interconnected neural layers having at least one connection between a corresponding input layer (para [0055] the neural network 300 include an input layer 310) of the group of interconnected neural layers and corresponding output layer of the group of interconnected neural layers (para [0015] one hidden layer of the one or more hidden layers may be positioned before the output layer, and may include two or more sets of hidden nodes, each set of which is connected to a different one of the class groups of the output layer); and 
It would have been obvious to persons’ having ordinary skill in the art before the effective filing date to combine the multi-classifier architecture of Moon et al. with the method of classifying data using a neural network of Kwon.
Doing so would allow for boost pooling (para [0006] The boosting method obtains a result by combining various classifier models trained through these processes, and the classification accuracy increases as the number of models increases, which is known to be more effective than the bagging method.).
Kang (US 20170083829 A1) teaches 
compute a loss value (para [0074] Loss(.theta.)=.alpha.f(P.sub.Teacher(i),P.sub.Student)+.beta.g(P.sub.Tea- cher(j),P.sub.Student)+.gamma.h(T.sub.r,P.sub.Student) [Equation 3]) based on a weighted combination of a first contribution determined from the first outputs of the first output layer of the neural network (para [0075] The model training apparatus may determine effects of the output data P.sub.Teacher(i) of the selected i-th teacher model. P.sub.Teacher(i) is the first contribution), a second contribution determined from the second outputs of the second output layer of the neural network (para [0075] the output data P.sub.Teacher(j) of the selected j-th teacher model. P.sub.Teacher(j) is the second contribution), and a third contribution determined from coefficients of the neural network (para [0075] and .gamma. is a constant which denotes a weight applied to the correct answer data T.sub.r. T.sub.r. is the third contribution determined from coefficients as it is multiplied by .gamma.), the first contribution adjusted based on a first weight, the second contribution adjusted based on a second weight, and the third contribution adjusted based on a third weight (para [0075] h denotes a cross entropy, a softmax function, or a Euclidian distance between correct answer data T.sub.r and the output data P.sub.Student of the student model 320. .beta. is a constant which denotes a weight applied to the output data P.sub.Teacher(j) of the selected j-th teacher model, and .gamma. is a constant which denotes a weight applied to the correct answer data T.sub.r); and 
update one or more of the coefficients of the neural network based on the loss value (para [0044] Error back-propagation learning refers to a method of estimating a loss with respect to input data provided through forward computation, and updating connection weights to reduce a loss in a process of propagating the estimated loss in a backward direction from an output layer toward a hidden layer and an input layer. Teacher and student models are updated by the loss function by adjusting the weights of the network. Para [0075] The model training apparatus may determine effects of the output data P.sub.Teacher(i) of the selected i-th teacher model, the output data P.sub.Teacher(j) of the selected j-th teacher model, and the correct answer data T.sub.r, by adjusting values of .alpha., .beta., and .gamma..).
It would have been obvious to one of ordinary skill in the art before the effective filing date to combine Moon’s method of training a neural network with Kang’s method of training a neural network. 
Doing so would allow for improved accuracy of the neural network (para [0080] For example, when the accuracy of the student model 420 is higher than a highest accuracy among the accuracies of the plurality of teacher models, the model training apparatus may determine the output data of the student model 420 to be used for training.)
Regarding Claim 19,
Moon et al. teaches the tangible computer-readable storage medium as defined in claim 18.
	Moon et al. does not explicitly disclose wherein the instructions, when executed, cause the machine to process the first outputs to form the second outputs by: 
forming the first outputs into groups according to the second possible demographic classifications with which they are hierarchically associated; 
identifying for each group a first one of the first outputs of the group having a greatest value to form third outputs; and 
converting the third outputs into probabilities to form the second outputs.
However, Kwon teaches 
forming the first outputs into groups according to the second possible demographic classifications with which they are hierarchically associated (para [0068] Also, the output layer 440 is configured to include output nodes 441a and 441b, output nodes 442a and 442b, and output nodes 443a and 443b, respectively, with regard to the classes C1, C2, and C3. As a result, the output nodes of the output layer 440 are divided into two class groups CG1 and CG2 as illustrated in FIG. 4.); 
identifying for each group a first one of the first outputs of the group having a greatest value to form third outputs (Fig. 5A element 550; para [0025] Each boost pooling node may be configured to output one output value for a corresponding class by applying any one or any combination of any two or more of a maximum, a mean, an average, and a probabilistic selection to all output values of all output nodes of the output layer for the corresponding class.); and 
converting the third outputs into probabilities to form the second outputs (Fig. 5A; para [0074] The softmax layer 560 performs normalizing on the output value of the previous layer so that each of the output values is not greater than one and so that a sum of the all of the output values is one, thus making the output value of each of the classes represent a probability that the input data belongs to the class.).
It would have been obvious to persons’ having ordinary skill in the art before the effective filing date to combine the multi-classifier architecture of Moon et al. with the method of classifying data using a neural network of Kwon.
Doing so would allow for boost pooling (para [0006] The boosting method obtains a result by combining various classifier models trained through these processes, and the classification accuracy increases as the number of models increases, which is known to be more effective than the bagging method.).
Regarding Claim 20,
Moon, Kwon, and Kang teach the tangible computer-readable storage medium as defined in claim 18. Kang further teaches wherein the first contribution is based on a first cross-entropy of the first outputs and the second contribution is based on a second cross-entropy of the second outputs (para [0067] In Equation 2, g denotes a cross entropy, a softmax function, or a Euclidean distance between the correct answer data T.sub.r and the output data P.sub.Student of the student model 220.).

Claim 9 is rejected under 35 U.S.C. 103 as being unpatentable over Moon et al. (US-9317785-B1) in view of Kwon (US-20170068887-A1 Kang et al. (US-20170083829-A1), and Bitran et al. (US-20180144101-A1).
Regarding Claim 9,
Moon, Kwon, and Kang the apparatus as defined in claim 8.
Moon, Kwon, and Kang do not explicitly disclose wherein the querier is to record the one of the first possible classifications and the one of the second possible classifications in the database in conjunction with the individual.
However, Bitran et al. teaches
wherein the querier is to record the one of the first possible classifications and the one of the second possible classifications in the database in conjunction with the individual (para [0032] In this way, the signals at 202 may provide the data used by the user classification system 204 to determine user demographic and contextual information, which is then stored in the user knowledge repository 206 for retrieval.).

	Doing so would allow for an analysis to be performed on the stored data (para [0027] A physical-health service may additionally be configured to perform one or more processing or analysis functions on the stored physical-health data, for the purpose of providing each user of a health platform with insights relating to their physical-health relative to the physical-health of demographically similar users.).
Claim 21 is rejected under 35 U.S.C. 103 as being unpatentable over Moon et al. (US-9317785-B1) in view of Kwon (US-20170068887-A1), Kang et al. (US-20170083829-A1), and Osypka et al. (US 20170053077 A1).
Regarding Claim 21,
Moon, Kwon, and Kang teach the apparatus as defined in claim 8. 
	Moon, Kwon, and Kang do not explicitly disclose 
wherein the demographic information includes a first value of a demographic characteristic associated with the individual, and the querier is to: 
convert the first value to second value representative of a range of values of the demographic characteristic, the range of values including the first value; and 
form a first one of the inputs based on the second value.
However, Osypka (US 20170053077 A1) teaches 
wherein the demographic information includes a first value of a demographic characteristic associated with the individual, and the querier is to: 
para [0019] In one aspect, a group of selected vital parameters are monitored or determined along with patient demographic data and the input value of each parameter is compared to the normal range for that parameter based on input or stored patient demographic data (e.g. age, gender, body mass or weight, height).); and 
form a first one of the inputs based on the second value (para [0022] In one embodiment, the current clinical parameter values are input to algorithms employing fuzzy logic in order to determine the likelihood for each shock type. In an alternative embodiment, a neural network method can be implemented in place of the fuzzy logic method.).
It would have been obvious to one or ordinary skill in the art before the effective filing date to combine the method of retrieving demographic data for input into a machine learning model of Moon with the method of retrieving demographic data for input into a machine learning model of Osypka.
Doing so would allow for normalizing input data (para [0192] In step 212, normal ranges of selected clinical parameters are retrieved from a local or remote data base, and the ranges are normalized based on the patient's demographic data.).
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Choi et al. (US 20170124415 A1) – discloses a convolutional neural network with a loss function.
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to HENRY K NGUYEN whose telephone number is (571)272-0217.  The examiner can normally be reached on Mon - Fri 7:00am-4:30pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.

Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/HENRY NGUYEN/Examiner, Art Unit 2121                                                                                                                                                                                                        


/Li B. Zhen/Supervisory Patent Examiner, Art Unit 2121