DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Arguments
Applicant’s arguments with respect to claim(s) 1, 12, and 18 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 2, 3, 7, 8, 10, 11, 12, 13, 14, 15, 18, and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Moon et al. (US-9317785-B1) in view of Kwon (US-20170068887-A1) and Girshick et al. ("Fast r-cnn.").
Regarding Claim 1,
Moon teaches an apparatus to demographically classify an individual, the apparatus comprising: 
a neural network structured to have an input layer, a first output layer, a second output layer subsequent to the first output layer, …the neural network structured to: 
…the inputs based on demographic information for the individual (Fig. 1; Col. 6 lines 37-42; The next multi-category decomposition architecture setup 842 step prepares the learning machines corresponding to all of the (gender, age) classes. The multi-category decomposition architecture training 844 step trains all of the learning machines so that they generate the desired outputs for the set of input facial images.),…
…the first outputs representing a plurality of first possible classifications of the individual according to a demographic classification system at a first hierarchical level (Fig. 4; Col. 7 lines 36-39; For example, the specialized classifier 1 817 is tuned to (male, "ethnicity class 1", child) class. The next classifier is tuned to (male, "ethnicity class 1", young adult), and so on.), and 
process the first outputs to form second outputs at the second output layer (Fig. 15; Col. 11 lines 12-23; The ethnicity output vectors are weighted by the auxiliary class likelihood output 848, and are added together to compute the final ethnicity vector 861: ("ethnicity class 1" sum, "ethnicity class 2" sum, "ethnicity class 3" sum). The final decision chooses the ethnicity label that has the highest score. For example, if "ethnicity class 2" score>"ethnicity class 1" score and "ethnicity class 2" score>"ethnicity class 3" score, then "ethnicity class 2" is chosen as the ethnicity of the input facial image.), the second outputs representing possible combined classifications of the individual, the possible combined classification corresponding to combinations of the plurality of first possible classifications and a plurality of second possible classifications of the individual -2-U.S. Application. No. 15/447,909Attorney Docket No. 20004/81154116US01 Response to the Office Action dated February 6, 2020 according to the demographic classification system at a second hierarchical level different from the first hierarchical level (Col. 7 lines 62-67; The final decision is made based on the output from all of the hybrid classifiers, by the classifier fusion 819 step.); and 
a processor to execute computer readable instructions to: 
select one of the second outputs at the second output layer (Col. 11 lines 24-31; A voting scheme 262 can be applied to help in the final decision process.); 
associate with the individual a respective one of the first possible classifications and a respective one of the second possible classifications corresponding to a respective one of the possible combined classifications represented by the selected second output (Col. 11 lines 24-31; A voting scheme 262 can be applied to help in the final decision process. For example, a voting scheme is applied to a series of multiple classification instances, i.e. producing a series of multiple final ethnicity vectors 1 through N, that are estimated by the learning machines for multiple input facial images of the same person rather than selecting the ethnicity label that has the highest score from a single instance of a ethnicity vector for an input facial image.).
Moon does not explicitly disclose
process inputs presented at the input layer to form first outputs at the first output layer… 
…and a plurality of neural network modules interposed between the input layer and the first output layer, respective ones of the neural network modules including corresponding groups of interconnected neural layers, each group of interconnected neural layers having at least one connection between a corresponding input layer of the group of interconnected neural layers  and corresponding output layer of the group of interconnected neural layers,…
compute a loss value based on a weighted combination of a first contribution determined from the first outputs of the first output layer of the neural network and a second contribution determined from the second outputs of the second output layer of the neural network; and 
update one or more coefficients of the neural network based on the loss value.
However, Kwon et al. teaches process inputs presented at the input layer to form first outputs at the first output layer… (para [0046] The input layer 210 includes k (k.gtoreq.1) input nodes 210a, and vector input data whose length is k is input to the input nodes 210a so that each element of the vector input data is input to a respective one of the input nodes 210a. The hidden layers 220 and 230 each include one or more hidden nodes 220a and 230a. The output layer 240 includes output nodes 241, 242, and 243, one for each of the classes C1, C2, and C3, respectively, and outputs the output value of the input data for each of the classes C1, C2, and C3.).
a neural network structured to have an input layer, a first output layer (para [0055] Referring to FIGS. 3A and 3B, the neural network 300 include an input layer 310, hidden layers 320 and 330, and an output layer 340, the same as in the general neural network 200 of FIG. 2.), a second output layer subsequent to the first output layer, (para [0056] Also, the neural network 300 further includes a boost pooling layer 350 subsequent to the output layer 340.) and a plurality of neural network modules (para [0055] Although two hidden layers 320 and 330 are illustrated in FIGS. 3A and 3B, the number of hidden layers is not limited to two. Examiner interprets neural network modules as a group of hidden layers) interposed between the input layer and the first output layer (para [0014] The neural network further may include one or more hidden layers between the input layer and the output layer.), respective ones of the neural network modules including corresponding groups of interconnected neural layers (para [0047] The structure of the neural network 200 is represented by information on the connections between nodes illustrated as arrows, and a weight value assigned to each connection, which is not illustrated.), each group of interconnected neural layers having at least one connection between a corresponding input layer (para [0055] the neural network 300 include an input layer 310) of the group of interconnected neural layers  and corresponding output layer of the group of interconnected neural layers (para [0015] one hidden layer of the one or more hidden layers may be positioned before the output layer, and may include two or more sets of hidden nodes, each set of which is connected to a different one of the class groups of the output layer)
It would have been obvious to persons’ having ordinary skill in the art before the effective filing date to combine the multi-classifier architecture of Moon et al. with the method of classifying data using a neural network of Kwon.
Doing so would allow for boost pooling (para [0006] The boosting method obtains a result by combining various classifier models trained through these processes, and the classification accuracy increases as the number of models increases, which is known to be more effective than the bagging method.).
Girshick further teaches

compute a loss value based on a weighted combination of a first contribution determined from the first outputs of the first output layer of the neural network and a second contribution determined from the second outputs of the second output layer of the neural network (pg. 1442; A Fast R-CNN network has two sibling output layers. The first outputs a discrete probability distribution (per RoI), p = (p0, . . . , pK), over K + 1 categories. As usual, p is computed by a softmax over the K+1 outputs of a fully connected layer. The second sibling layer outputs bounding-box regression offsets, t k = t k x , tk y , tk w, tk h  , for each of the K object classes, indexed by k And pg. 1442; . We use a multi-task loss L on each labeled RoI to jointly train for classification and bounding-box regression: L(p, u, tu , v) = Lcls(p, u) + λ[u ≥ 1]Lloc(t u , v), (1) in which Lcls(p, u) = − log pu is log loss for true class u); and 
update one or more coefficients of the neural network based on the loss value (pg. 1441; “Training all network weights with back-propagation is an important capability of Fast R-CNN.” And pg. 1441; The architecture is trained end-to-end with a multi-task loss. And pg. 1442; When the regression targets are unbounded, training with L2 loss can require careful tuning of learning rates in order to prevent exploding gradient.).
It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the method of training a classifier of Moon with the method of training a classifier of Girshick.
Doing so would allow for efficient object classification (pg. 1440; Recently, deep ConvNets [14, 16] have significantly improved image classification [14] and object detection [9, 19] accuracy. Compared to image classification, object detection is a more challenging task that requires more complex methods to solve.).
Regarding Claim 2,
Moon, Kwon, and Girshick teach the apparatus as defined in claim 1. Kwon further teaches wherein the neural network is also structured to include: 
a sorting output layer structured to sort the first outputs into groups according to the second possible classifications with which they are hierarchically associated (para [0068] Also, the output layer 440 is configured to include output nodes 441a and 441b, output nodes 442a and 442b, and output nodes 443a and 443b, respectively, with regard to the classes C1, C2, and C3. As a result, the output nodes of the output layer 440 are divided into two class groups CG1 and CG2 as illustrated in FIG. 4.); and 
a combining output layer structured to select from each of the groups a first one of the first outputs of the group having the greatest value to form third outputs (Fig. 5A element 550; para [0025] Each boost pooling node may be configured to output one output value for a corresponding class by applying any one or any combination of any two or more of a maximum, a mean, an average, and a probabilistic selection to all output values of all output nodes of the output layer for the corresponding class.), 
wherein the second output layer is structured to convert the third outputs into probabilities to form the second outputs (Fig. 5A; para [0074] The softmax layer 560 performs normalizing on the output value of the previous layer so that each of the output values is not greater than one and so that a sum of the all of the output values is one, thus making the output value of each of the classes represent a probability that the input data belongs to the class.).
Regarding Claim 3,
Moon, Kwon, and Girshick teach the apparatus as defined in claim 2. Kwon further teaches the plurality of neural network modules structured to process the inputs presented at the input layer to form the first outputs at the first output layer (Fig. 5A; para [0072] A neural network 500 includes an input layer 510 including an input node 510a, hidden layers 520 and 530 including hidden nodes 520a and 531a, an output layer 540, a boost pooling layer 550, and a softmax layer 560.).
Regarding Claim 7,

a sorting output layer structured to sort the first outputs into groups according to the second possible classifications with which the first outputs are hierarchically associated (para [0068] Also, the output layer 440 is configured to include output nodes 441a and 441b, output nodes 442a and 442b, and output nodes 443a and 443b, respectively, with regard to the classes C1, C2, and C3. As a result, the output nodes of the output layer 440 are divided into two class groups CG1 and CG2 as illustrated in FIG. 4.) and; 
a combining output layer structured to select from each of the groups a first one of the first outputs of the group having the greatest value to form third outputs (Fig. 5A element 550; para [0025] Each boost pooling node may be configured to output one output value for a corresponding class by applying any one or any combination of any two or more of a maximum, a mean, an average, and a probabilistic selection to all output values of all output nodes of the output layer for the corresponding class.); 
wherein the second output layer is structured to convert the third outputs into probabilities to form the second outputs (Fig. 5A; para [0074] The softmax layer 560 performs normalizing on the output value of the previous layer so that each of the output values is not greater than one and so that a sum of the all of the output values is one, thus making the output value of each of the classes represent a probability that the input data belongs to the class.), and 

Regarding Claim 8,
Moon, Kwon, and Girshick teach the apparatus as defined in claim 1. Moon et al. further teaches wherein the processor is to: 
query a database to obtain the demographic information for the individual (Col. 8 lines 33-34; The annotated facial image database 632); and 
form the inputs based on contents of the demographic information (Col. 8 lines 33-36; The annotated facial image database 632 is converted to a set of auxiliary class training data 882, where each training data corresponds to one of the auxiliary classes.).
Regarding Claim 10,
Moon, Kwon, and Girshick teach the apparatus as defined in claim 1. Moon et al. further teaches wherein the second possible classifications represent demographic categories of the individual (Col. 2 lines 19-25; The present invention handles the ethnicity classification problem by introducing a multi-category decomposition architecture, which is an exemplary embodiment of the hybrid multi-classifier architecture, where the learning machines are structured and trained to represent the face ensembles grouped by appearance-based demographics categories).
Regarding Claim 11,
Moon, Kwon, and Girshick teach the apparatus as defined in claim 10. Moon et al. further teaches wherein the first possible classifications represent demographic segments of the demographic categories (Col. 8 lines 1-13; FIG. 6 shows the groundtruth demographics labeling 654 scheme for the facial image annotation 650 in an exemplary embodiment of the present invention. First, the auxiliary demographics categories 665 should be determined. In the figure, gender and age are determined as auxiliary categories. In an exemplary embodiment, the gender category has (male, female) labels and the age category has (child, young adult, adult, senior) labels. The age category can also be more finely divided into a larger number of classes, such as [-9, 10-19, 20-29, 30-39, 40-49, 50-59, 60-69, 70-]. The ethnicity classes can be determined according to a given application; exemplary divisions can be ["ethnicity class 1", "ethnicity class 2", "ethnicity class 3", . . . ].).
Regarding Claim 12,
Moon teaches a method of performing demographic classification of an individual, the method comprising: 
obtaining data representative of demographic characteristics of an individual (Col. 6 lines 29-39; A preferred embodiment of the present invention is illustrated in FIG. 1. It shows the overall view of the system; the facial image annotation 650 step manually assigns labels (gender, age, and ethnicity) to each of the facial images in the face database. The granularity of the labeling should be determined beforehand. In an exemplary embodiment, age can be labeled as child, adult, or senior. Age can also be labeled as 18 and below, from 18 to 30, from 30 to 44, from 45 to 60, or 60 and above. The next multi-category decomposition architecture setup 842 step prepares the learning machines corresponding to all of the (gender, age) classes.); 
processing the data with a neural network to form first outputs at a first output layer of the neural network, the first outputs representing a plurality of first possible Fig. 4; Col. 7 lines 36-39; For example, the specialized classifier 1 817 is tuned to (male, "ethnicity class 1", child) class. The next classifier is tuned to (male, "ethnicity class 1", young adult), and so on.), …and 
processing the first outputs with the neural network to form second outputs at  the second output layer of the neural network (Fig. 15; Col. 11 lines 12-23; The ethnicity output vectors are weighted by the auxiliary class likelihood output 848, and are added together to compute the final ethnicity vector 861: ("ethnicity class 1" sum, "ethnicity class 2" sum, "ethnicity class 3" sum). The final decision chooses the ethnicity label that has the highest score. For example, if "ethnicity class 2" score>"ethnicity class 1" score and "ethnicity class 2" score>"ethnicity class 3" score, then "ethnicity class 2" is chosen as the ethnicity of the input facial image.), the second outputs representing possible combined demographic classifications of the individual corresponding to combinations of the plurality of first possible demographic classifications and a plurality of second possible demographic classifications, the second possible demographic classifications at a second hierarchical classification level different from the first hierarchical classification level (Col. 7 lines 62-67; The final decision is made based on the output from all of the hybrid classifiers, by the classifier fusion 819 step.).
Moon does not explicitly disclose
the neural network including an input layer, the first output layer, a second output layer subsequent to the first output layer, and a plurality of neural network modules interposed between the input layer and the first output layer, respective ones of the 
computing a loss value based on a weighted combination of a first contribution determined from the first outputs of the first output layer of the neural network and a second contribution determined from the second outputs of the second output layer of the neural network; and 
updating one or more coefficients of the neural network based on the loss value.
However, Kwon teaches 
the neural network including an input layer, the first output layer (para [0055] Referring to FIGS. 3A and 3B, the neural network 300 include an input layer 310, hidden layers 320 and 330, and an output layer 340, the same as in the general neural network 200 of FIG. 2.), a second output layer subsequent to the first output layer (para [0056] Also, the neural network 300 further includes a boost pooling layer 350 subsequent to the output layer 340.), and a plurality of neural network modules  (para [0055] Although two hidden layers 320 and 330 are illustrated in FIGS. 3A and 3B, the number of hidden layers is not limited to two. Examiner interprets neural network modules as a group of hidden layers) interposed between the input layer and the first output layer (para [0014] The neural network further may include one or more hidden layers between the input layer and the output layer.), respective ones of the neural network -5-U.S. Application. No. 15/447,909Attorney Docket No. 20004/81154116US01 Response to the Office Action dated February 6, 2020modules including corresponding groups of interconnected neural layers (para [0047] The structure of the neural network 200 is represented by information on the connections between nodes illustrated as arrows, and a weight value assigned to each connection, which is not illustrated.), each group of interconnected neural layers having at least one connection between a corresponding input layer (para [0055] the neural network 300 include an input layer 310) of the group of interconnected neural layers and corresponding output layer of the group of interconnected neural layers (para [0015] one hidden layer of the one or more hidden layers may be positioned before the output layer, and may include two or more sets of hidden nodes, each set of which is connected to a different one of the class groups of the output layer); and 
It would have been obvious to persons’ having ordinary skill in the art before the effective filing date to combine the multi-classifier architecture of Moon et al. with the method of classifying data using a neural network of Kwon.
Doing so would allow for boost pooling (para [0006] The boosting method obtains a result by combining various classifier models trained through these processes, and the classification accuracy increases as the number of models increases, which is known to be more effective than the bagging method.).
Girshick further teaches
computing a loss value based on a weighted combination of a first contribution determined from the first outputs of the first output layer of the neural network and a second contribution determined from the second outputs of the second output layer of the neural network (pg. 1442; A Fast R-CNN network has two sibling output layers. The first outputs a discrete probability distribution (per RoI), p = (p0, . . . , pK), over K + 1 categories. As usual, p is computed by a softmax over the K+1 outputs of a fully connected layer. The second sibling layer outputs bounding-box regression offsets, t k = t k x , tk y , tk w, tk h  , for each of the K object classes, indexed by k And pg. 1442; . We use a multi-task loss L on each labeled RoI to jointly train for classification and bounding-box regression: L(p, u, tu , v) = Lcls(p, u) + λ[u ≥ 1]Lloc(t u , v), (1) in which Lcls(p, u) = − log pu is log loss for true class u); and 
updating one or more coefficients of the neural network based on the loss value (pg. 1441; “Training all network weights with back-propagation is an important capability of Fast R-CNN.” And pg. 1441; The architecture is trained end-to-end with a multi-task loss. And pg. 1442; When the regression targets are unbounded, training with L2 loss can require careful tuning of learning rates in order to prevent exploding gradient.).
It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the method of training a classifier of Moon with the method of training a classifier of Girshick.
Doing so would allow for efficient object classification (pg. 1440; Recently, deep ConvNets [14, 16] have significantly improved image classification [14] and object detection [9, 19] accuracy. Compared to image classification, object detection is a more challenging task that requires more complex methods to solve.).
Regarding Claim 13,
Moon et al. teaches the method as defined in claim 12. 
Moon et al. does not explicitly disclose wherein processing the first outputs to form the second outputs includes: 

identifying for each group a first one of the first outputs of the group having the greatest value to form third outputs; and 
converting the third outputs into probabilities to form the second outputs.
However, Kwon et al. teaches
forming the first outputs into groups according to the second possible demographic classifications with which they are hierarchically associated (para [0068] Also, the output layer 440 is configured to include output nodes 441a and 441b, output nodes 442a and 442b, and output nodes 443a and 443b, respectively, with regard to the classes C1, C2, and C3. As a result, the output nodes of the output layer 440 are divided into two class groups CG1 and CG2 as illustrated in FIG. 4.); 
identifying for each group a first one of the first outputs of the group having the greatest value to form third outputs (Fig. 5A element 550; para [0025] Each boost pooling node may be configured to output one output value for a corresponding class by applying any one or any combination of any two or more of a maximum, a mean, an average, and a probabilistic selection to all output values of all output nodes of the output layer for the corresponding class.); and 
converting the third outputs into probabilities to form the second outputs (Fig. 5A; para [0074] The softmax layer 560 performs normalizing on the output value of the previous layer so that each of the output values is not greater than one and so that a sum of the all of the output values is one, thus making the output value of each of the classes represent a probability that the input data belongs to the class.).
It would have been obvious to persons’ having ordinary skill in the art before the effective filing date to combine the multi-classifier architecture of Moon et al. with the method of classifying data using a neural network of Kwon.
Doing so would allow for boost pooling (para [0006] The boosting method obtains a result by combining various classifier models trained through these processes, and the classification accuracy increases as the number of models increases, which is known to be more effective than the bagging method.).
Regarding Claim 14,
Moon, Kwon, and Girshick teach the method as defined in claim 12. Kwon further teaches wherein the second possible demographic classifications represent demographic categories of the individual (Col. 2 lines 19-25; The present invention handles the ethnicity classification problem by introducing a multi-category decomposition architecture, which is an exemplary embodiment of the hybrid multi-classifier architecture, where the learning machines are structured and trained to represent the face ensembles grouped by appearance-based demographics categories).
Regarding Claim 15,
Moon, Kwon, and Girshick teach the method as defined in claim 14. Kwon further teaches wherein the first possible demographic classifications represent demographic segments of the demographic categories (Col. 8 lines 1-13; FIG. 6 shows the groundtruth demographics labeling 654 scheme for the facial image annotation 650 in an exemplary embodiment of the present invention. First, the auxiliary demographics categories 665 should be determined. In the figure, gender and age are determined as auxiliary categories. In an exemplary embodiment, the gender category has (male, female) labels and the age category has (child, young adult, adult, senior) labels. The age category can also be more finely divided into a larger number of classes, such as [-9, 10-19, 20-29, 30-39, 40-49, 50-59, 60-69, 70-]. The ethnicity classes can be determined according to a given application; exemplary divisions can be ["ethnicity class 1", "ethnicity class 2", "ethnicity class 3", . . . ].).
Regarding Claim 18,
Moon teaches a tangible computer-readable storage medium comprising instructions that, when executed, cause a machine to at least: 
obtain data representative of demographic characteristics of an individual (Col. 6 lines 29-39; A preferred embodiment of the present invention is illustrated in FIG. 1. It shows the overall view of the system; the facial image annotation 650 step manually assigns labels (gender, age, and ethnicity) to each of the facial images in the face database. The granularity of the labeling should be determined beforehand. In an exemplary embodiment, age can be labeled as child, adult, or senior. Age can also be labeled as 18 and below, from 18 to 30, from 30 to 44, from 45 to 60, or 60 and above. The next multi-category decomposition architecture setup 842 step prepares the learning machines corresponding to all of the (gender, age) classes.); 
Fig. 4; Col. 7 lines 36-39; For example, the specialized classifier 1 817 is tuned to (male, "ethnicity class 1", child) class. The next classifier is tuned to (male, "ethnicity class 1", young adult), and so on.),… and 
process the first outputs with the neural network to form second outputs at the second output layer of the neural network (Fig. 15; Col. 11 lines 12-23; The ethnicity output vectors are weighted by the auxiliary class likelihood output 848, and are added together to compute the final ethnicity vector 861: ("ethnicity class 1" sum, "ethnicity class 2" sum, "ethnicity class 3" sum). The final decision chooses the ethnicity label that has the highest score. For example, if "ethnicity class 2" score>"ethnicity class 1" score and "ethnicity class 2" score>"ethnicity class 3" score, then "ethnicity class 2" is chosen as the ethnicity of the input facial image.), the second outputs representing possible combined demographic classifications of the individual corresponding to combinations of the plurality of first possible demographic classifications and a plurality of second possible demographic classifications, the second possible demographic classifications at a second hierarchical classification level different from the first hierarchical classification level (Col. 7 lines 62-67; The final decision is made based on the output from all of the hybrid classifiers, by the classifier fusion 819 step.).
Kwon does not explicitly disclose

compute a loss value based on a weighted combination of a first contribution determined from the first outputs of the first output layer of the neural network and a second contribution determined from the second outputs of the second output layer of the neural network; and 
update one or more coefficients of the neural network based on the loss value.
the neural network including an input layer, the first output layer (para [0055] Referring to FIGS. 3A and 3B, the neural network 300 include an input layer 310, hidden layers 320 and 330, and an output layer 340, the same as in the general neural network 200 of FIG. 2.), a second output layer subsequent to the first output layer (para [0056] Also, the neural network 300 further includes a boost pooling layer 350 subsequent to the output layer 340.), and a plurality of neural network modules  (para [0055] Although two hidden layers 320 and 330 are illustrated in FIGS. 3A and 3B, the number of hidden layers is not limited to two. Examiner interprets neural network modules as a group of hidden layers) interposed between the input layer and the first output layer (para [0014] The neural network further may include one or more hidden layers between the input layer and the output layer.), para [0047] The structure of the neural network 200 is represented by information on the connections between nodes illustrated as arrows, and a weight value assigned to each connection, which is not illustrated.), each group of interconnected neural layers having at least one connection between a corresponding input layer (para [0055] the neural network 300 include an input layer 310) of the group of interconnected neural layers and corresponding output layer of the group of interconnected neural layers (para [0015] one hidden layer of the one or more hidden layers may be positioned before the output layer, and may include two or more sets of hidden nodes, each set of which is connected to a different one of the class groups of the output layer); and 
It would have been obvious to persons’ having ordinary skill in the art before the effective filing date to combine the multi-classifier architecture of Moon et al. with the method of classifying data using a neural network of Kwon.
Doing so would allow for boost pooling (para [0006] The boosting method obtains a result by combining various classifier models trained through these processes, and the classification accuracy increases as the number of models increases, which is known to be more effective than the bagging method.).
Girshick further teaches
compute a loss value based on a weighted combination of a first contribution determined from the first outputs of the first output layer of the neural network and a second contribution determined from the second outputs of the second output layer of the neural network (pg. 1442; A Fast R-CNN network has two sibling output layers. The first outputs a discrete probability distribution (per RoI), p = (p0, . . . , pK), over K + 1 categories. As usual, p is computed by a softmax over the K+1 outputs of a fully connected layer. The second sibling layer outputs bounding-box regression offsets, t k = t k x , tk y , tk w, tk h  , for each of the K object classes, indexed by k And pg. 1442; . We use a multi-task loss L on each labeled RoI to jointly train for classification and bounding-box regression: L(p, u, tu , v) = Lcls(p, u) + λ[u ≥ 1]Lloc(t u , v), (1) in which Lcls(p, u) = − log pu is log loss for true class u); and 
update one or more coefficients of the neural network based on the loss value (pg. 1441; “Training all network weights with back-propagation is an important capability of Fast R-CNN.” And pg. 1441; The architecture is trained end-to-end with a multi-task loss. And pg. 1442; When the regression targets are unbounded, training with L2 loss can require careful tuning of learning rates in order to prevent exploding gradient.).
It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the method of training a classifier of Moon with the method of training a classifier of Girshick.
Doing so would allow for efficient object classification (pg. 1440; Recently, deep ConvNets [14, 16] have significantly improved image classification [14] and object detection [9, 19] accuracy. Compared to image classification, object detection is a more challenging task that requires more complex methods to solve.).
Regarding Claim 19,

	Moon et al. does not explicitly disclose wherein the instructions, when executed, cause the machine to process the first outputs to form the second outputs by: 
forming the first outputs into groups according to the second possible demographic classifications with which they are hierarchically associated; 
identifying for each group a first one of the first outputs of the group having the greatest value to form third outputs; and 
converting the third outputs into probabilities to form the second outputs.
However, Kwon teaches 
forming the first outputs into groups according to the second possible demographic classifications with which they are hierarchically associated (para [0068] Also, the output layer 440 is configured to include output nodes 441a and 441b, output nodes 442a and 442b, and output nodes 443a and 443b, respectively, with regard to the classes C1, C2, and C3. As a result, the output nodes of the output layer 440 are divided into two class groups CG1 and CG2 as illustrated in FIG. 4.); 
identifying for each group a first one of the first outputs of the group having the greatest value to form third outputs (Fig. 5A element 550; para [0025] Each boost pooling node may be configured to output one output value for a corresponding class by applying any one or any combination of any two or more of a maximum, a mean, an average, and a probabilistic selection to all output values of all output nodes of the output layer for the corresponding class.); and 
Fig. 5A; para [0074] The softmax layer 560 performs normalizing on the output value of the previous layer so that each of the output values is not greater than one and so that a sum of the all of the output values is one, thus making the output value of each of the classes represent a probability that the input data belongs to the class.).
It would have been obvious to persons’ having ordinary skill in the art before the effective filing date to combine the multi-classifier architecture of Moon et al. with the method of classifying data using a neural network of Kwon.
Doing so would allow for boost pooling (para [0006] The boosting method obtains a result by combining various classifier models trained through these processes, and the classification accuracy increases as the number of models increases, which is known to be more effective than the bagging method.).
Claims 5, 6, 16, 17, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Moon et al. (US-9317785-B1) in view of Kwon (US-20170068887-A1), Girshick et al. ("Fast r-cnn."), and Ros Sanchez et al. (US-20170262735-A1).
Regarding Claim 5,
Moon, Kwon, and Girshick teach the apparatus as defined in claim 1. 
	Moon, Kwon, and Girshick do not explicitly disclose
wherein the first contribution is based on a first cross-entropy of the first outputs and the second contribution is based on a second cross-entropy of the second outputs. 
However, Ros Sanchez et al. teaches
wherein the first contribution is based on a first cross-entropy of the first outputs and the second contribution is based on a second cross-entropy of the second outputs.  (para [0078] To achieve reasonable per-class accuracy, weighted cross-entropy (WCE) was employed in the definition of the loss function.).
It would have been obvious to persons’ having ordinary skill in the art before the effective filing date to combine the loss function of Kwon with the loss function of Ros Sanchez et al.
Doing so would allow for taking into account class imbalances (para [0079] In this way, WCE helped the networks to account for class frequency imbalances, a common phenomenon exposed in Table 1).
Regarding Claim 6,
Moon et al., Kwon, and Ros Sanchez et al. teach the apparatus as defined in claim 5. Ros Sanchez et al. further teaches wherein the processor is to update the coefficients using a stochastic descent algorithm (Para [0075] Optimisation was performed via standard backpropagation using Stochastic Conjugate Gradient Descent (S-CGD), endowed with a bounded line-search strategy and backtracking with Armijo's rule [15]. To avoid overfitting, the number of line-search iterations was bounded to 3. This proved to converge faster to good solutions than stochastic gradient descent without manual tweaking of learning rates.).
Regarding Claim 16,
Moon, Kwon, and Girshick teach the method as defined in claim 12. 
	Moon, Kwon, and Girshick do not explicitly disclose
wherein the first contribution is based on a first cross-entropy of the first outputs and the second contribution is based on a second cross-entropy of the second outputs.
However, Ros Sanchez et al. teaches
wherein the first contribution is based on a first cross-entropy of the first outputs and the second contribution is based on a second cross-entropy of the second outputs (para [0078] To achieve reasonable per-class accuracy, weighted cross-entropy (WCE) was employed in the definition of the loss function.).
It would have been obvious to persons’ having ordinary skill in the art before the effective filing date to combine the loss function of Kwon with the loss function of Ros Sanchez et al.
Doing so would allow for taking into account class imbalances (para [0079] In this way, WCE helped the networks to account for class frequency imbalances, a common phenomenon exposed in Table 1).
Regarding Claim 17,
Moon et al., Kwon, Girshick, and Ros Sanchez et al. teach the method as defined in claim 16. Ros Sanchez et al. further teaches wherein the updating of the coefficients includes using a stochastic descent algorithm (Para [0075] Optimisation was performed via standard backpropagation using Stochastic Conjugate Gradient Descent (S-CGD), endowed with a bounded line-search strategy and backtracking with Armijo's rule [15]. To avoid overfitting, the number of line-search iterations was bounded to 3. This proved to converge faster to good solutions than stochastic gradient descent without manual tweaking of learning rates.).
Regarding Claim 20,

	Moon, Kwon, and Girshick do not explicitly disclose
wherein the first contribution is based on a first cross-entropy of the first outputs and the second contribution is based on a second cross-entropy of the second outputs. 
However, Ros Sanchez et al. teaches
wherein the first contribution is based on a first cross-entropy of the first outputs and the second contribution is based on a second cross-entropy of the second outputs. 
It would have been obvious to persons’ having ordinary skill in the art before the effective filing date to combine the loss function of Kwon with the loss function of Ros Sanchez et al.
Doing so would allow for taking into account class imbalances (para [0079] In this way, WCE helped the networks to account for class frequency imbalances, a common phenomenon exposed in Table 1).
Claim 9 is rejected under 35 U.S.C. 103 as being unpatentable over Moon et al. (US-9317785-B1) in view of Kwon (US-20170068887-A1), Girshick et al. ("Fast r-cnn."), and Bitran et al. (US-20180144101-A1).
Regarding Claim 9,
Moon, Kwon, and Girshick the apparatus as defined in claim 8.
Moon, Kwon, and Girshick do not explicitly disclose wherein the querier is to record the one of the first possible classifications and the one of the second possible classifications in the database in conjunction with the individual.
However, Bitran et al. teaches
para [0032] In this way, the signals at 202 may provide the data used by the user classification system 204 to determine user demographic and contextual information, which is then stored in the user knowledge repository 206 for retrieval.).
	It would have been obvious to persons’ having ordinary skill in the art before the effective filing date to combine the querier of Moon et al. with the method of storing demographic classifications of Bitran et al.
	Doing so would allow for an analysis to be performed on the stored data (para [0027] A physical-health service may additionally be configured to perform one or more processing or analysis functions on the stored physical-health data, for the purpose of providing each user of a health platform with insights relating to their physical-health relative to the physical-health of demographically similar users.).
Claim 21 is rejected under 35 U.S.C. 103 as being unpatentable over Moon et al. (US-9317785-B1) in view of Kwon (US-20170068887-A1), Girshick et al. ("Fast r-cnn."), and Osypka et al. (US 20170053077 A1).
Regarding Claim 21,
Moon, Kwon, and Girshick teach the apparatus as defined in claim 8. 
	Moon, Kwon, and Girshick do not explicitly disclose 
wherein the demographic information includes a first value of a demographic characteristic associated with the individual, and the querier is to: 

form a first one of the inputs based on the second value.
However, Osypka (US 20170053077 A1) teaches 
wherein the demographic information includes a first value of a demographic characteristic associated with the individual, and the querier is to: 
convert the first value to second value representative of a range of values of the demographic characteristic, the range of values including the first value (para [0019] In one aspect, a group of selected vital parameters are monitored or determined along with patient demographic data and the input value of each parameter is compared to the normal range for that parameter based on input or stored patient demographic data (e.g. age, gender, body mass or weight, height).); and 
form a first one of the inputs based on the second value (para [0022] In one embodiment, the current clinical parameter values are input to algorithms employing fuzzy logic in order to determine the likelihood for each shock type. In an alternative embodiment, a neural network method can be implemented in place of the fuzzy logic method.).
It would have been obvious to one or ordinary skill in the art before the effective filing date to combine the method of retrieving demographic data for input into a machine learning model of Moon with the method of retrieving demographic data for input into a machine learning model of Osypka.
Doing so would allow for normalizing input data (para [0192] In step 212, normal ranges of selected clinical parameters are retrieved from a local or remote data base, and the ranges are normalized based on the patient's demographic data.).
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to HENRY K NGUYEN whose telephone number is (571)272-0217.  The examiner can normally be reached on Mon - Fri 7:00am-4:30pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Li B Zhen can be reached on 5712723768.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/HENRY NGUYEN/Examiner, Art Unit 2121                                                                                                                                                                                                        


/Li B. Zhen/Supervisory Patent Examiner, Art Unit 2121