Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Priority
Acknowledgment is made of applicant’s claim for foreign priority under 35 U.S.C. 119 (a)-(d). The certified copy has been filed in parent Application No. JP2019-129414, filed on 07/11/2019.

Specification
The disclosure is objected to because of the following informalities: Paragraph 003, line 4 “and a” should read “and”.
Appropriate correction is required.
The disclosure is objected to because of the following informalities: Paragraph 003, line 5 “avoiding the” should read “avoiding”. 
Appropriate correction is required.
The disclosure is objected to because of the following informalities: Paragraph 005, line 4 “when first” should read “when the first”.  
Appropriate correction is required.
The disclosure is objected to because of the following informalities: Paragraph 0024, line 1 “of model.” should read “of a model”.  
Appropriate correction is required.
The disclosure is objected to because of the following informalities: Paragraph 0030, line 3 “each of nodes” should read “each node”.  
Appropriate correction is required.
The disclosure is objected to because of the following informalities: Paragraph 0032, line 1 “in training” should read “in the training”.  
Appropriate correction is required.
The disclosure is objected to because of the following informalities: Paragraph 0050, line 1 “10 according” should read “10, according”.  
Appropriate correction is required.
The disclosure is objected to because of the following informalities: Paragraph 0050, line 1 “embodiment” should read “embodiment,”.  
Appropriate correction is required.
The disclosure is objected to because of the following informalities: Paragraph 0062, line 2 “embodiment” should read “embodiment,”.  
Appropriate correction is required.
The disclosure is objected to because of the following informalities: Paragraph 0062, line 7 “sets are” should read “sets is”.  
Appropriate correction is required.
The disclosure is objected to because of the following informalities: Paragraph 0062, line 7 “number of ” should read “amount of”.  
Appropriate correction is required.

Claim Objections
Claim 3, line 8, is objected to because of the following informalities:  “network” should read “network;”.  Appropriate correction is required.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Step 1: 
Claims 1-2, 4-5 are directed to a method. Claims 6-7 are directed to a non-transitory computer readable media. Thus, each of the claims falls within one of the four statutory categories (i.e., process, machine, manufacture, or composition of matter). 
Regarding claim 1:
Step 2A, prong 1: 
	Under broadest reasonable interpretation, these limitations are process steps that cover mental processes including an observation, evaluation, judgment or opinion that could be performed in the human mind or with the aid of pencil and paper. This claim falls within the “Mental Process” grouping of abstract ideas.
Claim 1 recites in part:
‘ calculating, by a computer, a first loss function based on a first distribution and a previously set second distribution, the first distribution being a distribution of a feature amount output ’, as drafted, is a process that, under broadest reasonable interpretation, covers performing mathematical process on a computer or mental process using at most a pen and paper. A user could calculate a loss function based on input given, in this case a distribution. The claim is silent on how the distribution is determined or how the loss function is calculated. 
‘calculating a second loss function based on second data and correct data corresponding to the first data, the second data being output from the output layer when the first data is input to the input layer of the model’, as drafted, is a process that, under broadest reasonable interpretation, covers performing mathematical process on a computer or mental process using at most a pen and paper. A user could calculate a loss function based on given output, for example a layer could be represented as values within parentheses. 
 ‘training the model based on both the first loss function and the second loss function’, as drafted, under broadest reasonable interpretation, covers performing a mental and/or a mathematical. A user could calculate a function given the output parameters of a previous function.

Step 2A, prong 2: 
	The judicial exception is integrated into a practical application. In particular, claim 1 recites:
A “computer” to perform abstract concepts. The “computer” in the limitations are recited at a high level of granularity (i.e. computer performing generic computer functions) such that it amounts to no more than mere instructions to apply the exception using a generic computer component. Accordingly, at Step 2A, prong 2, the additional elements or in combination do not integrate the judicial exception into a practical application.

Step 2B: 
In accordance with Step 2B, the claim does not include additional elements that are sufficient to amount to significantly more that the judicial exception. As discussed above, the element of using a ‘computer’ to perform the steps amounts to extra solution activity because it is a mere nominal or tangential addition to the claim. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. See MPEP 2106.05(f). 
	
Regarding claim 2:
Step 2A, Prong 1:
Under broadest reasonable interpretation, these limitations are process steps that cover mental processes including an observation, evaluation, judgment or opinion that could be performed in the human mind or with the aid of pencil and paper. This claim falls within the “Mental Process” grouping of abstract ideas.
Claim 2 recites in part:
	“wherein the first loss function is a distance between the first distribution and the second distribution.”
	This provides a further description of the abstract ideas, as discussed with regards to claim 1.
Step 2A, Prong 2:
	The claim does not recite any additional element that integrate the exception into a practical application or that amounts to significantly more that the judicial exception. 
Step  2B:
	In accordance with Step 2B, the claim does not include additional elements that are sufficient to amount to significantly more that the judicial exception. As discussed above, the distance between the first and second distribution are recited at a high-level of generality and amounts to no more than adding insignificant extra-solution activity to the judicial exception (See MPEP 2106.5(g)). Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea (See MPEP 2106.5(f))

Regarding claim 4: 
Step 2A, prong 1:
Under broadest reasonable interpretation, these limitations are process steps that cover mental processes including an observation, evaluation, judgment or opinion that could be performed in the human mind or with the aid of pencil and paper. This claim falls within the “Mental Process” grouping of abstract ideas.
Claim 4 recites in part:
	“further comprising: calculating the first loss function based on the first distribution and a second distribution set for correct data corresponding to the first data among distributions previously set for respective multiple correct data”
	This provides a further description of the abstract ideas, as discussed with regards to claim 1.
Step 2A, prong 2:
The claim does not recite any additional element that integrate the exception into a practical application or that amounts to significantly more that the judicial exception.
Step 2B:
	In accordance with Step 2B, the claim does not include additional elements that are sufficient to amount to significantly more that the judicial exception. As discussed above, distributions set with previous correct data corresponding to the first data are recited at a high-level of generality and amounts to no more than adding insignificant extra-solution activity to the judicial exception (See MPEP 2106.5(g)). Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea (See MPEP 2106.5(f))

Regarding claim 5:
Step 2A, prong 1:
Under broadest reasonable interpretation, these limitations are process steps that cover mental processes including an observation, evaluation, judgment or opinion that could be performed in the human mind or with the aid of pencil and paper. This claim falls within the “Mental Process” grouping of abstract ideas.
Claim 5 recites in part:
“further comprising: training the model based on only the first loss function when the correct data corresponding to the first data does not exist.”
This provides a further description of the abstract ideas, as discussed with regards to claim 1.
Step 2A, prong 2:
The claim does not recite any additional element that integrate the exception into a practical application or that amounts to significantly more that the judicial exception.
Step 2B:
	In accordance with Step 2B, the claim does not include additional elements that are sufficient to amount to significantly more that the judicial exception. As discussed above, distributions set with previous correct data are recited at a high-level of generality and amounts to no more than adding insignificant extra-solution activity to the judicial exception (See MPEP 2106.5(g)). Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea (See MPEP 2106.5(f))

Regarding Claim 6:
Step 2A, prong 1:
Under broadest reasonable interpretation, these limitations are process steps that cover mental processes including an observation, evaluation, judgment or opinion that could be performed in the human mind or with the aid of pencil and paper. This claim falls within the “Mental Process” grouping of abstract ideas.
	Claim 6 recites in part:
‘calculating a first loss function based on a first distribution and a previously set second distribution, the first distribution being a distribution of a feature amount output from an intermediate layer when first data is input to an input layer of a model that has the input layer, the intermediate layer, and an output layer’, as drafted, is a process that, under broadest reasonable interpretation, covers performing mathematical calculations. A user could calculate the loss function by hand from a closed set.
‘calculating a second loss function based on second data and correct data corresponding to the first data, the second data being output from the output layer when the first data is input to the input layer of the model’, as drafted is a process that, under broadest reasonable interpretation, covers performing mathematical calculations. A user could calculate the loss function given the input data.
‘training the model based on both the first loss function and the second loss function’, as drafted is a process that, under broadest reasonable interpretation, covers performing a mental process. 

Step 2A, prong 2:
The claim does not recite any additional element that integrate the exception into a practical application or that amounts to significantly more that the judicial exception.
	The element of “non-transitory computer-readable recording medium…that causes a computer” to perform abstract concepts. The “non-transitory computer-readable recording medium…that causes a computer” in the limitations are recited at a high level of granularity (i.e. computer performing generic computer functions) such that it amounts to no more than mere instructions to apply the exception using a generic computer component.
Step 2B:
	In accordance with Step 2B, the claim does not include additional elements that are sufficient to amount to significantly more that the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of using a ‘computer’ to perform the steps amounts to no more than adding insignificant extra-solution activity to the judicial exception (See MPEP 2106.5(g)). Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea (See MPEP 2106.5(f))

Regarding Claim 7:
Step 2A, prong 1:
Under broadest reasonable interpretation, these limitations are process steps that cover mental processes including an observation, evaluation, judgment or opinion that could be performed in the human mind or with the aid of pencil and paper. This claim falls within the “Mental Process” grouping of abstract ideas.
	Claim 7 recites in part:
‘calculate a first loss function based on a first distribution and a previously set second distribution, the first distribution being a distribution of a feature amount output from an intermediate layer when first data is input to an input layer of a model that has the input layer, the intermediate layer, and an output layer’, as drafted, is a process that, under broadest reasonable interpretation, covers performing mathematical calculations.
‘calculate a second loss function…’, as drafter, is a process that, under broadest reasonable interpretation, covers performing mathematical calculations.
‘calculate a second loss function based on second data and correct data corresponding to the first data, the second data being output from the output layer when the first data is input to the input layer of the model’, as drafted, is a process that, under broadest reasonable interpretation, covers performing a mental process on a set of data. A user could manipulate the data based on a closed set.
Step 2A, prong 1:
Under broadest reasonable interpretation, these limitations are process steps that cover mental processes including an observation, evaluation, judgment or opinion that could be performed in the human mind or with the aid of pencil and paper. This claim falls within the “Mental Process” grouping of abstract ideas.
	
Step 2A, prong 2:
The claim does not recite any additional element that integrate the exception into a practical application or that amounts to significantly more that the judicial exception.
Step 2B:
In accordance with Step 2B, the claim does not include additional elements that are sufficient to amount to significantly more that the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of using a ‘computer’ to perform the steps amounts to no more than mere instructions to apply the exception using a generic computer component is recited at a high-level of generality and amounts to no more than adding insignificant extra-solution activity to the judicial exception (See MPEP 2106.5(g)). Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea (See MPEP 2106.5(f))

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claim(s) 1, 3-6 is/are rejected under 35 U.S.C. 103 as being unpatentable over Chunyan Xu, (“Multi-loss Regularized Deep Neural Network”, hereinafter referred to as Chunyan Xu) in view of Bosch, (United States Patent Application Publication US 20190199743, hereinafter referred to as Bosch).
Regarding claim 1, Chunyan Xu discloses:
a machine learning method, comprising: calculating, by a computer, a first loss function based on a first distribution and a previously set second distribution, (Chunyan Xu, page 2274, left column, para [2], “During the ML-DNN training, the training images are first fed into the shared NIN and several parallel fc layers, each of which corresponds to some different loss layers…We pretrain the model with the single-loss function, and then warm start the whole ML-DNN with the convolutional parameters transferred from the pretrained model”, “fc” is defined as “feature connected”. The results produced from the images initially fed into the shared NIN correspond to the first distribution. The output produced from the initial distribution is then fed into the subsequent fc layers corresponding to the second loss function. It is inherent that the NIN is computer generated.)
 the first distribution being a distribution of a feature amount output from an intermediate layer when first data is input to an input layer of a model that has the input layer, the intermediate layer, and an output layer; (Chunyan Xu, page 2273, left column, para [3] to page 2274, right column, para [2], “During the ML-DNN training, the training images are first fed into the shared NIN and several parallel fc layers, each of which corresponds to some different loss layers. We feed the convolutional feature maps of the shared NIN into multiple branches of the loss functions.”, “fc” is defined as “feature connected”. “fc” corresponds to the input/output/intermediate layer. Fig 1, page 2274 shows a shared NIN taking the input from one layer using it as output to another. The data processing between each network corresponds to the intermediate layers also represented as “fc” layers. The resulting distribution is used to calculate the subsequent parameters to the loss functions across multiple layers including input, output and intermediate layers. The results from the first layer processing the images corresponds to the first distribution.)
 the second data being output from the output layer when the first data is input to the input layer of the model;(Chunyan Xu, Fig 1, page 2274, Fig. 1 shows images being fed as the input layer of the share NIN in which the resulting output is passed into the subsequent layers. Each layer has an associated loss function. The images correspond to input to the input layer. The layers correspond to the second data being output from the output layer.)
Chunyan Xu does not disclose:
calculating a second loss function based on second data and correct data corresponding to the first data, the second data being output from the output layer when the first data is input to the input layer of the model;
Bosch discloses:
calculating a second loss function based on second data and correct data corresponding to the first data,  (Bosch, para [42], “During the training of variational autoencoder 10, this autoencoder is trained, for example using a back-propagation method, in such a way that on the one hand the reconstruction error between input quantity vector x and output quantity vector x′ becomes a small as possible. On the other hand, the training is carried out in such a way that the distribution of the latent quantities z in the latent space corresponds as closely as possible to a specified reference distribution.” The back-propagation method corresponds to the second data and the correct data corresponding to the first data. Backpropagation is the sum of the input/put and distribution deviation.)
Before the time of the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art to have modified the step of training using more than one loss function, as taught by Chunyan Xu, to calculate a second loss function based on correct data from the first loss function as well as the second distribution. The motivation in doing so would have been to enable recognition of dynamic changes in the network behavior without erroneously classifying these anomalies. (Bosch, para [0011]) 
Regarding claim 3, Chunyan Xu in view of Bosch discloses the method of claim 1. Chunyan Xu additionally discloses:
wherein the model is a neural network that has the input layer, multiple intermediate layers, and the output layer, (Chunyan Xu, page 2277, left column, para [2], “…fc layers are summed together for parameter updating… by the ML-DNN”, “fc” is defined as “feature connected”. A “fc” layer represents the result obtained by the associated neural network. The result is then subsequently used as parameter input/output into a loss function. Fig. 1, page 2274, the Shared NIN corresponds to the intermediate layers used to calculate the output/input for parameter update for the loss functions. This is done multiple times across multiple layers. ”ML-DNN is defined as “multi-loss deep neural network” and corresponds to the neural network.)
the output layer serves as a predetermined activating function (Chunyan Xu, page 2274, right column, para [2], “The convolutional layer is to extract feature maps by linear convolutional filters followed by nonlinear activation functions”, Feature maps correspond to the output layer which is then used by an activation function. It is inherent that activation functions are data-driven corresponding to predetermined.)
and the machine learning method further comprises: training the model by an error back propagation method based on a loss function obtained by adding the first loss function and the second loss function. (Chunyan Xu, page 2274, left column, para [2, 4] -  page 2276, right column, para [8] “train an ML-DNN model with the multiple-loss functions…Based on these multiple-loss functions, the network is trained by simultaneously optimizing all loss functions with backpropagation”, “ML-DNN” is defined as “multi-loss deep neural network”. It is inherently known that backpropagation is the sum of the output/input error and the distribution deviation. Thus, the multiple-loss functions correspond to the first and second loss function. The backpropagation corresponds to the addition of the first and second loss function.)
Chunyan Xu does not disclose:
the first distribution is a distribution of a feature amount output from an intermediate layer that is closest to the output layer among the multiple intermediate layers when the first data is input to the input layer of the neural network
Bosch additionally discloses:
the first distribution is a distribution of a feature amount output from an intermediate layer that is closest to the output layer among the multiple intermediate layers when the first data is input to the input layer of the neural network (Bosch, para[0040 - 0042], “Encoder part 11 maps an input quantity vector x onto a representation z (latent quantities) in a latent space. The latent space has a lower dimensionality than does input quantity vector x. Encoder part 11 has an input layer 11E, one or more intermediate layers 11Z, and an output layer 11A that correspond to, or represent, the latent space…On the other hand, the training is carried out in such a way that the distribution of the latent quantities z in the latent space corresponds as closely as possible to a specified reference distribution.”, Representation corresponds to a feature amount. The latent space corresponds to the closest output layer among the intermediate layers. The latent space distribution is then trained in a way to correspond to the reference distribution which corresponds to the first distribution.)
Regarding claim 4, Chunyan Xu in view of Bosch discloses the method of claim 1. Chunyan Xu additionally discloses: 
calculating the first loss function based on the first distribution and a second distribution set for correct data corresponding to the first data among distributions previously set for respective multiple correct data. (Chunyan Xu, Fig 1, page 2274, shows , pg. 2274, left column, para [2], “Based on these multiple-loss functions, the network is trained by simultaneously optimizing all loss functions with backpropagation…the outputs by the ML-DNN from different loss functions are fused”, “ML-DNN” is defined as “multi-loss deep neural network”, It is inherent that backpropagation is the sum of the input/output and distribution deviation. )
	Regarding claim 5, Chunyan Xu in view of Bosch discloses the method of claim 1. Bosch additionally discloses: 
training the model based on only the first loss function when the correct data corresponding to the first data does not exist. (Bosch, para[0042], “The reference distribution is specified by reference distribution parameters that indicate the reference distribution in a coded manner. The distribution of the latent quantities z is specified by distribution parameters that indicate the distribution in a coded manner. The fact that the distribution of the latent quantities z in the latent space corresponds as closely as possible to a prespecified reference distribution is achieved in a known manner during the training of variational autoencoder 10 by specifying a constraint indicating that a degree of deviation between the achieved distribution and the specified reference distribution is to be made as small as possible.”, It is inherent that the without the correct data to be fed as parameters into the second loss function that the model training would be based solely on the first loss function whose data is available.)
Regarding claim 6, Chunyan Xu discloses:
A non-transitory computer-readable recording medium having stored therein a program that causes a computer to execute a process, the process comprising: (It is inherent that calculations performed on neural networks are done so via computers. It is also inherent that computers consist of non-transitory computer-readable medium commonly known as memory.)
calculating a first loss function based on a first distribution and a previously set second distribution, (Chunyan Xu, page 2274, left column, para [2], “During the ML-DNN training, the training images are first fed into the shared NIN and several parallel fc layers, each of which corresponds to some different loss layers…We pretrain the model with the single-loss function, and then warm start the whole ML-DNN with the convolutional parameters transferred from the pretrained model”, “fc” is defined as “feature connected”. The results produced from the images initially fed into the shared NIN correspond to the first distribution. The output produced from the initial distribution is then fed into the subsequent fc layers corresponding to the second loss function. It is inherent that the NIN is computer generated.)
the first distribution being a distribution of a feature amount output from an intermediate layer when first data is input to an input layer of a model that has the input layer, the intermediate layer, and an output layer (Chunyan Xu, page 2273, left column, para [3] to page 2274, right column, para [2], “During the ML-DNN training, the training images are first fed into the shared NIN and several parallel fc layers, each of which corresponds to some different loss layers. We feed the convolutional feature maps of the shared NIN into multiple branches of the loss functions.”, “fc” is defined as “feature connected”. “fc” corresponds to the input/output/intermediate layer. Fig 1, page 2274 shows a shared NIN taking the input from one layer using it as output to another. The data processing between each network corresponds to the intermediate layers also represented as “fc” layers. It is inherent that the resulting distribution is used to calculate the subsequent parameters to the loss functions across multiple layers including input, output and intermediate layers. The results from the first layer processing the images corresponds to the first distribution.) 
calculating a second loss function based on second data and correct data corresponding to the first data, the second data being output from the output layer when the first data is input to the input layer of the model; (Chunyan Xu, page 2273, left column, para [3] to page 2274, right column, para [2], “During the ML-DNN training, the training images are first fed into the shared NIN and several parallel fc layers, each of which corresponds to some different loss layers. We feed the convolutional feature maps of the shared NIN into multiple branches of the loss functions.”, “fc” is defined as “feature connected”. “fc” corresponds to the input/output/intermediate layer. Fig 1, page 2274 shows a shared NIN taking the input from one layer using it as output to another. The data processing between each network corresponds to the intermediate layers also represented as “fc” layers. It is inherent that the resulting distribution is used to calculate the subsequent parameters to the loss functions across multiple layers including input, output and intermediate layers. The results from the first layer processing the images corresponds to the first distribution.)
and training the model based on both the first loss function and the second loss function. (Chunyan Xu, pg. 2275, right column, para [2], “During the network training…Based on these multiple-loss functions, the network is trained”. The multiple-loss functions correspond to the first and second loss functions that the trained network is based on.)	
Claim 2 is rejected under 25 U.S.C. 103 as being unpatentable over Chunyan Xu Multi-loss Regularized Deep Neural Network, in view of Bosch United States Patent Application Publication 2019/0199743 in further view of Yang Model Loss and Distribution Analysis of Regression Problems in Machine Learning.
Regarding claim 2, Chunyan Xu in view of Bosch discloses the method of claim 1. Chunyan Xu in view of Bosch does not disclose: 
wherein the first loss function is a distance between the first distribution and the second distribution.
Yang discloses:
wherein the first loss function is a distance between the first distribution and the second distribution.(Yang, pg. 2, left column, para [1-2], “Two loss functions… Both of these loss functions increase with the distance between… “)
Before the time of the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art to have modified the step use a distance between the first and second distribution for the first loss function. The motivation for doing so would be to ensure the distribution model has good tolerance to the loss function. (Yang, page 1, left column, para [1])
Before the time of the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art to use the distance between distributions as representation. The motivation being that machine learning models are based on the assumption of a distribution. (Yang, page 1, left column, para [1], “The machine learning regression model is based on the assumption of normal distribution “)
Claim 7 is rejected under 25 U.S.C. 103 as being unpatentable over Chunyan Xu Multi-loss Regularized Deep Neural Network, in view of Bosch United States Patent Application Publication 2019/0199743.
Regarding claim 7, Chunyan Xu in discloses:
	An information processing apparatus, comprising: a memory; and a processor coupled to the memory and the processor configured to: (It is inherent that neural networks and machine learning methods are performed using computers comprising: memory and processors. These are generic computer components.)
	calculate a first loss function based on a first distribution and a previously set second distribution (Chunyan Xu, page 2274, left column, para [2], “During the ML-DNN training, the training images are first fed into the shared NIN and several parallel fc layers, each of which corresponds to some different loss layers…We pretrain the model with the single-loss function, and then warm start the whole ML-DNN with the convolutional parameters transferred from the pretrained model”, “fc” is defined as “feature connected”. The results produced from the images initially fed into the shared NIN correspond to the first distribution. The output produced from the initial distribution is then fed into the subsequent fc layers corresponding to the second loss function. It is inherent that the NIN is computer generated.) 
the first distribution being a distribution of a feature amount output from an intermediate layer when first data is input to an input layer of a model that has the input layer, the intermediate layer, and an output layer; (Chunyan Xu, page 2273, left column, para [3] to page 2274, right column, para [2], “During the ML-DNN training, the training images are first fed into the shared NIN and several parallel fc layers, each of which corresponds to some different loss layers. We feed the convolutional feature maps of the shared NIN into multiple branches of the loss functions.”, “fc” is defined as “feature connected”. “fc” corresponds to the input/output/intermediate layer. Fig 1, page 2274 shows a shared NIN taking the input from one layer using it as output to another. The data processing between each network corresponds to the intermediate layers also represented as “fc” layers. It is inherent that the resulting distribution is used to calculate the subsequent parameters to the loss functions across multiple layers including input, output and intermediate layers. The results from the first layer processing the images corresponds to the first distribution.)
the second data being output from the output layer when the first data is input to the input layer of the model; (Chunyan Xu, Fig 1, page 2274, Fig. 1 shows images being fed as the input layer of the share NIN in which the resulting output is passed into the subsequent layers. Each layer has an associated loss function. The images correspond to input to the input layer. The layers correspond to the second data being output from the output layer.)
Chunyan Xu does not disclose:
calculate a second loss function based on second data and correct data corresponding to the first data
Bosch in view of Chunyan Xu discloses:
calculate a second loss function based on second data and correct data corresponding to the first data (Bosch, para [42], “During the training of variational autoencoder 10, this autoencoder is trained, for example using a back-propagation method, in such a way that on the one hand the reconstruction error between input quantity vector x and output quantity vector x′ becomes a small as possible. On the other hand, the training is carried out in such a way that the distribution of the latent quantities z in the latent space corresponds as closely as possible to a specified reference distribution.” The back-propagation method corresponds to the second data and the correct data corresponding to the first data. It is inherently known that backpropagation is the sum of the input/put and distribution deviation.) 
	Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure: 
Nan Yang et al., Model Loss and Distribution Analysis of Regression Problems in Machine Learning, discloses the use of two loss functions based on distributions that are normalized.
	Any inquiry concerning this communication or earlier communications from the examiner should be directed to JAMIE RYAN BROBERG whose telephone number is (571)270-7583. The examiner can normally be reached Monday - Friday (8:30 am - 5 pm) PST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Omar Fernandez can be reached on571-272-2589. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/J.R.B./Examiner, Art Unit 4184        

/OMAR F FERNANDEZ RIVAS/Supervisory Patent Examiner, Art Unit 2128