Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Detailed Action

1.	The Examiner acknowledges the applicant’s amendment filed 7/29/2022.  At this point claims 1-13 are pending in the instant application and ready for examination by the Examiner.

Response to Arguments
2.	Applicant’s arguments filed on 7/29/2022 for claims 1-13 have been fully considered but are not persuasive.

3.	Applicant’s argument:
Rejections Under 35 U.S.C. § 103

Claims 1-3, 5-11, and 13 are rejected under 35 U.S.C. 103 Maitra (U.S. Publication No. 2018/0341871), Nishimura (U.S. Publication No. 2018/0032865), Puri (U.S. Publication No. 2007/0047802), and Haruki (U.S. Publication No. 2018/0121806). Claims 4 and 12 are rejected under 35 U.S.C. 103 over Maitra, Nishimura, Puri, Haruki, and Madabhushi (U.S. Publication No. 2018/0129911).

Without conceding the merits of the rejections and for the sole purpose of expediting prosecution of this application, Applicant has amended independent claim 1 in accordance with the discussion of the July 18 interview. As discussed during the interview, the cited portions of art used in the rejection do not teach or suggest at least “wherein the front-end network includes at least one of a convolutional layer or pooling layer and wherein the first GPU and the other GPUs each stores a complete set of parameters of each layer of the front-end network” and “wherein the back-end network includes a convolutional-softmax (Conv-softmax) combination and wherein the parameters of each layer of the back-end network are distributed into subsets and stored among the first GPU and the other GPUs” (emphasis added), in the context of amended claim 1. Accordingly, Applicant respectfully requests withdrawal of the Section 103 rejection of claim 1, as well as the Section 103 rejections of claims 2-5, which depend from claim 1.

Examiner’s answer:
New art Kluckner is used to address the new amendments. 

Claim Rejections - 35 USC § 103
4.	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

Claim(s) 1-3, 5-11 and 13 is/are rejected under 35 U.S.C. 103 as being unpatentable over Kluckner in view of Maruki in view of Nishimura and further in view of Puri. (U. S. Patent Publication 20200265263, referred to as Kluckner; U. S. Patent Publication 20180121806, referred to as Haruki; U. S. Patent Publication 2018032865, referred to as Nishimura; U. S. Patent Publication 20070047802, referred to as Puri)

Claim 1
Kluckner discloses a method for updating a convolutional neural network by using a graphics processing unit (GPU) cluster, the GPU cluster including a first GPU and a plurality of other GPUs, wherein the method is performed by the first GPU and comprises: obtaining a sample with a classification label (Kluckner, 0082; For each of the CNNs (e.g., CNN.sub.A, CNN.sub.B, CNN.sub.C), multiple sets of training examples are used to train the individual CNNs. The CNNs (e.g., CNN.sub.A, CNN.sub.B, CNN.sub.C) may be trained by providing ground truth labels 507 for each together with training images as training input.); performing a first operation corresponding to each layer of a front-end network of the convolutional neural network on the sample based on the parameters of each layer of the front-end network, to obtain a first operation result of the sample, wherein the front-end network includes at least one of a convolutional layer or pooling layer and wherein the first GPU and the other GPUs each stores a complete set of parameters of each layer of the front-end network (Kluckner, fig 5c; There are three locations which each have three convolutional layers and pooling layers. Examples if SG ground truths, SBP ground truths and SPP ground truths are examples of stored parameters.); performing a corresponding second operation on the sample based on the first operation result and a subset of parameters of each layer a of back-end network of the convolutional neural network that the first GPU stores, to obtain a second operation result wherein the back- end network includes a convolutional-softmax (Conv-softmax) combination and wherein the parameters of each layer of the back-end network are distributed into subsets and stored among the first GPU and the other GPUs. (Kluckner, fig 5c; Each item 535 has a softmax portion before outputting the result.)
Kluckner does not disclose expressly separately sending the first operation result to the other GPUs, so that each other GPU performs a corresponding third operation on the sample based on their respective subset of the parameters of each layer of the back-end network and the first operation result; receiving a third operation result obtained after each other GPU performs the corresponding third operation.
Haruki discloses separately sending the first operation result to the other GPUs, so that each other GPU performs a corresponding third operation on the sample based on their respective subset of the parameters of each layer of the back-end network and the first operation result (Haruki, fig 3; GPU1 (item 312B) sends results to GPU0 (item 312A). The result is the set up to the third operation.); receiving a third operation result obtained after each other GPU performs the corresponding third operation. (Haruki, fig 3 item 318, 0038; The gradients are collected and summed up by a parameter server such as GPU0 and the server calculates new model parameters to update the model. In this timeline diagram 300, parameter updating is divided into accumulation 318 and updating 320.) It would have been obvious to one having ordinary skill in the art, having the teachings of Kluckner and Haruki before him before the effective filing date of the claimed invention, to modify Kluckner to incorporate a front end and back end of a CNN; Updating a CNN; Employing the CNN with new input data; Generating an error while processing data; Updating parameters and sending out updated parameters; Having a distributed network; Producing results of Haruki. Given the advantage of different ends of a CNN having different functions allows the CNN to have a broader scope of uses; having a model which flexes with new input data; an error can be employed as a function to update the model; updating the model for refined results; allowing the invention to have the workload distributed to lower individual computational costs; and obtaining answers, one having ordinary skill in the art would have been motivated to make this obvious modification. 
Kluckner and Haruki do not disclose expressly combining the second operation result and the third operation result to obtain a classification result of the sample.
Nishimura discloses combining the second operation result and the third operation result to obtain a classification result of the sample. (Nishimura, 0040; The multilayer neural network structure 23 outputs the result of recognition of the input image I by the CNN.) It would have been obvious to one having ordinary skill in the art, having the teachings of Kluckner , Maitra and Nishimura before him before the effective filing date of the claimed invention to modify Kluckner  and Maitra to incorporate basic structure design of a convolutional neural network, step wise progression through the model to a result of a classification, having a reusable property of Nishimura. Given the advantage of one skilled in the art knows the result of a convolutional neural network, to obtain a usable result for classification, to employ the invention in a real world environment, one having ordinary skill in the art would have been motivated to make this obvious modification.
Kluckner, Haruki and Nishimura do not disclose expressly determining a prediction error based on the classification result and the classification label of the sample; and updating the convolutional neural network based on the prediction error.
Puri discloses determining a prediction error based on the classification result and the classification label of the sample (Puri, 0029; Next, at action 550, an error function is used to compute how far off of the expected output the neural network was.); and updating the convolutional neural network based on the prediction error. (Puri, 0029; By computing a gradient function, which comprises partial derivatives for each entry of each neural network matrix with respect to the error, the GPU can compute how much to adjust each matrix according to the gradient descent method.) It would have been obvious to one having ordinary skill in the art, having the teachings of Kluckner, Maitra, Nishimura and Puri before him before the effective filing date of the claimed invention, to modify Kluckner, Maitra and Nishimura to incorporate inputting data, implementing a function for a refined result, sending the refined result to the next stage processing, adjust the matrix, employing multiple GPU, employing a threshold as a decision engine, of Puri. Given the advantage of processing inputted data to for the next stage of the CNN, a result can be further processed and/or refined, the competition of the next stage, for a refined CNN model, to speed results faster, to obtain a confidence rating with the prediction one having ordinary skill in the art would have been motivated to make this obvious modification.

Claim 2
Kluckner does not disclose expressly receiving a third operation result obtained after each other GPU performs the corresponding third operation on the another sample.
Haruki discloses receiving a third operation result obtained after each other GPU performs the corresponding third operation on the another sample. (Haruki, fig 3 item 318, 0038; The gradients are collected and summed up by a parameter server such as GPU0 and the server calculates new model parameters to update the model. In this timeline diagram 300, parameter updating is divided into accumulation 318 and updating 320.) It would have been obvious to one having ordinary skill in the art, having the teachings of Kluckner and Haruki before him before the effective filing date of the claimed invention, to modify Kluckner to incorporate a front end and back end of a CNN; Updating a CNN; Employing the CNN with new input data; Generating an error while processing data; Updating parameters and sending out updated parameters; Having a distributed network; Producing results of Haruki. Given the advantage of different ends of a CNN having different functions allows the CNN to have a broader scope of uses; having a model which flexes with new input data; an error can be employed as a function to update the model; updating the model for refined results; allowing the invention to have the workload distributed to lower individual computational costs; and obtaining answers, one having ordinary skill in the art would have been motivated to make this obvious modification. 
Kluckner and Haruki do not disclose expressly combining the second operation result of the another sample and the third operation result of the another sample to obtain a classification result of the another sample. 
Nishimura discloses combining the second operation result of the another sample and the third operation result of the another sample to obtain a classification result of the another sample. (Nishimura, 0040; The multilayer neural network structure 23 outputs the result of recognition of the input image I by the CNN.) It would have been obvious to one having ordinary skill in the art, having the teachings of Kluckner , Maitra and Nishimura before him before the effective filing date of the claimed invention to modify Kluckner  and Maitra to incorporate basic structure design of a convolutional neural network, step wise progression through the model to a result of a classification, having a reusable property of Nishimura. Given the advantage of one skilled in the art knows the result of a convolutional neural network, to obtain a usable result for classification, to employ the invention in a real world environment, one having ordinary skill in the art would have been motivated to make this obvious modification.
Kluckner, Haruki and Nishimura do not disclose expressly receiving a first operation result of another sample from a second GPU, wherein the second GPU is any one of the other GPUs, and the another sample has a classification label; performing the corresponding second operation on the another sample based on the first operation result of the another sample and the subset of the parameters of each layer of the back-end network that the first GPU stores, to obtain a second operation result;….determining the prediction error includes: determining the prediction error based on (a) the classification result and the classification label of the sample and (b) the classification result and the classification label of the another sample.
Puri discloses receiving a first operation result of another sample from a second GPU, wherein the second GPU is any one of the other GPUs, and the another sample has a classification label (Puri, 0027; Training samples typically involve many (on the order of tens of thousands) samples of handwritten characters, along with an indication of the correct character each should be interpreted as.); performing the corresponding second operation on the another sample based on the first operation result of the another sample and the subset of the parameters of each layer of the back-end network that the first GPU stores, to obtain a second operation result (Puri, 0005, fig 2; As FIG. 2 illustrates, one method of implementing neural networks is to treat each level as a matrix of neuron values, as is illustrated by layer 0 matrix 210. Connection strengths can then be implemented as a transformation matrix 220, which is multiplied by the layer 0 matrix 210. This multiplication allows each value in the previous layer to be scaled according to connection strengths, and then summed, all through normal matrix multiplication. After the multiplication is performed, a bias matrix 230 is then added to the product matrix to account for the threshold of each neuron in the next level. Then a sigmoid function (in one implementation, tan h( ))is applied to each resultant value to determine if the threshold was met, and the resulting values are placed in the matrix for the next layer. EC: Puri shows computation of multiplication and addition occurs at every level past the first layer.);…. determining the prediction error includes: determining the prediction error based on (a) the classification result and the classification label of the sample and (b) the classification result and the classification label of the another sample. (Puri, 0029; Next, at action 550, an error function is used to compute how far off of the expected output the neural network was. …. Examples of one implementation of the equations used in these forward and backward passes are described in Section 4. Finally, at action 580, the forward-pass/backward-pass steps of process 500 are repeated as long as there are more sample inputs. At the end of the sample inputs, the network has been trained over those inputs and the process ends.) It would have been obvious to one having ordinary skill in the art, having the teachings of Kluckner, Maitra, Nishimura and Puri before him before the effective filing date of the claimed invention, to modify Kluckner, Maitra and Nishimura to incorporate inputting data, implementing a function for a refined result, sending the refined result to the next stage processing, adjust the matrix, employing multiple GPU, employing a threshold as a decision engine, of Puri. Given the advantage of processing inputted data to for the next stage of the CNN, a result can be further processed and/or refined, the competition of the next stage, for a refined CNN model, to speed results faster, to obtain a confidence rating with the prediction one having ordinary skill in the art would have been motivated to make this obvious modification.

Claim 3
Kluckner does not disclose expressly determining first update parameters of each layer of the front-end network and second update parameters of each layer of the back-end network based on the prediction error; updating the parameters of each layer of the front-end network that the first GPU stores based on the first update parameters; updating the subset of the parameters of each layer of the back-end network that the first GPU stores based on a subset of the second update parameters that correspond to the subset of the parameters of each layer of the back-end network that the first GPU stores; and sending the first update parameters and a corresponding subset of second update parameters to each other GPU, so that each other GPU updates its respective subset of the parameters of each layer of back-end network based on the corresponding subset of the second update parameters.
Haruki discloses determining first update parameters of each layer of the front-end network and second update parameters of each layer of the back-end network based on the prediction error (Haruki, fig 3 item 318, 0038; The gradients are collected and summed up by a parameter server such as GPU0 and the server calculates new model parameters to update the model. In this timeline diagram 300, parameter updating is divided into accumulation 318 and updating 320.); updating the parameters of each layer of the front-end network that the first GPU stores based on the first update parameters (Haruki, 0018; In the update parameter phase, all the parameters are updated using the gradients.); updating the subset of the parameters of each layer of the back-end network that the first GPU stores based on a subset of the second update parameters that correspond to the subset of the parameters of each layer of the back-end network that the first GPU stores (Haruki, 0018; In the update parameter phase, all the parameters are updated using the gradients.); and sending the first update parameters and a corresponding subset of second update parameters to each other GPU, so that each other GPU updates its respective subset of the parameters of each layer of back-end network based on the corresponding subset of the second update parameters. (Haruki, 0019; Once they have all finished the backward phase, they exchange gradients and a server GPU updates the parameters. The updated parameters are synchronized among the GPUs at the beginning of the next training iteration to ensure that all GPUs use the same parameters for training.) It would have been obvious to one having ordinary skill in the art, having the teachings of Kluckner and Haruki before him before the effective filing date of the claimed invention, to modify Kluckner to incorporate a front end and back end of a CNN; Updating a CNN; Employing the CNN with new input data; Generating an error while processing data; Updating parameters and sending out updated parameters; Having a distributed network; Producing results of Haruki. Given the advantage of different ends of a CNN having different functions allows the CNN to have a broader scope of uses; having a model which flexes with new input data; an error can be employed as a function to update the model; updating the model for refined results; allowing the invention to have the workload distributed to lower individual computational costs; and obtaining answers, one having ordinary skill in the art would have been motivated to make this obvious modification. 

Claim 5
Kluckner does not disclose expressly wherein the first GPU communicates with the other GPUs based on an application programming interface function library supporting distributed communication and computing.
Haruki discloses wherein the first GPU communicates with the other GPUs based on an application programming interface function library supporting distributed communication and computing. (Haruki, 0037; Alternatively, the GPUs 210 may reside elsewhere and be coupled to the host 100 with a high speed communication link as known in the art.) It would have been obvious to one having ordinary skill in the art, having the teachings of Kluckner and Haruki before him before the effective filing date of the claimed invention, to modify Kluckner to incorporate a front end and back end of a CNN; Updating a CNN; Employing the CNN with new input data; Generating an error while processing data; Updating parameters and sending out updated parameters; Having a distributed network; Producing results of Haruki. Given the advantage of different ends of a CNN having different functions allows the CNN to have a broader scope of uses; having a model which flexes with new input data; an error can be employed as a function to update the model; updating the model for refined results; allowing the invention to have the workload distributed to lower individual computational costs; and obtaining answers, one having ordinary skill in the art would have been motivated to make this obvious modification. 

Claim 6
Kluckner discloses a method for updating a convolutional neural network by using a graphics processing unit (GPU) cluster, the GPU cluster including a first GPU and a plurality of other GPUs, wherein the method is performed by a second GPU of the other GPUs and comprises: (Kluckner, 0082; For each of the CNNs (e.g., CNN.sub.A, CNN.sub.B, CNN.sub.C), multiple sets of training examples are used to train the individual CNNs. The CNNs (e.g., CNN.sub.A, CNN.sub.B, CNN.sub.C) may be trained by providing ground truth labels 507 for each together with training images as training input.); receiving a first operation result, sent by the first GPU, of a sample with a classification label wherein a front-end network of the convolutional neural network includes at least one of a convolutional layer or pooling layer and wherein the first GPU and the other GPUs each stores a complete set of parameters of each layer of the front-end network. (Kluckner, fig 5c; There are three locations which each have three convolutional layers and pooling layers. Examples if SG ground truths, SBP ground truths and SPP ground truths are examples of stored parameters.) performing a corresponding third operation on the sample based on the first operation result and a subset of parameters of each layer of a back-end network of the convolutional neural network that the second GPU stores, to obtain a third operation result, wherein the back- end network includes a convolutional-softmax (Conv-softmax) combination and wherein the parameters of each layer of the back-end network are distributed into subsets and stored among the first GPU and the other GPUs. (Kluckner, fig 5c; Each item 535 has a softmax portion before outputting the result.)
Kluckner does not disclose expressly sending the third operation result to the first GPU, so that after obtaining a third operation result sent by each other GPU, the first GPU combines a second operation result and the third operation result of the sample.
Haruki discloses sending the third operation result to the first GPU, so that after obtaining a third operation result sent by each other GPU, the first GPU combines a second operation result and the third operation result of the sample. (Haruki, fig 3 item 318, 0038; The gradients are collected and summed up by a parameter server such as GPU0 and the server calculates new model parameters to update the model. In this timeline diagram 300, parameter updating is divided into accumulation 318 and updating 320.) It would have been obvious to one having ordinary skill in the art, having the teachings of Kluckner and Haruki before him before the effective filing date of the claimed invention, to modify Kluckner to incorporate a front end and back end of a CNN; Updating a CNN; Employing the CNN with new input data; Generating an error while processing data; Updating parameters and sending out updated parameters; Having a distributed network; Producing results of Haruki. Given the advantage of different ends of a CNN having different functions allows the CNN to have a broader scope of uses; having a model which flexes with new input data; an error can be employed as a function to update the model; updating the model for refined results; allowing the invention to have the workload distributed to lower individual computational costs; and obtaining answers, one having ordinary skill in the art would have been motivated to make this obvious modification. 
Kluckner and Haruki do not disclose expressly to obtain a classification result of the sample, wherein the second operation result of the sample is obtained after the first GPU performs a corresponding second operation on the sample based on the first operation result and a subset of the parameters of each layer of the back-end network that the first GPU stores.
Nishimura discloses to obtain a classification result of the sample (Nishimura, 0040; The multilayer neural network structure 23 outputs the result of recognition of the input image I by the CNN.), wherein the second operation result of the sample is obtained after the first GPU performs a corresponding second operation on the sample based on the first operation result and a subset of the parameters of each layer of the back-end network that the first GPU stores. (Nishimura, 0042; Each of the filters 21a has a predetermined pixel size lower than the pixel size of an input image; each pixel of the corresponding filter 21a has a weight, i.e. weight value. The weight of each pixel of each of the filters 21a can be biased. EC: Nishimura is disclosing the original input is segmented into different (and smaller) portions for evaluation.) It would have been obvious to one having ordinary skill in the art, having the teachings of Kluckner , Maitra and Nishimura before him before the effective filing date of the claimed invention to modify Kluckner  and Maitra to incorporate basic structure design of a convolutional neural network, step wise progression through the model to a result of a classification, having a reusable property of Nishimura. Given the advantage of one skilled in the art knows the result of a convolutional neural network, to obtain a usable result for classification, to employ the invention in a real world environment, one having ordinary skill in the art would have been motivated to make this obvious modification.
Kluckner, Haruki and Nishimura do not disclose expressly and the first GPU further determines a prediction error based on the classification result and the classification label of the sample, and updates the convolutional neural network based on the prediction error.
Puri discloses and the first GPU further determines a prediction error based on the classification result and the classification label of the sample (Puri, 0029; Next, at action 550, an error function is used to compute how far off of the expected output the neural network was.), and updates the convolutional neural network based on the prediction error. (Puri, 0029; By computing a gradient function, which comprises partial derivatives for each entry of each neural network matrix with respect to the error, the GPU can compute how much to adjust each matrix according to the gradient descent method.) It would have been obvious to one having ordinary skill in the art, having the teachings of Kluckner, Maitra, Nishimura and Puri before him before the effective filing date of the claimed invention, to modify Kluckner, Maitra and Nishimura to incorporate inputting data, implementing a function for a refined result, sending the refined result to the next stage processing, adjust the matrix, employing multiple GPU, employing a threshold as a decision engine, of Puri. Given the advantage of processing inputted data to for the next stage of the CNN, a result can be further processed and/or refined, the competition of the next stage, for a refined CNN model, to speed results faster, to obtain a confidence rating with the prediction one having ordinary skill in the art would have been motivated to make this obvious modification.

Claim 7
Kluckner does not disclose expressly, to obtain a first operation result of the another sample, and performing the corresponding third operation on the another sample based on the first operation result of the another sample and the subset of the parameters of each layer of the back-end network that the second GPU stores, to obtain a third operation result. 
Haruki discloses, to obtain a first operation result of the another sample, and performing the corresponding third operation on the another sample based on the first operation result of the another sample and the subset of the parameters of each layer of the back-end network that the second GPU stores, to obtain a third operation result. (Haruki, fig 3 item 318, 0038; The gradients are collected and summed up by a parameter server such as GPU0 and the server calculates new model parameters to update the model. In this timeline diagram 300, parameter updating is divided into accumulation 318 and updating 320. EC: Puri discloses multiple training and testing cycles.) It would have been obvious to one having ordinary skill in the art, having the teachings of Kluckner and Haruki before him before the effective filing date of the claimed invention, to modify Kluckner to incorporate a front end and back end of a CNN; Updating a CNN; Employing the CNN with new input data; Generating an error while processing data; Updating parameters and sending out updated parameters; Having a distributed network; Producing results of Haruki. Given the advantage of different ends of a CNN having different functions allows the CNN to have a broader scope of uses; having a model which flexes with new input data; an error can be employed as a function to update the model; updating the model for refined results; allowing the invention to have the workload distributed to lower individual computational costs; and obtaining answers, one having ordinary skill in the art would have been motivated to make this obvious modification. 
Kluckner and Haruki do not disclose expressly to obtain a classification result of the another sample.
Nishimura discloses to obtain a classification result of the another sample.  (Nishimura, 0040; The multilayer neural network structure 23 outputs the result of recognition of the input image I by the CNN.) It would have been obvious to one having ordinary skill in the art, having the teachings of Kluckner , Maitra and Nishimura before him before the effective filing date of the claimed invention to modify Kluckner  and Maitra to incorporate basic structure design of a convolutional neural network, step wise progression through the model to a result of a classification, having a reusable property of Nishimura. Given the advantage of one skilled in the art knows the result of a convolutional neural network, to obtain a usable result for classification, to employ the invention in a real world environment, one having ordinary skill in the art would have been motivated to make this obvious modification.
Kluckner, Haruki and Nishimura do not disclose expressly obtaining another sample with a classification label; performing a first operation corresponding to each layer of the front-end network on the another sample based on the parameters of each layer of front-end network ….sending the first operation result of the another sample to the first GPU and a third GPU of the other GPUs, so that the first GPU performs the corresponding second operation on the another sample based on the subset of the parameters of each layer of the back-end network that the first GPU stores and the first operation result of the another sample, to obtain a second operation result, and the third GPU performs the corresponding third operation on the another sample based on a subset of the parameters of each layer of the back-end network that the third GPU stores and the first operation result of the another sample, to obtain a third operation result; and sending the third operation result of the another sample to the first GPU, so that the first GPU combines the second operation result and third operation results sent by the other GPUs,….and determines the prediction error based on the classification result and the classification label of the sample and the classification result and the classification label of the another sample.
Puri discloses obtaining another sample with a classification label (Puri, 0027; Training samples typically involve many (on the order of tens of thousands) samples of handwritten characters, along with an indication of the correct character each should be interpreted as.); performing a first operation corresponding to each layer of the front-end network on the another sample based on the parameters of each layer of front-end network (Puri, 0027, 0005, fig 2; ‘Training samples typically involve many (on the order of tens of thousands) samples of handwritten characters, along with an indication of the correct character each should be interpreted as.’ And ‘ As FIG. 2 illustrates, one method of implementing neural networks is to treat each level as a matrix of neuron values, as is illustrated by layer 0 matrix 210. Connection strengths can then be implemented as a transformation matrix 220, which is multiplied by the layer 0 matrix 210. This multiplication allows each value in the previous layer to be scaled according to connection strengths, and then summed, all through normal matrix multiplication. After the multiplication is performed, a bias matrix 230 is then added to the product matrix to account for the threshold of each neuron in the next level. Then a sigmoid function (in one implementation, tan h( ))is applied to each resultant value to determine if the threshold was met, and the resulting values are placed in the matrix for the next layer.’)….sending the first operation result of the another sample to the first GPU and a third GPU of the other GPUs, so that the first GPU performs the corresponding second operation on the another sample based on the subset of the parameters of each layer of the back-end network that the first GPU stores and the first operation result of the another sample, to obtain a second operation result, and the third GPU performs the corresponding third operation on the another sample based on a subset of the parameters of each layer of the back-end network that the third GPU stores and the first operation result of the another sample, to obtain a third operation result (Puri, 0005, fig 2; As FIG. 2 illustrates, one method of implementing neural networks is to treat each level as a matrix of neuron values, as is illustrated by layer 0 matrix 210. Connection strengths can then be implemented as a transformation matrix 220, which is multiplied by the layer 0 matrix 210. This multiplication allows each value in the previous layer to be scaled according to connection strengths, and then summed, all through normal matrix multiplication. After the multiplication is performed, a bias matrix 230 is then added to the product matrix to account for the threshold of each neuron in the next level. Then a sigmoid function (in one implementation, tan h( ))is applied to each resultant value to determine if the threshold was met, and the resulting values are placed in the matrix for the next layer. EC: Puri shows computation of multiplication and addition occurs at every level past the first layer.); and sending the third operation result of the another sample to the first GPU, so that the first GPU combines the second operation result and third operation results sent by the other GPUs (Puri, 0005, fig 2),….and determines the prediction error based on the classification result and the classification label of the sample and the classification result and the classification label of the another sample. (Puri, 0029; ‘Next, at action 550, an error function is used to compute how far off of the expected output the neural network was.’ and ‘By computing a gradient function, which comprises partial derivatives for each entry of each neural network matrix with respect to the error, the GPU can compute how much to adjust each matrix according to the gradient descent method.’) It would have been obvious to one having ordinary skill in the art, having the teachings of Kluckner, Maitra, Nishimura and Puri before him before the effective filing date of the claimed invention, to modify Kluckner, Maitra and Nishimura to incorporate inputting data, implementing a function for a refined result, sending the refined result to the next stage processing, adjust the matrix, employing multiple GPU, employing a threshold as a decision engine, of Puri. Given the advantage of processing inputted data to for the next stage of the CNN, a result can be further processed and/or refined, the competition of the next stage, for a refined CNN model, to speed results faster, to obtain a confidence rating with the prediction one having ordinary skill in the art would have been motivated to make this obvious modification.

Claim 8
Kluckner discloses a method for classifying images by using a graphics processing unit (GPU) cluster, the GPU cluster including a first GPU and a plurality of other GPUs, wherein the method is performed by the first GPU and comprises: obtaining a to-be-classified image (Kluckner, 0082; For each of the CNNs (e.g., CNN.sub.A, CNN.sub.B, CNN.sub.C), multiple sets of training examples are used to train the individual CNNs. The CNNs (e.g., CNN.sub.A, CNN.sub.B, CNN.sub.C) may be trained by providing ground truth labels 507 for each together with training images as training input.); performing a first operation corresponding to each layer of a front-end network on the to-be-classified image based on the parameters of each layer of the front-end network, to obtain a first operation result of the to-be-classified image wherein the front-end network is part of a convolutional neural network and includes at least one of a convolutional layer or pooling layer and wherein the first GPU and the other GPUs each stores a complete set of parameters of each layer of the front-end network (Kluckner, fig 5c, 0082; ‘There are three locations which each have three convolutional layers and pooling layers. Examples if SG ground truths, SBP ground truths and SPP ground truths are examples of stored parameters.’ And  ‘For each of the CNNs (e.g., CNN.sub.A, CNN.sub.B, CNN.sub.C), multiple sets of training examples are used to train the individual CNNs. The CNNs (e.g., CNN.sub.A, CNN.sub.B, CNN.sub.C) may be trained by providing ground truth labels 507 for each together with training images as training input.’); performing a corresponding second operation on the to-be-classified image based on the first operation result and a subset of parameters of each layer of a back-end network that the first GPU stores, to obtain a second operation result, wherein the back-end network is part of the convolutional neural network and includes a convolutional-softmax (Conv-softmax) combination and wherein the parameters of each layer of the back-end network are distributed into subsets and stored among the first GPU and the other GPUs. (Kluckner, fig 5c; Each item 535 has a softmax portion before outputting the result.)
Kluckner does not disclose expressly separately sending the first operation result to the other GPUs, so that each other GPU performs a corresponding third operation …. receiving a third operation result obtained after each other GPU performs the corresponding third operation.
Haruki discloses separately sending the first operation result to the other GPUs, so that each other GPU performs a corresponding third operation (Haruki, fig 3; GPU1 (item 312B) sends results to GPU0 (item 312A). The result is the set up to the third operation.) …. receiving a third operation result obtained after each other GPU performs the corresponding third operation. (Haruki, fig 3 item 318, 0038; The gradients are collected and summed up by a parameter server such as GPU0 and the server calculates new model parameters to update the model. In this timeline diagram 300, parameter updating is divided into accumulation 318 and updating 320.) It would have been obvious to one having ordinary skill in the art, having the teachings of Kluckner and Haruki before him before the effective filing date of the claimed invention, to modify Kluckner to incorporate a front end and back end of a CNN; Updating a CNN; Employing the CNN with new input data; Generating an error while processing data; Updating parameters and sending out updated parameters; Having a distributed network; Producing results of Haruki. Given the advantage of different ends of a CNN having different functions allows the CNN to have a broader scope of uses; having a model which flexes with new input data; an error can be employed as a function to update the model; updating the model for refined results; allowing the invention to have the workload distributed to lower individual computational costs; and obtaining answers, one having ordinary skill in the art would have been motivated to make this obvious modification. 
Kluckner and Haruki do not disclose expressly on the to-be-classified image based on their respective subset of the parameters of each layer of the back-end network and the first operation result; combining the second operation result and the third operation result to obtain a classification result of the to-be-classified image.
Nishimura discloses on the to-be-classified image (Nishimura, 0040; The multilayer neural network structure 23 outputs the result of recognition of the input image I by the CNN.) based on their respective subset of the parameters of each layer of the back-end network and the first operation result (Nishimura, 0042; Each of the filters 21a has a predetermined pixel size lower than the pixel size of an input image; each pixel of the corresponding filter 21a has a weight, i.e. weight value. The weight of each pixel of each of the filters 21a can be biased. EC: Nishimura is disclosing the original input is segmented into different (and smaller) portions for evaluation.); combining the second operation result and the third operation result to obtain a classification result of the to-be-classified image. (Nishimura, 0040; The multilayer neural network structure 23 outputs the result of recognition of the input image I by the CNN.)

Claim 9
Kluckner discloses an apparatus for updating a convolutional neural network by using a graphics processing unit (GPU) cluster, the GPU cluster including a first GPU and a plurality of second GPUs, where the apparatus comprises: an acquisition unit, configured to obtain a sample with a classification label; a first operation unit, configured to perform a first operation corresponding to each layer of a front-end network on the sample obtained by the acquisition unit based on parameters of each layer of the front-end network, to obtain a first operation result of the sample wherein the front-end network is part of the convolutional neural network and includes at least one of a convolutional layer or pooling layer and wherein the first GPU and the other GPUs each stores a complete set of parameters of each layer of the front-end network (Kluckner, fig 5c, 0082; ‘There are three locations which each have three convolutional layers and pooling layers. Examples if SG ground truths, SBP ground truths and SPP ground truths are examples of stored parameters.’ And ‘For each of the CNNs (e.g., CNN.sub.A, CNN.sub.B, CNN.sub.C), multiple sets of training examples are used to train the individual CNNs. The CNNs (e.g., CNN.sub.A, CNN.sub.B, CNN.sub.C) may be trained by providing ground truth labels 507 for each together with training images as training input.’); a second operation unit, configured to perform a corresponding second operation on the sample based on the first operation result and a subset of parameters of each layer of a backend network that the first GPU stores, to obtain a second operation result wherein the back-end network is part of the convolutional neural network and includes a convolutional-softmax (Conv- softmax) combination and wherein the parameters of each layer of the back-end network are distributed into subsets and stored among the first GPU and the other GPUs. (Kluckner, fig 5c; Each item 535 has a softmax portion before outputting the result.)
Kluckner does not disclose expressly a sending unit, configured to separately send the first operation result to the second GPUs, so that each second GPU performs a corresponding third operation on the sample …. a receiving unit, configured to receive a third operation result obtained after each other GPU performs the corresponding third operation; a combining unit, configured to combine the second operation result and the third operation result.
Haruki discloses a sending unit, configured to separately send the first operation result to the second GPUs, so that each second GPU performs a corresponding third operation on the sample (Haruki, fig 3; GPU1 (item 312B) sends results to GPU0 (item 312A). The result is the set up to the third operation.)…. a receiving unit, configured to receive a third operation result obtained after each other GPU performs the corresponding third operation; a combining unit, configured to combine the second operation result and the third operation result. (Haruki, fig 3 item 318, 0038; The gradients are collected and summed up by a parameter server such as GPU0 and the server calculates new model parameters to update the model. In this timeline diagram 300, parameter updating is divided into accumulation 318 and updating 320.) It would have been obvious to one having ordinary skill in the art, having the teachings of Kluckner and Haruki before him before the effective filing date of the claimed invention, to modify Kluckner to incorporate a front end and back end of a CNN; Updating a CNN; Employing the CNN with new input data; Generating an error while processing data; Updating parameters and sending out updated parameters; Having a distributed network; Producing results of Haruki. Given the advantage of different ends of a CNN having different functions allows the CNN to have a broader scope of uses; having a model which flexes with new input data; an error can be employed as a function to update the model; updating the model for refined results; allowing the invention to have the workload distributed to lower individual computational costs; and obtaining answers, one having ordinary skill in the art would have been motivated to make this obvious modification. 
Kluckner and Haruki do not disclose expressly based on their respective subset of the parameters of each layer of the back-end network and the first operation result;…. to obtain a classification result of the sample.
Nishimura discloses based on their respective subset of the parameters of each layer of the back-end network and the first operation result (Nishimura, 0042; Each of the filters 21a has a predetermined pixel size lower than the pixel size of an input image; each pixel of the corresponding filter 21a has a weight, i.e. weight value. The weight of each pixel of each of the filters 21a can be biased. EC: Nishimura is disclosing the original input is segmented into different (and smaller) portions for evaluation.);…. to obtain a classification result of the sample. (Nishimura, 0040; The multilayer neural network structure 23 outputs the result of recognition of the input image I by the CNN.) It would have been obvious to one having ordinary skill in the art, having the teachings of Kluckner , Maitra and Nishimura before him before the effective filing date of the claimed invention to modify Kluckner  and Maitra to incorporate basic structure design of a convolutional neural network, step wise progression through the model to a result of a classification, having a reusable property of Nishimura. Given the advantage of one skilled in the art knows the result of a convolutional neural network, to obtain a usable result for classification, to employ the invention in a real world environment, one having ordinary skill in the art would have been motivated to make this obvious modification.
Kluckner, Haruki and Nishimura do not disclose expressly a determining unit, configured to determine a prediction error based on the classification result and the classification label of the sample; and an updating unit, configured to update the convolutional neural network based on the prediction error determined by the determining unit.
Puri discloses a determining unit, configured to determine a prediction error based on the classification result and the classification label of the sample (Puri, 0029; Next, at action 550, an error function is used to compute how far off of the expected output the neural network was.); and an updating unit, configured to update the convolutional neural network based on the prediction error determined by the determining unit. (Puri, 0029; By computing a gradient function, which comprises partial derivatives for each entry of each neural network matrix with respect to the error, the GPU can compute how much to adjust each matrix according to the gradient descent method.) It would have been obvious to one having ordinary skill in the art, having the teachings of Kluckner, Maitra, Nishimura and Puri before him before the effective filing date of the claimed invention, to modify Kluckner, Maitra and Nishimura to incorporate inputting data, implementing a function for a refined result, sending the refined result to the next stage processing, adjust the matrix, employing multiple GPU, employing a threshold as a decision engine, of Puri. Given the advantage of processing inputted data to for the next stage of the CNN, a result can be further processed and/or refined, the competition of the next stage, for a refined CNN model, to speed results faster, to obtain a confidence rating with the prediction one having ordinary skill in the art would have been motivated to make this obvious modification.

Claim 10
Kluckner does not disclose expressly the receiving unit is further configured to receive a third operation result obtained after each other GPU performs a corresponding third operation on the another sample.
Haruki discloses the receiving unit is further configured to receive a third operation result obtained after each other GPU performs a corresponding third operation on the another sample. (Haruki, fig 3 item 318, 0038; The gradients are collected and summed up by a parameter server such as GPU0 and the server calculates new model parameters to update the model. In this timeline diagram 300, parameter updating is divided into accumulation 318 and updating 320.) It would have been obvious to one having ordinary skill in the art, having the teachings of Kluckner and Haruki before him before the effective filing date of the claimed invention, to modify Kluckner to incorporate a front end and back end of a CNN; Updating a CNN; Employing the CNN with new input data; Generating an error while processing data; Updating parameters and sending out updated parameters; Having a distributed network; Producing results of Haruki. Given the advantage of different ends of a CNN having different functions allows the CNN to have a broader scope of uses; having a model which flexes with new input data; an error can be employed as a function to update the model; updating the model for refined results; allowing the invention to have the workload distributed to lower individual computational costs; and obtaining answers, one having ordinary skill in the art would have been motivated to make this obvious modification. 
Kluckner and Haruki do not disclose expressly the combining unit is further configured to combine the second operation result of the another sample and the third operation result of the another sample to obtain a classification result of the another sample. 
Nishimura discloses the combining unit is further configured to combine the second operation result of the another sample and the third operation result of the another sample to obtain a classification result of the another sample. (Nishimura, 0040; The multilayer neural network structure 23 outputs the result of recognition of the input image I by the CNN.) It would have been obvious to one having ordinary skill in the art, having the teachings of Kluckner , Maitra and Nishimura before him before the effective filing date of the claimed invention to modify Kluckner  and Maitra to incorporate basic structure design of a convolutional neural network, step wise progression through the model to a result of a classification, having a reusable property of Nishimura. Given the advantage of one skilled in the art knows the result of a convolutional neural network, to obtain a usable result for classification, to employ the invention in a real world environment, one having ordinary skill in the art would have been motivated to make this obvious modification.
Kluckner, Haruki and Nishimura do not disclose expressly the receiving unit is further configured to receive a first operation result of another sample from a second GPU, wherein the another sample has a classification label; the second operation unit is further configured to perform a corresponding second operation on the another sample based on the first operation result of the another sample that is received by the receiving unit and the subset of the parameters of each layer of the back-end network that the first GPU stores, to obtain a second operation result;….the determining unit is specifically configured to determine the prediction error based on (a) the classification result and the classification label of the sample and (b) the classification result and the classification label of the another sample.
Puri discloses the receiving unit is further configured to receive a first operation result of another sample from a second GPU, wherein the another sample has a classification label (Puri, 0027; Training samples typically involve many (on the order of tens of thousands) samples of handwritten characters, along with an indication of the correct character each should be interpreted as.); the second operation unit is further configured to perform a corresponding second operation on the another sample based on the first operation result of the another sample that is received by the receiving unit and the subset of the parameters of each layer of the back-end network that the first GPU stores, to obtain a second operation result (Puri, 0005, fig 2; As FIG. 2 illustrates, one method of implementing neural networks is to treat each level as a matrix of neuron values, as is illustrated by layer 0 matrix 210. Connection strengths can then be implemented as a transformation matrix 220, which is multiplied by the layer 0 matrix 210. This multiplication allows each value in the previous layer to be scaled according to connection strengths, and then summed, all through normal matrix multiplication. After the multiplication is performed, a bias matrix 230 is then added to the product matrix to account for the threshold of each neuron in the next level. Then a sigmoid function (in one implementation, tan h( ))is applied to each resultant value to determine if the threshold was met, and the resulting values are placed in the matrix for the next layer. EC: Puri shows computation of multiplication and addition occurs at every level past the first layer.);….the determining unit is specifically configured to determine the prediction error based on (a) the classification result and the classification label of the sample and (b) the classification result and the classification label of the another sample. (Puri, 0029; Next, at action 550, an error function is used to compute how far off of the expected output the neural network was. …. Examples of one implementation of the equations used in these forward and backward passes are described in Section 4. Finally, at action 580, the forward-pass/backward-pass steps of process 500 are repeated as long as there are more sample inputs. At the end of the sample inputs, the network has been trained over those inputs and the process ends.) It would have been obvious to one having ordinary skill in the art, having the teachings of Kluckner, Maitra, Nishimura and Puri before him before the effective filing date of the claimed invention, to modify Kluckner, Maitra and Nishimura to incorporate inputting data, implementing a function for a refined result, sending the refined result to the next stage processing, adjust the matrix, employing multiple GPU, employing a threshold as a decision engine, of Puri. Given the advantage of processing inputted data to for the next stage of the CNN, a result can be further processed and/or refined, the competition of the next stage, for a refined CNN model, to speed results faster, to obtain a confidence rating with the prediction one having ordinary skill in the art would have been motivated to make this obvious modification.

Claim 11
Kluckner does not disclose expressly determine first update parameters of each layer of front-end network and second update parameters of each layer of the back-end network based on the prediction error update the parameters of each layer of the front-end network that the first GPU stores based on the first update parameters; update the subset of the parameters of each layer of the back-end network that the first GPU stores based on a subset of the second update parameters that correspond to the subset of the parameters of each layer of the back-end network that the first GPU stores; and send the first update parameters and a corresponding subset of the second update parameters to each second GPUs, so that each second GPU updates its respective subset of the parameters of each layer of the back-end network based on the corresponding subset of the second update parameters.
Haruki discloses determine first update parameters of each layer of front-end network and second update parameters of each layer of the back-end network based on the prediction error (Haruki, fig 3 item 318, 0038; The gradients are collected and summed up by a parameter server such as GPU0 and the server calculates new model parameters to update the model. In this timeline diagram 300, parameter updating is divided into accumulation 318 and updating 320.); update the parameters of each layer of the front-end network that the first GPU stores based on the first update parameters (Haruki, 0018; In the update parameter phase, all the parameters are updated using the gradients.); update the subset of the parameters of each layer of the back-end network that the first GPU stores based on a subset of the second update parameters that correspond to the subset of the parameters of each layer of the back-end network that the first GPU stores (Haruki, 0018; In the update parameter phase, all the parameters are updated using the gradients.); and send the first update parameters and a corresponding subset of the second update parameters to each second GPUs, so that each second GPU updates its respective subset of the parameters of each layer of the back-end network based on the corresponding subset of the second update parameters. (Haruki, 0019; Once they have all finished the backward phase, they exchange gradients and a server GPU updates the parameters. The updated parameters are synchronized among the GPUs at the beginning of the next training iteration to ensure that all GPUs use the same parameters for training.) It would have been obvious to one having ordinary skill in the art, having the teachings of Kluckner and Haruki before him before the effective filing date of the claimed invention, to modify Kluckner to incorporate a front end and back end of a CNN; Updating a CNN; Employing the CNN with new input data; Generating an error while processing data; Updating parameters and sending out updated parameters; Having a distributed network; Producing results of Haruki. Given the advantage of different ends of a CNN having different functions allows the CNN to have a broader scope of uses; having a model which flexes with new input data; an error can be employed as a function to update the model; updating the model for refined results; allowing the invention to have the workload distributed to lower individual computational costs; and obtaining answers, one having ordinary skill in the art would have been motivated to make this obvious modification. 

Claim 13
Kluckner does not disclose expressly wherein the first GPU communicates with the second GPUs based on an application programming interface function library supporting distributed communication and computing.
Haruki discloses wherein the first GPU communicates with the second GPUs based on an application programming interface function library supporting distributed communication and computing. (Haruki, 0037; Alternatively, the GPUs 210 may reside elsewhere and be coupled to the host 100 with a high speed communication link as known in the art.) It would have been obvious to one having ordinary skill in the art, having the teachings of Kluckner and Haruki before him before the effective filing date of the claimed invention, to modify Kluckner to incorporate a front end and back end of a CNN; Updating a CNN; Employing the CNN with new input data; Generating an error while processing data; Updating parameters and sending out updated parameters; Having a distributed network; Producing results of Haruki. Given the advantage of different ends of a CNN having different functions allows the CNN to have a broader scope of uses; having a model which flexes with new input data; an error can be employed as a function to update the model; updating the model for refined results; allowing the invention to have the workload distributed to lower individual computational costs; and obtaining answers, one having ordinary skill in the art would have been motivated to make this obvious modification. 

Claim(s) 4 and 12 is/are rejected under 35 U.S.C. 103 as being unpatentable over Kluckner, Maitra, Nishimura, Puri and Haruki as applied to claims 1-3, 5-11 and 13 above, and further in view of Madabhushi. (U. S. Patent Publication 20180129911, referred to as Madabhushi)

Claim 4
Kluckner discloses wherein the front-end network includes a convolutional layer and another layer; and performing the first operation corresponding to each layer of front-end network on the sample based on the parameters of each layer of the front-end network, to obtain the first operation result of the sample includes. (Kluckner, fig 5c; There are three locations which each have three convolutional layers and pooling layers. Examples if SG ground truths, SBP ground truths and SPP ground truths are examples of stored parameters.)
Kluckner, Haruki and Nishimura do not disclose expressly performing a convolution operation on the sample based on parameters of the convolutional layer, to obtain a first intermediate result; obtaining second intermediate results that are produced by the other GPUs performing a convolution operation on other samples based on parameters of the convolutional layer that the other GPUs store.
Puri discloses performing a convolution operation on the sample based on parameters of the convolutional layer, to obtain a first intermediate result (Puri, 0024; Similarly, in FIG. 4b, the handwriting sample 450 is combined with a convolutional kernel 460 representing a diagonal line going up and to the right. This results in a patch of pixels 460 which contains the two diagonal lines of the input character.); obtaining second intermediate results that are produced by the other GPUs performing a convolution operation on other samples based on parameters of the convolutional layer that the other GPUs store. (Puri, 0025; Additionally, rather than computing simple matrix multiplication, computations in convolutional neural networks involve more complex mathematics, with increased parallel computation required. EC: Parallel computation means parallel GPUs.) It would have been obvious to one having ordinary skill in the art, having the teachings of Kluckner, Maitra, Nishimura and Puri before him before the effective filing date of the claimed invention, to modify Kluckner, Maitra and Nishimura to incorporate inputting data, implementing a function for a refined result, sending the refined result to the next stage processing, adjust the matrix, employing multiple GPU, employing a threshold as a decision engine, of Puri. Given the advantage of processing inputted data to for the next stage of the CNN, a result can be further processed and/or refined, the competition of the next stage, for a refined CNN model, to speed results faster, to obtain a confidence rating with the prediction one having ordinary skill in the art would have been motivated to make this obvious modification.
Kluckner, Haruki, Nishimura and Puri do not disclose expressly performing normalization processing on the first intermediate result based on the second intermediate results; and performing another operation on the sample based on parameters of the another layer and the first intermediate result obtained after the normalization processing, to obtain the first operation result of the sample.
Madabhushi discloses performing normalization processing on the first intermediate result based on the second intermediate results; and performing another operation on the sample based on parameters of the another layer and the first intermediate result obtained after the normalization processing, to obtain the first operation result of the sample. (Madabhushi, 0069; ‘wherein each layer of front-end network includes a convolutional layer and another layer; and…. performing normalization processing on the first intermediate result based on the second intermediate results; and performing another operation on the sample based on parameters of the another layer and the first intermediate result obtained after the normalization processing, to obtain the first operation result of the sample.’ of applicant maps to ‘In this embodiment, the CNN includes a first layer comprising a convolutional layer, a batch normalization layer, and an activation layer. In this embodiment, the convolutional layer has a has 16 kernels of size 3 and uses a stride of 1.’) It would have been obvious to one having ordinary skill in the art, having the teachings of Kluckner, Maitra, Nishimura, Puri, Haruki and Madabhushi before him before the effective filing date of the claimed invention, to modify Kluckner, Maitra, Nishimura, Puri and Haruki to incorporate a plurality of different layers of Madabhushi. Given the advantage of each layer has a specific purpose and thus ease of modification or editing of the layer one having ordinary skill in the art would have been motivated to make this obvious modification.

Claim 12
Kluckner, Haruki and Nishimura do not disclose expressly the first operation unit is specifically configured to: perform a convolution operation on the sample based on parameters of the convolutional layer, to obtain a first intermediate result; obtain second intermediate results that are produced by the second GPUs performing a convolution operation on other samples based on parameters of the convolutional layer that the other GPUs store. 
Puri discloses the first operation unit is specifically configured to: perform a convolution operation on the sample based on parameters of the convolutional layer, to obtain a first intermediate result (Puri, 0024; Similarly, in FIG. 4b, the handwriting sample 450 is combined with a convolutional kernel 460 representing a diagonal line going up and to the right. This results in a patch of pixels 460 which contains the two diagonal lines of the input character.); obtain second intermediate results that are produced by the second GPUs performing a convolution operation on other samples based on parameters of the convolutional layer that the other GPUs store. (Puri, 0025; Additionally, rather than computing simple matrix multiplication, computations in convolutional neural networks involve more complex mathematics, with increased parallel computation required. EC: Parallel computation means parallel GPUs.) It would have been obvious to one having ordinary skill in the art, having the teachings of Kluckner, Maitra, Nishimura and Puri before him before the effective filing date of the claimed invention, to modify Kluckner, Maitra and Nishimura to incorporate inputting data, implementing a function for a refined result, sending the refined result to the next stage processing, adjust the matrix, employing multiple GPU, employing a threshold as a decision engine, of Puri. Given the advantage of processing inputted data to for the next stage of the CNN, a result can be further processed and/or refined, the competition of the next stage, for a refined CNN model, to speed results faster, to obtain a confidence rating with the prediction one having ordinary skill in the art would have been motivated to make this obvious modification.
Kluckner, Haruki, Nishimura and Puri do not disclose expressly wherein the front-end network includes a convolutional layer and another layer; and….perform normalization processing on the first intermediate result based on the second intermediate results; and perform another operation on the sample based on parameters of the another layer and the first intermediate result obtained after the normalization processing, to obtain the first operation result of the sample.
Madabhushi discloses wherein the front-end network includes a convolutional layer and another layer; and….perform normalization processing on the first intermediate result based on the second intermediate results; and perform another operation on the sample based on parameters of the another layer and the first intermediate result obtained after the normalization processing, to obtain the first operation result of the sample. (Madabhushi, 0069; wherein each layer of front-end network includes a convolutional layer and another layer; and …. perform normalization processing on the first intermediate result based on the second intermediate results; and perform another operation on the sample based on parameters of the another layer and the first intermediate result obtained after the normalization processing, to obtain the first operation result of the sample.’ of applicant maps to ‘In this embodiment, the CNN includes a first layer comprising a convolutional layer, a batch normalization layer, and an activation layer. In this embodiment, the convolutional layer has a has 16 kernels of size 3 and uses a stride of 1.’) It would have been obvious to one having ordinary skill in the art, having the teachings of Kluckner, Maitra, Nishimura, Puri, Haruki and Madabhushi before him before the effective filing date of the claimed invention, to modify Kluckner, Maitra, Nishimura, Puri and Haruki to incorporate a plurality of different layers of Madabhushi. Given the advantage of each layer has a specific purpose and thus ease of modification or editing of the layer one having ordinary skill in the art would have been motivated to make this obvious modification.

5.	Claims 1-13 are rejected.

Conclusion – Final
6.	THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 

Correspondence Information
7.	Any inquiry concerning this information or related to the subject disclosure should be directed to the Examiner Mr. Peter Coughlan, whose telephone number is (571) 272-5990 (Fax 571-273-5990).  The Examiner can be reached on Monday through Friday from 7:15 a.m. to 3:45 p.m.
	If attempts to reach the Examiner by telephone are unsuccessful, the Examiner’s supervisor Mr. Michael Huntley can be reached at (303) 297-4307.  .  Any response to this office action should be mailed to:
	Commissioner of Patents and Trademarks, 
	Washington, D. C. 20231;
Hand delivered to:
	Receptionist, 
	Customer Service Window, 
	Randolph Building, 
	401 Dulany Street,
	Alexandria, Virginia 22313,
	(located on the first floor of the south side of the Randolph Building);
or faxed to:
	(571) 272-3150 (for formal communications intended for entry.)
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/MICHAEL J HUNTLEY/Supervisory Patent Examiner, Art Unit 2129