Detailed Action
This action is in response to Applicant's communications filed 25 November 2020.
Claim(s) 1 and 19 was/were amended.  No claims were cancelled. No claims were withdrawn.  No claims were added.  Therefore, claims 1-20 are pending in this Application.

Response to Amendments/Arguments
Applicant's amendments, filed 25 November 2020, with respect to the objection of claim 19 have been fully considered and are sufficient to overcome the objections.  Accordingly, the objection to the claim has been withdrawn.  Examiner notes that the object to claim 7 was not addressed and thus is maintained.
Applicant's arguments/amendments, filed 25 November 2020, regarding the rejections of claims 1-20 under 35 USC 112(b) have been fully considered and are sufficient to overcome the rejections.  Accordingly, the rejections to the claims under 35 USC 112(b) have been withdrawn.
Applicant's arguments, filed 25 November 2020, regarding the rejections of claims 1-20 under 35 USC 103 have been fully considered but are moot because the arguments do not apply to any of the references being used in the current rejection.
Applicant's arguments, filed 25 November 2020, regarding the rejections of claims 6-7,  under 35 USC 103 have been fully considered but are not persuasive.
Regarding claim 6, Applicant argues that Ooi does not teach speculatively storing updated parameters that will be used for the future.  However, as discussed in the Office Action mailed 26 June 2020, the claim limitations are taught by Ooi, as servers are workers continually send parameters and parameter updates to each other, wherein the servers have memory and the workers work off local memory.
Regarding claim 7, Applicant argues that the prior art does not teach or suggest the features in paragraph [0087] of the Specification.  Although the claims are interpreted in light of the specification, limitations from the specification are not read into the claims.  See In re Van Geuns, 988 F.2d 1181, 26 USPQ2d 1057 (Fed. Cir. 1993).
For the aforementioned reasons, the claims are rejected under 35 USC 103.

Claim Objections
Claim 7 is objected to because of the following informalities:
Claim 7 should recite "first global parameters" instead of "first global" in line 5.
Appropriate correction is required.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been 

The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.


Claim(s) 1-3, 5-10, and 12-17, 19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Ooi et al. (SINGA: A Distributed Deep Learning Platform; Hereinafter "Ooi") in view of Parkhi et al. (Cats and Dogs, hereinafter "Parkhi") and Han et al. ("Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding", Hereinafter "Han").

Regarding Claim 1,
	Ooi teaches a computer-implemented method comprising:
receiving, by a first machine (Figure 4, worker group), a first set of global parameters (Algorithm 1: BPTrainOneBatch, Collect (layer.params()) // receive parameters; "In each iteration, workers collect fresh parameters from servers" p. 687, sec. 4.1 System Architecture, par. 1) from a global parameter server (Figure 4, server group), 
wherein the first set of global parameters are weights of one or more operands used in an algorithm ("If a layer has parameters, these parameters are declared using type Param which consists of a data and a gradient blob for parameter values and gradients respectively." p. 686, sec. 3.1 NeuralNet, par. 2) that models an entity type ("deep learning has been shown to achieve (and even surpass) the accuracy of state-of-the-art algorithms in a variety of tasks, e.g., image classiﬁcation [4] and multi-modal data analysis" p. 685, sec. Introduction, par. 1; this deep learning image classification teaches a model for identifying types of entities);
executing, by multiple learner processors ("a number of workers/servers are logically grouped as a worker/server group" p. 687, sec. 4.1 System Architecture, par. 1) in the first machine (Figure 4, work group), the algorithm using the first set of global parameters and a first mini-batch of data known to describe the entity type ("In each iteration, workers collect fresh parameters from servers and issue update requests to servers after the computation… A worker group loads a subset of the training data and computes the parameter gradients for a complete model replica, known as ParamShard" p. 687, sec. 4.1 System Architecture, par. 1);
generating, by the first machine (Figure 4, work group), a first consolidated set of gradients ("The resultant gradients are sent to the local stub that aggregates the requests and forwards them to corresponding servers for updating." p. 686, sec. 2 Overview, par. 1) that describes a direction for the first set of global parameters in order to improve an accuracy of the algorithm in modeling the entity type ("the parameter gradients for a complete model replica, known as ParamShard" p. 687, sec. 4.1 System Architecture, par. 1) when using the first set of global parameters ("parameters" p. 687, sec. 4.1 System Architecture, par. 1) and the first mini-batch of data known to describe the entity type ("a subset of the training data" p. 687, sec. 4.1 System Architecture, par. 1);
	transmitting, from the first machine, the first consolidated set of gradients to the global parameter server ("A server group maintains one replica of the full model parameters (i.e., a ParamShard), handling requests from multiple worker groups" p. 687, sec. 4.1 System Architecture, par. 1); and
	receiving, by the first machine, a second set of global parameters from the global parameter server, wherein the second set of global parameters is a modification of the first set of global parameters based on the first consolidated set of gradients ("The resultant gradients are sent to the local stub that aggregates the requests and forwards them to corresponding servers for updating.  Servers reply to workers with the updated parameters for the next iteration." p. 686, sec. 2 Overview, par. 1).

("deep learning has been shown to achieve (and even surpass) the accuracy of state-of-the-art algorithms in a variety of tasks, e.g., image classiﬁcation [4] and multi-modal data analysis" p. 685, sec. Introduction, par. 1), Ooi does not explicitly teach a first algorithm that models a first entity type, wherein each of the weights used in the first algorithm has a same value as other weights used in the first algorithm, and wherein the same value of the weights used in the first algorithm is used for parameter weights in a second algorithm that is based on the first algorithm to model a second entity type, wherein the second set of global parameters is used in the second algorithm to model the second entity type; and executing, by the first machine, the second algorithm using the second set of global parameters to describe the second entity type.

Parkhi teaches a first algorithm that models a first entity type ("The maximum response of the cat face detector (Sect. 3.1) on an image is used as an image-level score for the class cat." sec. 4.1, p. 3503), wherein the same value of the weights used in the first algorithm is used for parameter weights in a second algorithm that is based on the first algorithm to model a second entity type ("The same is done to obtain a score for the class dog. Then a linear SVM is learned to discriminate between cats and dogs based on these two scores. The classiﬁcation accuracy of this model on the Oxford-IIIT PET test data is 94.21%." sec. 4.1, p. 3503; this cat/dog image classifier uses a support vector machine wherein the output score determines whether the image displays a cat or a dog, wherein the support vector machine has the same weights for its values for both the first algorithm for cats and the second algorithm for dogs), wherein the second set of global parameters is used in the second algorithm to model the second entity type ("Then a linear SVM is learned to discriminate between cats and dogs" sec. 4.1, p. 3503; the learning of the SVM teaches updated weights that teaches the second set of global parameters, the SVM that classifies cats and dogs teaches the first and second algorithms that model the first and second entity types); and executing the second algorithm using the second set of global parameters to describe the second entity type ("The classiﬁcation accuracy of this model on the Oxford-IIIT PET test data is 94.21%." sec. 4.1, p. 3503; Table 4 shows the accuracy of executing the classifier, with the family column indicating the accuracy of classification of cat or dog).
Ooi and Parkhi are analogous art because both are directed towards machine learning models related to image classification. It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the distributed learning techniques of Ooi with the image classifier of Parkhi.  The modification would have been obvious because one of ordinary skill in the art would be motivated to attempt challenging machine learning projects that have technical and practical interests, as suggested by Parkhi (Parkhi: sec. 1, p. 3498), and to accelerate training speed, as suggested by Ooi (sec. 4.3, p. 688).

The Ooi/Parkhi combination does not explicitly teach wherein each of the weights used in the first algorithm has a same value as other weights used in the first algorithm.
Han teaches wherein each of the weights used in the first algorithm has a same value as other weights used in the first algorithm ("the weights are quantized so that 
Han and Ooi are analogous art because both are directed to machine learning algorithms.  It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the distributed learning techniques of the Ooi/Parkhi combination with the machine learning compression techniques of Han.  The modification would have been obvious because one of ordinary skill in the art would be motivated to improve networks in speed and energy efficiency, as suggested by Han (Han: Abstract, p.1)

Regarding Claim 2,
	The Ooi/Parkhi/Han combination teaches the computer-implemented method of claim 1.  Ooi further teaches receiving, by a second machine (Figure 4, one of the multiple worker groups), the first set of global parameters (Algorithm 1: BPTrainOneBatch, Collect (layer.params()) // receive parameters; "In each iteration, workers collect fresh parameters from servers" p. 687, sec. 4.1 System Architecture, par. 1) from the global parameter server (Figure 4, server group);
	executing, by multiple learner processors ("a number of workers/servers are logically grouped as a worker/server group" p. 687, sec. 4.1 System Architecture, par. 1) in the second machine (Figure 4, work group), the algorithm using the first set of global parameters and a second mini-batch of data known to describe the entity type ("In each iteration, workers collect fresh parameters from servers and issue update requests to servers after the computation… A worker group loads a subset of the training data and computes the parameter gradients for a complete model replica, known as ParamShard" p. 687, sec. 4.1 System Architecture, par. 1);
	generating, by the second machine (Figure 4, work group), a second consolidated set of gradients ("The resultant gradients are sent to the local stub that aggregates the requests and forwards them to corresponding servers for updating." p. 686, sec. 2 Overview, par. 1) that describes a direction for the first set of global parameters in order to improve the accuracy of the algorithm in modeling the first entity type ("the parameter gradients for a complete model replica, known as ParamShard" p. 687, sec. 4.1 System Architecture, par. 1) when using the first set of global parameters ("parameters" p. 687, sec. 4.1 System Architecture, par. 1);
	transmitting, from the second machine, the second consolidated set of gradients to the global parameter server ("A server group maintains one replica of the full model parameters (i.e., a ParamShard), handling requests from multiple worker groups" p. 687, sec. 4.1 System Architecture, par. 1); and
	receiving, by the first machine and the second machine, a third set of global parameters from the global parameter server, wherein the third set of global parameters is a modification of the first set of global parameters based on the first consolidated set of gradients and the second consolidated set of gradients ("The resultant gradients are sent to the local stub that aggregates the requests and forwards them to corresponding servers for updating.  Servers reply to workers with the updated parameters for the next iteration." p. 686, sec. 2 Overview, par. 1; the third set of global parameters are an update of the first set of global parameters).

Parkhi teaches a first algorithm and first entity type ("The maximum response of the cat face detector (Sect. 3.1) on an image is used as an image-level score for the class cat." sec. 4.1, p. 3503).
Ooi and Parkhi are analogous art because both are directed towards machine learning models related to image classification. It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the distributed learning techniques of Ooi with the image classifier of Parkhi.  The modification would have been obvious because one of ordinary skill in the art would be motivated to attempt challenging machine learning projects that have technical and practical interests, as suggested by Parkhi (Parkhi: sec. 1, p. 3498), and to accelerate training speed, as suggested by Ooi (sec. 4.3, p. 688).

Regarding Claim 3,
	The Ooi/Parkhi/Han combination teaches the computer-implemented method of claim 2.  Ooi and Parkhi further teach testing, by a third machine, a set of unknown data (Parkhi: "The complete MSR ASIRRA system is based on a database of several millions images of pets, equally divided between cats and dogs. Our classiﬁers are tested on the 24,990 images that have been made available to the public for research and evaluation purposes." sec. 2.2, p. 3501) using the third set of global parameters (Ooi: "The resultant gradients are sent to the local stub that aggregates the requests and forwards them to corresponding servers for updating.  Servers reply to workers with the updated parameters for the next iteration." p. 686, sec. 2 Overview, par. 1; the third set of global parameters are an update of the first set of global parameters) in order to determine whether the set of unknown data matches the entity type (Parkhi: "Then a linear SVM is learned to discriminate between cats and dogs based on these two scores. The classiﬁcation accuracy of this model on the Oxford-IIIT PET test data is 94.21%." sec. 4.1, p. 3503).
Ooi and Parkhi are analogous art because both are directed towards machine learning models related to image classification. It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the distributed learning techniques of Ooi with the image classifier of Parkhi.  The modification would have been obvious because one of ordinary skill in the art would be motivated to attempt challenging machine learning projects that have technical and practical interests, as suggested by Parkhi (Parkhi: sec. 1, p. 3498), and to accelerate training speed, as suggested by Ooi (sec. 4.3, p. 688).

Regarding Claim 5,
	The Ooi/Parkhi/Han combination teaches the computer-implemented method of claim 1.  Ooi further teaches reading, by all of the multiple learner processors in the first machine, the first set of global parameters and the second set of global parameters from a shared memory in the first machine ("In SINGA implementation, each execution unit (worker or server) is a thread… If two execution units manage the same parameter partition and they are in the same process, SINGA can leverage the shared memory to reduce communication cost." p. 687, sec. 4.2 System Implementation, par. 1).

Regarding Claim 6,
	The Ooi/Parkhi/Han combination teaches the computer-implemented method of claim 1.  Ooi further teaches storing, by one or more processors, the first set of global parameters currently in use by the first machine when executing the first algorithm in a first memory in the first machine ("In SINGA implementation, each execution unit (worker or server) is a thread. As a process may contain multiple threads, there could be multiple (worker/server) groups in one process… the main thread runs as a stub thread as shown in Figure 1, which aggregates local requests and sends them to remote stubs. Hence, each unit only sends and receives messages from its local stub." p. 687, sec. 4.2 System Implementation, par. 1; this teaches that each worker stores parameters that it is using in local memory); and
	storing, by one or more processors, the second set of global parameters being downloaded from the global parameter server for future use, in a second memory in the first machine ("In SINGA implementation, each execution unit (worker or server) is a thread. As a process may contain multiple threads, there could be multiple (worker/server) groups in one process… the main thread runs as a stub thread as shown in Figure 1, which aggregates local requests and sends them to remote stubs. Hence, each unit only sends and receives messages from its local stub." p. 687, sec. 4.2 System Implementation, par. 1; this teaches that a machine stores global parameters in a stub thread that can be sent to workers that request the parameters; "SINGA also supports different neural net partitioning schemes to parallelize the training of large models, namely partitioning on batch dimension, feature dimension or hybrid partitioning" sec. 1, p. 685)
(Figure 4, ParamShard in Server Group) are speculatively generated global parameters (Algorithm 1 and Algorithm 2, Update(layer.params())// send gradients; "servers maintain up-to-date parameters... from workers" sec. 4.1, p. 687; "a server group maintains one replica of the full model parameters (i.e., a ParamShard), handling requests from multiple worker groups" sec. 4.1, p. 687; the global parameters are updated through the gradients determined by the workers), generated by the global parameter server ("a server group maintains one replica of the full model parameters (i.e., a ParamShard), handling requests from multiple worker groups" sec. 4.1, p. 687; the server group uses the gradients from the workers to update the parameters), to be used by the first machine in a next iteration of the algorithm ("In each iteration, workers collect fresh parameters from servers" sec. 4.1, p.687).

Regarding Claim 7,
	The Ooi/Parkhi/Han combination teaches the computer-implemented method of claim 1.  Ooi further teaches wherein a first version number is assigned to the first set of global parameters, wherein a second version number is assigned to the second set of global parameters, and wherein the computer-implemented method further comprises:
identifying, by the first machine, all parameters from the first global according to the first version number ("SINGA comes with many popular protocols for updating parameter values based on gradients. If users want to implement their own updating protocols, they can extend the base Updater to override the Update function." sec. 3.3, p. 687; "For each SGD iteration, every worker calls the Train-OneBatch function to compute gradients of parameters associated with local layers" sec. 3.2, p. 687; the parameters are separated based on iteration, which teaches the first and second version numbers);
determining, by the first machine, that said all parameters from the first global parameters have been used by the first machine to generate the first consolidated set of gradients based on said identifying all parameters from the first global according to the first version number ("In each iteration, workers collect fresh parameters from servers and issue update requests to servers after the computation… A worker group loads a subset of the training data and computes the parameter gradients for a complete model replica, known as ParamShard" p. 687, sec. 4.1 System Architecture, par. 1); and 
in response to determining that all parameters from the first global parameters have been used by the first machine to generate the first consolidated set of gradients ("servers maintain up-to-date parameters and handle get/update requests from workers." sec. 4.1, p. 687), utilizing the second set of global parameters, as identified by the second version number, to generate, by the first machine, a second consolidated set of gradients that further describes the direction for the first set of global parameters ("In each iteration, workers collect fresh parameters from servers and issue update requests to servers after the computation… A worker group loads a subset of the training data and computes the parameter gradients for a complete model replica, known as ParamShard" p. 687, sec. 4.1 System Architecture, par. 1).

Regarding Claims 8-10 and 12-14,
Claim(s) 8-10 and 12-14 recite(s) a computer program product with executable instructions corresponding to the method steps recited in claim(s) 1-3 and 5-7, respectively.  The Ooi/Parkhi/Han combination teaches the limitations of claim(s) 8-10 and 12-14 as set forth above in connection with claim(s) 1-3 and 5-7.  Therefore, claim(s) 8-10 and 12-14 is/are rejected under the same rationale as respective claims 1-3 and 5-7.

Regarding Claim 15,
	The Ooi/Parkhi/Han combination teaches the computer program product of claim 8.  Ooi further teaches wherein the program instructions are provided as a service in a cloud environment (Figure 4, Server groups; "A server group maintains one replica of the full model parameters (i.e., a ParamShard), handling requests from multiple worker groups. Neighboring server groups synchronize their parameters periodically." p. 687, sec. 4.1 System Architecture, par. 1; the synchronizing server groups teach a cloud environment).

Regarding Claims 16-17,
Claim(s) 16-17 recite(s) a computer system including processor, memory, computer readable storage mediums, and executable instructions corresponding to the method steps recited in claim(s) 1-2, respectively.  The Ooi/Parkhi/Han combination teaches the limitations of claim(s) 16-17 as set forth above in connection with claim(s) 1-

Regarding Claim 19,
	The Ooi/Parkhi/Han combination teaches the computer-implemented method of claim 1.  Ooi and Parkhi further teaches wherein the first set of global parameters is a vector (Ooi: "parameters associated with local layers" sec. 3.2, p. 687; "The srclayer vector records all source layers" sec. 3.1, p. 686), wherein the multiple learner processors are graph processing units (GPUs) (Ooi: "GPU support and integration with cluster management software like Mesos will be added in near future" sec. 1, p. 685), and wherein the computer-implemented method further comprises:
generating, by the GPUs, the first consolidated set of gradients (Ooi: "layer.ComputeGradient()" Algorithm 1, Algorithm 2, p. 687) from the first set of global parameters (Ooi: "Collect(layer.params())" Algorithm 1, Algorithm 2, p. 687) and the first mini-batch of data (Ooi: "TrainOneBatch function" sec. 3.2, p. 687) known to describe the first entity type (Parkhi: "Then a linear SVM is learned to discriminate between cats and dogs based on these two scores. The classiﬁcation accuracy of this model on the Oxford-IIIT PET test data is 94.21%." sec. 4.1, p. 3503).
Ooi and Parkhi are analogous art because both are directed towards machine learning models related to image classification. It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the distributed learning techniques of Ooi with the image classifier of Parkhi.  The modification would have been obvious because one of ordinary skill in the art would be motivated to attempt challenging .

Claim 4 and 11 is/are rejected under 35 U.S.C. 103 as being unpatentable over Ooi et al. ("SINGA: A Distributed Deep Learning Platform"; Hereinafter "Ooi") in view of Parkhi et al. (Cats and Dogs, hereinafter "Parkhi") and Han et al. ("Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding", Hereinafter "Han"), and further in view of Forrest et al. (Implementing Neural Network Models on Parallel Computers; Hereinafter "Forrest").

Regarding Claim 4
	The Ooi/Parkhi/Han combination teaches the computer-implemented method of claim 1.  Ooi further teaches wherein the computer-implemented method further comprises: generating each gradient from the first consolidated set of gradients by a different learner processor in the first machine ("In each iteration, every worker calls TrainOneBatch function to compute parameter gradients" p. 685, sec. Overview, par. 1; Algorithm 1: BPTrainOneBatch… 5 layer.ComputeGradient()// backward prop);
	writing, by each of the multiple learner processors in the first machine, each gradient generated by each of the multiple learner processors to a memory in the first machine ("In each iteration, every worker calls TrainOneBatch function to compute parameter gradients" p. 685, sec. Overview, par. 1; Algorithm 1: BPTrainOneBatch… 5 layer.ComputeGradient()// backward prop; "servers maintain up-to-date parameters and handle get/update requests from workers" sec. 4.1, p. 687; Examiner notes that storing computations in memory is inherent in computer processing); and
	consolidating, by the first machine, gradients generated by all of the multiple learner processors in the first machine in order to create the first consolidated set of gradients ("The resultant gradients are sent to the local stub that aggregates the requests and forwards them to corresponding servers for updating." p. 686, sec. 2 Overview, par. 1; "servers maintain up-to-date parameters and handle get/update requests from workers" sec. 4.1, p. 687).

Ooi does not explicitly teach wherein the multiple learner processors are hardware processors.
Forrest teaches wherein the multiple learner processors are hardware processors ("The Edinburgh machine has 42 processors: a host, a graphics processor and 40 'workers' (Figure 2).  Each of these workers is a trasputer with 256K of external memory." sec. 2.1.2, p. 414).
Ooi and Forrest are analogous art because both are directed to implementing neural networks using data parallelism architecture. It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the distributed training of the Ooi/Parkhi/Han combination with the parallel processor architectures of Forrest.  

Regarding Claim 11,
Claim(s) 11 recite(s) a computer program product with executable instructions corresponding to the method steps recited in claim(s) 4.  The Ooi/Parkhi/Han/Forrest combination teaches the limitations of claim(s) 11 as set forth above in connection with claim(s) 4.  Therefore, claim(s) 11 is/are rejected under the same rationale as respective claims 4.

Claims 18 is/are rejected under 35 U.S.C. 103 as being unpatentable over Ooi et al. ("SINGA: A Distributed Deep Learning Platform"; Hereinafter "Ooi") in view of in view of Parkhi et al. (Cats and Dogs, hereinafter "Parkhi") and Han et al. ("Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding", Hereinafter "Han"), and further in view of Cotter et al. (Better Mini-Batch Algorithms via Accelerated Gradient Methods, hereinafter Cotter).

Regarding Claim 18,
	The Ooi/Parkhi/Han combination teaches the computer-implemented method of claim 1.  The Ooi/Parkhi/Han combination does not explicitly teach wherein the first consolidated set of gradients is an average of multiple gradients generated by the first machine.

	Cotter teaches wherein the first consolidated set of gradients is an average of multiple gradients generated by the first machine ("A popular way to speed-up these algorithms, especially in a parallel setting, is via mini-batching, where the incremental update is performed on an average of the subgradients with respect to several instances at a time, rather than a single instance." sec. 1, p. 1).
	Ooi and Cotter are analogous art because both are  directed to machine learning using batches.  It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the batch training of the Ooi/Parkhi/Han combination with the mini-batch averaging of Cotter.  The modification would have been obvious because one of ordinary skill in the art would be motivated to speed up training, especially in a parallel setting, as suggested by Cotter (sec. 1, p. 1).

Claim 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Ooi et al. ("SINGA: A Distributed Deep Learning Platform"; Hereinafter "Ooi") in view of Parkhi et al. (Cats and Dogs, hereinafter "Parkhi") and Han et al. ("Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding", Hereinafter "Han"), and further in view of Georgescu et al. (US 2016/0174902; Hereinafter "Georgescu").

Regarding Claim 20,
	The Ooi/Parkhi/Han combination teaches the computer-implemented method of claim 1.  Ooi does not explicitly teach determining that the one or more operands, used 

	Georgescu teaches determining that the one or more operands, used in the algorithm that models the entity type, comprise a first operand and a second operand; determining that results of any operations that uses a multiplication operator is inconsequential to modeling the first entity type; determining that the first operand is multiplied by the second operand in the first algorithm; and in response to determining that the first operand is multiplied by the second operand, applying a weight that approaches zero to a product of the first operand and the second operand, wherein applying the weight that approaches zero to the first operand removes the product of the first operand and the second operand from the algorithm before executing the algorithm ("As shown in FIG. 15, at 1502, a deep neural network is pre-trained.... The sparsity injection method... is applied over a preset number of T training rounds (stages).  In each round, at 1506, from the active weights of the considered filters, a percentage of active weights having the smallest absolute values are greedily selected and set to zero to permanently remove the corresponding neural connections from the deep neural network." [0096]; this teaches removing values that are inconsequential, Examiner notes that it would be obvious to one of ordinary skill in the art that this includes terms that are multiplied together; "This re-weighting scheme reduces the effect of the L1 norm in the adapted objective function by multiplying each weight in the L1 norm with a term approximating the inverse of its magnitude. The re-weighting of the L1 norm makes the regularization look more like L0 norm regularization, and drives a large number of weights that are less relevant to the final classification result to zero." [0099]; this teaches applying a weight to the inconsequential terms to remove the terms from the algorithm).
	Ooi and Georgescu are analogous art because both are directed to training deep neural networks.  It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the distributed training of the Ooi/Parkhi/Han combination with the sparse deep neural network training of Georgescu.  The modification would have been obvious because one of ordinary skill in the art would be motivated by the computational efficiency, as suggested by Georgescu ([0037]). 

Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to CHARLES C KUO whose telephone number is (571)270-7477.  The examiner can normally be reached on M-F: 9:00 a.m. - 6:00 p.m..
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ann Lo can be reached on (571) 272-9767.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer 






/CHARLES C KUO/Examiner, Art Unit 2126      
/ANN J LO/Supervisory Patent Examiner, Art Unit 2126