DETAILED ACTION
This action is in response to the application 15/945647 filed on April 04, 2018. Claims 1-20 are pending and have been examined.
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Objections
3.	Claim 17 is objected to because of the following informalities:
In claim 17, “wherein” should read “wherein the method further comprises” as in claim 16. 
Appropriate correction is required.
Claim Rejections - 35 USC § 112
4.	The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


5.	Claim 16 is rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.

Claim Rejections - 35 USC § 101
7.	35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


8.	Claims 1, 5, 8, 12, 15, and 19 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 analysis:
In the instant case, claims 1 and 5 are directed to a system, claims 8 and 12 are directed to a method, and claims 15 and 19 are directed to a non-transitory computer-readable medium. Thus, the claims fall within one of the four statutory categories (i.e., process, machine, manufacture, or composition of matter).
Step 2 analysis:
Step 2A: Prong 1 analysis:
The claim(s) recite(s):
Claim 1, 8, 15:
-        identify a plurality of work servers … (mental process); 
-     processing samples vial a model of DNN (mental process);
-       determine differences in the GPU processing powers between the work servers… (mathematical concept);

Accordingly, the claims recite an abstract idea which is one of the judicial exceptions.
Step 2A: Prong 2 analysis:
This judicial exception is not integrated into a practical application because the additional element “receive information indicating GPU processing powers…” in claims 1, 8, and 15 is mere data gathering which is an insignificant extra solution to the judicial exception, as discussed in MPEP 2106.05(g). Accordingly, the use of additional elements “Deep Neural Network”, “work servers”, and “training data” in claims 1, 8, and 15 are generally linking the use of judicial exception to a particular technological environment of field of use (ex: Neural Networks) as discussed in MPEP 2106.05(h). The additional elements – “GPU” in claims 1, 8, and 15; “memory” in claim 1; “computer readable medium” and “processor” recited in claim 15 are recited at high-level of generality such that they amount to no more than mere instructions to apply the exception using a generic computer component. Please see MPEP 2106.04.(a)(2).III.C. The claims do not recite additional elements that integrate the judicial exception into a practical application. The claims are directed to an abstract idea.
Step 2B analysis:
The claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements of claims 1,8, and 15 are merely adding insignificant extra-solution activity to the judicial exception and generally linking the use of judicial exception to a particular technological environment or field of use. The receiving step is an insignificant extra-solution activity that is a well understood, routine, and 
9.	Claims 5, 12, and 19 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more, and the rejection of claims 1, 8, and 15 are incorporated into the claims 5, 12, and 19, respectively. Claims 5, 12, and 19 recite more specifics to the judicial exceptions identified in the rejection of claims 1, 8, and 15, respectively. Determine an integer number of epochs of training… and allocate the samples among the work servers… are more specific to the judicial exceptions in claims 1, 8, and 15, respectively. The limitations are an abstract idea of the “mental process” grouping. The additional elements “Deep Neural Network” and “work servers” are generally linking the use of judicial exception to a particular technological environment of field of use (ex: Neural Networks) as discussed in MPEP 2106.05(h). Claims 5, 12, and 19 do not recite any other additional elements, than the ones recited in claims 1, 8, and 15, which integrate the judicial exception into a practical application or amount to significantly more. The claims are not patent eligible.
Claim Rejections - 35 USC § 103
10.	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

11.	Claims 1-4, 6, 8-11, 13, 15-18, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over “Model Accuracy and Runtime Tradeoff in Distributed Deep Learning: A Systematic Study” to reference Gupta et al., (hereinafter, “Gupta”), in view of US 20170091668 A1 to reference Kadav et al. (hereinafter, “Kadav”).
12.	As per claim 1, Gupta teaches: A system comprising: a memory configured to store samples of training data for a Deep Neural Network (DNN) (Gupta pg. 174 right col. para. 3 “this system contains… 128GB of memory);
	a distributor configured to identify a plurality of work servers provisioned for training the DNN by processing the samples via a model of the DNN (Gupta on pg. 172 discloses learners (work servers) that uses data parallelism and mini-batch to train model of deep neural networks across multiple learners). 
to receive information indicating Graphics Processing Unit (GPU) processing powers at the work servers (Gupta on pg. 171 right col. para. 1 discloses “improve the neural network’s performance (measured as classification accuracy) while working under the constraints of the computational resources available in a single computing node (CPU with or without GPU acceleration)”). Gupta fails to explicitly teach to determine differences in… processing powers between the work servers based on the information and to allocate the samples among the work servers based on the differences. However, Kadav teaches: 
	to determine differences in… processing powers between the work servers based on the information (Kadav in claim 6 has to measure the computation capability of the processors in order to dynamically adjust the communication “batch size to balance computation and communication overhead and ensuring convergence even with a mismatch in processing abilities on different computer learning nodes.”)
to allocate the samples among the work servers based on the differences (Kadav on pg. 2 para. 1 allocates updates by first preferring connections to machines with higher throughput compared to machines with lower throughput). 
Therefore, it would have been obvious to one of ordinary skill in the art before the filing date of the claimed invention to modify the teachings of Gupta and incorporate the teachings to Kadav with a motivation to determine differences in GPU processing powers and to allocate samples to servers based on the differences. One would be motivated to use the combination because Gupta teaches GPU processing which can be combined with Kadav to determine difference in processing to select machines with higher throughput (Kadav pg. 4 last para.). Machines with higher throughput are preferred because of efficiency, faster training, low running costs, and improved model accuracy (Kadav pg. 2 para. 2). 
13.	As per claim 2, the combination of Gupta and Kadav as shown above teaches the system of claim 1, wherein:
	the distributor is configured to determine throughputs of the work servers (Gupta on pg. 176 right col. para. 2 discloses “Reducing the mini-batch size cause a proportionate decrease in the GEMM throughput and slower processing of the mini-batch by the learner”),
	to select batch sizes for the work servers based on the throughputs (Kadav on claim 13 discloses “adjusting the communication batch sizes to automatically balance processor and network loads.” Kadav is selecting the batch sizes based on machines with different network throughputs and then prefers network with higher throughput as disclosed in pg. 2 para. 1]),
	to report the batch sizes to the work servers (Gupta on pg. 178 discloses “reduce the mini-batch size as the number of learners is increased”);
	each batch size defines a number of the samples in a batch for simultaneous processing by one of the work servers (Gupta on figure 5 and pg. 176 last para. discloses “the contour labeled μ = 128 is the configurations with the mini-batch size per learner is kept constant at 128” where the number of learners are from 1 to 30).
	As per claim 3, the combination of Gupta and Kadav as shown above teaches the system of claim 2, wherein: 
the distributor is configured to dynamically determine the throughputs (Gupta on pg. 176 right col. para. 2 discloses “Reducing the mini-batch size cause a proportionate decrease in the GEMM throughput”),
to adjust the batch sizes based on the throughputs during training of the DNN (Kadav on claim 6 discloses “dynamically adjusting the communication batch size to balance computation and communication overhead and ensuring convergence even with a mismatch in processing abilities on different computer learning nodes”).
As per claim 4, the combination of Gupta and Kadav as shown above teaches the system of claim 2, further comprising:
at least one revision element configured to receive input from one of the work servers upon completion of processing of a batch of the samples at the one of the work servers (Gupta on pg. 175 para. 2 inputs an image with a batch size of 256 as training data for deep convolutional neural network),
to determine adjustments to the DNN based on the input (Gupta on pg. 171 last para – pg. 172 para. 1 teaches, “A neural network computes a parametric, non-linear transformation fθ : X → Y, where θ represents a set of adjustable parameters (or weights). In a supervised fθ (X), corresponding to the input X”), and
to report the adjustments to the work servers for updating the model of the DNN (Gupta on pg. 172 left col. teaches “pushGradient: Send the computed gradients to the parameter server” that sends the adjustments to the learners.)
As per claim 6 the combination of Gupta and Kadav as shown above teaches the system of claim 1 wherein:
at least one of the samples comprises an image (Gupta on pg. 175 left col. para. 2 discloses “The training set is a subset of the hand-labeled ImageNet database and contains 1.2 million images”).
14.	As per claim 8, Gupta teaches A method comprising:
	identifying a plurality of work servers provisioned for training a Deep Neural Network (DNN) by processing samples of training data via a model of the DNN (Gupta on pg. 172 discloses learners (work servers) that uses data parallelism and mini-batch to train model of deep neural networks across multiple learners). 
receiving information indicating Graphics Processing Unit (GPU) processing powers at the work servers (Gupta on pg. 171 right col. para. 1 discloses “improve the neural network’s performance (measured as classification accuracy) while working under the constraints of the computational resources available in a single computing node (CPU with or without GPU acceleration)”). Gupta fails to explicitly teach determining differences in… processing powers between the work servers based on the information and allocating the samples among the work servers based on the differences. However, Kadav teaches: 
	determining differences in… processing powers between the work servers based on the information (Kadav in claim 6 has to measure the computation capability of the processors in order to dynamically adjust the communication “batch size to balance computation and communication overhead and ensuring convergence even with a mismatch in processing abilities on different computer learning nodes.”)
	allocating the samples among the work servers based on the differences (Kadav in pg . 2 para. 1 allocates updates by first preferring connections to machines with higher throughput compared to machines with lower throughput). 
Therefore, it would have been obvious to one of ordinary skill in the art before the filing date of the claimed invention to modify the teachings of Gupta and incorporate the teachings to Kadav with a motivation to determine differences in GPU processing powers and to allocate samples to servers based on the differences. One would be motivated to use the combination because Gupta teaches GPU processing which can be combined with Kadav to determine difference in processing to select machines with higher throughput (Kadav pg. 4 last para.). Machines with higher throughput are preferred because of efficiency, faster training, low running costs, and improved model accuracy (Kadav pg. 2 para. 2). 
15.	As per claim 9, the combination of Gupta and Kadav as shown above teaches the method of claim 8, further comprising:
	determining throughputs of the work servers (Gupta on pg. 176 right col. para. 2 discloses “Reducing the mini-batch size cause a proportionate decrease in the GEMM throughput and slower processing of the mini-batch by the learner”),
	selecting batch sizes for the work servers based on the throughputs (Kadav on claim 13 discloses “adjusting the communication batch sizes to automatically balance processor and network loads.” Kadav is selecting the batch sizes based on machines with different network throughputs and then prefers network with higher throughput as disclosed in para. [0005]),
	reporting the batch sizes to the work servers (Gupta on pg. 178 discloses “reduce the mini-batch size as the number of learners is increased”); wherein
	each batch size defines a number of the samples in a batch for simultaneous processing by one of the work servers (Gupta on figure 5 and pg. 176 last para. discloses “the contour labeled μ = 128 is the configurations with the mini-batch size per learner is kept constant at 128” where the number of learners are from 1 to 30).
	As per claim 10, the combination of Gupta and Kadav as shown above teaches the method of claim 9 further comprising: 
dynamically determining the throughputs (Gupta on pg. 176 right col. para. 2 discloses “Reducing the mini-batch size cause a proportionate decrease in the GEMM throughput”),
adjusting the batch sizes based on the throughputs during training of the DNN (Kadav on claim 6 discloses “dynamically adjusting the communication batch size to balance computation and communication overhead and ensuring convergence even with a mismatch in processing abilities on different computer learning nodes”).
claim 11, the combination of Gupta and Kadav as shown above teaches the method of claim 9, further comprising:
receiving input from one of the work servers upon completion of processing of a batch of the samples at the one of the work servers (Gupta on pg. 175 para. 2 inputs an image with a batch size of 256 as training data for deep convolutional neural network),
determining adjustments to the DNN based on the input (Gupta on pg. 171 last para – pg. 172 para. 1 teaches, “A neural network computes a parametric, non-linear transformation fθ : X → Y, where θ represents a set of adjustable parameters (or weights). In a supervised learning context (such as image classification), X is the input image and Y corresponds to the label assigned to the image...  The terminal layer generates the network’s output Ŷ = fθ (X), corresponding to the input X”), and
reporting the adjustments to the work servers for updating the model of the DNN (Gupta on pg. 172 left col. teaches “pushGradient: Send the computed gradients to the parameter server” that sends the adjustments to the learners.)
As per claim 13, the combination of Gupta and Kadav as shown above teaches the method of claim 8 wherein:
at least one of the samples comprises an image (Gupta on pg. 175 left col. para. 2 discloses “The training set is a subset of the hand-labeled ImageNet database and contains 1.2 million images”).
16.	As per claim 15, Gupta teaches A non-transitory computer readable medium embodying programmed instructions which when executed by a processor, are operable for performing a method comprising:
	identifying a plurality of work servers provisioned for training a Deep Neural Network (DNN) by processing samples of training data via a model of the DNN (Gupta on pg. 172 discloses learners (work servers) that uses data parallelism and mini-batch to train model of deep neural networks across multiple learners). 
receiving information indicating Graphics Processing Unit (GPU) processing powers at the work servers (Gupta on pg. 171 right col. para. 1 discloses “improve the neural network’s performance (measured as classification accuracy) while working under the constraints of the computational resources available in a single computing node (CPU with or without GPU acceleration)”). Gupta fails to explicitly teach determining differences in… processing powers between the work servers based on the information and allocating the samples among the work servers based on the differences. However, Kadav teaches: 
	determining differences in… processing powers between the work servers based on the information (Kadav in claim 6 has to measure the computation capability of the processors in order to dynamically adjust the communication “batch size to balance computation and communication overhead and ensuring convergence even with a mismatch in processing abilities on different computer learning nodes.”)
	allocating the samples among the work servers based on the differences (Kadav in pg. 2 para. 1 allocates updates by first preferring connections to machines with higher throughput compared to machines with lower throughput). 
Therefore, it would have been obvious to one of ordinary skill in the art before the filing date of the claimed invention to modify the teachings of Gupta and incorporate the teachings to Kadav with a motivation to determine differences in GPU processing powers and to allocate 
17.	As per claim 16, the combination of Gupta and Kadav as shown above teaches the medium of claim 15 wherein the method further comprises:
	determining throughputs of the work servers (Gupta on pg. 176 right col. para. 2 discloses “Reducing the mini-batch size cause a proportionate decrease in the GEMM throughput and slower processing of the mini-batch by the learner”),
	selecting batch sizes for the work servers based on the throughputs (Kadav on claim 13 discloses “adjusting the communication batch sizes to automatically balance processor and network loads.” Kadav is selecting the batch sizes based on machines with different network throughputs and then prefers network with higher throughput as disclosed in para. [0005]),
	reporting the batch sizes to the work servers (Gupta on pg. 178 discloses “reduce the mini-batch size as the number of learners is increased”); wherein
	each batch size defines a number of the samples in a batch for simultaneous processing by one of the work servers (Gupta on figure 5 and pg. 176 last para. discloses “the contour labeled μ = 128 is the configurations with the mini-batch size per learner is kept constant at 128” where the number of learners are from 1 to 30).
	As per claim 17, the combination of Gupta and Kadav as shown above teaches the medium of claim 16 wherein: 
dynamically determining the throughputs (Gupta on pg. 176 right col. para. 2 discloses “Reducing the mini-batch size cause a proportionate decrease in the GEMM throughput”),
adjusting the batch sizes based on the throughputs during training of the DNN (Kadav on claim 6 discloses “dynamically adjusting the communication batch size to balance computation and communication overhead and ensuring convergence even with a mismatch in processing abilities on different computer learning nodes”).
As per claim 18, the combination of Gupta and Kadav as shown above teaches the medium of claim 16 wherein the method further comprises:
receiving input from one of the work servers upon completion of processing of a batch of the samples at the one of the work servers (Gupta on pg. 175 para. 2 inputs an image with a batch size of 256 as training data for deep convolutional neural network),
determining adjustments to the DNN based on the input (Gupta on pg. 171 last para – pg. 172 para. 1 teaches, “A neural network computes a parametric, non-linear transformation fθ : X → Y, where θ represents a set of adjustable parameters (or weights). In a supervised learning context (such as image classification), X is the input image and Y corresponds to the label assigned to the image...  The terminal layer generates the network’s output Ŷ = fθ (X), corresponding to the input X”), and
reporting the adjustments to the work servers for updating the model of the DNN (Gupta on pg. 172 left col. teaches “pushGradient: Send the computed gradients to the parameter server” that sends the adjustments to the learners.)
As per claim 20, the combination of Gupta and Kadav as shown above teaches the medium of claim 15 wherein:
at least one of the samples comprises an image (Gupta on pg. 175 left col. para. 2 discloses “The training set is a subset of the hand-labeled ImageNet database and contains 1.2 million images”).
18.	Claims 5, 12, and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Gupta, in view of Kadav further in view of US 20190095794 A1 to reference Aldana et al. (hereinafter, “Aldana”). 
As per claims 5, 12, and 19 the combination of Gupta and Kadav teaches the system of claim 1, the method of claim 8, and the medium of claim 15 respectively. Gupta and Kadav fail to explicitly teach determine an integer number of epochs of training to perform on the DNN; and allocate the samples among the work servers so that the integer number of the epochs will be completed but not exceeded during training. However, Aldana teaches:
determine an integer number of epochs of training to perform on the DNN (Aldana on para. [0063] discloses an epoch counter that “Upon reaching the maximum number of desired epochs, the neural network should be sufficiently trained and have reached stability”);
allocate the samples among the work servers so that the integer number of the epochs will be completed but not exceeded during training (Aldana on para. [0063] discloses that if the epoch counter meets or exceeds the maximum number of desired epochs the example training process will terminate).
Therefore, it would have been obvious to one of ordinary skill in the art before the filing date of the claimed invention to modify the teachings of Gupta and Kadav and incorporate the teachings to Aldana with a motivation to determine number and threshold of epochs for training. One would be motivated to use the combination because training DNN by determining .
19.	Claims 7 and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Gupta, in view of Kadav further in view of US 20150324690 A1 to reference Chilimbi et al. (hereinafter, “Chilimbi”).
As per claims 7 and 14 the combination of Gupta and Kadav teaches the system of claim 1 and the method of claim 8 respectively. Gupta and Kadav fail to explicitly teach at least one of the samples comprises a sound file. However, Chilimbi teaches 
	at least one of the samples comprises a sound file (Chilimbi on para. [0026] discloses using distributed neural network processing for speech recognition).
	Therefore, it would have been obvious to one of ordinary skill in the art before the filing date of the claimed invention to modify the teachings of Gupta and Kadav and incorporate the teachings to Chilimbi with a motivation to include sound files for training. One would be motivated to use the combination because using speech and/or visual object recognition (image recognition as taught in Gupta), text processing, and other tasks is used as training input for convolutional and fully-connected network layers (Chilimbi para. [0026]).
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to RAHUL GURUNG whose telephone number is (571) 272-8406. The examiner can normally be reached on 8:30 am to 3:30 pm from Mondays to Thursdays.

Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://portal.uspto.gov/external/portal. Should you have questions about access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
/RAHUL GURUNG/Examiner, Art Unit 2122                                                                                                                                                                                                        
/ERIC NILSSON/Primary Examiner, Art Unit 2122