DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendment
Acknowledgement is made of Applicant's claim amendments on 12/30/2021. The claim amendments are entered. Presently, claims 1-3, 5, 8-11, 14, 16, 19, 21, 23, 24, and 26 remain pending. Claims 1, 11, and 16 are amended. Claims 4, 6, 7, 12, 13, 15, 17, 18, 20, 22, and 25 have previously been cancelled. 

Response to Arguments
Applicant's arguments filed on 12/30/2021 have been fully considered but they are not persuasive.

Applicant presents a summary of Chilimbi and argues that it allegedly does not teach the amended claim limitations because the idea of a parameter data is an allegedly different concept since there is allegedly no correlation between the parameter data and the memory and compute budgets in Chilimbi (Applicant’s reply pgs. 7-11). These arguments are not persuasive. Memory and compute budgets denote examples of finite resources which can be considered when operating neural networks to achieve some desired outcome, e.g. for optimization purposes, or to improve performance, etc. In order to achieve this outcome while accounting for the limited finite resources (i.e. optimizing such resources), tweaks can be made to the neural network, e.g. Chilimbi teaches examples of finite resources such as memory and speed metrics (i.e. compute budgets) in correlation with parameter values, e.g. weights or activation values as shown in the updated mapping below. That is, Chilimbi explicitly teaches that the memory optimizations and computation speed metrics/budgets are related to parameter data as shown below. Thus, contrary to Applicant’s argument, the concept of parameter data is actually related to memory and compute speed metrics/budgets and Chilimbi does teach such a correlation as shown below.

Applicant also provides an overview of Shi and argues that it in conjunction with Chilimbi allegedly does not teach the various claim limitations (Applicant’s reply pgs. 12-13). This is not persuasive because Shi in conjunction with Chilimbi does teach the amended limitations as shown in the updated mapping below. 

Similarly, for the dependent claims, Applicant also provides a summary of Lin and Chen and argues that these references in conjunction with Chilimbi and Shi allegedly do not teach the various claim limitations (Applicant’s reply pgs. 13-15). This is not persuasive. As previously explained above, Chilimbi and Shi does teach the claim limitations, thus the combination of these references with Lin and Chen does teach the various claim limitations. 

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the 
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claims 1-3, 5, 8, 11, 14, 16, 19, 21, 23, 24, and 26 are rejected under 35 U.S.C. 103 as being unpatentable over Chilimbi et. al. (U.S. Pat. App. Pre-Grant Pub. No. 2015/0324690, hereinafter Chilimbi) in view of Shi et. al. (U.S. Pat. App. Pre-Grant Pub. No. 2017/0270408, hereinafter Shi).

Regarding claim 1, Chilimbi teaches:
An apparatus to facilitate neural network training, comprising ([0026]: describing “a scalable distributed deep learning training system comprised of commodity servers to train large neural network models for providing training input to model training machines”): 
a memory to store data, including data for neural network training ([0050]: describing memory and computer storage media); and
one or more processors including a graphics processing unit GPUs to perform neural network operations including performing training to generate a trained neural network ([0048]: describing that a “[p]rocessing unit(s) 612 and can represent, for example, a CPU-type processing unit, a GPU-type processing unit…” Wherein the processing units can direct a plurality of servers and other machines for neural network operations, e.g. training (Figs. 6 and 7 and [0052]). That is, “FIG. 7 is a diagram illustrating the system for deep learning training as described in FIG. 6 with more detail, including partitioning models across training machines. Data servers 702 may be any of the servers 610 in FIG. 6. The data servers 702 may be leveraged for fast data serving as described below. Replicas 704A-704N represent groups of computing devices or machines. Machines 1-M may be any of the machines 610 in FIG. 6. Each of the replicas 704A-704N may train a same (but duplicate) model.” ([0052]). Whereby the various neural network operations can comprise model training, multi-threaded training, and fast weight updates for training ([0056]-[0058]).);
wherein performing training of a neural network includes the GPU to:  
receive one or more network constraints to be applied in processing for the neural network ([0076]-[0079]: describing receiving batch of data that can include parameter data, e.g. weights, error terms, activation values, etc. The parameter data being accessed and used/applied in training the neural network ([0057]-[0058] and [0065]-[0066]). Wherein the parameter data correlates with computational constraints of the neural network (NN) by optimizing the available resources and NN’s performance ([0061]-[0062] and [0065]-[0066]).), 
the one or more network constraints including memory and compute budget constraints to be applied in processing for operation of the neural network ([0059] and [0061]-[0063]: describing application of memory system optimizations and mitigation impact of slow machines. See also [0053], [0055], and [0068]: describing a determination and an application for fast handling and training vast amounts of data, as well as throughout optimizations. Wherein the fast handling/speed of the machines denote a compute budget. See also [0028]-[0029]: describing the network parameters being related to nodal connections via weights and activation values that can be optimized/constrained for optimal operation of the NN as computed via error values. The network parameters are accessed and used/applied as previously described above as part of the process to achieve the fast handling and memory system optimization constraints.);
…
([0056]-[0058]: describing various training of the machine learning models. Wherein the “deep learning training module” can operate in conjunction with the global parameter server(s) to train the plurality of machine learning models. Thus, resulting in an updated machine learning models with updated set of network parameter values, e.g. updated weights, updated activation values, etc. ([0036] and [0076]-[0080]).), 
including enforcing the one or more network constraints in the generation of the trained neural network ([0075]-[0080]: describing the process by which the “deep learning training module” can receive data to continuously learn and update the models to ensure the models operate optimally by operating below a predetermined error threshold. Wherein this process is performed in conjunction with the various model training machines and parameter server (Figs. 6 and 7 and as cited above). The error threshold being computed in relation to a backward propagation of error terms and weight updates ([0034]-[0036]). That is, the network constraints being related to nodal connections via weights and activation values ([0028]-[0029]).).

While the cited reference Chilimbi teaches the above limitations of claim 1, it does not explicitly teach: 
“determine initial data points for the neural network and initial parameter precision values for nodes within the neural network based at least in part on the one or more network constraints, the initial data points including initial network parameters [[for]] to be applied in a network layout of the neural network; perform training of the neural network utilizing a training routine based at least in part on received training data and the initial data points, wherein the Shi discloses the claim limitations, teaching: 
“determine initial data points for the neural network and initial parameter precision values for nodes within the neural network based at least in part on the one or more network constraints (Shi [0031]-[0035] and [0070]: describing initial parameter values, e.g. weights with a certain precision bit-length in the neural network (NN) in correlation with hardware complexity cost and accuracy cost, i.e. network constraints, associated with the various weights in the NN. Wherein these costs can comprise network constraints. The NN comprising various nodes with weights being applied as a connection between the nodes ([0030], [0054], and [0092]).),
the initial data points including initial network parameters [[for]] to be applied in a network layout of the neural network (Shi [0082]-[0084]: describing a layout of the NN based on the network parameters, e.g. weight configuration. That is, the weights are being applied into the NN via its layout (Shi [0034], [0054], and [0060]). Wherein the weight configurations and its initial values were previously cited above.); 
perform training of the neural network utilizing a training routine based at least in part on received training data and the initial data points (Shi [0031]: describing that data received by the neural network can comprise initial weight parameter values and “[t]raining data 34 [that] is input to neural network 36 when training routines 46 are executed”. See also Shi [0034] and [0092]: describing training routines.), 
(Shi [0031] and [0034]: describing initial weight parameter values that can be optimized via training to obtain updated weight values in correlation with an accuracy cost. See also [0054]-[0060]: describing the accuracy consideration of the weight parameters as part of training to determine optimized parameters in further details.); 
Attorney Docket No.: P116244-4- Application Filed: April 24, 2017Application No.: 15/494,826adjust the parameter precision values for the neural network utilizing at least the one or more network constraints and the accuracy data generated for the neural network training (Shi [0030]: describing that the NN can be adjusted for optimal operation. Wherein such adjustments can be made via bit-depth adjustments to the parameters of the NN due to network constraints, e.g. storage/memory, and accuracy determinations to ensure an optimal performance of the NN ([0034]-[0035]). That is, adjustments to a bit-depth of weight can be performed, wherein a bit-depth can represent a level of precision in the weights ([0047], [0059], [0061], and [0063]-[0064]).); and
… including a best network layout (Shi [0082]-[0084] and [0086]: describing a layout of the NN based on an optimized bit-width of the NN parameters, e.g. weights.)….
Thus, it would have been obvious to Person Having Ordinary Skill in the Art (PHOSITA) before the effective filing date (EFD) to modify the apparatus for training the neural network in the cited reference to include the initial data points and optimizations in Shi. Doing so would enable “A bit-depth optimization engine reduces the hardware cost of a neural network. When training data is applied to a neural network during training routines, accuracy cost and hardware costs are generated. A hardware complexity cost generator generates costs for weights near bit-depth steps where the number of binary bits required to represent a weight decreases, such as from 2N to 2N−1, where one less binary bit is required…. The selected weights are reduced during optimization. Over many cycles of optimization, a low-bit-depth neural network is generated that uses fewer binary bits per weight, resulting in lower hardware costs when the low-bit-depth neural network….” (Shi Abstract). 

Regarding claim 2, Chilimbi teaches:
The apparatus of claim 1, wherein performing neural network training includes the GPU to determine an optimum neural network based on the one or more network constraints ([0036] and [0080]: describing that the trained/modified model is one that has been optimized to have an error value below a predetermined threshold based on constraints of network parameters, e.g. weights. Wherein the servers for training the machine learning models can include processing units comprising of GPU-type processing units ([0026], [0046] and [0048]).).

Regarding claim 3, the rejection of claim 1 is incorporated. Shi further teaches: 
The apparatus of claim 1, wherein the initial data points further training parameters for the neural network (Shi [0031] and [0054]: describing training data that can comprise initial parameter data for training, e.g. initial weight values in a neural network. Wherein the training and optimization process can be performed several times with varying initial parameter conditions ([0093]).).”
Thus, it would have been obvious to Person Having Ordinary Skill in the Art (PHOSITA) before the effective filing date (EFD) to modify the apparatus for training the neural network in Shi. A motivation to combine the cited reference with Shi was previously given.

Regarding claim 5, the rejection of claim 1 is incorporated. Chilimbi teaches:
The apparatus of claim 1, wherein performing training further includes the GPU to: 
([0079]-[0080]: “the deep learning training module 616 may calculate a model prediction error based at least in part on the updated individual weight values and the new updated weight values. The deep learning training module 616 may process subsequent batches of data items by repeating process 1100 until the model prediction error converges to a value below a predetermined threshold”. Wherein the prediction error can denote an accuracy. See also [0036] and [0063]: describing the validation in correlation with the model prediction and error.).

Regarding claim 8, the rejection of claim 1 is incorporated. Chilimbi teaches:
The apparatus of claim 1, wherein the network parameters comprise one or more of network depth ([0028], [0031], and [0056]: describing various neural network layers which can denote a depth of a neural network), number of nodes in each layer ([0028]-[0035] and [0056]: describing neurons in the various layers of the neural network), convolution dimensions, stride and padding numbers, activation functions at each layer, and pooling layer properties. 


Regarding independent claim 11, claim 11 is substantially similar to independent claim 1 and therefore is rejected on the same grounds as claim 1. Claim 11 is a method claim that corresponds to apparatus claim 1. 

Regarding claim 14, claim 14 is substantially similar to claim 5 and therefore is rejected on the same ground as claim 5. Claim 14 is a method claim that corresponds to apparatus claim 5. 

Regarding independent claim 16, claim 16 is substantially similar to independent claim 1 and therefore is rejected on the same grounds as claim 1. Claim 16 is a medium claim that corresponds to apparatus claim 1. A mapping is shown below for the limitations of claim 16 that differ from claim 1. Chilimbi teaches:
At least one computer readable medium having instructions, which when executed by one or more processors, cause the processors to ([0051] and [0111]: describing that “[o]ne or more computer-readable storage media encoded with instructions that, when executed by a processor”.): ….

Regarding claim 19, claim 19 is substantially similar to claim 5 and therefore is rejected on the same ground as those claims. Claim 19 is a medium claim that corresponds to apparatus claim 5.

Regarding claim 21, claim 21 is substantially similar to claim 3 and therefore is rejected on the same ground as claim 3. Claim 21 is a media claim that corresponds to apparatus claim 3
Regarding claim 23, claim 23 is substantially similar to claim 8 and therefore is rejected on the same ground as claim 8. Claim 23 is a media claim that corresponds to apparatus claim 8.

Regarding claim 24, claim 24 is substantially similar to claim 3 and therefore is rejected on the same ground as claim 3. Claim 24 is a method claim that corresponds to apparatus claim 3.

Regarding claim 26, claim 26 is substantially similar to claim 8 and therefore is rejected on the same ground as claim 8. Claim 26 is a method claim that corresponds to apparatus claim 8.

Claim 9 is rejected under 35 U.S.C. 103 as being unpatentable over Chilimbi et. al. (U.S. Pat. App. Pre-Grant Pub. No. 2015/0324690, hereinafter Chilimbi) and Shi et. al. (U.S. Pat. App. Pre-Grant Pub. No. 2017/0270408, hereinafter Shi) in view of Lin et. al. (U.S. Pat. App. Pre-Grant Pub. No. 2016/0328644, hereinafter Lin).

Regarding claim 9, the rejection of claim 1 is incorporated. The cited references in combination do not explicitly teach: “wherein the GPU is further to compress data bits representing weights associated with connections in the neural network”. Lin discloses the claim limitations, teaching: compression of a neural network by “fine-tun[ing] to adjust the weight values of the compressed and uncompressed layers. Fine-tuning recaptures the loss in classification accuracy due to compression. The compression parameters can be chosen to satisfy the requirements of system resources and performance specifications”. (Lin [0069] and [0073]-[0074]). Wherein the neural network and its operations are loaded onto a GPU (Lin [0027]-[0028] and [0047]).
Thus, it would have been obvious to Person Having Ordinary Skill in the Art (PHOSITA) before the effective filing date (EFD) to modify the apparatus for training the neural network with the initial data points and optimizations in the combined cited references to include the compression logic for weights in Lin. Doing so would enable “[a] method of adaptively selecting a configuration for a machine learning process includes determining current system resources and performance specifications of a current system. A new configuration for the machine learning process is determined based at least in part on the current system resources and the performance specifications. The method also includes dynamically selecting between a current configuration and the new configuration based at least in part on the current system resources and the performance specifications.” (Lin Abstract).

Claim 10 is rejected under 35 U.S.C. 103 as being unpatentable over Chilimbi et. al. (U.S. Pat. App. Pre-Grant Pub. No. 2015/0324690, hereinafter Chilimbi) and Shi et. al. (U.S. Pat. App. Pre-Grant Pub. No. 2017/0270408, hereinafter Shi) in view of Lin et. al. (U.S. Pat. App. Pre-Grant Pub. No. 2016/0328644, hereinafter Lin) and Chen et. al. (U.S. Pat. App. Pre-Grant Pub. No. 2018/0330239, hereinafter Chen).

Regarding claim 10, the rejection of claim 9 is incorporated. While the cited references in combination teach the limitation “further includes the GPU” as shown above, they do not explicitly teach: “wherein compressing the data bits … to execute an operation to load/store consecutive bit values”. Chen discloses the claim limitations, teaching: “compression coding for a neural network” (Chen [0018] and [0019]) wherein the coding comprises instructions enabling “[t]he input data and/or the weight values may be transmitted to and temporarily stored in a neuron data cache 212 and/or a weight cache 214. The neuron data cache 212 may be further configured to store first input neuron data, first output neuron data, output gradients, and input gradients for different layers of operations in a neural network.” (Chen [0019]). Wherein the weights values comprises of various bits in a particular order and include a first and second weight value (Chen [0029] and [0034]-[0038]).  
Thus, it would have been obvious to Person Having Ordinary Skill in the Art (PHOSITA) before the effective filing date (EFD) to modify the apparatus for training the neural network with the initial data points and optimizations in the combined cited references to include the compression logic for load/store in Chen. Doing so would enable “[a] compression coding apparatus for artificial neural network, including memory interface unit, instruction cache, controller unit and computing unit, wherein the computing unit is configured to perform corresponding operation to data from the memory interface unit according to instructions of controller unit” (Chen Abstract). Wherein the coding instructions comprise steps enabling “weight values [composed of bits to] be retrieved from the memory 108 and stored on the neural network processor 206 during the processes” (Chen [0018]). 
Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after 

The prior art made of record and not relied upon is considered pertinent to Applicant's disclosure:
Qiu et. al. “Going Deeper with Embedded FPGA Platform for Convolutional Neural Network”: describing implementation of a convolutional neural network (CNN) on a field-programmable gate array (FGPA) platform. The FPGA has a limited on-chip memory and limited bandwidth, thus resulting in resource constraints when implementing the CNN. Optimization of the CNN to meet the resource constraint is achieved via applying singular value decomposition (SVD) to the weight matrix of a particular fully connect (FC) layer of the CNN and saving weights of the FC layer. 
Han et. al. “An Approximation-Based Execution Framework for Deep Stream Processing Under Resource Constraints”: 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SELENE A HAEDI whose telephone number is (571)270-5762.  The examiner can normally be reached on M-F 11 AM - 7 PM EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, OMAR FERNANDEZ RIVAS can be reached on (571)272-2589.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/S.H./Examiner, Art Unit 2121                                                                                                                                                                                                        

/OMAR F FERNANDEZ RIVAS/Supervisory Patent Examiner, Art Unit 2128