Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
The information disclosure statement filed 03/20/2019 fails to comply with the provisions of 37 CFR 1.97, 1.98 and MPEP § 609 because the citation for non-patent literature document (6) Molchanov et al. does not include a place of publication. This reference has been considered by Examiner and cited by Examiner in this office action (See PTO-892: Notice of References Cited).
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 8 and 17 rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
The term "permissible" in claim 8 lines 3 and 4 (2 instances) and claim 17 lines 3 and 4 (2 instances) is a relative term which renders the claim indefinite.  The term "permissible" is not defined by the claim, the specification does not provide a standard for ascertaining the requisite degree, and one of ordinary skill in the art would not be reasonably apprised of the scope of the invention.  “Permissible” is a subjective term. See MPEP 2173.05(b)(IV). For purposes of examination, Examiner interprets “permissible” as meeting a predetermined threshold.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
CLAIM 1
Step 1: The claim recites a method, one of the four categories of eligible subject matter.
Step 2A Prong 1: The claim recites the following limitations: 
(1) computing… a set of statistics pertaining to resource utilization
(2) determining… multiple batch sizes to be used for inferencing 
Computing is a mathematical computation, and determining is a mathematical computation and/or a mental process which can reasonably be performed in one’s mind with the aid of pencil and paper. Accordingly, the claim recites an abstract idea.
Step 2A Prong 2: This judicial exception is not integrated into a practical application. The claim recites the following additional elements:
obtaining, as input for inferencing…  (ii) one or more resource constraints; 
deep neural network
multiple layers 
outputting, to at least one user
computing device
The obtaining one or more resource constraints step is mere data gathering of inputs, an insignificant extra-solution activity under MPEP 2106.05(g). The outputting to a user is an insignificant extra-solution activity. The deep neural networks is not meaningful limitations under MPEP 2106.05(e) 
Accordingly, the additional elements do not integrate the abstract idea into a practical application because they does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception for the reasons given in Step 2A Prong 2. The claim is not patent eligible.

CLAIMS 2-13 incorporate the rejection of claim 1.
Step 1: The claims recites a method, one of the four categories of eligible subject matter.
Step 2A Prong 1: The judicial exceptions of claim 1 are incorporated. All additional limitations further describe the judicial exceptions. Accordingly, the claims recite an abstract idea. 
Step 2A Prong 2: This judicial exceptions are not integrated into a practical application. There are no additional elements. The claims are directed to an abstract idea.
Step 2B: The claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception for the reasons given in Step 2A Prong 2. The claims are not patent eligible.

CLAIM 14
Step 1: The claim recites a product, one of the four categories of eligible subject matter.
Step 2A Prong 1: The claim recites the following limitations: 
(1) compute… a set of statistics pertaining to resource utilization
(2) determine… multiple batch sizes to be used for inferencing 
Computing is a mathematical computation, and determining is a mathematical computation and/or a mental process which can reasonably be performed in one’s mind with the aid of pencil and paper. Accordingly, the claim recites an abstract idea.
Step 2A Prong 2: This judicial exception is not integrated into a practical application. The claim recites the following additional elements:
obtain, as input for inferencing…  (ii) one or more resource constraints; 
deep neural network
multiple layers 
outputting, to at least one user
a computer program product
a computer readable storage medium
program instructions
a computing device
The obtaining one or more resource constraints step is mere data gathering of inputs, an insignificant extra-solution activity under MPEP 2106.05(g). The outputting to a user is an insignificant 
Accordingly, the additional elements do not integrate the abstract idea into a practical application because they does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception for the reasons given in Step 2A Prong 2. The claim is not patent eligible.

CLAIMS 15-18 incorporate the rejection of claim 14
Step 1: The claims recites a product, one of the four categories of eligible subject matter.
Step 2A Prong 1: The judicial exceptions of claim 1 are incorporated. All additional limitations further describe the judicial exceptions. Accordingly, the claims recite an abstract idea. 
Step 2A Prong 2: This judicial exceptions are not integrated into a practical application. There are no additional elements. The claims are directed to an abstract idea.
Step 2B: The claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception for the reasons given in Step 2A Prong 2. The claims are not patent eligible.

CLAIM 19
Step 1: The claim recites a system, one of the four categories of eligible subject matter.
Step 2A Prong 1: The claim recites the following limitations: 
(1) computing… a set of statistics pertaining to resource utilization
(2) determining… multiple batch sizes to be used for inferencing 
Computing is a mathematical computation, and determining is a mathematical computation and/or a mental process which can reasonably be performed in one’s mind with the aid of pencil and paper. Accordingly, the claim recites an abstract idea.
Step 2A Prong 2: This judicial exception is not integrated into a practical application. The claim recites the following additional elements:
obtaining, as input for inferencing…  (ii) one or more resource constraints; 
deep neural network
multiple layers 
outputting, to at least one user
a memory
a processor
The obtaining one or more resource constraints step is mere data gathering of inputs, an insignificant extra-solution activity under MPEP 2106.05(g). The outputting to a user is an insignificant extra-solution activity. The deep neural networks is not meaningful limitations under MPEP 2106.05(e) because it is generally linking the judicial exception of determining batch sizes to the particular 
Accordingly, the additional elements do not integrate the abstract idea into a practical application because they does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception for the reasons given in Step 2A Prong 2. The claim is not patent eligible.

CLAIM 20
Step 1: The claim recites a method, one of the four categories of eligible subject matter.
Step 2A Prong 1: The claim recites the following limitations: 
(1) computing… a set of statistics pertaining to resource utilization comprising (i)-(iv)
(2) determining… multiple batch sizes to be used for inferencing 
Computing is a mathematical computation, and determining is a mathematical computation and/or a mental process which can reasonably be performed in one’s mind with the aid of pencil and paper. Accordingly, the claim recites an abstract idea.
Step 2A Prong 2: This judicial exception is not integrated into a practical application. The claim recites the following additional elements:
obtaining, as input for inferencing…  (ii) one or more resource constraints comprising (a)-(c); 
deep neural network
multiple layers 
outputting, to at least one user
computing device
The obtaining one or more resource constraints step is mere data gathering of inputs, an insignificant extra-solution activity under MPEP 2106.05(g). The outputting to a user is an insignificant extra-solution activity. The deep neural networks is not meaningful limitations under MPEP 2106.05(e) because it is generally linking the judicial exception of determining batch sizes to the particular technological environment of machine learning; and it is not an improvement in machine learning technology. The claims are focused on determining batch sizes for use with the neural network, in contrast to implementing the batch sizes in executing the neural network to obtain some output. The multiple layers as recited in line 7 is not a meaningful limitation because it is an intended use of the computing. The multiple layers as recited in line 12 is not a meaningful limitation because it is generally linked to inferencing (interpreted as part of the mental process of determining). The computing device is mere instructions to apply the judicial exception under MPEP 2106.05(f), and it is not a meaningful limitation.
Accordingly, the additional elements do not integrate the abstract idea into a practical application because they does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception for the reasons given in Step 2A Prong 2. The claim is not patent eligible.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that 
Claims 1-2, 7-8, 11-17, and 19 is/are rejected under 35 U.S.C. 103 as being unpatentable over U.S. Patent No. 10,019,668 B1 to Woo in view of U.S. Publication 2018/0341851 to Chung et al., hereinafter “Chung”.

Regarding claim 1, Woo teaches: A computer-implemented method, the method comprising steps of: 
obtaining, as input for inferencing of one or more deep neural networks, (i) an inferencing model (“convolutional neural network layers” C. 1 L. 18-19) and (ii) one or more resource constraints (total available on-chip memory: “a storage capacity… of on-chip memory may be 500 megabyte (MB)” C. 15 L. 65);
computing, based at least in part on the obtained input, a set of statistics pertaining to resource utilization for each of multiple layers in the one or more deep neural networks (A set of statistics is interpreted as at least one statistic. Woo teaches the statistic of a working set, defined as, “a size parameter that indicates an amount of memory needed to process the one or more inputs through each of the layers in the superlayer” (C. 2 L. 22-30) and where “Circuit 100 can then determine an amount of memory required to store respective sets of parameters for each layer of a neural network” (C. 15 L. 2-5)); and 
determining, based at least in part on (i) the obtained input and (ii) the computed set of statistics, multiple batch sizes to be used for inferencing the multiple layers of the one or more deep neural networks (Multiple batch sizes is interpreted as multiple identical batch sizes. “For respective ; and
outputting… the determined batch sizes to be used for inferencing the multiple layers of the one or more deep neural networks (Fig. 6A, C. 15 L. 37);
wherein the steps are carried out by at least one computing device (“a CPU or GPU” C. 4 L. 55).
However, Woo does not explicitly teach [outputting] to at least one user
	But Chung teaches: [outputting] to at least one user (“The dashboard 500 also includes a chart 520 providing a performance report.” [0070])
	Chung is in the same field of endeavor as the claimed invention, namely, optimizing the performance of a machine learning system. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have displayed the batch sizes from Woo using the chart display from Chung, with a motivation to adjust tuning parameters in real time while an application is running (“Providing the ability to “see” the current system performance is significant because at least some of the tuning parameters can be adjusted in real-time, while an application is running” Chung, [0070].)

Regarding claim 2, the combination of Woo and Chung teaches: The computer-implemented method of claim 1, wherein the inferencing model comprises a feed forward model (Woo teaches convolutional neural network layers, which are interpreted as a feed forward model).

Regarding claim 7, the combination of Woo and Chung teaches: The computer-implemented method of claim 1, wherein the one or more resource constraints comprises at least one of (i) total available memory, (ii) maximum latency for inferencing, and (iii) maximum energy for inferencing. (Woo in C. 15 L. 65 teaches (i) total available memory of may be 500 megabyte (MB))

Regarding claim 8, the combination of Woo and Chung teaches: The computer-implemented method of claim 1, wherein the set of statistics comprises at least one of (i) amount of working memory, (ii) input and activation size for each sample, (iii) time to process a layer for each of multiple permissible batch sizes, and (iv) energy to process a layer for each of multiple permissible batch sizes. (Woo teaches (i), the statistic of a working set, defined as, “a size parameter that indicates an amount of memory needed to process the one or more inputs through each of the layers in the superlayer” (C. 2 L. 22-30) and where “Circuit 100 can then determine an amount of memory required to store respective sets of parameters for each layer of a neural network” (C. 15 L. 2-5))


Regarding claim 11, the combination of Woo and Chung teaches: The computer-implemented method of claim 1, wherein said determining decreases one or more energy values associated with the inferencing of the one or more deep neural networks. (Woo teaches “energy optimization” C. 3 L. 12-13 and “conserve component energy consumption in… energy-sensitive computing environments” C. 10 L. 41-42.)

Regarding claim 12, the combination of Woo and Chung teaches: The computer-implemented method of claim 1, wherein said determining decreases one or more latency values associated with the inferencing of the one or more deep neural networks. (Woo teaches that “external communications can… increase system latency” (C. 10 L. 55), and so the “use of this on-chip storage and other local resources can serve to minimize external communications by the hardware circuit during processing of inputs through layers of a neural network” (C. 10 L. 42-46), thereby resulting in decreased latency.) 

Regarding claim 13, the combination of Woo and Chung teaches: The computer-implemented method of claim 1, wherein said determining decreases one or more memory values associated with the inferencing of the one or more deep neural networks. (A memory value is interpreted as a working set size (C. 2 L. 22-30). Woo teaches “For example, at least with regard to batch processing at layer B for batch element 1, alternating between different batch elements can reduce a maximum working set size of layer B to 10 units, instead of the maximum working set size of 16 units required when using the conventional scheduling policy described above.” C. 12 L. 43-46)

Regarding claim 14, Woo teaches: A computer program product comprising a computer readable storage medium (“computer storage devices” C. 2 L. 62) having program instructions embodied therewith, the program instructions executable by a computing device to cause the computing device to (instructions executed by processor C. 2 L. 67 – C. 3 L. 2): 
obtain, as input for inferencing of one or more deep neural networks, (i) an inferencing model (“convolutional neural network layers” C. 1 L. 18-19) and (ii) one or more resource constraints (A resource constraint is a total available on-chip memory: “a storage capacity… of on-chip memory may be 500 megabyte (MB)” C. 15 L. 65);
compute, based at least in part on the obtained input, a set of statistics pertaining to resource utilization for each of multiple layers in the one or more deep neural networks (A set of statistics is interpreted as at least one statistic. Woo teaches the statistic of a working set, defined as, “a size parameter that indicates an amount of memory needed to process the one or more inputs through each of the layers in the superlayer” (C. 2 L. 22-30) and where “Circuit 100 can then determine an amount of memory required to store respective sets of parameters for each layer of a neural network” (C. 15 L. 2-5)); 
determine, based at least in part on (i) the obtained input and (ii) the computed set of statistics, multiple batch sizes to be used for inferencing the multiple layers of the one or more deep neural networks (Multiple batch sizes is interpreted as multiple identical batch sizes. “For respective layers A, B, C, circuit 100 can determine a particular size parameter for inputs of working sets to be processed by respective layers and a corresponding batch size for the working set.” C. 15 L. 13-17); and 
output, to at least one user, the determined batch sizes to be used for inferencing the multiple layers of the one or more deep neural networks (Fig. 6A, C. 15 L. 37).
However, Woo does not explicitly teach [output] to at least one user
	But Chung teaches: [output] to at least one user (“The dashboard 500 also includes a chart 520 providing a performance report.” [0070])
	Chung is in the same field of endeavor as the claimed invention, namely, optimizing the performance of a machine learning system. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have displayed the batch sizes from Woo using the chart display from Chung, with a motivation to adjust tuning parameters in real time while an application is running (“Providing the ability to “see” the current system performance is significant because at least some of the tuning parameters can be adjusted in real-time, while an application is running” Chung, [0070].)

Regarding claim 15, the combination of Woo and Chung teaches: The computer program product of claim 14, wherein the inferencing model comprises a feed forward model. (Woo teaches convolutional neural network layers, which are interpreted as a feed forward model)

Regarding claim 16, the combination of Woo and Chung teaches: The computer program product of claim 14, wherein the one or more resource constraints comprises at least one of (i) total available memory, (ii) maximum latency for inferencing, and (iii) maximum energy for inferencing. (Woo in C. 15 L. 65 teaches (i) total available memory of may be 500 megabyte (MB))

Regarding claim 17, the combination of Woo and Chung teaches: The computer program product of claim 14, wherein the set of statistics comprises at least one of (i) amount of working memory, (ii) input and activation size for each sample, (iii) time to process a layer for each of multiple permissible batch sizes, and (iv) energy to process a layer for each of multiple permissible batch sizes. (Woo teaches (i), the statistic of a working set, defined as, “a size parameter that indicates an amount of memory needed to process the one or more inputs through each of the layers in the superlayer” (C. 2 L. 22-30) and where “Circuit 100 can then determine an amount of memory required to store respective sets of parameters for each layer of a neural network” (C. 15 L. 2-5))

Regarding claim 19, Woo teaches: A system comprising: a memory (Fig. 1, 102 and 104); and at least one processor (“a CPU or GPU” C. 4 L. 55) operably coupled to the memory and configured for: 
obtaining, as input for inferencing of one or more deep neural networks, (i) an inferencing model (“convolutional neural network layers” C. 1 L. 18-19) and (ii) one or more resource constraints (A resource constraint is a total available on-chip memory: “a storage capacity… of on-chip memory may be 500 megabyte (MB)” C. 15 L. 65);
computing, based at least in part on the obtained input, a set of statistics pertaining to resource utilization for each of multiple layers in the one or more deep neural networks (A set of statistics is interpreted as at least one statistic. Woo teaches the statistic of a working set, defined as, “a size parameter that indicates an amount of memory needed to process the one or more inputs through each of the layers in the superlayer” (C. 2 L. 22-30) and where “Circuit 100 can then determine an ; 
determining, based at least in part on (i) the obtained input and (ii) the computed set of statistics, multiple batch sizes to be used for inferencing the multiple layers of the one or more deep neural networks (Multiple batch sizes is interpreted as multiple identical batch sizes. “For respective layers A, B, C, circuit 100 can determine a particular size parameter for inputs of working sets to be processed by respective layers and a corresponding batch size for the working set.” C. 15 L. 13-17); and  
outputting… the determined batch sizes to be used for inferencing the multiple layers of the one or more deep neural networks (Fig. 6A, C. 15 L. 37).
However, Woo does not explicitly teach [outputting] to at least one user
	But Chung teaches: [outputting] to at least one user (“The dashboard 500 also includes a chart 520 providing a performance report.” [0070])
	Chung is in the same field of endeavor as the claimed invention, namely, optimizing the performance of a machine learning system. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have displayed the batch sizes from Woo using the chart display from Chung, with a motivation to adjust tuning parameters in real time while an application is running (“Providing the ability to “see” the current system performance is significant because at least some of the tuning parameters can be adjusted in real-time, while an application is running” Chung, [0070].)

Claim 3 is/are rejected under 35 U.S.C. 103 as being unpatentable over combination of Woo and Chung in view of “Pruning Convolutional Neural Networks for Resource Efficient Inference” to Molchanov et al.

Regarding claim 3, the combination of Woo and Chung teaches: The computer-implemented method of claim 1, 
However, the combination of Woo and Chung does not explicitly teach: wherein the inferencing model comprises a compressed model generated through weight-based pruning.
Molchanov teaches: wherein the inferencing model comprises a compressed model generated through weight-based pruning. (“Pruning by magnitude of kernel weights is perhaps the simplest possible criterion.” p. 3, § 2.2 Criteria for Pruning, ¶ Minimum weight. By default, pruning weights compresses the model.)
Molchanov is in the field of neural network optimization for inference. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have compressed the model in and pruned the weights in Woo’s system using the method of Molchanov’s system. A motivation for this combination is “that a convolutional kernel with low L2 norm detects less important features than those with a high norm.” (“The motivation to apply this type of pruning is that a convolutional kernel with low L2 norm detects less important features than those with a high norm”, ¶ Minimum weight)

Claims 4 is/are rejected under 35 U.S.C. 103 as being unpatentable over combination of Woo and Chung in view of “Trained ternary quantization” to Zhu et al.
Regarding claim 4, the combination of Woo and Chung teaches: The computer-implemented method of claim 1, 
The combination of Woo and Chung does not explicitly teach: wherein the inferencing model comprises a compressed model generated through at least one of (i) quantization and (ii) weight sharing.
Zhu teaches: wherein the inferencing model comprises a compressed model generated through at least one of (i) quantization and (ii) weight sharing. (Zhu teaches (i) quantization: “We During inference, only ternary values (2-bit weights) and scaling factors are needed” (Abstract). By default, quantizing weights compresses the model).
Zhu is in the field of neural network optimization for inference. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have quantized the weights in Woo’s system to ternary values from Zhu’s system, with a motivation to shrink the models (“our models are nearly 16x smaller than full-precision models” Zhu, Abstract). 

Claims 5 is/are rejected under 35 U.S.C. 103 as being unpatentable over combination of Woo and Chung in view “EIE: Efficient Inference Engine on Compressed Deep Neural Network” to Han et al.
Regarding claim 5, the combination of Woo and Chung teaches: The computer-implemented method of claim 1, 
However, the combination of Woo and Chung does not explicitly teach: wherein the inferencing model comprises a compressed model generated through relative indexing.
But Han teaches: wherein the inferencing model comprises a compressed model generated through relative indexing. (Han, Fig. 3 below shows memory layout for relative indexed CSC (compressed sparse column) format.)


    PNG
    media_image1.png
    189
    686
    media_image1.png
    Greyscale


(Hann p. 244, col. 1, first full paragraph) and to keep the weight matrix in sparse form instead of converting back to dense form (Hann p. 244, end of col. 2).

Claim 6 is/are rejected under 35 U.S.C. 103 as being unpatentable over combination of Woo and Chung in view of “Weightless: Lossy Weight Encoding For Deep Neural Network Compression” to Reagen et al.
Regarding claim 6, the combination of Woo and Chung teaches: The computer-implemented method of claim 1, 
However, the combination of Woo and Chung does not explicitly teach: wherein the inferencing model comprises a compressed model generated through encoding.
Reagen teaches: wherein the inferencing model comprises a compressed model generated through encoding. (End of p. 2: “Weightless is a lossy encoding scheme based around Bloomier filters… We then show how to encode neural network weights using this data structure and propose a set of augmentations to make it an effective compression strategy for deep neural networks.” Reagen teaches this in § 3.1, ¶Decoding and ¶Encoding. By default, encoding weights compresses the model.)
Reagen is in the field of neural network optimization for inference. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have compressed the neural network in Woo’s system by encoding weight matrices using Bloomier filters (Weightless) as in Reagen’s system with a motivation to compactly store weights in a neural network. (“We propose using the Bloomier filter to compactly store weights in a neural network”, p. 4 §3.2, ¶1)

Claims 9, 10 and 18 is/are rejected under 35 U.S.C. 103 as being unpatentable over combination of Woo and Chung in view of “Low latency RNN inference with cellular batching” to Gao et al.
Regarding claim 9, the combination of Woo, Chung teaches: The computer-implemented method of claim 1, 
However, the combination of Woo and Chung does not explicitly teach: wherein said determining comprises determining a sequence of variable batch sizes corresponding to the multiple layers of the one or more deep neural networks.
	But Gao teaches: wherein said determining comprises determining a sequence of variable batch sizes corresponding to the multiple layers of the one or more deep neural networks. (“We perform microbenchmarks using various input batch sizes” (p. 12, col. 2)) 
	Gao is in the field of batching for improving neural network inferencing. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated the cellular batching method on recurrent neural networks from Gao’s system into the batching in Woo’s system, with a motivation to achieve high throughput values and low latency simultaneously. (“We propose the technique of cellular batching, which improves both the latency and throughput of RNN inference.” (Gao, Astract))


Regarding claim 10, the combination of Woo and Chung teaches: The computer-implemented method of claim 1, 
However, the combination of Woo and Chung does not explicitly teach: wherein said determining increases one or more throughput values associated with the inferencing of the one or more deep neural networks.
wherein said determining increases one or more throughput values associated with the inferencing of the one or more deep neural networks. (Throughput is interpreted as requests per second (req/s) as used by Gao. “The inference throughput of BatchMaker for TreeLSTM is 4× and 1.8× that of TensorFlow Fold and DyNet, respectively” (Gao, col. 1, end of section 1))
	Gao is in the field of batching for improving neural network inferencing. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated the cellular batching method on recurrent neural networks from Gao’s system into the batching in Woo’s system, with a motivation to achieve high throughput values and low latency simultaneously. (“We propose the technique of cellular batching, which improves both the latency and throughput of RNN inference.” (Gao, Astract))

Regarding claim 18, the combination of Woo and Chung teaches: The computer program product of claim 14, 
However, the combination of Woo and Chung does not explicitly teach: wherein said determining comprises determining a sequence of variable batch sizes corresponding to the multiple layers of the one or more deep neural networks.
But Gao teaches: wherein said determining comprises determining a sequence of variable batch sizes corresponding to the multiple layers of the one or more deep neural networks. (“We perform microbenchmarks using various input batch sizes” (p. 12, col. 2)) 
	Gao is in the field of batching for improving neural network inferencing. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated the cellular batching method on recurrent neural networks from Gao’s system into the batching in Woo’s system, with a motivation to achieve high throughput values and low (“We propose the technique of cellular batching, which improves both the latency and throughput of RNN inference.” (Gao, Astract))

Claim 20 is/are rejected under 35 U.S.C. 103 as being unpatentable Woo in view of “An Analysis of Deep Neural Network Models for Practical Applications” to Canziani et al., and further in view of U.S. Patent Publication No.  20170344882 to Ambrose et al., and further in view of Molchanov and Chung.
Regarding claim 20, Woo teaches: A computer-implemented method, the method comprising steps of: 
obtaining, as input for inferencing of one or more deep neural networks, (i) an inferencing model, wherein the inferencing model comprises a feed forward model (Woo teaches convolutional neural network layers (C. 1 L. 18-19) which are interpreted as a feed forward model), and (ii) constraints comprising (a) total available memory (Woo in C. 15 L. 65 teaches (i) total available memory of may be 500 megabyte (MB)), 
computing, based at least in part on the obtained input, a set of statistics pertaining to resource utilization for each of multiple layers in the one or more deep neural networks, wherein the set of statistics comprises (i) amount of working memory (Woo teaches (i), the statistic of a working set, defined as, “a size parameter that indicates an amount of memory needed to process the one or more inputs through each of the layers in the superlayer” (C. 2 L. 22-30) and where “Circuit 100 can then determine an amount of memory required to store respective sets of parameters for each layer of a neural network” (C. 15 L. 2-5)), (ii) input size (“Thus, in this example, circuit 100 determines that aggregate memory usage for respective sets of parameters for layers A, B, and C is 300 MB, leaving 200 MB of available on-chip memory for use in storing inputs. For respective layers A, B, C, circuit 100 can determine a particular size parameter for inputs of working sets to be processed by respective layers and a corresponding batch size for the working set.” (C. 15 L. 10-17))
determining, based at least in part on (i) the obtained input and (ii) the computed set of statistics, the multiple batch sizes to be used for inferencing the multiple layers of the one or more deep neural networks (“For respective layers A, B, C, circuit 100 can determine a particular size parameter for inputs of working sets to be processed by respective layers and a corresponding batch size for the working set.” C. 15 L. 13-17)
displaying… the determined batch sizes to be used for inferencing the multiple layers of the one or more deep neural networks; (Fig. 6A, C. 15 L. 37)
wherein the steps are carried out by at least one computing device (“a CPU or GPU” C. 4 L. 55).

Woo does not explicitly teach the constraints comprising (b) maximum latency for inferencing, and (c) maximum energy for inferencing; set of statistics comprising (ii) activation size, (iii) time to process a layer for each of multiple batch sizes, and (iv) energy to process a layer for each of the multiple batch sizes; and [displaying] to at least one user

	Canziani teaches: constraints comprising (b) maximum latency for inferencing (Latency is interpreted as inference time per image: “there is a linear relationship between operations count and inference time per image. Therefore, at design time, we can pose a constraint on the number of operation to keep processing speed in a usable range for real-time applications or resource-limited deployments” The number of operation is proportional to inference time.” Canziani, end of p. 4), and (c) maximum energy for inferencing (“an energetic constraint, which could possibly be an essential designing factor for a network that needs to run on an embedded system.” Canziani, p. 6 first paragraph.) ;
a set of statistics comprising (iv) energy to process a layer for each of the multiple batch sizes (the energy statistic is interpreted as the energy constraint recited above);
“The purpose of this paper is to stress the importance of these figures, which are essential hard constraints for the optimisation of these networks in practical deployments and applications.” (Canziani, p. 1))

Molchanov teaches: set of statistics comprising (ii) activation size (“If an activation value (an output feature map) is small then this feature detector is not important for the prediction task at hand. We may evaluate this by the mean activation… or by the standard deviation of the activation” (end of p. 3))
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated the teachings of Molchanov’s system into the combination of Woo and Canziani’s system. The activation value as taught by Molchanov would be computed by Woo’s system, with a motivation to determine a pruning criterion (Molcanov § 2). 

	Ambrose teaches: set of statistics comprising (iii) time to process a layer for each of multiple batch sizes ([0153] Extended Cost Estimations [0154] The following are specific example formulations to estimate the memory size and execution time for different scheduling schemes, depending upon the location of the data. [0155] The minimum on-chip shared memory size required depends on the scheduling scheme and whether input/output data is stored in on-chip memory or external memory… “layer execution time = input FM processing pipeline latency*inFM/numPU” [0183])
 (Ambrose [0153])

Chung teaches: [displaying] to at least one user (“The dashboard 500 also includes a chart 520 providing a performance report.” [0070])
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated the teachings of Chung’s system into the combination of Woo, Canziani, Molchanov, and Ambrose’s system. The batch sizes from Woo would be displayed using the chart display from Chung, with a motivation to adjust tuning parameters in real time while an application is running (“Providing the ability to “see” the current system performance is significant because at least some of the tuning parameters can be adjusted in real-time, while an application is running” Chung, [0070].)
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. U.S. Patent Publication 2019/0320115 to Dutta Bordoloi et al. teaches techniques for dynamically selecting a batch size used in vehicle camera image.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Asher Jablon whose telephone number is (571)270-7648.  The examiner can normally be reached on Monday - Friday, 9:00 am - 6:00 pm.

If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kakali Chaki can be reached on (571) 272-3719.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/ASHER JABLON/Examiner, Art Unit 2122                                                                                                                                                                                                        
/ERIC NILSSON/Primary Examiner, Art Unit 2122