DETAILED ACTION
Status of Claims
This is a non-final office action on the merits in response to the arguments and amendments filed on 3 June 2022 and the request for continued examination filed on 3 June 2022.
Claims 1, 15, and 20 were amended. Claims 1-20 are currently pending and have been examined. 

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 3 June 2022 has been entered.
 
Note Regarding Amendments to the Specification
The 3 June 2022 amendments to the specification are not entered. CFR 1.121(b)(1)(i) states that amendments to replace a paragraph of the specification are required to include: “An instruction, which unambiguously identifies the location, to … replace a paragraph with one or more replacement paragraph.” The specification amendment as filed indicates where to replace a paragraph of “published specification” (i.e., pre-grant publication). This is not an unambiguous identification of the location in the currently filed specification. Further, the amendment as filed would place the paragraph in the specification with an incorrect paragraph number. 

Claim Rejections - 35 USC § 112(a)
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.

The following is a quotation of the first paragraph of pre-AIA  35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.

Claims 1-20 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA  35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention. Claims not listed below are rejected for dependency.

Claim 1 recites the original limitation “generating, based on the ANN structure, a configuration for the computation engine, the configuration including information concerning a batch size of the one or more layers of the ANN.” 
MPEP 2163 states: “While there is a presumption that an adequate written description of the claimed invention is present in the specification as filed, In re Wertheim, 541 F.2d 257, 262, 191 USPQ 90, 96 (CCPA 1976), a question as to whether a specification provides an adequate written description may arise in the context of an original claim. An original claim may lack written description support when (1) the claim defines the invention in functional language specifying a desired result but the disclosure fails to sufficiently identify how the function is performed or the result is achieved.” In the present case, the identified limitation uses functional language specifying a desired result. 
The most relevant portions of the original disclosure regarding the generation of the configuration state: 
[0012] According to one example embodiments, a system for optimizing ANN computations based on automatic determination of a batch size for an ANN is provided. The system may include a computation engine capable of performing computations of one or more layers of the ANN and an optimization module. The optimization module can be capable of receiving an ANN structure associated with the ANN and generating, based on the ANN structure, a configuration for the computation engine. The configuration may include information concerning batch sizes of one or more layers of the ANN.

[0033] The system 400 may further include an optimization module 440 configured to generate a configuration 450 for the computational engine 405. The controller 420 of the computational engine 405 may assign, based on the configuration 450, batch sizes to layers of ANN. The controller 420 may also configure the processing units 430-i (i=1, . . . , N) to perform computations of the layers of ANN.

[0054] In block 804, the method 800 may generate, based on the ANN structure, a configuration for a computation engine capable of performing computations of the layers of the ANN. The configuration may include information concerning a batch size of one or more layers of the ANN and a sequence of performing computations for the layers of ANN. The optimization module can include a software-based module and the computation engine may include one or more hardware-based modules implemented on FPGAs.

[0056] The batch size of a layer of the ANN can be determined based on a bandwidth required to read data related to the layer, a number of parameters associated with the layer, and a time the layer processes one input dataset from the batch. The batch size of the layer of the ANN can differ from the batch size of the ANN. The ANN may include at least a first layer and a second layer such as a batch size of the first layer differs from a batch size of the second layer.

[0057] Determining the batch sizes for layers of the ANN may include performing, by the optimization module, one or more iterations of selecting batch sizes of the layers of the ANN to optimize a performance measure. The performance measure can be a function of one or more of: a latency of the ANN, a throughput of the ANN, and a desired size of ANN. The performance measure can be set based on a user input. The iterations can be carried out until a number of the iterations exceeds a predetermined threshold or the performance measure exceeds a pre-determined threshold.

The disclosures of [0012], [0033], and [0054] merely repeat the identified functionality with no elaboration. The disclosure at [0056] plausibly provides some indication of input parameters to the generation, but this too provides no details on the actual implementation. The disclosure at [0057] vaguely references the use of iterations to select batch sizes, but there is no explanation of what process is being iterated. None of these disclosures provide meaningful details as to how the function is performed or the result is achieved. The remainder of the original disclosure similarly fails to articulate how the function is performed or the result is achieved. 
MPEP 2163 further states: “The Federal Circuit has explained that a specification cannot always support expansive claim language and satisfy the requirements of 35 U.S.C. 112 "merely by clearly describing one embodiment of the thing claimed." LizardTech v. Earth Resource Mapping, Inc., 424 F.3d 1336, 1346, 76 USPQ2d 1731, 1733 (Fed. Cir. 2005). The issue is whether a person skilled in the art would understand applicant to have invented, and been in possession of, the invention as broadly claimed. In LizardTech, claims to a generic method of making a seamless discrete wavelet transformation (DWT) were held invalid under 35 U.S.C. 112, first paragraph, because the specification taught only one particular method for making a seamless DWT and there was no evidence that the specification contemplated a more generic method.” The present application includes expansive functional language while providing no examples of the thing claimed. 
Based on the broad functional language of the claims, and the lack of disclosure, one of ordinary skill in the art would not conclude that applicant was in possession of the claimed invention at the time of filing. Thus the claim is rejected for failing the written description requirement. Claims 15 and 20 are similarly rejected. 

Claim 1 recites the non-original limitation “generating, based on the ANN structure, a configuration for the computation engine, the configuration including information concerning a batch size of the one or more layers of the ANN and a sequence of performing computations of one or more layers of the ANN by the computation engine”. Applicants identified [0011], [0031-0033], [0039], and [0055] as support for the amendments at large. 
[0011] Provided are computer-implemented systems and methods for optimizing ANN computations based on automatic determination of a batch size. Some embodiments of the present disclosure can facilitate reduction in time required for computer systems to perform ANN computations for a batch by determining batch sizes individually for each layer of the ANN and a sequence of a computation of layers for inputs sets from the batch.

[0039] FIG. 6A shows a plot 610 of a sequence of computation of layers of the ANN 500, according to an example embodiment. In the example of FIG. 6A, the subpart of ANN 500 including the hidden layers 510 (L0), 515 (L1), and 520 (L2) is first executed twice, once for input dataset A and once for the input dataset B. Then, the hidden layer 525 (L3) can be executed once using the batch including two outputs of the hidden layer 515 (L1), where the two outputs are generated based on input dataset A and B. After computation of the hidden layer 525 (L3) are finished, the hidden layer 530 (L4) can be executed once using batch of two inputs, where a first input is based on of the outputs of layers 520 (L2) and 525 (L3) generated based on input dataset A and a second input is based on the outputs of layers 520 (L2) and 525 (L3) generated based on input dataset B. The whole sequence of computations of layers L0, L1, L2, L3, and L4 is further repeated one more time for the input datasets C and D. As a result, the layer L4 generates four outputs which are based on the input datasets A, B, C, and D. The four outputs of the layer L4 may form a batch for computation of the output layer 535 (L5).

[0041] The sequence of computations of layers in the ANN can be based on the batch sizes of the layers, job allocation of computation of layers in the processing units 430-i (i=1, . . . , N), and memory allocation of weights of neurons of layers, inputs of layers, outputs of layers in memories of computational engine 405. The sequence of the computation of layers can depend on latencies and throughputs of memories storing input data of layers and other information related to the layers. Due to the latencies and throughputs, the sequence shown in FIG. 6B can be a better choice than the sequence described in FIG. 6A, even though results of the execution of the sequences can be the same.

The disclosure at [0011] describe the functionality of determining a sequence of computation of layers, but provides no indication of how such a determination is implemented. The disclosure of [0039] provides an example of a sequence, but provides no indication how such a sequence is generated. The disclosure at [0041] describes factors relevant to the determination of sequence, but does not provide any details regarding the implementation of such a determination. Based on the specification not providing how the non-originally claimed functionality is achieved, one of ordinary skill in the art would not recognize applicant as possessing the claimed invention at the time of filing. Thus the claim is rejected for failing the written description requirement. Claims 15 and 20 are similarly rejected. 

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Claim 1, which is representative of claims 15 and 20, recites receiving an ANN structure associated with an ANN; and generating, based on the ANN structure, a configuration for the computation engine, the configuration including information concerning a batch size of the one or more layers of the ANN and a sequence for performing computations of one or more layers of the ANN by the computation engine, wherein the sequence includes a subpart, the subpart including at least two connected layers of the ANN and being repeated in the sequence at least two times, the batch size and the sequence being based on optimized performance of computations of a layer of the ANN by the computation engine. These limitations describe a mental process of receiving information and analyzing the information to determine an optimal batch size and sequence for the layers of a neural network based on the structure of the neural network. 
MPEP 2106 notes that “A Claim With Limitation(s) That Cannot Practically be Performed in the Human Mind Does Not Recite a Mental Process.” However, the same section notes that “claims do recite a mental process when they contain limitations that can practically be performed in the human mind, including for example, observations, evaluations, judgments, and opinions. Examples of claims that recite mental processes include: a claim to ‘collecting information, analyzing it, and displaying certain results of the collection and analysis,’ where the data analysis steps are recited at a high level of generality such that they could practically be performed in the human mind.” The limitation identified above is recite at an extremely high level of generality, and does not include any requirements of scale or computational-technique that would place it beyond the bounds of what can be practically performed by a human using pen and paper. As such, it is determined that the identified limitation can be practically performed in the human mind, and the identified limitation is not excluded from being interpreted as a mental process. Therefore the claims are determined to recite a mental process. 
Under the 2019 PEG, the additional elements of the claims are considered for whether they integrate the abstract idea into a practical application. Claims 1 and 20 recites the additional element of a processor and a memory, Claim 15 recites the additional element of a processor. These additional elements are all recited at a high level of generality, and may be interpreted as generic computing devices implementing the above identified abstract idea. Under the 2019 PEG, the use of a generic computing device to implement an abstract idea does not integrate that abstract idea into a practical application. As such, these additional elements do not integrate the abstract idea into a practical application. The claims further recite the additional element of a computation engine comprising a controller and one or more processing units being capable of performing operations associated with the one or more layers of the ANN where the controller is configured to receive a batch of input datasets; and configured, based on the configuration, the one or more processing units to perform computations of the one or more layers of the ANN for the input datasets. The abstract idea involves generating configuration information for a neural network, and this additional element essentially describes the use of that information. Having generated a set of batch sizes and a sequence for the batches for a neural network, the mere implementation of that generated information is neither significant nor more than necessary data outputting. Thus this additional element is properly understood as insignificant extra-solution activity. Under the 2019 PEG, the incorporation of additional element which is insignificant extra-solution activity does not integrate that abstract idea into a practical application. There are no further additional elements. When considered as a combination of the additional elements, the combination of additional elements does not require any particular device, do not represent any improvement to technology, do not effect the transformation of an article, and do not meaningfully limit the implementation of the abstract idea. Instead, the combination of additional elements only generally links the abstract idea to a computer environment for implementation. As such, the combination of additional elements does not integrate the abstract idea into a practical application. Therefore as the additional elements of the claims do not integrate the abstract idea into a practical application, the claims are determined to be directed to an abstract idea.
In Step 2B of the Mayo/Alice analysis, the additional elements of the claims are considered for whether they amount to significantly more than the abstract idea. As previously noted, the claims recite additional elements which may be interpreted as generic computing devices used to implement the abstract idea. However, implementing an abstract idea on a generic computer does not add significantly more, similar to how the recitation of the computer in the claim in Alice amounted to mere instructions to apply the abstract idea of intermediated settlement on a generic computer. As such, these elements do not provide an inventive concept and do not constitute significantly more. As previously noted, the claims recite an additional element of implementing the information produced via the abstract idea, which was determined to be insignificant extra-solution activity. Note additionally that Devarakonda (AdaBatch: Adaptive Batch Sizes For Training Deep Neural Networks) suggests that running neural networks via selected batch sizes was conventional before the priority date of the claimed invention (“Training deep neural networks with Stochastic Gradient Descent, or its variants, requires careful choice of both learning rate and batch size. While smaller batch sizes generally converge in fewer training epochs, larger batch sizes offer more parallelism and hence better computational efficiency.” See Abstract. Also: “implementations typically divide the training set into a (potentially large) series of batches of some fixed size. Each batch is processed in sequence during one training epoch; however, the individual training samples within a single batch may be processed in parallel (Bottou et al., 2016; Goodfellow et al., 2016).” See Page 1. Also: “The user overseeing the training process typically chooses a static batch size r”. See at least Page 1). This further supports the implementation of a neural network based on a determined batch size to be insignificant extra-solution activity. Thus this additional element is determined to be insignificant extra solution activity, and does not amount to significantly more than the abstract idea. There are no further additional elements. As previously noted, the combination of additional elements only generally links the abstract idea to a computer environment for implementation. As such, the combination of additional elements does not amount to significantly more than the abstract idea. Therefore, when considered individually and as an ordered combination, the additional elements of the independent claims do not amount to significantly more than the judicial exception. Thus the independent claims are not patent eligible.  
Dependent claims 2-7, 11-14, and 16-19 only further describe the abstract idea and do not recite any further additional elements. The previously identified additional elements do not integrate the further narrowed abstract idea into a practical application. And the previously identified additional elements do not amount to significantly more than the abstract idea. Dependent claim 8, 9 further describe the additional element of implementing the generated information, but the additional element continues to amount to insignificant extra-solution activity, and does not integrate the abstract idea into a practical application nor amount to significantly more than the abstract idea, when considered individually and in combination with the other additional element.  Dependent claim 10 further describes the additional element interpreted as a generic computing device, where this device is now implemented on a FPGA. However, this additional element only generally links the abstract idea to a technological environment involving FPGAs. As such, when considered individually and in combination with prior additional elements, this additional element does not integrate the abstract idea into a practical application. Further, Nakaya (US 2009/0138770 A1), see at least [0002], demonstrates that FPGAs were conventional long before the priority date of the claimed invention. Thus, when considered individually and in combination with prior additional elements, this additional element does not amount to significantly more than the abstract idea. Thus as the dependent claims remain directed to a judicial exception, and as the additional elements of the claims do not amount to significantly more, the dependent claims are not patent eligible.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claims 1-13, and 15-20 are rejected under 35 U.S.C. 103 as being unpatentable over Choudhury et al. (US 2020/0125926 A1) in view of Woo (US 2018/0373976 A1).

Regarding Claim 1: Choudhury discloses a system for optimizing artificial neural network (ANN) computations based on automatic determination of a batch size for an ANN, the system comprising: 
a computing engine comprising a controller and one or more processing units being capable of performing operations associated with the one or more layers of the ANN (inferencing can be carried out either on the cloud or the edge device itself. Inferencing, as used herein, refers to the stage wherein a trained network predicts and/or classifies input test samples. See at least [0002]); and
a processor and a memory storing an optimization module comprising processor-executable codes (See at least [0035]), wherein the processor is configured to implement the following operations upon executing the processor-executable codes:
receiving an ANN structure associated with an ANN (Step 502 includes obtaining a feed forward model and resource constraints for the system.  See at least [0030]). 
generating, based on the ANN structure, a configuration for the computation engine, the configuration including information concerning a batch size of the one or more layers of the ANN and a sequence of performing computations of one or more layers of the ANN by the computation engine, the batch size and the sequence being based on optimized performance of computations of a layer of the ANN by the computation engine (Step 506 includes running an optimizer to maximize throughput while maintaining latency, memory, and/or energy constraints. Step 508 includes outputting/returning an optimal batch size to be used for each layer in the inference. See at least [0030]. Also: determining one or more optimal batch size sequences to be used by the different layers of the model for inferencing, wherein the one or more batch size sequences increase throughput and/or reduce energy or power consumption. See at least [0014]. Also: as also depicted in FIG. 4, the optimal batch size sequence determination component 408, using inputs 402 and 404, as well as the statistics determined by the pre-processing component 406, determines one or more optimal batch size sequences for the layers of the feed forward network, as shown in algorithm 300. In making such determinations, component 408 attempts to maximize throughput, minimize energy consumption, maintain one or more latency parameters, and/or maintain one or more memory requirements, as detailed above. Further, component 408 outputs a batch size sequence 410 across multiple layers of the feed forward network. See at least [0029]).
wherein the controller is configured to: receive a batch of input datasets; and configure, based on the configuration, the one or more processing units to perform computations of the one or more layers of the ANN for the input datasets (inferencing can be carried out either on the cloud or the edge device itself. Inferencing, as used herein, refers to the stage wherein a trained network predicts and/or classifies input test samples. See at least [0002]. Also: Step 508 includes outputting/returning an optimal batch size to be used for each layer in the inference. See at least [0030]). 

Choudhury does not appear to disclose wherein the sequence includes a subpart, the subpart including at least two connected layers of the ANN and being repeated in the sequence at least two times. 
	Woo discloses generating a configuration for the computation engine, the configuration including a sequence of performing computations of one or more layers of the ANN by the computation engine, wherein the sequence includes a subpart, the subpart including at least one connected layers of the ANN and being repeated in the sequence at least two times (At block 404, circuit 100 determines a partitioning of the neural network layers into a sequence of superlayers. For example, circuit 100 can include, or have access to, compiler logic that is configured to determine one or more partitions of the neural network layers into sequences of superlayers. Alternatively, or in addition to the compiler logic, circuit 100 can include, or have access to, at least one hardware block configured to determine one or more partitions of the neural network layers into sequences of superlayers. In some implementations, each superlayer in the sequence of super layers is a partition of the directed graph that includes one or more layers. See at least [0077]. Also: Graph 500 further includes a sequence of superlayers along an X-axis of the graph. For example, graph 500 includes: i) a first superlayer 512 for processing batch elements 0, 1, 2, and 3 through each of layers A, B, C; and ii) a second superlayer 514 for processing batch elements 0, 1, 2, and 3 through each of layers D, E. According to the described teachings, a sequence of superlayers defined based on an improved neural network scheduling policy can support a relatively high working set batch size without exceeding on-chip memory capacity, or threshold capacity, of a hardware circuit that executes a neural network. See at least [0086] and Fig. 5).
	Choudhury provides a system which receives a neural network structure and determines a set and sequence of optimal batch sizes, which differs from the claimed invention by the substitution of Choudhury’s generic batch sequences, for a batch sequence including repeated subparts of multiple layers. However, Woo demonstrates that the prior art already knew of generating a configuration for a neural network that defines a sequence for neural network layer processing that includes repeated subparts of multiple layers. One of ordinary skill in the art could have easily substituted Woo’s techniques for determining a sequence of layers into the system of Choudhury. Further, one of ordinary skill in the art would have recognized that such a substitution would have resulted in improved neural network processing efficiency (Woo. See at least [0086] and [0064]). As such, the identified substitution and the claimed invention would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention in view of the disclosures of Choudhury and the teachings of Woo. 

Regarding Claim 2: Choudhury in view of Woo teaches the above limitations. Additionally, Choudhury discloses wherein the processor determines the batch size of the one or more layers based on a bandwidth required to read data related to the one or more layers (the pre-processing component 406 determines, for each layer of the feed forward network 402, a set of statistics related to resource utilization. Such statistics can include, for example, working memory, input and output activation size for every batch size, time and/or energy to compute the layer for every batch size, etc. See at least [0028]. Also: the optimal batch size sequence determination component 408, using inputs 402 and 404, as well as the statistics determined by the pre-processing component 406, determines one or more optimal batch size sequences for the layers of the feed forward network, as shown in algorithm 300. See at least [0029]. Examiner’s note: As “bandwidth required to read data” will influence “time … to compute the layer”, Choudhury’s optimization is “based on a bandwidth require to read data related to the one or more layers.”). 

Regarding Claim 3: Choudhury in view of Woo teaches the above limitations. Additionally, Choudhury discloses wherein the processor determines the batch size of the one or more layers based on a number of parameters associated with the one or more layers (the pre-processing component 406 determines, for each layer of the feed forward network 402, a set of statistics related to resource utilization. Such statistics can include, for example, working memory, input and output activation size for every batch size, time and/or energy to compute the layer for every batch size, etc. See at least [0028]. Also: the optimal batch size sequence determination component 408, using inputs 402 and 404, as well as the statistics determined by the pre-processing component 406, determines one or more optimal batch size sequences for the layers of the feed forward network, as shown in algorithm 300. See at least [0029]). 

Regarding Claim 4: Choudhury in view of Woo teaches the above limitations. Additionally, Choudhury discloses wherein the processor determines the batch size of the one or more layers based on a time the one or more layers processes one input dataset from the batch (the pre-processing component 406 determines, for each layer of the feed forward network 402, a set of statistics related to resource utilization. Such statistics can include, for example, working memory, input and output activation size for every batch size, time and/or energy to compute the layer for every batch size, etc. See at least [0028]. Also: the optimal batch size sequence determination component 408, using inputs 402 and 404, as well as the statistics determined by the pre-processing component 406, determines one or more optimal batch size sequences for the layers of the feed forward network, as shown in algorithm 300. See at least [0029]). 

Regarding Claim 5: Choudhury in view of Woo teaches the above limitations. Additionally, Choudhury discloses wherein the processor is configured to determine the batch size for the ANN based on the batch size of the layers of the ANN (component 408 outputs a batch size sequence 410 across multiple layers of the feed forward network. See at least [0029] and Fig. 4, Element 410. Also: Equation (1) can be derived as follows. Suppose in the optimal solution for OPTExact[i, j, b, mem], layer L.sub.k (i≤k≤j) is computed with batch size b. As such, the total time per sample to compute layers L.sub.i to L.sub.j in this scenario can be expressed as the sum of three quantities: (i) the optimal time per sample to compute layers L.sub.i to L.sub.k−1 using batch size at most b with memory mem, (ii) the optimal time per sample to compute layer L.sub.k with batch size b and memory mem (this is finite only if <k, b, mem> is feasible), and (iii) the optimal time per sample to compute layers L.sub.k+1 to L.sub.j using batch size at most b and memory mem. As the layer L.sub.k can be unknown, every layer between L.sub.i and L.sub.j can be considered, and the layer L.sub.k that provides the best solution can be selected. See at least [0023]). 

Regarding Claim 6: Choudhury in view of Woo teaches the above limitations. Additionally, Choudhury discloses wherein the batch size of the one or more layers differs from the batch size of the ANN (component 408 outputs a batch size sequence 410 across multiple layers of the feed forward network. See at least [0029] and Fig. 4, Element 410. Also: Equation (1) can be derived as follows. Suppose in the optimal solution for OPTExact[i, j, b, mem], layer L.sub.k (i≤k≤j) is computed with batch size b. As such, the total time per sample to compute layers L.sub.i to L.sub.j in this scenario can be expressed as the sum of three quantities: (i) the optimal time per sample to compute layers L.sub.i to L.sub.k−1 using batch size at most b with memory mem, (ii) the optimal time per sample to compute layer L.sub.k with batch size b and memory mem (this is finite only if <k, b, mem> is feasible), and (iii) the optimal time per sample to compute layers L.sub.k+1 to L.sub.j using batch size at most b and memory mem. As the layer L.sub.k can be unknown, every layer between L.sub.i and L.sub.j can be considered, and the layer L.sub.k that provides the best solution can be selected. See at least [0023]).

Regarding Claim 7: Choudhury in view of Woo teaches the above limitations. Additionally, Choudhury discloses wherein the ANN includes at least a first layer and a second layer, and a batch size of the first layer differs from a batch size of the second layer (component 408 outputs a batch size sequence 410 across multiple layers of the feed forward network. See at least [0029] and Fig. 4, Element 410. Also: Equation (1) can be derived as follows. Suppose in the optimal solution for OPTExact[i, j, b, mem], layer L.sub.k (i≤k≤j) is computed with batch size b. As such, the total time per sample to compute layers L.sub.i to L.sub.j in this scenario can be expressed as the sum of three quantities: (i) the optimal time per sample to compute layers L.sub.i to L.sub.k−1 using batch size at most b with memory mem, (ii) the optimal time per sample to compute layer L.sub.k with batch size b and memory mem (this is finite only if <k, b, mem> is feasible), and (iii) the optimal time per sample to compute layers L.sub.k+1 to L.sub.j using batch size at most b and memory mem. As the layer L.sub.k can be unknown, every layer between L.sub.i and L.sub.j can be considered, and the layer L.sub.k that provides the best solution can be selected. See at least [0023]).

Regarding Claim 8: Choudhury in view of Woo teaches the above limitations. Additionally, Choudhury discloses wherein the controller being capable of configuring the one or more processing units to repeat operations of the ANN for different input datasets from the batch of input datasets (multiple batch sizes to be used for inferencing the multiple layers of the one or more deep neural networks; and outputting, to at least one user, the determined batch sizes to be used for inferencing the multiple layers of the one or more deep neural networks. See at least [0003]. Also:  determining one or more optimal batch size sequences to be used by the different layers of the model for inferencing, wherein the one or more batch size sequences increase throughput and/or reduce energy or power consumption. See at least [0014]. Also: Step 508 includes outputting/returning an optimal batch size to be used for each layer in the inference. See at least [0030]). As previously noted in combination with Choudhury, Woo teaches a subpart. The motivation to combine Choudhury and Woo is the same as explained under claim 1 above, and is incorporated herein.

Regarding Claim 9: Choudhury in view of Woo teaches the above limitations. Additionally, Choudhury discloses wherein the controller configures the one or more processing units to perform, at the same time, a computation of the first layer of the ANN for a first input dataset of a batch of input datasets and a computation of the first layer for a second input dataset of the batch of input datasets prior to computation of a second layer of the ANN, wherein an input dataset of the second layer includes an output dataset of the first layer (FIG. 1 is a diagram illustrating batch size optimization, according to an embodiment of the invention. By way of illustration, FIG. 1 depicts a first deep neural networks layer (L.sub.1) 102, a second layer (L.sub.2) 104, and a third layer (L.sub.3) 106. As detailed herein, given memory availability, one or more embodiments of the invention include computing different batch sizes for different layers. By way merely of example, with uniform batch size, a memory requirement of layer L.sub.2 104 can restrict the batch size that can be processed for the network. Additionally, a larger batch size of b can be used for layers L.sub.1 102 and L.sub.3 106, while a b′<b batch size can be used for layer L.sub.2 104. See at least [0016]. Also: By way of further explanation and/or illustration, such an embodiment can include utilization of a batch size optimizer. In such an embodiment, L.sub.1, L.sub.2, . . . , L.sub.n represent the n layers of the network. A simple path network, for example, can include an output of layer L.sub.i being fed only into its successor layer L.sub.i+1. As also used in conjunction with one or more such embodiments, time (i, b) refers to the time per sample to process layer L.sub.i with a batch size of b. Additionally, in(i, b) refers to the memory required to store activations for b input samples for layer L.sub.i, out(i, b) refers to the memory required to store activations for b output samples for layer L.sub.i, ws(i, b) refers to the temporary workspace required for processing layer L.sub.i with batch size of b, and Tot refers to the total memory available in the system. See at least [0018]). 

Regarding Claim 10: Choudhury in view of Woo teaches the above limitations. Choudhury does not appear to disclose wherein the computation engine is implemented on a field-programmable gate array. However, Choudhury does teach a field-programmable gate array (field-programmable gate arrays (FPGA) [0047]). 
Choudhury and Woo suggest a system where software determines optimal batch sizes for the layers of a neural network, and where that neural network is implemented by a computing device. This differs from the claimed invention by the substitution of Choudhury’s generic computing device with a FPGA device. However, Choudhury separately demonstrates that the prior art already knew of FPGAs. One of ordinary skill in the art could have trivially substituted Choudhury’s FPGA into Choudhury and Woo’s system as the device for implementing the neural network. One of ordinary skill in the art would have recognized that such a substitution would have predictably resulted in a system which would optimize batch sizes for a neural network operated on a FPGA. As such, the claimed invention would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention in view of the disclosures of Choudhury and the teachings of Woo. 

Regarding Claim 11: Choudhury in view of Woo teaches the above limitations.  Additionally, Choudhury discloses wherein the processor is configured to perform one or more iterations of selecting batch sizes of the layers of the ANN to optimize a performance measure, the performance measure being a function of one or more of: a batch size of ANN, a latency of the ANN, and a throughput of the ANN (Step 506 includes running an optimizer to maximize throughput while maintaining latency, memory, and/or energy constraints. Step 508 includes outputting/returning an optimal batch size to be used for each layer in the inference. See at least [0030]). 

Regarding Claim 12: Choudhury in view of Woo teaches the above limitations. Additionally, Choudhury discloses wherein the processor is configured to perform the one or more iterations until a number of iterations exceeds a predetermined threshold or the performance measure exceeds a pre-determined threshold (Step 506 includes running an optimizer to maximize throughput while maintaining latency, memory, and/or energy constraints. Step 508 includes outputting/returning an optimal batch size to be used for each layer in the inference. See at least [0030]. Examiner’s note: The claim includes a single iteration of optimization). 

Regarding Claim 13: Choudhury in view of Woo teaches the above limitations. Additionally, Choudhury discloses wherein the performance measure is set based on a user input (The required optimal solution for inferencing b samples with available memory mem can ultimately be obtained from the entry OPT[1, n, b, TOT]. The optimal choice at each step can be tracked using auxiliary data structures aux1, aux2, aux3 in order to determine the batch sizes employed by different layers corresponding to the optimal solution. See at least [0026]. Also: [0027] Such an embodiment as described above can be extended to ensure that the latency of inferencing does not exceed some given requirement. This can be achieved by modifying Equation (1) so that whenever OPTExact[⋅,⋅,⋅,⋅] exceeds the required latency threshold, the value is set to infinity. Similarly, such an embodiment can also be extended to cater to optimizing battery/energy consumption. This can be done by filling the table entries in the base case with battery/energy consumption values instead of time values. See at least [0027]. Also:  FIG. 4 depicts input 402 and input 404, wherein input 402 includes a feed forward model and input 404 includes resource constraints for the given system (such as, for example, available memory, permissible latency, etc.). Inputs 402 and 404 are provided to pre-processing component 406 and optimal batch size sequence determination component 408. See at least [0028]). 

Regarding Claim 15: Choudhury discloses a method for optimizing artificial neural network (ANN) computations based on automatic determination of a batch size for an ANN, the method comprising: 
receiving, by a processor, an ANN structure associated with the ANN (Step 502 includes obtaining a feed forward model and resource constraints for the system.  See at least [0030]. Also: the methods described herein can include an additional step of providing a system comprising distinct software modules See at least [0052])); and 
generating, based on the ANN structure, a configuration for a computation engine capable of performing a computation of the layers of the ANN, the configuration including information concerning a batch size of one or more layers of the ANN and a sequence of performing computations of one or more layers of the ANN by the computation engine, the batch size and the sequence being based on optimized performance of computations of a layer of the ANN by the computation engine (Step 506 includes running an optimizer to maximize throughput while maintaining latency, memory, and/or energy constraints. Step 508 includes outputting/returning an optimal batch size to be used for each layer in the inference. See at least [0030]. Also: determining one or more optimal batch size sequences to be used by the different layers of the model for inferencing, wherein the one or more batch size sequences increase throughput and/or reduce energy or power consumption. See at least [0014]. Also: as also depicted in FIG. 4, the optimal batch size sequence determination component 408, using inputs 402 and 404, as well as the statistics determined by the pre-processing component 406, determines one or more optimal batch size sequences for the layers of the feed forward network, as shown in algorithm 300. In making such determinations, component 408 attempts to maximize throughput, minimize energy consumption, maintain one or more latency parameters, and/or maintain one or more memory requirements, as detailed above. Further, component 408 outputs a batch size sequence 410 across multiple layers of the feed forward network. See at least [0029]).
determining, by the processor and based on the batch size of the one or more layers of the ANN, the batch size for the ANN (component 408 outputs a batch size sequence 410 across multiple layers of the feed forward network. See at least [0029] and Fig. 4, Element 410. Also: Equation (1) can be derived as follows. Suppose in the optimal solution for OPTExact[i, j, b, mem], layer L.sub.k (i≤k≤j) is computed with batch size b. As such, the total time per sample to compute layers L.sub.i to L.sub.j in this scenario can be expressed as the sum of three quantities: (i) the optimal time per sample to compute layers L.sub.i to L.sub.k−1 using batch size at most b with memory mem, (ii) the optimal time per sample to compute layer L.sub.k with batch size b and memory mem (this is finite only if <k, b, mem> is feasible), and (iii) the optimal time per sample to compute layers L.sub.k+1 to L.sub.j using batch size at most b and memory mem. As the layer L.sub.k can be unknown, every layer between L.sub.i and L.sub.j can be considered, and the layer L.sub.k that provides the best solution can be selected. See at least [0023]).
wherein the computation engine comprises: one or more processing units being capable of performing operations associated with the one or more layers of the ANN; and a controller configured to: receive a batch of input datasets; and configure, based on the configuration, the one or more processing units to perform computations of the one or more layers of the ANN for the input datasets (inferencing can be carried out either on the cloud or the edge device itself. Inferencing, as used herein, refers to the stage wherein a trained network predicts and/or classifies input test samples. See at least [0002]. Also: Step 508 includes outputting/returning an optimal batch size to be used for each layer in the inference. See at least [0030]).

Choudhury does not appear to disclose wherein the sequence includes a subpart, the subpart including at least two connected layers of the ANN and being repeated in the sequence at least two times. 
	Woo discloses generating a configuration for the computation engine, the configuration including a sequence of performing computations of one or more layers of the ANN by the computation engine, wherein the sequence includes a subpart, the subpart including at least one connected layers of the ANN and being repeated in the sequence at least two times (At block 404, circuit 100 determines a partitioning of the neural network layers into a sequence of superlayers. For example, circuit 100 can include, or have access to, compiler logic that is configured to determine one or more partitions of the neural network layers into sequences of superlayers. Alternatively, or in addition to the compiler logic, circuit 100 can include, or have access to, at least one hardware block configured to determine one or more partitions of the neural network layers into sequences of superlayers. In some implementations, each superlayer in the sequence of super layers is a partition of the directed graph that includes one or more layers. See at least [0077]. Also: Graph 500 further includes a sequence of superlayers along an X-axis of the graph. For example, graph 500 includes: i) a first superlayer 512 for processing batch elements 0, 1, 2, and 3 through each of layers A, B, C; and ii) a second superlayer 514 for processing batch elements 0, 1, 2, and 3 through each of layers D, E. According to the described teachings, a sequence of superlayers defined based on an improved neural network scheduling policy can support a relatively high working set batch size without exceeding on-chip memory capacity, or threshold capacity, of a hardware circuit that executes a neural network. See at least [0086] and Fig. 5).
	Choudhury provides a system which receives a neural network structure and determines a set and sequence of optimal batch sizes, which differs from the claimed invention by the substitution of Choudhury’s generic batch sequences, for a batch sequence including repeated subparts of multiple layers. However, Woo demonstrates that the prior art already knew of generating a configuration for a neural network that defines a sequence for neural network layer processing that includes repeated subparts of multiple layers. One of ordinary skill in the art could have easily substituted Woo’s techniques for determining a sequence of layers into the system of Choudhury. Further, one of ordinary skill in the art would have recognized that such a substitution would have resulted in improved neural network processing efficiency (Woo. See at least [0086] and [0064]). As such, the identified substitution and the claimed invention would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention in view of the disclosures of Choudhury and the teachings of Woo. 

Regarding Claim 16: Choudhury in view of Woo teaches the above limitations. Additionally, Choudhury discloses wherein the batch size of the one or more layers is determined based on one or more of: a bandwidth required to read data related to the one or more layers; a number of parameters associated with the one or more layers; and a time the one or more layers processes one input dataset from the batch (the pre-processing component 406 determines, for each layer of the feed forward network 402, a set of statistics related to resource utilization. Such statistics can include, for example, working memory, input and output activation size for every batch size, time and/or energy to compute the layer for every batch size, etc. See at least [0028]. Also: the optimal batch size sequence determination component 408, using inputs 402 and 404, as well as the statistics determined by the pre-processing component 406, determines one or more optimal batch size sequences for the layers of the feed forward network, as shown in algorithm 300. See at least [0029]). 

Regarding Claim 17: Choudhury in view of Woo teaches the above limitations. Additionally, Choudhury discloses wherein the batch size of the one or more layers differs from the batch size of the ANN (component 408 outputs a batch size sequence 410 across multiple layers of the feed forward network. See at least [0029] and Fig. 4, Element 410. Also: Equation (1) can be derived as follows. Suppose in the optimal solution for OPTExact[i, j, b, mem], layer L.sub.k (i≤k≤j) is computed with batch size b. As such, the total time per sample to compute layers L.sub.i to L.sub.j in this scenario can be expressed as the sum of three quantities: (i) the optimal time per sample to compute layers L.sub.i to L.sub.k−1 using batch size at most b with memory mem, (ii) the optimal time per sample to compute layer L.sub.k with batch size b and memory mem (this is finite only if <k, b, mem> is feasible), and (iii) the optimal time per sample to compute layers L.sub.k+1 to L.sub.j using batch size at most b and memory mem. As the layer L.sub.k can be unknown, every layer between L.sub.i and L.sub.j can be considered, and the layer L.sub.k that provides the best solution can be selected. See at least [0023]).

Regarding Claim 18: Choudhury in view of Woo teaches the above limitations. Additionally, Choudhury discloses wherein the ANN includes at least a first layer and a second layer, and a batch size of the first layer differs from a batch size of the second layer (component 408 outputs a batch size sequence 410 across multiple layers of the feed forward network. See at least [0029] and Fig. 4, Element 410. Also: Equation (1) can be derived as follows. Suppose in the optimal solution for OPTExact[i, j, b, mem], layer L.sub.k (i≤k≤j) is computed with batch size b. As such, the total time per sample to compute layers L.sub.i to L.sub.j in this scenario can be expressed as the sum of three quantities: (i) the optimal time per sample to compute layers L.sub.i to L.sub.k−1 using batch size at most b with memory mem, (ii) the optimal time per sample to compute layer L.sub.k with batch size b and memory mem (this is finite only if <k, b, mem> is feasible), and (iii) the optimal time per sample to compute layers L.sub.k+1 to L.sub.j using batch size at most b and memory mem. As the layer L.sub.k can be unknown, every layer between L.sub.i and L.sub.j can be considered, and the layer L.sub.k that provides the best solution can be selected. See at least [0023]).

Regarding Claim 19: Choudhury in view of Woo teaches the above limitations.  Additionally, Choudhury discloses performing, by the processor, one or more iterations of selecting batch sizes of the layers of the ANN to optimize a performance measure, the performance measure being a function of one or more of: a latency of the ANN, a throughput of the ANN, and a user-specified batch size of ANN (multiple batch sizes to be used for inferencing the multiple layers of the one or more deep neural networks; and outputting, to at least one user, the determined batch sizes to be used for inferencing the multiple layers of the one or more deep neural networks. See at least [0003]. Also:  determining one or more optimal batch size sequences to be used by the different layers of the model for inferencing, wherein the one or more batch size sequences increase throughput and/or reduce energy or power consumption. See at least [0014]. Also: Step 508 includes outputting/returning an optimal batch size to be used for each layer in the inference. See at least [0030]).

Regarding Claim 20: Choudhury discloses a system for optimizing artificial neural network (ANN) computations based on automatic determination of a batch size for an ANN, the system comprising: 
a computation engine comprising a controller and one or more processing units being capable of performing operations associated with the one or more layers of the ANN (inferencing can be carried out either on the cloud or the edge device itself. Inferencing, as used herein, refers to the stage wherein a trained network predicts and/or classifies input test samples. See at least [0002]); and
a processor and a memory storing an optimization module comprising processor-executable codes (See at least [0035]), wherein the processor is configured to implement the following operations upon executing the processor-executable codes: 
receiving an ANN structure associated with the ANN (Step 502 includes obtaining a feed forward model and resource constraints for the system.  See at least [0030]. Also: the methods described herein can include an additional step of providing a system comprising distinct software modules See at least [0052]); 
generating, based on the ANN structure, a configuration for the computation engine, the configuration including information concerning a batch size of at least one layer of the layers of the ANN and a sequence of performing computations of one or more layers of the ANN by the computation engine, the batch size and the sequence being based on optimized performance of computations of a layer of the ANN by the computation engine, wherein the batch size of the at least one layer is determined based one or more of: a bandwidth required to read data related to the at least one layer, a number of parameters associated with the at least one layer, and a time the at least one layer processes one input dataset from the batch (Step 506 includes running an optimizer to maximize throughput while maintaining latency, memory, and/or energy constraints. Step 508 includes outputting/returning an optimal batch size to be used for each layer in the inference. See at least [0030]. Also: the pre-processing component 406 determines, for each layer of the feed forward network 402, a set of statistics related to resource utilization. Such statistics can include, for example, working memory, input and output activation size for every batch size, time and/or energy to compute the layer for every batch size, etc. See at least [0028]. Also: the optimal batch size sequence determination component 408, using inputs 402 and 404, as well as the statistics determined by the pre-processing component 406, determines one or more optimal batch size sequences for the layers of the feed forward network, as shown in algorithm 300. See at least [0029]. Also: determining one or more optimal batch size sequences to be used by the different layers of the model for inferencing, wherein the one or more batch size sequences increase throughput and/or reduce energy or power consumption. See at least [0014]. Also: as also depicted in FIG. 4, the optimal batch size sequence determination component 408, using inputs 402 and 404, as well as the statistics determined by the pre-processing component 406, determines one or more optimal batch size sequences for the layers of the feed forward network, as shown in algorithm 300. In making such determinations, component 408 attempts to maximize throughput, minimize energy consumption, maintain one or more latency parameters, and/or maintain one or more memory requirements, as detailed above. Further, component 408 outputs a batch size sequence 410 across multiple layers of the feed forward network. See at least [0029]).
 determining the batch size for the ANN based on the batch size of the one or more layers of the ANN (component 408 outputs a batch size sequence 410 across multiple layers of the feed forward network. See at least [0029] and Fig. 4, Element 410. Also: Equation (1) can be derived as follows. Suppose in the optimal solution for OPTExact[i, j, b, mem], layer L.sub.k (i≤k≤j) is computed with batch size b. As such, the total time per sample to compute layers L.sub.i to L.sub.j in this scenario can be expressed as the sum of three quantities: (i) the optimal time per sample to compute layers L.sub.i to L.sub.k−1 using batch size at most b with memory mem, (ii) the optimal time per sample to compute layer L.sub.k with batch size b and memory mem (this is finite only if <k, b, mem> is feasible), and (iii) the optimal time per sample to compute layers L.sub.k+1 to L.sub.j using batch size at most b and memory mem. As the layer L.sub.k can be unknown, every layer between L.sub.i and L.sub.j can be considered, and the layer L.sub.k that provides the best solution can be selected. See at least [0023]) 
wherein the controller is configured to: receive a batch of input datasets; and configure, based on the configuration, the one or more processing units to perform computations of the one or more layers of the ANN for the input datasets (inferencing can be carried out either on the cloud or the edge device itself. Inferencing, as used herein, refers to the stage wherein a trained network predicts and/or classifies input test samples. See at least [0002]. Also: Step 508 includes outputting/returning an optimal batch size to be used for each layer in the inference. See at least [0030]).

Choudhury does not appear to disclose wherein the sequence includes a subpart, the subpart including at least two connected layers of the ANN and being repeated in the sequence at least two times. 
	Woo discloses generating a configuration for the computation engine, the configuration including a sequence of performing computations of one or more layers of the ANN by the computation engine, wherein the sequence includes a subpart, the subpart including at least one connected layers of the ANN and being repeated in the sequence at least two times (At block 404, circuit 100 determines a partitioning of the neural network layers into a sequence of superlayers. For example, circuit 100 can include, or have access to, compiler logic that is configured to determine one or more partitions of the neural network layers into sequences of superlayers. Alternatively, or in addition to the compiler logic, circuit 100 can include, or have access to, at least one hardware block configured to determine one or more partitions of the neural network layers into sequences of superlayers. In some implementations, each superlayer in the sequence of super layers is a partition of the directed graph that includes one or more layers. See at least [0077]. Also: Graph 500 further includes a sequence of superlayers along an X-axis of the graph. For example, graph 500 includes: i) a first superlayer 512 for processing batch elements 0, 1, 2, and 3 through each of layers A, B, C; and ii) a second superlayer 514 for processing batch elements 0, 1, 2, and 3 through each of layers D, E. According to the described teachings, a sequence of superlayers defined based on an improved neural network scheduling policy can support a relatively high working set batch size without exceeding on-chip memory capacity, or threshold capacity, of a hardware circuit that executes a neural network. See at least [0086] and Fig. 5).
	Choudhury provides a system which receives a neural network structure and determines a set and sequence of optimal batch sizes, which differs from the claimed invention by the substitution of Choudhury’s generic batch sequences, for a batch sequence including repeated subparts of multiple layers. However, Woo demonstrates that the prior art already knew of generating a configuration for a neural network that defines a sequence for neural network layer processing that includes repeated subparts of multiple layers. One of ordinary skill in the art could have easily substituted Woo’s techniques for determining a sequence of layers into the system of Choudhury. Further, one of ordinary skill in the art would have recognized that such a substitution would have resulted in improved neural network processing efficiency (Woo. See at least [0086] and [0064]). As such, the identified substitution and the claimed invention would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention in view of the disclosures of Choudhury and the teachings of Woo. 

Claim 14 is rejected under 35 U.S.C. 103 as being unpatentable over Choudhury et al. (US 2020/0125926 A1) in view of Woo (US 2018/0373976 A1), and further in view of Prakash et al. (US 2019/0087721 A1). 

Regarding Claim 14: Choudhury in view of Woo teaches the above limitations. Choudhury does not appear to disclose wherein the processor is configured to select the batch sizes of some of the layers of the ANN based on a heuristic algorithm. However, Prakash teaches select the batch sizes of some of the layers of the ANN based on a heuristic algorithm (the batch size of the model may be set based on heuristics. See at least [0036]). 
	Choudhury and Woo suggest a system which determines the batch sizes of layers of a neural network based, which differs from the claimed invention which uses a heuristic for at least some of the determinations. However, Prakash demonstrates that the prior art already knew of using heuristics to determine batch size. One of ordinary skill in the art could have trivially substituted Prakash’s heuristic into Choudhury’s techniques for determining batch size. Further, one of ordinary skill in the art would have recognized that such a substitution would have predictably resulted in a system which would more quickly determine batch size. As such, the claimed invention would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention in view of the disclosures of Choudhury and the teachings of Woo and Prakash. 

Response to Arguments
Applicant’s Argument Regarding 112(a) Rejections of claims 1-20: Applicant respectfully submits that one skilled in the art should be able to implement a function that returns a batch size based at least one a time of execution of the layer or time for fetching layer-related data. One should also be able to implement a function for generating, based on ANN structure, possible sequences of performing computations of one or more layers and selecting one of the possible sequences. Thus, one skilled in the art would have sufficient information on how to implement the claimed invention. 
Examiner’s Response: Applicant's arguments filed 3 June 2022 have been fully considered but they are not persuasive. Applicant’s arguments (e.g., “One should also be able to implement…”), appear to be addressed to an enablement rejection rather than written description rejection at hand. As such, the argument is unpersuasive and the rejection is maintained. 

Applicant’s Argument Regarding 101 Rejections of claims 1-20: The recited hardware and steps meaningfully limit the use of the abstract idea into a practical application of receiving an ANN description/structure by a computing system (a processor and a memory storing an optimization module) to determine a configuration of a batches of layers of ANN and sequences of executing the layers of ANN. 
Examiner’s Response: Applicant's arguments filed 3 June 2022 have been fully considered but they are not persuasive. The claims do not recite any specific hardware and the steps outside of identified abstract idea amount to implementing the solution of the abstract idea. Neither of these are indicative of a practical application, and as such applicant’s argument is unpersuasive. 

Applicant’s Argument Regarding 103 Rejections of claims 1-20: Applicant respectfully submits that batch sizes of layers are selected, in block 740, based at least on time of execution of the layer by a computational engine. Accordingly, the performance measure depends on the time of execution of the layer by a computational engine. Thus, the performance is determined based on a computation of a layer of the ANN. Paragraph [0067] recites “The blocks 740, 750, and 760 can be iterated to find a configuration of batch sizes of layers of the ANN corresponding to an optimal performance measure.” Thus, the batch sizes of the layers are based on optimized performance of computations of a layer of the ANN by the computation engine. Choudhury does not execute layers of neural networks when solving the system of equations to find the batch sizes. 
Examiner’s Response: Applicant's arguments filed 3 June 2022 have been fully considered but they are not persuasive. Examiner initially notes that the claim language at issue is “generating, based on the ANN structure, a configuration for the computation engine, the configuration including information concerning a batch size of the one or more layers of the ANN and a sequence of performing computations of one of more layers of the ANN by the computation engine … the batch size and the sequence being based on optimized performance of computations of a layer of the ANN by the computation engine”. 
The key issue appears to be whether “the batch size and the sequence being based on an optimized performance of computations of a layer of the ANN by the computation engine” requires performing the computations computation of the layer to determine the batch size and sequence. The OED reports the verb “base” to mean “to place on (also upon) a foundation, fundamental principle, or underlying basis.” So the plain and ordinary meaning of the limitation refers to where an “optimized performance of computations of a layer by the computation engine” is used as an underlying basis for the batch size and sequence. Further, a process being based on “optimized performance of computations of a layer by the computation engine” does not appear to require the actual execution of such computations, if. Thus examiner initially understands this to encompass both (1) where the results of implementing an optimization of the performance of computations of a layer is a basis of the batch size and sequence or (2) where the intent to optimize the performance of computations of a layer is a basis of the batch size and sequence. 
The specification does not appear to exclude either of the above interpretations. For example, a batch size selected in order to minimize execution time would reasonably be “based at least on time of execution of the layer by a computational engine”, which is consistent with the pre-grant publication at [0065] (“The selection of a batch size for a layer can be based on the number of parameters of the layer, time of execution of the layer, and bandwidth required to fetch and store the layer-related data and parameters.”). Examiner further notes that alternative elements of the disclosure appear to contradict the first interpretation. For example, the pre-grant publication at [0066] states “The ANN batch size, latency of the ANN, and throughput of the ANN can be estimated based on the batch sizes of layers determined in block 740.” One of ordinary skill in the art would not understand why it would be necessary to estimate the latency and throughput if the batch size for layers had been experimentally determine by actually performing the calculations. Further, the pre-grant publication at [0065] states “In some embodiments, the selection of batch sizes of the layers of ANN can be based on a heuristic algorithm.” One of ordinary skill in the art would not understand why a heuristic algorithm would be used if the batch size for layers had been experimentally determine by actually performing the calculations and seeing what worked best. That these disclosures appear to contradict an experimentally determined batch size indicates to one of ordinary skill in the art that a model based batch size determination is a reasonable interpretation. Examiner notes that the first interpretation cannot be excluded, because the apparently contradictory disclosures are disclosed as optional or in the alterative. Thus, the broadest reasonable interpretation is determined to encompass both (1) where the results of implementing an optimization of the performance of computations of a layer is a basis of the batch size and sequence or (2) where the intent to optimize the performance of computations of a layer is a basis of the batch size and sequence. 
Under the second interpretation, the failure of Choudhury to actually execute layers of the neural network in determining a batch size and sequence to optimize the performance of computations of the layer does not actually differentiate Choudhury from the claimed invention. Thus applicant’s argument is unpersuasive. 

Additional Considerations
The prior art made of record and not relied upon that is considered pertinent to applicant’s disclosure can be found in the PTO-892 Notice of References Cited. 
NIST (Engineering Statistics Handbook: 5.5.3. How do you optimize a process?) notes that “[t]he optimal region to run a process is usually determined after a sequence of experiments has been conducted and a series of empirical models obtained. In many engineering and science applications, experiments are conducted and empirical models are developed with the objective of improving the responses of interest.”
Vooturi et al. (Efficient Inferencing of Compressed Deep Neural Networks) performed various computation experiments comparing batch sizes of 16 and batch sizes of 256, and used the result to make conclusions regarding what batch sizes would be more efficient for the particular system. 
Suzuki et al. (US 2019/0228298 A1) describes evaluating mini-batch sizes to determine the best mini-batch size for a neural network. 

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Bion A Shelden whose telephone number is (571)270-0515. The examiner can normally be reached M-F, 12pm-10pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Hajime S Rojas can be reached on (571)270-5491. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/Bion A Shelden/Examiner, Art Unit 3681                                                                                                                                                                                                        2022-06-27