DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 01/12/2022 has been entered.

Amendments
Claims 1-20 are pending and have been examined. Claims 1, 8, 14, 17, and 19-20 have been amended.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claim(s) 1-2 and 7-19 are rejected under 35 U.S.C. 103 as being unpatentable over Young (US 20160342890 A1, cited in IDS filed 03/20/2019) in view of Woo et al. (US 10019668 B1, cited in PTO-892 filed 04/27/2021).

	Regarding CLAIM 1, Young teaches: A computer-implemented method, the method comprising: 
obtaining, as input for inferencing of one or more deep neural networks, (i) an inferencing model and (¶ [0017]-[0018] teaches a neural network model for inferencing, and ¶ [0021] teaches receiving weight inputs for a given layer of a neural network. Receiving weight inputs amounts to obtaining an inferencing model.)
… each of multiple permissible batch sizes, wherein the multiple permissible batch sizes … ; (¶ [0061] - [0064] and the corresponding Fig. 6 teach multiple permissible batch sizes of 1, 2, 8, and 32.)
determining, for each respective one of the multiple layers of the one or more deep neural networks, a corresponding one of the multiple permissible batch sizes, wherein the determining is based at least in part on (i) the obtained input and… , and wherein the batch size determined for a first one of the multiple layers is different than the batch size determined for a second one of the multiple layers; and (¶ [0061] - [0064] and the corresponding Fig. 6 teach determining a batch size for each layer from the set of multiple permissible batch sizes. Layer 1, Layer 2, Layer 3, and Layer 6 have different batch sizes from one another. Layers 4 and 5 have the same batch size as Layer 3.)
using the determined batch sizes for inferencing the multiple layers of the one or more deep neural networks;  (Last sentence of ¶ [0064] and all of ¶ [0066])
wherein the method is carried out by at least one computing device. (¶ [0030] teaches a systolic array and ¶ [0059] teaches CPU and GPU)
	Although Young teaches resource constraints of processing speed and memory access speed in ¶ [0047] and a clock rate of the memory storing the weight inputs, a number of arithmetic units inside the circuit, and a number of channels in memory in ¶ [0049], Young does not explicitly teach that these resource constraints are used to compute input and output activation sizes for each of multiple permissible batch sizes. Therefore, Young does not explicitly teach: obtaining (ii) one or more resource constraints;
computing, based at least in part on the obtained input, a set of statistics pertaining to resource utilization for each of multiple layers in the one or more deep neural networks, wherein the set of statistics comprises (i) input and output activation sizes for each of multiple permissible batch sizes, wherein the multiple permissible batch sizes are based at least in part on the one or more resource constraints;
	determining… a corresponding… batch sizes, wherein the determining is based at least in part on (ii) the computed set of statistics
	But Woo teaches: obtaining (ii) one or more resource constraints; (Obtaining resource constraints as inputs for inferencing is taught by a circuit 100 determining total capacity of memory of a hardware circuit — C. 14, L. 53-59: “In some implementations, determining a partitioning of neural network layers into a sequence of superlayers includes: … ii) circuit 100 determining a particular aggregate input activation and parameter capacity of a memory of a hardware circuit;” C. 14, L. 65 discloses a total available on-chip memory: “a storage capacity… of on-chip memory may be 500 megabyte (MB).”)
computing, based at least in part on the obtained input, a set of statistics pertaining to resource utilization for each of multiple layers in the one or more deep neural networks, wherein the set of statistics comprises (i) input and output activation sizes for each of multiple permissible batch sizes, wherein the multiple permissible batch sizes are based at least in part on the one or more resource constraints; (C. 8, L. 10-14, 17-20 teach that a “working set” includes input and output activation sizes based on an amount of memory. C. 14, L. 65 to C. 15, L. 13 teaches computing that there is 200 MB of memory available for storing inputs out of 500 MB of on-chip memory.)
	determining… a corresponding… batch sizes, wherein the determining is based at least in part on (ii) the computed set of statistics (C. 9, L. 12-22 teaches processing two batches 212, 214 simultaneously. C. 9, L. 51-C. 10, L. 14 teaches that processing both batches simultaneously exceeds available memory resources. C. 12, L. 20-31 teaches scheduling a single batch based on available memory resources. Evidence that a single batch is processed is found at C. 17, L. 20.)
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated Woo’s available memory constraint into Young’s list of resource constraints in ¶ [0049] which are used to determine a weight reuse value, and  it would have been obvious to have computed input and output activation sizes in Young’s networks based on this weight reuse value. A motivation for the combination is that improving the scheduling of inputs and outputs in a neural network based on resource constraints maximizes efficient use of available on-chip resources, reduces external communications, and leads to an increase in available system bandwidth and an overall decrease in energy consumption by system components. (Woo, C. 11, L. 13-20)

Regarding CLAIM 2, the combination of Young and Woo teaches: The computer-implemented method of claim 1, 
Young teaches: wherein the inferencing model comprises a feed forward model. (¶ [0017] to ¶ [0019], line 4; ¶ [0066] and the corresponding Fig. 6)

Regarding CLAIM 7, Young teaches: The computer-implemented method of claim 1, 
Although ¶ [0048], last 2 lines teaches the circuit minimizes a latency being a memory access wait time, and ¶ [0049] teaches the weight reuse value can be based on a clock rate of the memory, Young does not explicitly teach: wherein the one or more resource constraints comprises at least one of (i) total available memory, (ii) maximum latency for inferencing, and (iii) maximum energy for inferencing.
But Woo teaches: wherein the one or more resource constraints comprises at least one of (i) total available memory, (ii) maximum latency for inferencing, and (iii) maximum energy for inferencing. (C. 14 L. 65-66 teaches (i) total available memory of may be 500 megabyte (MB).)

Regarding CLAIM 8, the combination of Young and Woo teaches: The computer-implemented method of claim 1, 
However, Young does not explicitly teach: wherein the set of statistics comprises at least one of (i) amount of working memory, (ii) time to process a layer for each of the multiple permissible batch sizes, and (iii) energy to process a layer for each of the multiple permissible batch sizes.
But Woo teaches: wherein the set of statistics comprises at least one of (i) amount of working memory, (ii) time to process a layer for each of the multiple permissible batch sizes, and (iii) energy to process a layer for each of the multiple permissible batch sizes. (Limitation (i): C. 14, L. 65 to C. 15, L. 13 teaches computing that there is 200 MB of memory available for storing inputs. This is an “amount of working memory” as claimed.)

	Regarding CLAIM 9, the combination of Young and Woo teaches: The computer-implemented method of claim 1, 
Young teaches: wherein said determining comprises determining a sequence of variable batch sizes corresponding to the multiple layers of the one or more deep neural networks. (¶ [0061] - [0064] and the corresponding Fig. 6 teach determining a batch size for each layer from the set of multiple permissible batch sizes. Layer 1, Layer 2, Layer 3, and Layer 6 have different batch sizes from one another. Layers 4 and 5 have the same batch size as Layer 3.)

Regarding CLAIM 10, the combination of Young and Woo teaches: The computer-implemented method of claim 1, 
Young teaches: wherein said determining increases one or more throughput values associated with the inferencing of the one or more deep neural networks. (¶ [0008], lines 7-12 and ¶ [0046], lines 1-6)

	Regarding CLAIM 11, the combination of Young and Woo teaches: The computer-implemented method of claim 1, 
Young teaches: wherein said determining decreases one or more energy values associated with the inferencing of the one or more deep neural networks. (¶ [0008], fourth-to-last line teaches the circuit avoids stalling of the circuit, which decreases energy consumption.)

Regarding CLAIM 12, the combination of Young and Woo teaches: The computer-implemented method of claim 1, 
Young teaches: wherein said determining decreases one or more latency values associated with the inferencing of the one or more deep neural networks. (¶ [0048], last 2 lines teaches the circuit minimizes memory access wait time.)

	Regarding CLAIM 13, the combination of Young and Woo teaches: The computer-implemented method of claim 1, 
Young teaches: wherein said determining decreases one or more memory values associated with the inferencing of the one or more deep neural networks. (¶ [0048], last 2 lines teaches the circuit minimizes memory access wait time. A memory access wait time fits within the broadest reasonable interpretation of a “memory value associated with the inferencing”.)

	Claims 14-18 recite the same features as claims 1-2 and 7-9 respectively. Claims 14-18 also recite the limitation: “A computer program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions executable by a computing device to cause the computing device to”. Young teaches this limitation on p. 8 in claim 19, lines 1-4.
Claims 14-18 are rejected for the reasons set forth in the rejections of claims 1-2 and 7-9 respectively. 

	Claim 19 recites the same features as claim 1. Claim 19 also recites the limitation: “A system comprising: a memory; and at least one processor operably coupled to the memory and configured for”. ¶ [0030] teaches a systolic array and ¶ [0059] teaches CPU and GPU, all of which are processors. Memory is taught by p. 8 in claim 19, lines 1-4, where a computer is understood to comprise at least one processor.
	Claim 19 is rejected for the reasons set forth in the rejection of claim 1.

Claims 3-6 are rejected under 35 U.S.C. 103 as being unpatentable over Young (US 20160342890 A1, cited in IDS filed 03/20/2019) in view of Woo et al. (US 10019668 B1, cited in PTO-892 filed 04/27/2021) and Han et al. (“Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding” arXiv version 5, cited in the IDS filed 03/20/2018 as NPL doc. 11)

Regarding CLAIM 3, the combination of Young and Woo teaches: The computer-implemented method of claim 1,
In C. 6, L. 55-57, Woo teaches compressing activation data by storing only non-zero activation values in memory. However, neither Young nor Woo explicitly teaches: wherein the inferencing model comprises a compressed model generated through weight-based pruning.
But Han teaches: wherein the inferencing model comprises a compressed model generated through weight-based pruning. (P. 2, last paragraph, last 6 lines)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have compressed Young/Woo’s neural network by pruning connections with weights below a threshold. A motivation for the combination is to reduce the number of parameters. (Han, p. 2, last sentence)

Regarding CLAIM 4, the combination of Young and Woo teaches: The computer-implemented method of claim 1, 
However, Young does not explicitly teach: wherein the inferencing model comprises a compressed model generated through at least one of (i) quantization and (ii) weight sharing.
But Han teaches: wherein the inferencing model comprises a compressed model generated through at least one of (i) quantization and (ii) weight sharing. (Han teaches simultaneous network quantization and weight sharing on p. 3, § 3, first 2 paragraphs)
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have compressed Young/Woo’s neural network by performing quantization and weight sharing. A motivation for the combination is to reduce the number of bits required to represent each weight. (Han, p. 3, § 3, first sentence)

Regarding CLAIM 5, Young teaches: The computer-implemented method of claim 1, 
However, neither Young nor Woo explicitly teaches: wherein the inferencing model comprises a compressed model generated through relative indexing.
But Han teaches: wherein the inferencing model comprises a compressed model generated through relative indexing. (P. 3, second paragraph, and Fig. 2 and its caption)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have compressed Young/Woo’s neural network through relative indexing. A motivation for the combination is to achieve further compression. (Han, p. 3, second paragraph)

Regarding CLAIM 6, Young teaches: The computer-implemented method of claim 1, 
However, neither Young nor Woo explicitly teaches: wherein the inferencing model comprises a compressed model generated through encoding.
	But Han teaches: wherein the inferencing model comprises a compressed model generated through encoding. (P. 5, §4 teaches Huffman code)
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have compressed Young/Woo’s neural network through Huffman coding. A motivation for the combination is to save on network storage. (Han, p. 5, §4, last sentence)

Claim 20 is rejected under 35 U.S.C. 103 as being unpatentable over  Young (US 20160342890 A1, cited in IDS filed 03/20/2019) in view of Woo (US 10019668 B1, cited in PTO-892 filed 04/27/2021), Canziani et al. (“An Analysis of Deep Neural Network Models for Practical Applications”, cited in PTO-892 filed 04/27/2021), and Yang et al. (“A Method to Estimate the Energy Consumption of Deep Neural Networks”).

Regarding CLAIM 20, Young teaches: A computer-implemented method, the method comprising: 
obtaining, as input for inferencing of one or more deep neural networks, (i) an inferencing model, wherein the inferencing model comprises a feed forward model, and (ii) constraints (Limitation (i): ¶ [0017]-[0019], line 4; ¶ [0066] and the corresponding Fig. 6 teaches a feed forward neural network model for inferencing. ¶ [0021] teaches receiving weight inputs for a given layer of a neural network. Receiving weight inputs amounts to obtaining an inferencing model. Limitation (ii): ¶ [0047] teaches processing speed and memory access speed; ¶ [0049] teaches a clock rate of the memory storing the weight inputs, a number of arithmetic units inside the circuit, and a number of channels in memory.)
computing, based at least in part on the obtained input, a set of statistics pertaining to resource utilization for each of multiple layers in the one or more deep neural networks, wherein the set of statistics comprises…  (iii) time to process a layer for each of the multiple batch sizes (A weight reuse value is computed based on the constraints in ¶ [0049], and Limitation (iii) is taught by the weight reuse value, where time is measured in the number of clock cycles (¶ [0035]). It is evident that a weight reuse value is measured by a number of clock cycles. ¶ [0061] - [0064] and the corresponding Fig. 6 teach a 6-layer network which has multiple permissible batch sizes of 1, 2, 8, and 32.)
determining, for each respective one of the multiple layers of the one or more deep neural networks, a corresponding one of the multiple batch sizes, wherein the determining is based at least in part on (i) the obtained input and (ii) the computed set of statistics, and wherein the batch size determined for a first one of the multiple layers is different than the batch size determined for a second one of the multiple layers; and (¶ [0061] - [0064] and the corresponding Fig. 6 teach determining a batch size for each layer from the set of multiple permissible batch sizes based on the weight reuse value. The weight reuse value incorporates a clock rate. Layer 1, Layer 2, Layer 3, and Layer 6 have different batch sizes from one another. Layers 4 and 5 have the same batch size as Layer 3.)
using the determined batch sizes for inferencing the multiple layers of the one or more deep neural networks; (Last sentence of ¶ [0064] and all of ¶ [0066])
wherein the method is carried out by at least one computing device. (¶ [0030] teaches a systolic array and ¶ [0059] teaches CPU and GPU)
	Young teaches each of multiple batch sizes. However, Young does not explicitly teach: obtaining (ii) constraints comprising (a) total available memory, (b) maximum latency for inferencing, and (c) maximum energy for inferencing;
computing a set of statistics, wherein the set of statistics comprises (i) amount of working memory, (ii) input activation size and output activation size for each of multiple batch sizes,…  and (iv) energy to process a layer for each of the multiple batch sizes;
	However, Woo teaches: obtaining (ii) constraints comprising (a) total available memory, (Obtaining resource constraints as inputs for inferencing is taught by a circuit 100 determining total capacity of memory of a hardware circuit — C. 14, L. 53-59: “In some implementations, determining a partitioning of neural network layers into a sequence of superlayers includes: … ii) circuit 100 determining a particular aggregate input activation and parameter capacity of a memory of a hardware circuit;” C. 14, L. 65 discloses a total available on-chip memory: “a storage capacity… of on-chip memory may be 500 megabyte (MB).”)
computing a set of statistics, wherein the set of statistics comprises (i) amount of working memory, (ii) input activation size and output activation size (Limitation (i): C. 14, L. 66 starting at “Circuit 100” to C. 15, L. 13 teaches computing 200 MB of available memory. Limitation (ii): C. 8, L. 10-14, 17-20 teach that a “working set” includes input and output activation sizes based on an amount of memory. C. 14, L. 65 to C. 15, L. 13 teaches computing that there is 200 MB of memory available for storing inputs out of 500 MB of on-chip memory.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated Woo’s available memory constraint into Young’s list of resource constraints in ¶ [0049] which are used to determine a weight reuse value, and  it would have been obvious to have computed input and output activation sizes in Young’s networks based on this weight reuse value. A motivation for the combination is that improving the scheduling of inputs and outputs in a neural network based on resource constraints maximizes efficient use of available on-chip resources, reduces external communications, and leads to an increase in available system bandwidth and an overall decrease in energy consumption by system components. (Woo, C. 11, L. 13-20)
However, neither Young nor Woo explicitly teaches: obtaining constraints comprising (b) maximum latency for inferencing, and (c) maximum energy for inferencing;
computing a set of statistics, wherein the set of statistics comprises (iv) energy to process a layer for each of the multiple batch sizes;
	But Canziani teaches: obtaining constraints comprising (b) maximum latency for inferencing, and (c) maximum energy for inferencing; (Constraint (b) is taught by p. 4, § 3.5. Constraint (c) is taught by p. 6, lines 1-3 starting with “Since the power” and by p. 6, last paragraph: “We show that an energy constraint will set a specific upper bound on the maximum achievable accuracy and model complexity, in terms of operations counts.”)
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated Canziani’s constraint of time for inferencing and statistic of energy into Young/Woo’s system. A motivation for this combination is to further optimize the network. (Canziani, p. 1, § 1, last sentence of the last paragraph)
However, neither Young, Woo, nor Canziani explicitly teaches: computing a set of statistics, wherein the set of statistics comprises (iv) energy to process a layer for each of the multiple batch sizes; 
But Yang teaches: computing a set of statistics, wherein the set of statistics comprises (iv) energy to process a layer for each of the multiple batch sizes; (p. 1917, § III-A, ¶ 1-2; p. 1919, col. 2, subsection B and the corresponding Fig. 6)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have computed energy per layer of the network in Young/Woo/Canziani’s system. Motivations for the combination include implementing mobile AI applications, where energy consumption is critical, by aiding neural network designers in selecting hyperparameters which optimize energy consumption. (Yang, p. 1916, § 1, ¶ 1-2)

Response to Arguments
	The following is a response to the claims and remarks filed 12/29/2021 and the Advisory Action filed 01/12/2022.

Claim Rejections under 35 U.S.C. § 102 and 103 (Remarks pp. 8-10): Applicant’s arguments with respect to claims 1-20 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Chen et al. (US 20180357541 A1) discloses determining mini-batch sizes that fits the estimated memory distribution.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Asher H. Jablon whose telephone number is (571)270-7648. The examiner can normally be reached Monday - Friday, 9:00 am - 6:00 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Abdullah Al Kawsar can be reached on (571)270-3169. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/A.H.J./Examiner, Art Unit 2127                                                                                                                                                                                                        
/ABDULLAH AL KAWSAR/Supervisory Patent Examiner, Art Unit 2127