DETAILED ACTION
This action is written in response to the Applicants remarks and amendments dated 1/27/21. This action is made final. The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
The Applicants argue that the previous art of record does not anticipate or render obvious the claims as currently amended. The Examiner provides updated prior art rejections below necessitated by the current amendments. Additional arguments are also addressed below.
First, “Alwani’s layer fusion process fails to disclose “identifying layers of the neural network that are partitioned into a sequence of a plurality of superlayers, wherein each superlayer in the sequence comprises two or more layers and is a partition of the directed graph,” as recited in amended claim 2”. (Remarks, p. 9.)

The Examiner is not persuaded. Although sec. III of Alwani looks at the functionality of a single fused layer (akin to one of the recited superlayers), sec. V makes it clear that real-world implementations of the techniques described can involve a plurality of fused layers (superlayers). (See e.g. p. 8, second col.: “Given a network with ℓ l layers, there are 2ℓ-1 possible ways to fuse these layers”. Also, same col.: “The AlexNet CNN has five convolutional layers and three pooling layers; there are 128 possible combinations of different ways to fuse layers.”)
Because Alwani teaches the limitations of claim 2 as currently amended, the Examiner maintains the outstanding rejection of this claim under §102.
“Second, Alwani fails to disclose at least the feature of “loading the respective set of parameters for each of the layers in the superlayer into a memory of the hardware circuit,” where this “loading” is “for each superlayer in the sequence,” as recited at amended claimed 2.” (Remarks, p. 9.)

As noted above, the Alwani system does disclose partitioning their neural network into a plurality of fused layers (superlayers). In this arrangement, each of the fused layers operates according 

Double Patenting
Claim 2, 11, and 20 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1, 10, and 19 of U.S. Patent No. 10,019,668 B1. Although the claims at issue are not identical, they are not patentably distinct from each other for the reasons described in the table below.
This application – 16/017052
US 10,019,668 B1
2. A method performed using a neural network implemented on a hardware circuit and having a plurality of layers arranged in a directed graph, the method comprising:

1. A method, comprising:
...
receiving a batch of neural network inputs to be processed using a neural network on a hardware circuit
receiving a batch of neural network inputs to be processed using the layers of the neural network, each layer having a respective set of parameters;

receiving a batch of neural network inputs to be processed using a neural network on a hardware circuit, the neural network having a plurality of layers arranged in a directed graph, each layer having a respective set of parameters;

identifying layers of the neural network that are partitioned into a sequence of a plurality of superlayers, wherein each superlayer in the sequence comprises two or more layers and is a partition of the directed graph;

determining a partitioning of the neural network layers into a sequence of superlayers, each superlayer being a partition of the directed graph that includes one or more layers, and wherein a memory of the hardware circuit has a threshold storage capacity, and determining the partitioning of the neural network layers into a sequence of superlayers, comprises:

partitioning the neural network layers into a sequence of superlayers based on the threshold storage capacity of the memory of the hardware circuit;
processing the batch of neural network inputs using the hardware circuit, comprising, for each superlayer in the sequence:
processing the batch of neural network inputs using the hardware circuit, comprising, for each superlayer in the sequence:
loading the respective set of parameters for each of the layers in the superlayer into a memory of the hardware circuit; and
loading the respective set of parameters for the layers in the superlayer into the memory of the hardware circuit; and
for each neural network input in the batch:

processing a superlayer input corresponding to the neural network input through each of the layers in the superlayer using parameters that were loaded in the memory of the hardware circuit; and

generating a superlayer output for the neural network input in response to processing the superlayer input through each of the layers in the superlayer.
for each neural network input in the batch:

processing a superlayer input corresponding to the neural network input through each of the layers in the superlayer using the parameters in the memory of the hardware circuit to generate a superlayer output for the neural network input.
As illustrated in the table above, each limitation of claim 1 of this application has a substantially identical or broader corresponding limitation in claim 1 of the ‘668 patent. Thus, claim 1 of ‘668 anticipates claim 2 of this application.

This analysis applies equally to claim 11 and 20 of this application, whose limitations are substantially identical to those of claim 2 of this application, as well as to claims 10 and 19 of ‘668. (The claims vary only in form, i.e. method vs. system vs. machine-readable storage device.)



Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale or otherwise available to the public before the effective filing date of the claimed invention.

Claims 2, 3, 5-12, and 14-21 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Alwani (“Fused-Layer CNN Accelerators”, cited in PTO-892 dated 9/25/20).
Regarding claim 2, Smith discloses a method performed using a neural network implemented on a hardware circuit and having a plurality of layers arranged in a directed graph, the method comprising:
P. 1, second col.: “We validate our approach by demonstrating CNN layer fusion on a Xilinx Viertex-7 FPGA.” (hardware circuit 
receiving a batch of neural network inputs to be processed using the layers of the neural network, each layer having a respective set of parameters;
p. 2, second col., paragraph 4, "Figure 2 shows the size (in MB) of the feature maps (input and output) and the filter weights of each of the convolutional layers of the VGGNet-E network [17].l" VGGNet-E is a sequence of layers which is a simple directed graph, having a number of input parameters for each layer as shown in Fig. 2.
identifying layers of the neural network that are partitioned into a sequence of a plurality of superlayers, wherein each superlayer in the sequence comprises two or more layers and is a partition of the directed graph;
P. 3, second col., paragraph 2: "Figure 3 demonstrates the layer fusion process with an example that fuses two convolutional layers together (which we will refer to as Layer 1 and Layer 2). Note that, although the example in the following discussion focuses specifically on fusing two layers, the general form allows for more than two to be merged in an analogous way."
P. 8, second col.: “Given a network with ℓ l layers, there are 2ℓ-1 possible ways to fuse these layers”. “The AlexNet CNN has five convolutional layers and three pooling layers; there are 128 possible combinations of different ways to fuse layers.”
processing the batch of neural network inputs using the hardware circuit, comprising, for each superlayer in the sequence:
loading the respective set of parameters for each of the layers in the superlayer into a memory of the hardware circuit; and
P. 3, second col., paragraph 3, "Layer 1 operates on a tile of its input feature maps, consisting of 5x5xN input values (the black dashed outline labeled "tile I" and extending "down" through all N maps). This means that 5 x 5 x N words are brought from off-chip memory and stored in on-chip buffers." This loads the input parameters for the first layer of the superlayer, and the filter weights are stored on chip (see footnote).
for each neural network input in the batch:
processing a superlayer input corresponding to the neural network input through each of the layers in the superlayer using parameters that were loaded in the memory of the hardware circuit; and
P. 4 column I paragraph 2 and Fig. 3, "Once input tile 1 (black outline) is loaded on chip, we compute the entire pyramid of intermediate values without transferring any additional feature map data to or from off chip memory. When we reach the tip of the pyramid (the end of the fused layers), only the values in the last output feature maps are retained." The input to the first layer is processed to ultimately result in the last layer of the pyramid's output values.
generating a superlayer output for the neural network input in response to processing the superlayer input through each of the layers in the superlayer.
P. 4, second col.: “we can analyze the effect of fusing two or more layers by starting from the output and working backwards to calculate the dimensions of the pyramid at each level (i.e., the values at each level upon which the final outputs depend).”

Claims 11 and 20 each recite limitations which are substantially identical to those in claim 2, and are rejected for the same reason.

Regarding claims 3, 12, and 21, Alwani discloses their further limitations wherein the sequence of superlayers includes a first superlayer comprising a first plurality of layers, and processingFiled: June 25, 2018 the batch of neural network inputs comprises:
for the first superlayer, loading the respective set of parameters for each layer of the first plurality of layers into the memory of the hardware circuit.
P. 3, second col., paragraph 3 and Fig. 3: "Layer I operates on a tile of its input feature maps, consisting of 5x5xN input values (the black dashed outline labeled "tile I" and extending "down" through all N maps). This means that 5 x 5 x N words are brought from off-chip memory and stored in on-chip buffers. Layer 1 then convolves all M of its filters (each 3x3xN) across this tile, producing the 3 x 3 xM region illustrated with a black dashed outline in the intermediate feature maps (and extending downward through all M feature maps). Then, Layer 2 is able to use 

Regarding claims 5 and 14, Alwani discloses their further limitation comprising, prior to identifying the layers that are partitioned into the sequence of superlayers:
partitioning a first subset of the layers into a first superlayer comprising a first plurality of layers; and
partitioning a second subset of the layers into a second superlayer comprising a second plurality of layers.
See e.g. p. 8, second col.: “Given a network with ℓ l layers, there are 2ℓ-1 possible ways to fuse these layers”. Also, same col.: “The AlexNet CNN has five convolutional layers and three pooling layers; there are 128 possible combinations of different ways to fuse layers.” See also fig. 6.

Regarding claims 6 and 15, Alwani discloses their further limitation wherein the first superlayer and the second superlayer represent respective partitions in the sequence of superlayers.
P. 5, second col., paragraph 4 and Fig. 4: “On the right, we consider decomposing the layers into two pyramids. This organization has greater off-chip memory transfer, because layer 3's output must be stored to DRAM and then read-back to compute the pyramid for layer 4. The benefit of this multi-pyramid approach is that the on-chip storage for the reuse model (or the amount of recomputation if using the recompute model) will be reduced, as the input tile and intermediate results are smaller.” Fusion into multiple pyramids is equivalent to partitioning into superlayers. Arbitrary amounts of layers can be fused per paragraph 3.

Regarding claims 7 and 16, Alwani discloses their further limitation comprising:
processing a first set of superlayer inputs through each layer of the first plurality of layers in the first superlayer to generate a plurality of superlayer outputs from at least the superlayer input corresponding to the neural network input.


Regarding claims 8 and 17, Alwani discloses their further limitation wherein a superlayer input to the second superlayer in the sequence of superlayers corresponds to a first superlayer outputApplication No. : 16/017,052 generated by the superlayer in the sequence of superlayers.
P. 5, second col., paragraph 3: "Although our example (Figure 3) illustrates fusing two convolutional layers, fusing more layers is analogous. As the number of fused layers increases, the benefits (reduction of data transferred to and from DRAM) increase, but so do the costs (on-chip memory required or redundant computation performed). Thus, there is a tradeoff between the costs incurred and the benefits. We can consider the case where all layers are fused into a single pyramid as an extreme: increasing costs by the largest amount to save the most bandwidth. However, we can also choose other tradeoff points, decomposing the layers using more than one pyramid." Here the fusion into pyramids is equivalent to the creation of superlayers.

Regarding claims 9 and 18, Alwani discloses their further limitation wherein:
the memory of the hardware circuit is further configured to store the batch of neural network inputs for the neural network; and
the method further comprises loading a batch of neural network inputs for each superlayer in the sequence of superlayers in the memory of the hardware circuit.
P. 5, second col., paragraph 4 and Fig. 4: "On the right, we consider decomposing the layers into two pyramids. This organization has greater off-chip memory transfer, because layer 3's output must be stored to DRAM and then read-back to compute the pyramid for layer 4. The benefit of this multi-pyramid approach is that the on-chip storage for the reuse model (or the amount of recomputation if using the 

Regarding claims 10 and 19, Alwani discloses their further limitation wherein loading at least the respective set of parameters for each layer in the superlayer comprises:
loading the respective set of parameters for each layer based on a threshold aggregate parameter capacity of a parameter memory included in the memory of the hardware circuit, the parameter memory being configured to store parameters for the superlayer.
P. 6, second col., paragraph 2: “Given a hardware resource budget (e.g., a number of FPGA DSP slices available for the accelerator), one can find the optimal Tn and Tm for a given convolutional layer. In [19], a joint optimization process is proposed to create a design that can compute all of the convolutional layers in a given CNN. Given a resource budget, the optimization finds the (Tn, Tm) that maximizes the aggregate performance of the accelerator.” Loading of parameters is based on the given hardware resource budget.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103(a) which forms the basis for all obviousness rejections set forth in this Office action:
(a) A patent may not be obtained though the invention is not identically disclosed or described as set forth in section 102 of this title, if the differences between the subject matter sought to be patented and the prior art are such that the subject matter as a whole would have been obvious at the time the invention was made to a person having ordinary skill in the art to which said subject matter pertains. Patentability shall not be negatived by the manner in which the invention was made.

Claims 4 and 13 are rejected under 35 U.S.C. 103(a) as being unpatentable over Alwani and Ambrose (US 2017/0344882).
Regarding claims 4 and 13, Ambrose discloses their further limitation which Alwani does not seem to disclose explicitly wherein:
the hardware circuit is configured to exchange data communications with a host controller that is external to the hardware circuit; and
the batch of neural network inputs and the respective set of parameters for each layer in the superlayer are received from the host controller based on a global scheduling process executed by the host controller.
[0054]: “As observed by researchers, due to varying parameters and a fixed scheduling scheme across all layers of the CNN algorithm, each layer can have varying performance and memory access patterns for a given hardware platform. The present disclosure departs from the current practice of designing hardware for a given scheduling scheme. Instead, the present disclosure builds on the concept of having a flexible architecture and then selecting the best possible scheduling scheme for each layer from a given set of scheduling schemes.” There is global management of selecting scheduling schemes for each layer / superlayer.
At the time of filing, it would have been obvious to a person of ordinary skill to use a global scheduling technique (as disclosed by Ambrose) on a CNN system with layer fusing (as disclosed by Alwani) because this would provide for more efficient use of computation resources. Both disclosures pertain to neural network processing hardware.

Conclusion
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a). 
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Vincent Gonzales whose telephone number is (571) 270-3837. The examiner can normally be reached on Monday-Friday 7 a.m. to 4 p.m. MT.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Miranda Huang, can be reached at (571) 270-7092. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/Vincent Gonzales/Primary Examiner, Art Unit 2124