Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Election/Restrictions
Applicant’s election without traverse of Group I (claims 1-10, 20 and 21) in the reply filed on 2/4/2022 is acknowledged.  Claims 11-19, 22 and 23 are hereby cancelled.
Specification
The disclosure is objected to because of the following informalities: in ¶2, the term “neural Network” should be all lowercase for consistency.  
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):

(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:

The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 1-10, 20 and 21 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claim 1 recites the limitation "the selected layer objects" in the second main limitation.  There is insufficient antecedent basis for this limitation in the claim.  To 
Claims 2-10, 20 and 21 are rejected as being dependent upon a rejected base claim.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 2, 5, 8-10, 20 and 21 are rejected under 35 USC 103 as being unpatentable over Fused-Layer CNN Accelerators to Alwani et al. (hereinafter Alwani) in view of Tactics to Directly Map CNN Graphs on Embedded FPGAs to Abdelouahab et al. (hereinafter Abdelouahab).
Per claim 1, Alwani discloses a method (Section III-IV and figs. 3-7…method of implementing fused-layer convolutional neural network accelerator) to optimize a neural network computational algorithm (Section 4 and Listing 1-4…CNN accelerator algorithm), wherein the computational algorithm is used to execute neural network calculation (Section I…CNN performs calculations in applications such as computer vision recognition/classification of images) by a computational platform (Section I, VI…CNN layer fusion implemented on Xilinx Virtex-7 FPGA); wherein the computational platform reads data needed by the calculation from off-chip memory (Section II, last paragraph…”In this paper, we demonstrate layer fusion, our technique to minimize the off-chip data movement between layers…our results show that, on an FPGA implementation of the first five convolutional layers of VGGNet-E…we reduce the total data transfer required from 77MB to 3.6MB…at the cost of only 362KB of extra on-chip storage), wherein the method comprises: 
selecting layers which can be fused (Section III-D and fig. 4…fusing groups of layers into ‘pyramids’, i.e., can selectively fuse all layers into one pyramid or fuse two subsets of layers into their own pyramids) at least based on an optimization rule to reduce frequency of data exchange between the computational platform and the off-chip memory (Section V…an exploration tool built to analyze pyramid combinations and how they affect reduced frequency of data exchange between FPGA and off-chip memory, the exploration tool is an optimization based on programmed rules: “a tool for exploring the tradeoffs of fused-layer CNN accelerators designs…analyzes the costs (in terms of added on-chip memory capacity or added arithmetic operations) and benefits (off-chip accesses saved) for all possible pyramids and combinations of pyramids)”); 
fusing at least two adjacent layers in the computational graph according to the selected layer objects (Section III…”This work identifies a key opportunity in restructuring the CNN evaluation by fusing the computation of adjacent layers, largely eliminating the off-chip feature map data transfer”; fig. 4…at least two adjacent layers are selected to be fused into a pyramid, e.g., right side of fig. 4 shows bottom two layers are fused into one pyramid; Section V-B…can select to group various combinations of adjacent layers, “For example, if a network has three layers, we can choose to organize the layers into groups of (1,1,1), (1,2), (2,1), or (3)”), wherein the at least two adjacent layers are at least one of the following:
horizontally adjacent layers having same input of feature maps (fig. 3 shows layers 1 and 2 having same feature maps, layers 1 and 2 construed to be horizontally adjacent layers); and
longitudinally adjacent layers (fig. 4 shows stacked layer representation with the adjacent layers being grouped, construed to be longitudinally adjacent layers), wherein the calculation results of a feature map of previous layer are at least part of input for a next layer (fig. 3 show calculation results of feature maps for layer 1 being used as input in layer 2).

Alwani does not expressly disclose, but Abdelouahab does teach a neural network algorithm, specifically a CNN algorithm such as in Alwani, being described as a neural network computational graph (Abdelouahab: Section 1…”a CNN algorithm is described as a graph of dataflow actors exchanging data thru unidirectional channels and this dataflow graph is statically and physically mapped onto the target FPGA using a library of pre-defined computing elements implementing actors”).
 Alwani combined with Abdelouahab are analogous art because they are from the same field of endeavor in implementing CNNs on FPGAs for machine vision applications.
Before the effective filing date of the claimed invention, it would have been obvious to a person of ordinary skill in the art to have the CNN algorithm in Alwani to be implemented in the framework of a computational graph.
The suggestion/motivation for doing so would have been to have an efficient mapping between the computational model of the CNN and the execution model (Abdelouahab: Abstract and Section I).
Per claim 2, Alwani combined with Abdelouahab discloses claim 1, Alwani further disclosing the at least two adjacent layers comprise: a convolution (CONV) layer, a non-linear (ReLU) layer and a pooling (POOL) layer which are successive (Section II, last paragraph…”FPGA implementation of the first five convolution layers of VGGNet-E (along with adjacent pooling, padding, and ReLU layers”); horizontal CONV layers sharing same input of feature maps (Section VI.B…”In all, we fuse two convolutional layers, two ReLU layers, two padding layers, and one pooling layer”).
Per claim 5, Alwani combined with Abdelouahab discloses claim 1, Alwani further disclosing a subsequent adjacent layer further reading needed data which is from other feature maps from the off-chip memory (fig. 4, rights side and Section III.D…the two pyramids each require their own reading from off-chip memory, “On the right side, we consider decomposing the layers into two pyramids.  This organization has greater off-chip memory transfer, because layer 3’ output must be stored to DRAM and then read-back to compute the pyramid for layer 4”).
Per claim 8, Alwani combined with Abdelouahab discloses claim 1, Alwani further disclosing decomposing a layer with a plurality of previous horizontally side-by-side input feature maps (fig. 4, rights side…decomposing layers to merge into pyramids; Section III.D…“On the right side, we consider decomposing the layers into two pyramids.; and merging the layers obtained after the decomposition into respective input branches (fig. 3 and Section III.A…pyramids are construed to be input branches, ”fusing two layers, the general form allows for more than two to be merged in an analogous way…if the layers are visualized spatially, this process creates a computational pyramid across multiple layers”).
Per claim 9, Alwani combined with Abdelouahab discloses claim 8, Alwani further disclosing the decomposed layer is a POOL layer on a trunk having branch inputs (Section III.B…” Because the pooling operation is performed localized over small tiles, we always fuse the pooling layer into the previous convolutional layer, as it saves bandwidth at virtually no cost”).
Per claim 10, Alwani combined with Abdelouahab discloses a method to optimize a neural network computational graph based on rules (see claim 1 analysis), comprising: making preset rules according to the method of claim 1 (Alwani: Section V…an exploration tool built to analyze pyramid combinations and how they affect reduced frequency of data exchange between FPGA and off-chip memory, the exploration tool is an optimization based on programmed rules: “a tool for exploring the tradeoffs of fused-layer CNN accelerators designs…analyzes the costs (in terms of added on-chip memory capacity or added arithmetic operations) and benefits (off-chip accesses saved) for all possible pyramids and combinations of pyramids)”); searching a topology conforming to the preset rules in the neural network computational graph and reconstructing the computational graph (Alwani: Section V.B and fig. 7…different groupings of layers into pyramids, e.g., topologies, are explored using the exploration tool, e.g., reconstructing the original computational graph is different combinations of groupings, “For each network, we enumerate all possibilities and compute how much data must be transferred to and from DRAM and how much on-chip buffering is required. Figure 7 shows these results for AlexNet and VGG…For example, point A in Figure 7(b) has the lowest on-chip storage cost; it represents a layer-by-layer design that incurs no layer-fusion costs and transfers 86MB of data. Point C represents another extreme, where five convolutional layers are fused and only the input and final output feature maps are transferred. This design transfers only 3.6MB per image, a 24x reduction in DRAM traffic, but requires 362KB of on-chip memory for intermediate results. Other points between these extremes may represent attractive tradeoffs. For example, point B transfers 25MB of data, but requires only 118KB of extra on-chip storage”).
Per claim 20, Alwani combined with Abdelouahab discloses a computational platform (Alwani: Section I, VI…CNN layer fusion implemented on Xilinx Virtex-7 FPGA) for a neural network (Alwani: Section I…CNN is neural network performing calculations in applications such as computer vision recognition/classification of images), comprising: a data processing module (Xilinx Virtex-7 has DSPs construed to be processing modules…see extrinsic evidence Xilinx 7 Series FPGAs Data Sheet: Overview, pg. 5), used to carry out preset calculation processing for input data and generate output data (DSP in Virtex-7 processes input and generates output); a data storage module (Xilinx Virtex-7 has RAM Blocks construed to be data storage module…see extrinsic evidence Xilinx 7 Series FPGAs Data Sheet: Overview, pg. 5), used to cache input data needed by the data processing module or intermediate data outputted by the data processing module (RAM serves are cache/buffers for data processing operations); and a control module, controlling the data processing module and the data storage module to execute neural network calculation based on a computational graph optimized by the method of claim 1 (Xilinx Virtex-7 has logic cells and blocks construed to be a control module that controls data processing module and data storage module…see extrinsic evidence Xilinx 7 Series FPGAs Data Sheet: Overview, pg. 5).
Per claim 21, Alwani combined with Abdelouahab discloses a non-transitory machine-readable storage medium, wherein an executable code is stored thereon (Alwani: Section IV…fused-layer CNN implemented C++ code that is transformed into hardware using Vivado HLS tool, intrinsically requiring a non-transitory machine-readable storage medium); wherein when the executable code is executed by a processor of an electronic device (Alwani: Section VI.A…Xilinx Virtex-7 FPGA construed to be processor on a board of an electronic device that executes transformed C++ code), the processor executes the method of claim 1 (see claim 1 analysis).
Allowable Subject Matter
Claims 3, 4, 6 and 7 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
The following is the statement of reasons for the indication of allowable subject matter:  The prior art disclosed by the applicant and cited by the Examiner fail to teach or suggest, alone or in combination, all the limitations of independent claim 1 and intervening claims, including the particular notable limitations provided below:
Claims 3-4: pruning layers only used to change data dimension or arrangement manner in the neural network, through storing operation result back to the off-chip memory in a required dimension arrangement manner and/or reading previous operation result from the off-chip memory in a required dimension arrangement manner
Claim 6: a subsequent adjacent layer further reading needed data which is from other feature maps from the off-chip memory
Claim 7: directly merging operation of a subsequent layer into a previous layer, wherein the subsequent layer is a Batch Norm layer and a Scale layer, and the previous layer is a CONV layer

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.  Patents and/or related publications are cited in the Notice of References Cited (Form PTO-892) attached to this action to further show the state of the art with respect to optimizing neural network computational graphs in part by fusing layers.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ALAN CHEN whose telephone number is (571)272-4143. The examiner can normally be reached M-F 10-7.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kamran Afshar can be reached on (571) 272-7796. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and 





/ALAN CHEN/Primary Examiner, Art Unit 2125