Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Remarks
This Office Action is responsive to Applicants' Amendment filed on September 28, 2022, in which claims 1, 6, and 13-17 are currently amended. Claims 2-5, 7-12, and 18-24 are canceled. Claims 25-34 are newly added.

Response to Arguments
Applicant’s arguments with respect to rejection of claims 1-20 under 35 U.S.C. 103(a) based on amendment have been considered and are persuasive. The argument is moot in view of a new ground of rejection set forth below.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.

Claims 1-20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.

Regarding claims 1, 6, 11, and 16, having a smaller number of instantiations is indefinite.  It's not clear from the instant specification what an instantiation is or how the smaller number of instantiations might be comparatively determined.

Regarding claims 4, 9, 13, and 19, an ANN “incapable” of being implemented in a single NN processor device is indefinite.  It would be unclear to one of ordinary skill in the art what metrics are used to determine whether or not an ANN would be incapable of being implemented in a single NN processor device.  The limitation is circular and self-contradictory in nature seeing as the claims are directed towards a single NN processor device (processor integrated circuit device).

The remaining claims are rejected with respect to their dependence on the rejected claims. 


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.


	Claims 1-20 are rejected under U.S.C. §103 as being unpatentable over the combination of Majid (“Evolutionary Neural Network Parallelization with Multicore Systems on Chip”, 2012) and Pratas (US20170277658A1) and in further view Yao (US20180046894A1).

	 Regarding claim 1, Majid teaches A neural network (NN) processor integrated circuit (IC) device for performing neural network calculations for an artificial neural network (ANN) having one or more network layers, comprising:([Abstract] "In this work, Parallel Evolutionary Neural Network algorithm is proposed and implemented on Multi-core system on chip. The algorithm is parallelized, partitioned, mapped, and scheduled on multicore." [p. 1 §II] "Evolutionary Neural Network (ENN) consists of Feed Forward Neural Networks and Genetic Algorithm (GA). A multilayer feed forward neural network consists of three types of layers; input, hidden and output")
	a plurality of computation circuits, each computation circuit including computing elements, associated dedicated memory elements, and related control logic configured to be dynamically mapped to sets of memory elements in accordance with a number of computations required in a network layer;([p.1 §IV] "Parallel Evolutionary Neural Network algorithm is divided into two parts, Parallel Feed Forward Neural Networks and Parallel GA. These parts are further divided 
into number of steps to find out the suitable candidates for parallel processing. These steps need to be partitioned, mapped, and scheduled on multi-core so that they can 
execute entire algorithm efficiently" [p. 2 §IVC] "To achieve the full performance of multi-core, the parallel part of the algorithm should divide into tasks; these tasks are executed on different cores concurrently...Tasks are created with as few dependencies as possible with other tasks. In case of one hidden layer, each task contains two matrix multiplications. First matrix multiplication involves multiplication of an input row vector with neurons of weight matrix giving a row vector. The resultant row vector is then multiplied with a matrix of the output layer")
	wherein said plurality of computation circuits, including computing elements and associated dedicated memory elements, are aggregated in multiple levels to form a hierarchy, each level having its own dedicated local memory;([p. 3 §VIII] "Each task with its associated chromosomes is sent to L2 cache. The Fitness Function and roulette selection is sent to L3 cache as L3 cache is shared among all the cores. These tasks are computed independently on different cores. The L2 cache is used for subpopulations and L3 cache for fitness functions and roulette selection as shown in Fig 6." L2 cache for subpopulations interpreted as synonymous with dedicated memory associated with a specific hierarchical level.)
	wherein higher levels in said hierarchy are generally more complex and include a smaller number of instantiations than lower levels;([p. 3 §VIII] "Each task with its associated chromosomes is sent to L2 cache. The Fitness Function and roulette selection is sent to L3 cache as L3 cache is shared among all the cores. These tasks are computed independently on different cores. The L2 cache is used for subpopulations and L3 cache for fitness functions and roulette selection as shown in Fig 6." L2 population cache interpreted as more complex higher level containing a smaller number of instantiations.)
	whereby one or more splits are made in accordance with bandwidth demand at the input and output of any ANN subnetworks mapped to said plurality of NN processor cores; ([p. 2 §V] "inspired by the functional programming concepts of mapping and reducing. Input matrix is split into a set of columns and the weight matrix into a set of rows" [p. 3 §VIIB] "The cache hit performance is improved by the concept of row-major organization [6]. The element of the second matrix is accessed column wise, and therefore is not in sequential order in memory. This is due to the fact that matrices are stored in memory as row-major order. As a result, the feed forward algorithm is bandwidth limited and displays poor performance and low efficiency because of time spent in loading data rather than computing. The second matrix is accessed using row-major order in order to optimize the performance." Majid explicitly teaches that the splits are made to ensure ANN input rows fit in the L1 cache and are explicitly organized in row-major order to maximize performance with respect to bandwidth demand.).
	However, Majid does not explicitly teach an internal bus providing synchronous communications between said plurality of NN processor cores utilizing a synchronous protocol as well as guaranteeing a required bandwidth therebetween; 
	wherein during an offline compilation process a compiler maps on a layer by layer basis a logical ANN model to a physical configuration that includes a plurality of NN processor cores 
	and wherein said mapping and resultant physical configuration are driven by available resources of each NN processor core, including memory capacity, computing capacity, availability of control resources, and input and output ports each having limited bandwidth..

	Pratas, in the same field of endeavor, teaches an internal bus providing synchronous communications between said plurality of NN processor cores utilizing a synchronous protocol as well as guaranteeing a required bandwidth therebetween; ([¶0138] "The outputs from the PUs are collected and processed in a synchronized manner in order to produce the expected results. Data is communicated to the execution cluster 2500 from either the memory 2501 via the I/O interface 2503 or the external world from external interfaces 2502 via the I/O interface 2503 (e.g., using point-to-point buses)." [¶0140] "Current designs propose traditional data cache organizations to reduce the bandwidth requirements." Reducing bandwidth requirements interpreted as synonymous with guaranteeing a required bandwidth.)
	and wherein said mapping and resultant physical configuration are driven by available resources of each NN processor core, including memory capacity, computing capacity, availability of control resources, and input and output ports each having limited bandwidth.([¶0119] "Thus, one embodiment of the invention comprises a unified scratch pad memory 1900 used for two types of data in convolution accelerators, input data and partial results. In this scratchpad memory all banks are partitioned in two areas (input data and partial results) and the amount devoted for each data type can be changed depending on the problem/application. Sharing the available storage capacity allows an optimal use for all problem sizes, leading to lower bandwidth requirements and lower energy per operation." [¶0120] "The embodiments of the invention also include a mapping technique that ensures a minimum Quality of Service for both types of data, even when using memory banks with only one Read/Write port and a shared interconnect. Allowing the usage of memory banks with only one port reduces the required area and energy of the scratchpad memory 1900.").

	Majid as well as Pratas are directed towards a system on a chip for parallel processing of neural networks.  Therefore, Majid as well as Pratas are analogous art in the same field of endeavor.  It would have been obvious before the effective filing date of the claimed invention to combine the teachings of Majid with the teachings of Pratas by using the detailed hardware implementation in Pratas to perform the hierarchical task based parallelization in Majid.  Pratas teaches as motivation for the combination ([¶0121] “One advantage of this unified design is that it achieves optimal utilization of the available capacity of the scratchpad memory 1900, and most importantly, without requiring multi-ported memory banks or additional array buses that typically require more area and consume more power. Additionally, better scratchpad memory utilization results in a significant external-memory bandwidth reduction, and therefore lower power and energy consumption.”).
	While Pratas does teach a compiler to compile instructions for the parallel system [¶0085-¶0086], the combination of Majid, and Pratas does not explicitly teach during an offline compilation process a compiler maps on a layer by layer basis a logical ANN model to a physical configuration that includes a plurality of NN processor cores.

	Yao, in the same field of endeavor, teaches during an offline compilation process a compiler maps on a layer by layer basis a logical ANN model to a physical configuration that includes a plurality of NN processor cores ([¶0013] " compiling step, for compiling said compressed ANN to generate instructions to be executed by an ANN accelerator, so as to implement said ANN on said ANN accelerator; wherein the compiling step is conducted on the basis of the quantized weights of CONV and FC layers of said ANN" See also Table 3 [¶0140] "Instructions for One CONV layer generated by the compiler").

	The combination of Majid and Pratas as well as Yao are directed towards a system on a chip for parallel processing of neural networks.  Therefore, the combination of Majid and Pratas as well as Yao are analogous art in the same field of endeavor.  It would have been obvious before the effective filing date of the claimed invention to combine the teachings of the combination of Majid, and Pratas with the teachings of Yao by performing the mapping at compile time.  Yao provides as additional motivation for combination ([¶0152] “With these parameters, the compiling step 415 may provide a set of customized instructions for said ANN. For example, the tiling and data reusing steps 710 and 715 may help achieve a better utilization of the accelerator's resources with these parameter”).  This motivation for combination also applies to the remaining claims which depend on this combination. 

	 Regarding claim 2, the combination of Majid, Pratas, and Yao teaches The device according to claim 1, further comprising an NN processor system comprising a plurality of NN processor devices each interconnected via said device-to-device interface circuit, wherein implementation of the ANN over said plurality of interconnected NN processor devices is substantially seamless resulting in behavior equivalent to the ANN implemented on a single NN processor device.(Pratas [Abstract] “An apparatus and method are described for distributed and cooperative computation in artificial neural networks. For example, one embodiment of an apparatus comprises: an input/output (I/O) interface; a plurality of processing units communicatively coupled to the I/O interface to receive data for input neurons and synaptic weights associated with each of the input neurons, each of the plurality of processing units to process at least a portion of the data for the input neurons and synaptic weights to generate partial results; and an interconnect communicatively coupling the plurality of processing units, each of the processing units to share the partial results with one or more other processing units over the interconnect”).   [¶0126] “Still, using a simpler memory array requires handling read/write conflicts in the shared bus and a specialized data mapping to guarantee the required Quality of Service for both input-data and partial results” Guaranteeing quality of service across a shared bus and resolving read/write conflicts interpreted as synonymous with substantially seamless).
	
	 Regarding claim 3, the combination of Majid, Pratas, and Yao teaches The device according to claim 2, wherein said plurality of NN processor devices are interconnected in at least one of a scatter configuration, gather configuration, and feedforward configuration.(Majid [p. 1 §III] "In ENN, due to the Single Instruction Multiple Data (SIMD) nature of the feed forward, data level parallelism has been selected to partition the feed forward into different parts and each part is further divided into tasks").
	
	 Regarding claim 4, the combination of Majid, Pratas, and Yao teaches The device according to claim 2, wherein said interconnected plurality of NN processor devices is operative to implement an ANN too large to implement in a single NN processor device.(Pratas [Abstract] “An apparatus and method are described for distributed and cooperative computation in artificial neural networks. For example, one embodiment of an apparatus comprises: an input/output (I/O) interface; a plurality of processing units communicatively coupled to the I/O interface to receive data for input neurons and synaptic weights associated with each of the input neurons, each of the plurality of processing units to process at least a portion of the data for the input neurons and synaptic weights to generate partial results; and an interconnect communicatively coupling the plurality of processing units, each of the processing units to share the partial results with one or more other processing units over the interconnect”).).
	
	 Regarding claim 5, the combination of Majid, Pratas, and Yao teaches The device according to claim 1, wherein said device-to-device interface circuit comprises at least one input port and at least one output port providing bidirectional communications between two NN processor devices.(Yao [¶0207] "datain_port_num. The maximum amount of data that can be transferred by DMA each cycle." [¶0209] "dataout_port_num. The maximum amount of results that can be transferred by DMA each cycle." See also FIG. 3.).

Claims 6-10, 11-15, and 16-20 are substantially similar to claims 1-5, therefore the rejections applied to claims 1-5 also apply to claims 6-10, 11-15, and 16-20.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Ishii (US20210133552A1) is directed towards a hierarchical mapping of neural networks.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SIDNEY VINCENT BOSTWICK whose telephone number is (571)272-4720.  The examiner can normally be reached on M-F 7:30am-5:00pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Miranda Huang can be reached on (571)270-7092.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/SB/Examiner, Art Unit 2124                                                                                                                                                                                                        

/MIRANDA M HUANG/Supervisory Patent Examiner, Art Unit 2124