DETAILED CORRESPONDENCE
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .  This non-final office action is in response to the Patent Application filed on 19 December 2018.  Claims 1-20 are pending and considered below.         

Priority
	Applicant’s claim for the benefit of a prior-filed application under 35 U.S.C. 119(e) or under 35 U.S.C. 120, 121, or 365(c) is acknowledged.  Applicant’s claim of priority to provisional application 62/628130 filed 8 February 2018, provisional application 62/644352 filed 16 March 2018, and provisional application 62/675076 filed 22 May 2018 is recognized.  Therefore the application is afforded a priority date of 8 February 2018.

Claim Rejections - 35 USC § 112
Claims 1-20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
a.	Claims 1 and 11 each include the limitation, as recited, “a plurality of second processing units/elements (PEs) coupled to the crossbar and wherein each PE is configured to perform a sparse and/or irregular computation task of the ML operation on the data within the OCMs and/or from the POD units,” which is not clearly defined with respect to the written description.  The term “second processing units/elements,” when considered relative to the context of at least paragraph [40] and Figure 4 of the written description, does not accurately reflect the definition of the second processing element denoted as a “processing engine/element (PE), e.g., 430 (or 432, 434, 436).”  As claimed the PE processing unit/element is confusing with respect to the claimed POD (i.e., first processing unit), and the Examiner recommends more clarity with respect to differentiating the first and second processing elements from each other.
b.	Claim 11 includes the limitation, as recited, “maintaining and outputting result of the ML operation performed by a processing tile that comprises at least a PE, a POD, and an OCM as an output data stream from the OCM,” which is confusing with respect to what it is claiming.  The phrase, “an OCM as an output data stream from the OCM,” is circular in logic because the OCM is a hardware element denoted as an on-chip memory, and the limitation claims an OCM (i.e., interpreted as a hardware element) as an output data stream from the OCM (i.e., interpreted as a hardware element).  Thus, confusion results from a hardware element claimed as an output data stream from an undetermined OCM hardware element.  

Examiner Note
While the Applicants have claimed an inference engine as performing the steps of the claims, the structural elements of the inference engine are further defined within the claims and thus interpretation of the claims under 35 USC 112(f) or sixth paragraph is not applicable. 

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claim(s) 1-20 is/are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Nurvitadhi et al. (20180315158).

Claims 1 and 11:	Nurvitadhi discloses crossbar-based inference engine and method configured to perform a machine learning (ML) operation on an input data stream ([53, 60, 62, 142 “machine learning application 602 can include training and inference functionality for a neural network,” 143, 144, Fig. 2A]), comprising: 
a plurality of on-chip memories (OCMs) coupled to a crossbar and each OCM is configured to load and maintain data from the input data stream for local access by components in the inference engine ([52, 53 “Within the parallel processing unit 202, the I/O unit 204 connects with a host interface 206 and a memory crossbar 216, where the host interface 206 receives commands directed to performing processing operations and the memory crossbar 216 receives commands directed to performing memory operations,” 55, 60, Fig. 2A (i.e. elements 224, 216, 212)]); 
maintain and output result of the ML operation performed by the components in the inference engine as an output data stream ([62 “ clusters 214A-214N of the processing cluster array 212 can process data that will be written to any of the memory units 224A-224N within parallel processor memory 222. The memory crossbar 216 can be configured to transfer the output of each cluster 214A-214N to any partition unit 220A-220N or to another cluster 214A-214N, which can perform additional processing operations….separate traffic streams between the clusters 214A-214N and the partition units,”]); 
a plurality of first processing units (PODs) each coupled to one OCM of the plurality of OCMs and configured to perform a dense and/or regular computation task of the ML operation on the data within the corresponding OCM ([60 “Each of the one or more instances of the parallel processing unit 202 can couple with parallel processor memory 222. The parallel processor memory 222 can be accessed via the memory crossbar 216, which can receive memory requests from the processing cluster array 212 as well as the I/O unit 204. The memory crossbar 216 can access the parallel processor memory 222 via a memory interface 218. The memory interface 218 can include multiple partition units (e.g., partition unit 220A, partition unit 220B, through partition unit 220N) that can each couple to a portion (e.g., memory unit) of parallel processor memory,” 232-234, 235 “input buffer and unpack unit 2111A-2111N supports dense matrix format, compressed sparse matrix formats, as well as further sparse matrix format optimizations,” 236 “operation in both row-oriented and column-oriented formats for any combination of a sparse or dense matrix and a sparse or dense vector (e.g., sparse matrix, sparse vector; sparse matrix, dense vector; dense matrix, sparse vector; dense matrix, dense vector,” 237]); 
a plurality of second processing units/elements (PEs) coupled to the crossbar and wherein each PE is configured to perform a sparse and/or irregular computation task of the ML operation on the data within the OCMs and/or from the POD units ([60 “Each of the one or more instances of the parallel processing unit 202 can couple with parallel processor memory 222. The parallel processor memory 222 can be accessed via the memory crossbar 216, which can receive memory requests from the processing cluster array 212 as well as the I/O unit 204. The memory crossbar 216 can access the parallel processor memory 222 via a memory interface 218. The memory interface 218 can include multiple partition units (e.g., partition unit 220A, partition unit 220B, through partition unit 220N) that can each couple to a portion (e.g., memory unit) of parallel processor memory,” 232-234, 235 “input buffer and unpack unit 2111A-2111N supports dense matrix format, compressed sparse matrix formats, as well as further sparse matrix format optimizations,” 236 “operation in both row-oriented and column-oriented formats for any combination of a sparse or dense matrix and a sparse or dense vector (e.g., sparse matrix, sparse vector; sparse matrix, dense vector; dense matrix, sparse vector; dense matrix, dense vector,” 237]); and 
said crossbar configured to connect the plurality of PEs to the plurality of OCMs to enable each PE of the plurality of PEs to read data from and/or write data to the corresponding OCM ([60, 61, 62, 64 “L2 cache 221 is a read/write cache that is configured to perform load and store operations received from the memory crossbar 216,” Fig. 2A]);
maintaining and outputting result of the ML operation performed by a processing tile that comprises at least a PE, a POD, and an OCM as an output data stream from the OCM ([60, 61, 62, 64 “L2 cache 221 is a read/write cache that is configured to perform load and store operations received from the memory crossbar 216,” Fig. 2A]). 

Claims 2 and 12:	Nurvitadhi discloses the system and method as for Claims 1 and 11 above, and Nurvitadhi further discloses: 
a plurality of OCM streamers each configured to stream data between each OCM and its corresponding POD ([62 “memory crossbar 216 can use virtual channels to separate traffic streams between the clusters 214A-214N and the partition units 220A-220N,” 81 “configured as a streaming multiprocessor (SM) capable of simultaneous execution of a large number of execution threads,”].  

Claims 3 and 13:	Nurvitadhi discloses the system and method as for Claims 1 and 11 above, and Nurvitadhi further discloses: 
the input data stream includes data to be analyzed and inferred by the inference engine and/or training data used to train the inference engine for the ML operation, wherein the training data includes a polynomial with their respective weights ([69 “execution logic supports a variety of operations including integer and floating point arithmetic, comparison operations, Boolean operations, bit-shifting, and computation of various algebraic functions,” 163, 164, 166 “perform statistical language modeling to predict an upcoming word given a previous sequence of words. The illustrated RNN 1000 can be described has having an input layer 1002 that receives an input vector, hidden layers 1004 to implement a recurrent function, a feedback mechanism 1005 to enable a ‘memory’ of previous states, and an output layer 1006 to output a result,” 167 “trained layer-by-layer using greedy unsupervised learning. The learned weights of the DBN can then be used to provide pre-train neural networks by determining an optimal initial set of weights for the neural network, 169, 170]).  

Claims 4 and 14:	Nurvitadhi discloses the system and method as for Claims 1 and 11 above, and Nurvitadhi further discloses:
each PE of the plurality of PEs is configured to receive and execute a set of programming instructions directly from a core, wherein the core is configured to coordinate and program the inference engine to perform the ML operation ([43, 46 “ parallel or vector processing system that can include a large number of processing cores and/or processing clusters, such as a many integrated core (MIC) processor,” 63 “different instances of the parallel processing unit 202 can be configured to inter-operate even if the different instances have different numbers of processing cores, different amounts of local parallel processor memory, and/or other configuration differences,”]).  

Claims 5 and 15:	Nurvitadhi discloses the system and method as for Claims 4 and 14 above, and Nurvitadhi further discloses: 
the plurality of OCMs and/or the plurality of PODs are configured to receive and execute a set of programming instructions from the core via the crossbar ([52-62, 63 “different instances of the parallel processing unit 202 can be configured to inter-operate even if the different instances have different numbers of processing cores, different amounts of local parallel processor memory, and/or other configuration differences,”]).  

Claims 6 and 16:	Nurvitadhi discloses the system and method as for Claims 1 and 11 above, and Nurvitadhi further discloses:
each POD of the plurality of PODs is configured to perform a matrix multiplication operation on the data in its corresponding OCM ([159 “activations within the fully connected layers 908 can be computed using matrix multiplication,”]).  

Claims 7 and 17:	Nurvitadhi discloses the system and method as for Claims 6 and 16 above, and Nurvitadhi further discloses: 
each POD is configured to perform one or more post matrix multiplication operation on the POD output ([159 “activations within the fully connected layers 908 can be computed using matrix multiplication,” 195, 236 “sparse compute accelerator architecture 2100 is generally intended to operate on large matrix data, where performance is typically limited by the memory bandwidth available to access such data. Accordingly, the accelerator architecture has been designed to scale and take the most advantage of all available memory bandwidth,”]).  

Claims 8 and 18:	Nurvitadhi discloses the system and method as for Claims 1 and 11 above, and Nurvitadhi further discloses: 
the crossbar is configured to accept one read request or one write request per PE to read data from and write data to one OCM of the plurality of OCMs, respectively ([60 “parallel processor memory 222 can be accessed via the memory crossbar 216, which can receive memory requests from the processing cluster array 212 as well as the I/O unit 204,” 61, 62 “memory crossbar 216 can be configured to transfer the output of each cluster 214A-214N to any partition unit 220A-220N or to another cluster 214A-214N, which can perform additional processing operations on the output. Each cluster 214A-214N can communicate with the memory interface 218 through the memory crossbar 216 to read from or write to various external memory devices,” 64]).  

Claims 9 and 19:	Nurvitadhi discloses the system and method as for Claims 8 and 18 above, and Nurvitadhi further discloses: 
the crossbar is configured to route the read or the write request through the plurality of OCMs in the inference engine until the request reaches the OCM associated with the request ([60 “parallel processor memory 222 can be accessed via the memory crossbar 216, which can receive memory requests from the processing cluster array 212 as well as the I/O unit 204,” 61, 62 “memory crossbar 216 can be configured to transfer the output of each cluster 214A-214N to any partition unit 220A-220N or to another cluster 214A-214N, which can perform additional processing operations on the output. Each cluster 214A-214N can communicate with the memory interface 218 through the memory crossbar 216 to read from or write to various external memory devices,” 64]).  

Claims 10 and 20:	Nurvitadhi discloses the system and method as for Claims 8 and 18 above, and Nurvitadhi further discloses:
the crossbar is configured to merge a plurality of read and/or write requests to a same address in the same OCM ([64 “read/write cache that is configured to perform load and store operations received from the memory crossbar,” 72 “processing cluster 214 may include an MMU 245 (memory management unit) that is configured to map virtual addresses into physical addresses. In other embodiments, one or more instances of the MMU 245 may reside within the memory interface 218 of FIG. 2A. The MMU 245 includes a set of page table entries (PTEs) used to map a virtual address to a physical address of a tile and optionally a cache line index,” 76]).  

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant’s disclosure.
Please see attached References Cited form 892
See Barik et al. (20180307980)
See Molchanov et al. (20180114114)
Any inquiry concerning this communication or earlier communications from the examiner should be directed to David Stoltenberg whose telephone number is (571) 270-3472. 
The examiner can normally be reached on Monday-Friday 8:30AM to 5:00PM EST.  If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Waseem Ashraf, can be reached on (571) 270-3948.  The fax phone number for the organization where this application or proceeding is assigned is (571)-273-8300, or the examiner’s direct fax phone number is 571 270 4472.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool.  To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.  
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published application may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov.  Should you have questions on access to the Private PAIR system, contact the Electronic Business Center at (866) 217-9197 (toll free).  If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call (800) 786-9199 (IN USA OR CANADA) or (571) 272-1000.

/DAVID J STOLTENBERG/Primary Examiner, Art Unit 3682