Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Reasons for Allowance
The following is an examiner’s statement of reasons for allowance: Claim 1 requires, among other things: 1. An integrated circuit device, comprising: a state buffer operable to receive data elements of a tensor; a results buffer; and a processing element array, wherein the integrated circuit device is configured to execute a set of compiler- generated instructions to transpose the tensor, and the set of compiler-generated instructions is operable to: map a block of data elements of the tensor to a number of row partitions of the state buffer,… the results buffer having a same number of column partitions as columns in the processing element array, wherein the summed multiplication products for subsequent multiplication operations are stored in subsequent rows for each corresponding column partition, and wherein the results buffer has a same number of rows as the processing element array; and load the summed multiplication products stored in the column partitions in the results buffer to a corresponding number of row partitions in the state buffer. 
The closest prior art includes Schreiber (patent No. 6,438,747) and Young (patent application publication No. 2018/0260690). Young taught  a system including a input memory (310) , matrix computation unit (312), vector computation unit (314)(e.g., see fig. 3) configuring the processor as an array of cells (e.g., see fig. 4)(activation register and weight register which provides stores and provides input data to the An integrated circuit device, comprising: a state buffer operable to receive data elements of a tensor; a results buffer; and a processing element array, wherein the integrated circuit device is configured to execute a set of compiler- generated instructions to transpose the tensor, and the set of compiler-generated instructions is operable to: map a block of data elements of the tensor to a number of row partitions of the state buffer,… the results buffer having a same number of column partitions as columns in the processing element array, wherein the summed multiplication products for subsequent multiplication operations are stored in subsequent rows for each corresponding column partition, and wherein the results buffer has a same number of rows as the processing element array; and load the summed multiplication products stored in the column partitions in the results buffer to a corresponding number of row partitions in the state buffer. 
Claim 5 recites similar limitations.
Claim 19 requires, among other things; 19. A computer-implemented method,… and generating, by the compiler, a set of instructions operable to transpose the tensor by loading data elements of the tensor from a state buffer of the execution engine into a systolic array of the execution engine, performing an identity multiplication on the data elements in the systolic array, storing a result of the identity multiplication in a results buffer of the execution engine, and loading the result from the results buffer into the state buffer. 
Schreiber and Young did not teach these limitations.  
Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee.  Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance.”
	Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
	Greenberg (patent No. 6,675,187) disclosed pipelined linear array of processing elements performing matrix computations (e.g., see abstract).
	Kurak (patent No. 6,754,687) disclosed system for cosine transform (e.g., see abstract).
	Rao (patent No. 8,473,430) disclosed decoder and triangular matrix (e.g., se abstract).
	Zeda (patent No. 11,036,827) disclosed software defined buffer/transposer for general matrix multiplication in a programmable IC. (e.g., see abstract). 
	Gustavson (patent application publication No. 2009/0063529) disclosed system for in-place transformation of standard full and packed matrix data formats (e.g., see abstract).

	Meeker (patent application publication No. 2013/0103925) disclosed system for folding a SIMD array (e.g., see abstract). 
	Zhang (patent application publication No. 2018/03114671) disclosed system for systolic array design from a high-level program (e.g., see abstract). 
	Nishikawa (patent No. 5,842,035) disclosed parallel computer utilizing less memory having first and second memory arrays (e.g., see abstract).
	
	Baker, K., Singular Value Decomposition Tutorial., kbaker@ling.osu.edu., March 19,2005, pp. 1-24.   (Year: 2005).
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ERIC COLEMAN whose telephone number is (571)272-4163. The examiner can normally be reached M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jyoti Mehta can be reached on 0-3995. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is 

ERIC . COLEMAN
Primary Examiner
Art Unit 2183



/ERIC COLEMAN/Primary Examiner, Art Unit 2183