DETAILED ACTION
Status of Claims 
Claims 1, 3-11, and 13-26 have been considered. It is hereby acknowledged that the following papers have been received and placed of record in the file:
Applicant Remarks 						-Receipt Date 01/12/2021
Amended Claims 						-Receipt Date 01/12/2021

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 01/12/2021 has been entered.

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 01/12/2021 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Response to Amendment
This office action is in response to the amendment filed on 01/12/2021. Claims 1, 3-11, and 13-26 are pending. Claims 1, 3, 10-11, 13-14, 17, 19, 21-22, and 25 are amended. Claim 2 is canceled. 

Response to Arguments
Applicant’s arguments with respect to claim(s) 1, 22, and 25 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument. 

Claim Objections
Claim 1 is objected to because of the following informalities:  
Claim 1 line 13- “a post processing unit” should be “a post-processing unit”
Appropriate correction is required.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 1, 3-11, 13-21, and 25-26 are rejected under 35 U.S.C. 103 as being unpatentable over Phelps et al. US 2018/0336164 (hereinafter, Phelps) in view of Lacy et al. US 2018/0260220 (hereinafter, Lacy) and Minoya et al. US 2015/0331832 (hereinafter, Minoya).
Regarding claim 1, Phelps teaches:
1. A microprocessor system, comprising: 
a computational array that includes a plurality of computation units, wherein the computation units are grouped into a plurality of lanes ([0057] and [0063]: the matrix multiply unit, i.e. a computational array, includes a plurality of multiply-add sub unit cells, i.e. computation units, which are grouped into columns/lanes); and 
a vector computational unit in communication with the computational array ([0054] and [0057]-[0058]: the vector processing unit 106 is in communication with the matrix multiply unit, see also Fig. 1B communication between 106 and 113), the vector computational unit comprising a plurality of processing elements ([0053]: the vector processing unit includes a plurality of lanes/processing elements), 
wherein the processing elements are arranged as a vector ([0053]: the lanes/processing elements of the vector processing unit are arranged as a vector) and configured to process the received output data elements in parallel to form a processing result ([0057]-[0058]: the 128 received output results from the MXU are processed by the vector lanes in parallel, i.e. to form a processing result)
	Phelps does not explicitly teach:
wherein the plurality of lanes are associated with respective first-in-first-out queues;
wherein each processing element from the plurality of processing elements is a configured to receive an output data element from an individual first-in-first-out queue associated with a respective lane of the plurality of lanes,
wherein the vector computational unit is configured to directly provide the processing result to a post processing unit in communication with the vector computational unit.
However, Lacy discloses further details regarding the vector processing unit disclosed in Phelps (see Abstract). In particular, Lacy teaches:
wherein the plurality of lanes are associated with respective first-in-first-out queues ([0065] and [0072]-[0073]: each of 128 VPU lanes sends 8 data words to the MXU 110 and MXU 110 shifts out 128 results, one for each lane each clock cycle, thus the MXU is grouped into lanes associated with a respective matrix result fifo mrf 114, i.e. respective first-in-first-out queues)
wherein each processing element from the plurality of processing elements is a configured to receive an output data element from an individual first-in-first-out queue associated with a respective lane of the plurality of lanes (([0065] and [0072]-[0073]: each lane of the VPU receives output data elements from an individual mrf 114 associated with the respective lane from the MXU);
	It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the MXU of Phelps to receive output data elements from respective FIFO queues associated with each lane as further disclosed by Lacy. One of ordinary skill in the art would have been motivated to make this modification to exploit parallelism across VPU lanes (Lacy [0073]). 
	Further, in the analogous art of neural networks, Minoya teaches:
wherein an activation unit is configured to directly provide a processing result to a post processing unit in communication with the activation unit ([0031]-[0032] and Fig. 4: activation portion 103 performs a nonlinear function on outputs from 102 to form processing results and directly provides the processing result to a pooling portion/post processing unit in communication with the activation unit).
	It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify Phelps in view of Lacy to include a pooling portion/unit for performing pooling functions as taught by Minoya such that the vector computation unit of Phelps, which performs nonlinear functions (Phelps [0058]) would use the pooling unit to generate pooled values. One of ordinary skill in the art would have been motivated to make this modification because providing a 

	Regarding claim 3, Phelps in view of Lacy and Minoya teaches: 
3. The system of claim 1, wherein the processing elements process in parallel the received output data elements in response to a single processor instruction (Phelps [0053], [0057], and Lacy [0072]-[0073]: the received outputs from the matrix multiply unit is processed in a SIMD manner, i.e. in parallel in response to a single instruction). 

	Regarding claim 4, Phelps in view of Lacy and Minoya teaches: 
4. The system of claim 1, wherein the computational array includes a matrix processor (Phelps [0057]: the matrix multiply unit is a matrix processor). 

	Regarding claim 5, Phelps in view of Lacy and Minoya teaches:
5. The system of claim 1, wherein the computational array is configured to receive two vector input operands (Phelps [0066]: the matrix multiply unit MXU receives two vector input operands from a first and second source bus containing data sent from the vector processing unit). 
	
	Regarding claim 6, Phelps in view of Lacy and Minoya teaches: 
6. The system of claim 1, wherein each computation unit of the plurality of computation units includes an arithmetic logic unit, an accumulator, and a shadow register (Phelps [0063] and [0072]-[0073]: the MXU includes multiply-add sub unit cells which uses a multiplier/ALU, and adds the result to a partial sum to produce a new partial sum, i.e. by an adder/accumulator, the sub unit cells also include a weight matrix register, i.e. a shadow register). 

Regarding claim 7, Phelps in view of Lacy and Minoya teaches: 
7. The system of claim 1, wherein each computation unit of the plurality of computation units is configured to perform a multiply operation and an add operation (Phelps [0063]: the multiply-add subunit cells each perform a multiply operation and an add operation). 

	Regarding claim 8, Phelps in view of Lacy and Minoya teaches: 
8. The system of claim 1, wherein each computation unit of the plurality of computation units is configured to perform a dot-product component operation (Phelps [0063]: the multiply-add is a dot-product component operation). 

	Regarding claim 9, Phelps in view of Lacy and Minoya teaches: 
9. The system of claim 1, wherein each computation unit of the plurality of computation units is configured to compute a dot-product result component in parallel in response to a single computational array instruction (Phelps [0058] and [0063]: the matrix multiply unit computes a number of results, i.e. dot product result components, per cycle, i.e. in parallel; [0045] and [0049]: the operations performed by the matrix multiply unit is in response to an instruction specifying extended vector unit instructions, i.e. a single computational array instruction). 

	Regarding claim 10, Phelps in view of Lacy and Minoya teaches:
10. The system of claim 1, wherein each processing element of the plurality of processing elements includes an arithmetic logic unit configured to perform arithmetic logic unit operations in parallel with other processing elements (Phelps [0053]-[0054] and [0058]: the vector unit lanes operate in parallel with each other and include arithmetic logic units). 

	Regarding claim 11, Phelps in view of Lacy and Minoya teaches:
11. The system of claim 1, wherein a notification signal identifies that output data elements from the computational array are ready for the vector computational unit (Phelps [0057]-[0058]: the results being stored in the vector registers is a notification signal that they are ready for the vector processing unit). 

	Regarding claim 13, Phelps in view of Lacy and Minoya teaches:
13. The system of claim 1, wherein the output data elements from the computational array correspond to dot-product results (Phelps [0057]-[0058]: the output results of the matrix multiply unit are matrix multiply, i.e. dot-product, results). 

	Regarding claim 14, Phelps in view of Lacy and Minoya teaches: 
14. The system of claim 1, wherein the output data elements from the computational array correspond to convolution results performed on image data (Phelps [0071]-[0074] and [0081]: the matrix multiply unit cells perform convolution, i.e. the output of the matrix multiply unit correspond to convolution results, and the image data is the data given to the matrix multiply unit to perform the convolution). 

	Regarding claim 15, Phelps in view of Lacy and Minoya teaches: 
15. The system of claim 3 wherein the single processor instruction is used to calculate a result of a non-linear function (Phelps [0045], [0053], and [0058]: the vector processing unit performs a non-linear function, i.e. calculates a result of a non-linear function, in response to the instruction). 

Regarding claim 16, Phelps in view of Lacy and Minoya teaches:
16. The system of claim 15, 
	Phelps in view of Lacy and Minoya, as currently mapped, does not explicitly teach:
wherein the non-linear function is a rectified linear unit function or a sigmoid function. 
	However, Minoya further teaches:
a non-linear function is a rectified linear unit function or a sigmoid function ([0031]: ReLu or sigmoid function are well-known non-linear functions that may be used as the activation function)
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the non-linear function of Phelps to be a sigmoid function as taught by Minoya. One of ordinary skill in the art would have been motivated to make this modification because ReLu and sigmoid functions are well-known non-linear functions (Minoya [0031]) and would enable learning techniques.

Regarding claim 17, Phelps in view of Lacy and Minoya teaches:
17. The system of claim 1, further comprising the post-processing unit (Minoya [0031]-[0032] and Fig. 4 pooling portion 104 is a post-processing unit). 

Regarding claim 18, Phelps in view of Lacy and Minoya teaches:
18. The system of claim 17, wherein the post-processing unit is configured to perform a 
pooling function (Phelps [0058] and Minoya [0031]-[0032]: the vector processing unit of Phelps will use the pooling portion/post-processing unit of Minoya to perform its pooling function). 

Regarding claim 19, Phelps in view of Lacy and Minoya teaches:
19. The system of claim 1, wherein the received output data elements from the computational array are stored in an accumulator (Phelps [0082]: the matrix multiply unit accumulates the final accumulated results in register accumulators at the bottom of the array before they are transferred to the vector unit). 

	Regarding claim 20, Phelps in view of Lacy and Minoya teaches:
20. The system of claim 19, wherein each processing element of the plurality of processing elements is configured to access a slice of the accumulator and a slice of one or more vector registers (Phelps [0058], [0082], Lacy [0057], and [0072]-[0073]: each lane of the VPU unit accesses respective accumulated results, i.e. a slice of the accumulator, and each lane accesses vector registers, i.e. a slice of one or more vector registers). 

Regarding claim 21, Phelps in view of Lacy and Minoya teaches:
21. The system of claim 1, wherein the vector computational unit further includes a plurality of vector registers sized to fit the output data elements from the computational array (Phelps [0057]-[0058]: the vector processing unit 106 includes vector registers to store the output of the matrix multiply unit, i.e. the vector registers are sized to fit the output data elements). 


25. A method comprising: 
receiving a single processor instruction for a vector computational unit ([0045]: instructions are encoded in the VLIW instruction slots for the vector processing unit, including a single processor instruction), wherein the vector computational unit is in communication with a computational array ([0054] and [0057]-[0058]: the vector processing unit 106 is in communication with the matrix multiply unit, see also Fig. 1B communication between 106 and 113) and includes a plurality of processing elements arranged as a vector ([0053]: the lanes/processing elements of the vector processing unit are arranged as a vector), the processing elements configured to receive output data elements from respective lanes of a plurality of lanes of the computational array  ([0053] and [0057]-[058]: the vector processing unit receives results from respective cells/lanes in the matrix unit that produce those results via a FIFO); 
receiving the output data elements from the computational array ([0053] and [0057]: the vector processing unit receives data from the matrix multiply unit), wherein the computational array includes a plurality of computation units, wherein the computation units are grouped into the plurality of lanes ([0057] and [0063]: the matrix multiply unit, i.e. a computational array, includes a plurality of multiply-add sub unit cells, i.e. computation units, which are grouped into columns/lanes); and 
processing in parallel the received output data elements in response to the single processor instruction ([0053] and [0057]: the received outputs from the matrix multiply unit is processed in a SIMD manner, i.e. in response to a single instruction) to form a processing result ([0053] and [0057]-[0058]: the 128 received output results from the MXU are processed by the vector lanes in parallel, i.e. to form a processing result)

wherein each processing element is configured to receive an output data element from an individual first-in-first-out queue associated with a respective lane of the plurality of lanes; 
wherein the vector computational unit is configured to directly provide the processing result to a post processing unit in communication with the vector computational unit.
However, Lacy discloses further details regarding the vector processing unit disclosed in Phelps (see Abstract). In particular, Lacy teaches:
wherein each processing element is configured to receive an output data element from an individual first-in-first-out queue associated with a respective lane of the plurality of lanes ([0065] and [0072]-[0073]: each of 128 VPU lanes sends 8 data words to the MXU 110 and MXU 110 shifts out 128 results, one for each lane each clock cycle, thus the MXU is grouped into lanes associated with a respective matrix result fifo mrf 114, i.e. respective first-in-first-out queues, each lane of the VPU receives output data elements from an individual mrf 114 associated with the respective lane from the MXU);)
	It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the MXU of Phelps to receive output data elements from respective FIFO queues associated with each lane as further disclosed by Lacy. One of ordinary skill in the art would have been motivated to make this modification to exploit parallelism across VPU lanes (Lacy [0073]). 
	Further, in the analogous art of neural networks, Minoya teaches:
wherein an activation unit is configured to directly provide a processing result to a post processing unit in communication with the activation unit ([0031]-[0032] and Fig. 4: activation portion 103 performs a nonlinear function on outputs from 102 to form processing results and directly provides the processing result to a pooling portion/post processing unit in communication with the activation unit).


Regarding claim 26, Phelps in view of Lacy and Minoya teaches:
	26. The system of claim 1, wherein the computation units are arranged as an MxN matrix (Phelps [0063]: the matrix unit is arranged in rows and columns, i.e. the computation units of the matrix unit are arranged as an MxN matrix), and wherein processing elements are arranged as a 1xN vector ([0053]: the vector processing unit is arranged as a 1xN vector of lanes).

Claims 22-24 are rejected under 35 U.S.C. 103 as being unpatentable over Phelps et al. US 2018/0336164 (hereinafter, Phelps) in view of Minoya et al. US 2015/0331832 (hereinafter, Minoya).
	Regarding claim 22, Phelps teaches: 
22. A microprocessor system, comprising: 
a computational array that includes a plurality of computation units, wherein the computation units are grouped into a plurality of lanes ([0057] and [0063]: the matrix multiply unit, i.e. a computational array, includes a plurality of multiply-add sub unit cells, i.e. computation units, which are grouped into columns/lanes), and wherein each computation unit of the plurality of computation units is configured to perform a dot-product component operation in response to a single computational array instruction ([0058] and [0063]: the matrix multiply unit computes a number of results, i.e. dot product result components, per cycle; [0045] and [0049]: the operations performed by the matrix multiply unit is in response to an instruction specifying extended vector unit instructions, i.e. a single computational array instruction); and 
a vector computational unit in communication with the computational array ([0054] and [0057]-[0058]: the vector processing unit 106 is in communication with the matrix multiply unit, see also Fig. 1B communication between 106 and 113), wherein the vector computational unit includes a plurality of processing elements ([0053]: the vector processing unit includes a plurality of lanes/processing elements), wherein the processing elements are configured to receive output data elements from respective lanes ([0053] and [0057]-[058]: the outputs from the matrix multiply unit lanes are received by lanes of the vector processing unit) and process the received output data elements in response to a single vector computational unit instruction ([0053] and [0057]: the received outputs from the matrix multiply unit is processed in a SIMD manner, i.e. in parallel in response to a single instruction)
wherein the processing elements are arranged as a vector ([0053]: the lanes/processing elements of the vector processing unit are arranged as a vector) and configured to process the received output data elements in parallel to form a processing result ([0057]-[0058]: the 128 received output results from the MXU are processed by the vector lanes in parallel, i.e. to form a processing result)
	Phelps does not explicitly teach:
wherein the vector computational unit is configured to directly provide the processing result to a post processing unit in communication with the vector computational unit.

wherein an activation unit is configured to directly provide a processing result to a post processing unit in communication with the activation unit ([0031]-[0032] and Fig. 4: activation portion 103 performs a nonlinear function on outputs from 102 to form processing results and directly provides the processing result to a pooling portion/post processing unit in communication with the activation unit).
	It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify Phelps to include a pooling portion/unit for performing pooling functions as taught by Minoya such that the vector computation unit of Phelps, which performs nonlinear functions (Phelps [0058]) would use the pooling unit to generate pooled values. One of ordinary skill in the art would have been motivated to make this modification because providing a separate unit for performing a function is a known technique on the known device of a computer processor for increasing processing resources and would yield the predictable result of speeding up processing by freeing up computation resources on the vector unit. 

Regarding claim 23, Phelps in view of Minoya teaches:
23. The system of claim 22, further comprising: 
a control unit configured to provide the single computational array instruction to the computational array (Phelps [0049]: instructions that specify extended vector unit operations for the matrix multiply unit are provided, i.e. by a control unit, to the matrix multiply unit) and the single vector computational unit instruction to the vector computational unit (Phelps [0045]: instructions are encoded in the VLIW instruction slots for the vector processing unit and the unit providing the instruction is a control unit). 

	Regarding claim 24, Phelps in view of Minoya teaches: 
24. The system of claim 23, wherein the control unit synchronizes the output data elements transferred from the computational array to the processing elements of the vector computational unit (Phelps [0057]: the multiply result FIFO synchronizes the output of the matrix multiply unit to the vector processing unit and is part of the control unit since it is used to control execution on the vector processing unit). 

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to KASIM ALLI whose telephone number is (571)270-1476.  The examiner can normally be reached on Monday - Friday 9am 5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.  
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Aimee Li can be reached on 5712724169.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.






/KASIM ALLI/Examiner, Art Unit 2183