DETAILED ACTION
Status of Claims 
Claims 1-3, 5-7, 10-17, and 20 have been considered. It is hereby acknowledged that the following papers have been received and placed of record in the file:
Applicant Remarks 						-Receipt Date 03/19/2021
Amended Claims 						-Receipt Date 03/19/2021

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 03/19/2021 has been entered.

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 03/22/2021, 04/29/2021, 06/17/2021, and 07/16/2021 are in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Response to Amendment
This office action is in response to the amendment filed on 03/19/2021. Claims 1-3, 5-7, 10-17, and 20 are pending. Claims 1, 5-7, 15-16, and 20 are amended. Claims 4, 8-9, and 18-19 are canceled. 

Response to Arguments
Applicant’s arguments, see Remarks pages 7-9, filed 03/19/2021, with respect to the rejection(s) of claim(s) 1 and 20 under 35 U.S.C. 102 have been fully considered and are persuasive.  Therefore, the rejection has been withdrawn.  However, upon further consideration, a new ground(s) of rejection is made over Phelps et al. US 2018/0336164 (hereinafter, Phelps) in view of Lacy et al. US 2018/0260220 (hereinafter, Lacy), Barry et al. US 6,446,190 (hereinafter, Barry), and Sugimoto US 2002/0169942.

Claim Objections
Claims 1-2, 5-7, 10, 12, 15-17, and 20 are objected to because of the following informalities:  
Claim 1 line 9- “the single processor instruction specifies at least three differenta plurality of component instructions” should be “the single processor instruction specifies at least three different component instructions”
Claim 1 line 20- “the different component instructions” should be “different component instructions”
Claim 1 line 23- “a first of the component instruction” should be “a first component instructions” 
Claim 1 line 23- “the first processor instruction” should be “a first processor instruction”
Claim 1 line 26- “the first of the component instructions” should be “the first component instruction”
Claim 1 line 26- “second of the component instructions” should be “the second component instruction”
Claim 1 line 27- “third of the component instructions” should be “the third component instruction”
Claim 2- “the component instructions” should be “the at least three different component instructions”
Claim 5- “the three component instructions” should be “the at least three different component instructions”
Claims 6 and 16- “the first processor instruction” should be “a first processor instruction” or “ a first processor instruction” should be introduced earlier
Claim 7- “the different component instructions” should be “the at least three different component instructions”
Claims 10, 12, 15, and 17- “the plurality of component instructions” should be “the at least three different component instructions”
Claim 20- “the single processor instruction” should be “a single processor instruction”
Claim 20- “and third of the different component instructions” should be “and the third of the different component instructions” 
Appropriate correction is required.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the 

Claims 1-3, 5-7, 10-17, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Phelps et al. US 2018/0336164 (hereinafter, Phelps) in view of Lacy et al. US 2018/0260220 (hereinafter, Lacy), Barry et al. US 6,446,190 (hereinafter, Barry), and Sugimoto US 2002/0169942.
Regarding claim 1, Phelps teaches:
1. A microprocessor system, comprising: 
a vector computational unit that includes a plurality of processing elements arranged as a vector ([0053]-[0054]: vector processing unit 106 includes a 2D array of vector processing units in which a plurality of processing elements in a first dimension are arranged as a vector along a second dimension), wherein each processing element is configured to receive data elements from a lane of a plurality of lanes of a computational array ([0057]-[0058]: the vector processing unit, i.e. each of its processing elements, receives outputs of the matrix multiply unit/computational array, where the matrix multiply unit is divided into cells that data moves across, i.e. in lanes of a plurality of lanes of the matrix multiply unit); and 
a control unit configured to provide at least a single processor instruction to the vector computational unit ([0061]: instruction decode and issue 102 is a control unit that provides VLIW instructions to the vector unit), the control unit configured to synchronize receipt of the data elements from the plurality of lanes to respective processing elements ([0057] and [0060]: the instruction issue unit 102/control unit issues an instruction that causes the vector processing unit, i.e. respective processing elements, to grab/receive result elements that reside in the multiply result FIFO, i.e. data elements from the plurality of lanes, in one clock cycle; that is, an instruction from a control unit synchronizes, i.e. coordinates/causes to occur at the same time, receipt of the data elements that come from the lanes of the matrix unit and go to the vector processing unit);
wherein the single processor instruction specifies at least three different component instructions to be executed by the vector computational unit in response to the single processor instruction ([0045]: the VLIW instruction includes slots to specify instructions to be executed by the vector processing unit; [0054]: each vector unit of the vector processing unit executes two ALU instructions, one load, and one store instruction, i.e. the VLIW instruction specifies a load, store, and ALU instruction to be executed by the vector processing unit) and each of the plurality of processing elements of the vector computational unit is configured to process the received data elements in parallel with other processing elements in response to the single processor instruction ([0053] and [0058]: the processing elements of the vector unit execute in a SIMD manner to process the received data elements in parallel with other processing elements), 
	Phelps does not explicitly teach: 
		wherein each processing element comprises an arithmetic logic unit (ALU)
wherein the at least three different component instructions utilize different hardware resources of each of the processing elements included in the vector computational unit, the hardware resources of each of the processing elements comprising, at least, the ALU, 
wherein for a particular clock cycle, the processing elements are configured to execute the different component instructions of different single processor instructions, wherein the different single processor instructions are executed using staggered starts by the vector computational unit, 
wherein a first of the component instructions is specified by the first processor instruction, a second component instruction is specified by a second processor instruction, and a third component instruction is specified by a third processor instruction, and 
wherein the first of the component instructions, second of the component instructions, and third of the component instructions, utilize different hardware resources of the vector computational unit during the particular clock cycle.
	However, Lacy teaches further details regarding the vector processing unit of Phelps (see Abstract). In particular, Lacy teaches:
wherein each processing element comprises an arithmetic logic unit (ALU) ([0045]: each sublane/processing element includes an ALU)
	It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the vector units of Phelps to each include an ALU as taught by Lacy. One of ordinary skill in the art would have been motivated to make this modification because using an ALU in each vector unit is a known technique on the known device of a computer processor for vectorizing ALU operations and would yield the predictable result of speeding ALU operations. 
	Further, Barry teaches:
different hardware resources of each of the processing elements (col 7 lines 53-55 and Fig. 1A: each processing element includes an ALU, load unit, and store unit)
	It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the processing elements of Phelps in view of Lacy to further include a load unit and store unit as taught by Barry. This combination would teach:
wherein the at least three different component instructions utilize different hardware resources of each of the processing elements included in the vector computational unit (each processing element of Phelps in view of Lacy and Barry would include an ALU, load unit, and store unit for the ALU instruction, load instruction, and store instruction of Phelps to utilize), the hardware resources of each of the processing elements comprising, at least, the ALU (Lacy teaches that each vector unit in the vector processing unit includes an ALU), 
One of ordinary skill in the art would have been motivated to make this modification because providing dedicated execution units is a known technique on the known device of a computer processor for executing specific instructions and would yield the predictable result of speeding up processing time by increasing processing resources.
	Furthermore, in the analogous art of VLIW processors, Sugimoto teaches:
wherein for a particular clock cycle, the processing elements are configured to execute the different component instructions of different single processor instructions, wherein the different single processor instructions are executed using staggered starts ([0045] and Fig. 2: in cycle T5, different component instructions INT1, MUL, and LD of different VLIW instructions 1-3 are executed one step at a time/using staggered starts), 
wherein a first of the component instructions is specified by the first processor instruction, a second component instruction is specified by a second processor instruction, and a third component instruction is specified by a third processor instruction ([0045] and Fig. 2: INT1 is specified by instruction 1, MUL is specified by instruction 2, and LD is specified by instruction 3), and 
wherein the first of the component instructions, second of the component instructions, and third of the component instructions, utilize different hardware resources during the particular clock cycle ([0045] and Fig. 2: INT1, MUL, and LD utilize execution pipelines/hardware resources 33, 32, and 31 respectively during the T5 cycle).


	Regarding claim 2, Phelps in view of Lacy, Barry, and Sugimoto teaches:
2. The system of claim 1, wherein the component instructions include an encoded memory access operation component instruction and an encoded arithmetic logic unit operation component instruction (Phelps [0054]: the component instructions executed by each vector unit includes a load, i.e. an encoded memory access operation component instruction, and an ALU instruction, i.e. an encoded arithmetic logic unit operation component instruction).

	Regarding claim 3, Phelps in view of Lacy, Barry, and Sugimoto teaches:
3. The system of claim 2, wherein the encoded memory access operation component instruction is an encoded load operation component instruction (Phelps [0054]: the component instructions executed by each vector unit includes a load, i.e. an encoded load operation component instruction) or an encoded store operation component instruction.

	Regarding claim 5, Phelps in view of Lacy, Barry, and Sugimoto teaches:
5. The system of claim 1, wherein the three component instructions include an encoded load operation component instruction, an encoded arithmetic logic unit operation component instruction, and an encoded store operation component instruction (Phelps [0054]: the component instructions executed by each vector unit includes a load, ALU, and store instruction; [0045]: the VLIW instruction specifies component instructions for the vector processing unit, i.e. the three instructions executed by each processing unit are component instructions).

Regarding claim 6, Phelps in view of Lacy, Barry, and Sugimoto teaches:
6. The system of claim 5, wherein for the particular clock cycle of the vector computational unit, a load operation associated with the first processor instruction, an arithmetic logic unit operation associated with the second processor instruction executed in parallel (Sugimoto [0045] and Fig. 2: in cycle T5 a load operation is executed in parallel with an INT/ALU operation each associated with different instructions)
	Phelps in view of Lacy, Barry, and Sugimoto does not explicitly teach:
the load, ALU, and a store operation associated with the third processor instruction, are executed in parallel.
However, Phelps further teaches:
a store operation ([0054]: the vector units may execute a store instruction) associated with the third processor instruction ([0045]: the store instruction may be a component instruction of a VLIW instruction)
	It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to further modify Phelps in view of Lacy, Barry, and Sugimoto to execute the store operation associated with a third instruction in Phelps in parallel as taught by Sugimoto by using staggered starts at taught by Sugimoto. One of ordinary skill in the art would have been motivated to make this modification to reduce data hazards among the VLIW instructions and enhance program processing performance (Sugimoto [0045]).

	Regarding claim 7, Phelps in view of Lacy, Barry, and Sugimoto teaches:
7. The system of claim 6, wherein the load operation, the arithmetic logic unit operation, and the store operation (Phelps [0054]: each vector unit executes a load, ALU, and store operation) correspond to the different component instructions of the different single processor instructions (Sugimoto [0045] and Fig. 2: the different component instructions being executed in parallel correspond to different VLIW instructions).

Regarding claim 10, Phelps in view of Lacy, Barry, and Sugimoto teaches:
10. The system of claim 1, wherein the vector computational unit is configured to process an execute stage for each of the plurality of component instructions in parallel (Sugimoto [0045] and Fig. 2: the execute stage for each of the component instructions are executed in parallel in cycle T5).

Regarding claim 11, Phelps in view of Lacy, Barry, and Sugimoto teaches:
11. The system of claim 1, wherein the vector computational unit includes a plurality of vector registers, a control logic, an input buffer, and an output buffer (Phelps [0054]: the vector processing unit includes vector registers; Phelps [0047]: the vector instruction dispatcher is control logic; Phelps [0057]-[0058]: the multiply result FIFO is an input buffer and the vector memory is an output buffer).

Regarding claim 12, Phelps in view of Lacy, Barry, and Sugimoto teaches:
12. The system of claim 1, wherein one of the plurality of component instructions references one or more vector registers of the vector computational unit (Phelps [0054]: the component instructions use/reference the vector registers as operands).


13. The system of claim 1, wherein the vector computational unit includes one or more aliased vector registers (Phelps [0054]: the vector processing unit includes vector register V1 which is a vector register named/aliased as V1).

Regarding claim 14, Phelps in view of Lacy, Barry, and Sugimoto teaches:
14. The system of claim 13, wherein the one or more aliased vector registers include an aliased 8-bit vector register, an aliased 16-bit vector register, or an aliased 32-bit vector register (Phelps [0057]: each vector register contains 32-bits, i.e. is an aliased 32-bit vector register).

Regarding claim 15, Phelps in view of Lacy, Barry, and Sugimoto teaches:
15. The system of claim 1, 
Phelps in view of Lacy, Barry, and Sugimoto, as currently mapped, does not explicitly teach:
wherein one of the plurality of component instructions references three source registers and one destination register.
	However, Barry further teaches:
a MAC instruction references three source registers and one destination register (col 14 lines 35-37: the MAC instruction references source registers Rt, Rx, Ry and destination register Rt)
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the multiply accumulate operation in Phelps in view of Lacy, Barry, and Sugimoto to reference three source registers and one destination register as further taught by Barry. One of ordinary skill in the art would have been motivated to make this modification because 

Regarding claim 16, Phelps in view of Lacy, Barry, and Sugimoto teaches:
16. The system of claim 1, wherein the first single processor instruction encodes a vector mask move instruction (Phelps [0054]: the ALU of the vector unit performs mask operations, i.e. a VLIW instruction encodes a vector mask move component instruction that is received by the vector processing unit).

Regarding claim 17, Phelps in view of Lacy, Barry, and Sugimoto teaches:
17. The system of claim 1, wherein one of the plurality of component instructions includes a vector bit mask, a register size field (Phelps [0054]: the field of the add instruction specifying a register, i.e. V1, V2, or V3,  is a register size field since it has a size to specify a register), a mask bit, or an immediate valid bit.


	Regarding claim 20, Phelps teaches
20. A method comprising: 
receiving a plurality of processor instructions ([0041]: each core receives a stream/plurality of VLIW/processor instructions) from a control unit ([0061]: the instructions are received from the instruction decode and issue 102, i.e. a control unit), the plurality of processor instructions comprising a first processor instruction, a second processor instruction, and a third processor instruction, wherein each of the processor instruction specifies a plurality of component instructions ([0045]-[0047]: the plurality of VLIW instructions includes first, second, and third instructions, and each VLIW instruction includes slots to specify a plurality of component instructions); 
decoding each of the processor instructions into the plurality of component instructions ([0045]: the VLIW instructions are decoded into component scalar and vector instructions); 
using a vector computational unit that includes a plurality of processing elements to execute the decoded plurality of component instructions in response to the single processor instruction, the processing elements arranged as a vector ([0053]-[0054]: vector processing unit 106 includes a 2D array of vector processing units in which a plurality of processing elements in a first dimension are arranged as a vector along a second dimension; [0047]: the processing elements receive the decoded vector/component instructions in response to a VLIW instruction) and configured to receive data elements from a lane of a plurality of lanes of a computational array ([0057]-[0058]: the vector processing unit, i.e. each of its processing elements, receives outputs of the matrix multiply unit/computational array, where the matrix multiply unit is divided into cells that data moves across, i.e. in lanes of a plurality of lanes of the matrix multiply unit), 
using each of the plurality of processing elements to process respective received data elements in parallel with other processing elements ([0053] and [0058]: the processing elements of the vector unit execute in a SIMD manner to process the received data elements in parallel with other processing elements), 
wherein the control unit is configured to synchronize receipt of the data elements from the plurality of lanes to respective processing elements ([0057] and [0060]: the instruction issue unit 102/control unit issues an instruction that causes the vector processing unit, i.e. respective processing elements, to grab/receive result elements that reside in the multiply result FIFO, i.e. data elements from the plurality of lanes, in one clock cycle; that is, an instruction from a control unit synchronizes, i.e. coordinates/causes to occur at the same time, receipt of the data elements that come from the lanes of the matrix unit and go to the vector processing unit)
 	Although Phelps teaches the processor core including ALUs (Fig. 1C 126) and the VLIW instruction specifying instructions for the vector processing unit ([0045]), Phelps does not teach:
wherein each processing element comprises an arithmetic logic unit (ALU), and wherein different component instructions utilize different hardware resources of each of the processing elements included in the vector computational unit; and
wherein for a particular clock cycle, the processing elements are configured to execute different component instructions of the plurality of processor instructions, wherein the plurality of processor instructions are executed using staggered starts, 
wherein a first of the different component instructions is specified by the first processor instruction, a second of the different component instructions is specified by the second processor instruction, and a third of the different component instructions is specified by the third processor instruction, and 
wherein the first of the different component instructions, the second of the different component instructions, and third of the different component instructions, utilize different hardware resources of the vector computational unit during the particular clock cycle.
		However, Lacy teaches further details regarding the vector processing unit of Phelps (see Abstract). In particular, Lacy teaches:
wherein each processing element comprises an arithmetic logic unit (ALU) ([0045]: each sublane/processing element includes an ALU)
	It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the vector units of Phelps to each include an ALU as taught by Lacy. One of ordinary skill in the art would have been motivated to make this modification because using an ALU in each vector unit is a known technique on the known device of a computer processor for vectorizing ALU operations and would yield the predictable result of speeding ALU operations. 
	Further, Barry teaches:
different hardware resources of each of the processing elements (col 7 lines 53-55 and Fig. 1A: each processing element includes an ALU, load unit, and store unit)
	It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the processing elements of Phelps in view of Lacy to further include a load unit and store unit as taught by Barry. This combination would teach:
wherein different component instructions utilize different hardware resources of each of the processing elements included in the vector computational unit (each processing element of Phelps in view of Lacy and Barry would include an ALU, load unit, and store unit for the ALU instruction, load instruction, and store instruction of Phelps to utilize), 
One of ordinary skill in the art would have been motivated to make this modification because providing dedicated execution units is a known technique on the known device of a computer processor for executing specific instructions and would yield the predictable result of speeding up processing time by increasing processing resources.
Furthermore, in the analogous art of VLIW processors, Sugimoto teaches:
wherein for a particular clock cycle, the processing elements are configured to execute different component instructions of the plurality of processor instructions, wherein the plurality of processor instructions are executed using staggered starts ([0045] and Fig. 2: in cycle T5, different component instructions INT1, MUL, and LD of different VLIW instructions 1-3 are executed one step at a time/using staggered starts), 
wherein a first of the different component instructions is specified by the first processor instruction, a second of the different component instructions is specified by the second processor instruction, and a third of the different component instructions is specified by the third processor instruction ([0045] and Fig. 2: INT1 is specified by instruction 1, MUL is specified by instruction 2, and LD is specified by instruction 3), and 
wherein the first of the different component instructions, the second of the different component instructions, and third of the different component instructions, utilize different hardware resources of the vector computational unit during the particular clock cycle ([0045] and Fig. 2: INT1, MUL, and LD of the different component instructions utilize execution pipelines/hardware resources 33, 32, and 31 respectively during the T5 cycle).
	It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the vector processing unit of Phelps in view of Lacy and Barry to execute the different component instructions using staggered starts as taught by Sugimoto. One of ordinary skill in the art would have been motivated to make this modification to reduce data hazards among the VLIW instructions and enhance program processing performance (Sugimoto [0045]).

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to KASIM ALLI whose telephone number is (571)270-1476.  The examiner can normally be reached on Monday - Friday 9am 5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.  
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Aimee Li can be reached on 5712724169.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/KASIM ALLI/Examiner, Art Unit 2183