DETAILED ACTION

Status of Application
Claims 1-14 and 21-29 are pending in the present application.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 06/23/2021 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.


Response to Arguments
Applicant’s arguments with respect to claim(s) 1, 9, and 21 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:


Claims 1-3, 5, 8-9, 11-13, 21, and 24-26 is/are rejected under 35 U.S.C. 103 as being unpatentable over Hook et al (hereinafter Hook), U.S. Publication No. 2009/0249039 A1, in view of Espasa et al (hereinafter Espasa), U.S. Publication No. 2015/0277904 A1, in view of Boggs et al (hereinafter Boggs), U.S. Publication No. 2006/0095713 A1.
	Referring to claims 1, 9, and 21, Hook discloses a processor [fig. 2] comprising:
fetch circuitry to fetch a single vector multiplication instruction having fields for an opcode, first and second source identifiers, [paragraphs 8, 13, 38, an arithmetic operation such as add and multiply on SIMD vectors; an arithmetic instruction is fetched from memory and is decoded. Then, the first vector register and the second vector register are read from the register file as specified in the arithmetic instruction; The instruction opcode specifies an arithmetic operation such as add, multiply, or subtract in its opcode field];
decode circuitry to decode the fetched vector multiplication instruction [paragraph 13, “Next, an arithmetic instruction is fetched from memory and is decoded”]; and 
execution circuitry to, on each of a plurality of pairs of corresponding fixed-sized elements of the identified first and second sources, execute the decoded vector multiplication instruction to:
generate a signed fixed-sized result by rounding the most significant fixed-sized portion of the product to fit into the identified destination [paragraph 13, 63, 
Hook does not explicitly disclose the instruction having a destination identifier;
generating a double-sized product of each pair of fixed-sized elements, the doublesized product being represented by at least twice a number of bits of the fixed size.
However, Espasa discloses the instruction having a destination identifier [paragraph 8, It should be understood that the term destination vector operand (or destination operand) is defined as the direct result of performing the operation specified by an instruction, including the storage of that destination operand at a location (be it a register or at a memory address specified by that instruction)];
generating a double-sized product of each pair of fixed-sized elements, the doublesized product being represented by at least twice a number of bits of 
One of ordinary skill in the art before the effective filing date of the claimed invention would have clearly recognized that it is quite advantageous for the processor of Hook to provide improved power consumption as only one instruction is decoded. It is for this reason one of ordinary skill in the art would have been motivated to implement having a destination identifier; generating a double-sized product of each pair of fixed-sized elements, the doublesized product being represented by at least twice a number of bits of the fixed size.
The modified Hook does not explicitly disclose the rounding according to a rounding control, wherein the rounding control is selectable between truncating, rounding up, and convergent rounding.
However, Boggs discloses the rounding according to a rounding control, wherein the rounding control is selectable between truncating, rounding up, and convergent rounding [paragraph 50, “In some embodiments, rounding is performed on the result data”; “in some such embodiments, a rounding mode control register value ROUND_MODE specifies a rounding mode (such as ceiling, floor, zero, or nearest)”; 
One of ordinary skill in the art before the effective filing date of the claimed invention would have clearly recognized that it is quite advantageous for the processor of the modified Hook to provide a single instruction that performs clipping, picking, rounding, and packing in a single operation as opposed to performing a multi-instruction sequence. It is for this reason one of ordinary skill in the art would have been motivated to implement the rounding according to a rounding control, wherein the rounding control is selectable between truncating, rounding up, and convergent rounding.
Referring to claim 2, the modified Hook discloses the processor of claim 1, wherein the fixed size is 16 bits, and wherein the execution circuitry is further to execute the decoded vector multiplication instruction in parallel on every element of the first and second identified sources [Hook, paragraphs 29, 39, arithmetic operation operates on the corresponding elements of the registers independently and in parallel; operate on data that is the full width of the local on-chip memories, up to 64 bits, and this allows parallel operations on 8 8 bit, 4 16-bit, 2, 32 bit, or 1 64 bit elements in one cycle].
Referring to claims 3 and 12, taking claim 3 as exemplary, the modified Hook discloses the processor of claim 1, wherein the first source identifier, the second source identifier, and the destination identifier each identifies a same-sized one of a 32-bit general purpose register, a 64-bit general purpose register, a 128-bit vector register, a 256-bit vector register, and a 512-bit vector register, having 16-bit elements [Hook, paragraph 13, fig. 9, data element comprises N bits, resulting element is transformed into N-bit width element; Espasa, paragraph 165].
Referring to claims 5 and 13, taking claim 5 as exemplary, the modified Hook discloses the processor of claim 1, wherein the execution circuitry is to execute the decoded vector multiplication instruction in parallel on every corresponding pair of fixed-size elements [Hook, paragraphs 29, 39, arithmetic operation operates on the corresponding elements of the registers independently and in parallel; operate on data that is the full width of the local on-chip memories, up to 64 bits, and this allows parallel operations on 8 8 bit, 4 16-bit, 2, 32 bit, or 1 64 bit elements in one cycle].
Referring to claims 8, 11, and 24, taking claim 8 as exemplary, the modified Hook discloses the processor of claim 1, further comprising a software-accessible 
Referring to claim 25, the modified Hook discloses the system of claim 21, wherein the vector multiplication instruction further identifies a write mask to conditionally control per-element computational operation and updating of results to the identified destination [Espasa, paragraphs 111, 131, write mask field allows for partial vector operations, including loads, stores, arithmetic, logical, etc.; and the combination of write mask field and data element width field creates typed instructions in that they allow the mask to be applied based on different data element widths]. 
Referring to claim 26, the modified Hook discloses the method of claim 9, further comprising, after decoding the fetched vector multiplication instruction, retrieving data associated with the identified first and second sources, and scheduling execution of the decoded vector multiplication instruction [Hook, paragraphs 8, 13, 38, The present invention first loads, from a memory, a first set of data elements into a first vector register and a second set of data elements into a second vector register. Each data element comprises N bits. Next, an arithmetic instruction is fetched from memory and is decoded. Then, the first vector register and the second vector register are read from the register file as specified in the arithmetic instruction. The present invention then executes the arithmetic instruction on corresponding data elements in the first and second vector registers].
Claims 4, 10, 22, and 23 is/are rejected under 35 U.S.C. 103 as being unpatentable over Hook, in view of Espasa, in view of Boggs, as applied to claims 1 and 9 above, and further in view of Desai et al (hereinafter Desai), U.S. Publication No. 2003/0014457 A1.
Referring to claims 4 and 22, taking claim 4 as exemplary, the modified Hook discloses the processor of claim 1, wherein the fixed size is 16 bits [Hook, paragraphs 29, 39, arithmetic operation operates on the corresponding elements of the registers independently and in parallel; operate on data that is the full width of the local on-chip memories, up to 64 bits, and this allows parallel operations on 8 8 bit, 4 16-bit, 2, 32 bit, or 1 64 bit elements in one cycle; Espasa, paragraph 99, “alternative embodiments may support more, less and/or different vector operand sizes”; Hook, paragraphs 29, 35]. 
The modified Hook does not explicitly disclose wherein the identified first source, second source, and destination each comprises a 128-bit vector register.
However, Desai discloses wherein the identified first source, second source, and destination each comprises a 128-bit vector register [paragraph 36, For example, a vector MAC operation on eight 16-bit data elements produces eight 32-bit product elements, which are added to the current contents of vector accumulator registers 512 and 514, and the results of the operation can be locally stored back into the vector accumulator registers 512 and 514 (each of which provide 128-bits of storage)], in order to provide high throughput signal processing without adding significantly to the cost or power consumption [paragraph 3].
One of ordinary skill in the art before the effective filing date of the claimed invention would have clearly recognized that it is quite advantageous for the processor of the modified Hook to provide high throughput signal processing without adding significantly to the cost or power consumption. It is for this reason one of ordinary skill in 
Referring to claims 10 and 23, taking claim 10 as exemplary, the modified Hook discloses the method of claim 9, wherein the fixed size is 16 bits [Hook, paragraphs 29, 39, arithmetic operation operates on the corresponding elements of the registers independently and in parallel; operate on data that is the full width of the local on-chip memories, up to 64 bits, and this allows parallel operations on 8 8 bit, 4 16-bit, 2, 32 bit, or 1 64 bit elements in one cycle; Espasa, paragraph 99, “alternative embodiments may support more, less and/or different vector operand sizes”; Hook, paragraphs 29, 35], and further comprising executing, by the execution circuitry, the decoded vector multiplication instruction in parallel on every element of the first and second identified sources [Hook, paragraphs 29, 39, arithmetic operation operates on the corresponding elements of the registers independently and in parallel; operate on data that is the full width of the local on-chip memories, up to 64 bits, and this allows parallel operations on 8 8 bit, 4 16-bit, 2, 32 bit, or 1 64 bit elements in one cycle].
The modified Hook does not explicitly disclose wherein the identified first source, second source, and destination each comprises a 128-bit vector register.
However, Desai discloses wherein the identified first source, second source, and destination each comprises a 128-bit vector register [paragraph 36, For example, a vector MAC operation on eight 16-bit data elements produces eight 32-bit product elements, which are added to the current contents of vector accumulator registers 512 and 514, and the results of the operation can be locally stored back into the vector accumulator registers 512 and 514 (each of which provide 128-bits of storage)], in order 
One of ordinary skill in the art before the effective filing date of the claimed invention would have clearly recognized that it is quite advantageous for the processor of the modified Hook to provide high throughput signal processing without adding significantly to the cost or power consumption. It is for this reason one of ordinary skill in the art would have been motivated to implement wherein the identified first source, second source, and destination each comprises a 128-bit vector register.
Claims 7 is/are rejected under 35 U.S.C. 103 as being unpatentable over Hook, in view of Espasa, in view of Boggs, as applied to claim 3 above, and further in view of Damron, U.S. Publication No. 2011/0145543 A1.
Referring to claim 7, the modified Hook does not explicitly disclose the processor of claim 3, wherein the vector multiplication instruction further comprises a vector size identifier to specify the size of the same-sized register.
	However, Damron discloses wherein the vector multiplication instruction further comprises a vector size identifier to specify the size of the same-sized register [paragraph 48, executed instruction specifies setting the width of the vector register], in order to provide a single set of instructions that can be utilized for vector processing while supporting many different vector widths [paragraph 5].
One of ordinary skill in the art before the effective filing date of the claimed invention would have clearly recognized that it is quite advantageous for the processor of the modified Hook to provide a single set of instructions that can be utilized for vector processing while supporting many different vector widths. It is for this reason one of .	
Claims 27, 28, and 29 are is/are rejected under 35 U.S.C. 103 as being unpatentable over Hook, in view of Espasa, in view of Boggs, as applied to claims 1, 9 and 21 above, and further in view of Palmer et al (hereinafter Palmer), U.S. Patent No. 4,338,675.
Referring to claims 27, 28 and 29, taking claim 27 as exemplary, the modified Hook does not explicitly disclose the processor of claim 1, wherein the rounding control specifies the truncating when set to a first value, the rounding up when set to a second value, and the convergent rounding otherwise.
However, Palmer discloses wherein the rounding control specifies the truncating when set to a first value, the rounding up when set to a second value, and the convergent rounding otherwise [col. 15, lines 21-28, Rounding mode is captured in an "RC" field, which is comprised of a two bit code for the rounding modes: "nearest", "up", and "chop". In the chop mode, the result is merely chopped or truncated. An example of two bits codes (values) can be 00, 01, 11 for the specified modes], in order to provide fast, accurate and precise processor [col. 23, lines 47-54].
One of ordinary skill in the art before the effective filing date of the claimed invention would have clearly recognized that it is quite advantageous for the processor of the modified Hook to provide fast, accurate and precise processor. It is for this reason one of ordinary skill in the art would have been motivated to implement wherein the .

Allowable Subject Matter
Claims 6 and 14 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
The following is a statement of reasons for the indication of allowable subject matter:  The prior art of record taken alone or in combination fails to teach and/or fairly suggest wherein the execution circuitry is to execute the decoded vector multiplication instruction on 16 corresponding pairs of elements at a time, taking one cycle, two cycles, and four cycles to execute the decoded vector multiplication instruction when the first source identifier, the second source identifier, and the destination identifier identify the 128-bit vector register, the 256-bit vector register, and the 512-bit vector register, respectively, in combination with other recited limitations in claim 6.
The prior art of record taken alone or in combination fails to teach and/or fairly suggest executing, by the execution circuitry, the decoded vector multiplication instruction on 16 corresponding pairs of elements at a time, taking one cycle, two cycles, and four cycles to execute the decoded vector multiplication instruction when the first source identifier, the second source identifier, and the destination identifier each identifies the 128-bit vector register, the 256-bit vector register, and the 512-bit vector register, respectively, in combination with other recited limitations in claim 14.


Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to FARLEY J ABAD whose telephone number is (571)270-3425.  The examiner can normally be reached on M-Th 6:30 - 3:00 PM; Fri 7:30 - 4:00 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.

Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/Farley Abad/Primary Examiner, Art Unit 2181