DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
EXAMINER’S AMENDMENT
An examiner’s amendment to the record appears below. Should the changes and/or additions be unacceptable to applicant, an amendment may be filed as provided by 37 CFR 1.312. To ensure consideration of such an amendment, it MUST be submitted no later than the payment of the issue fee.
Authorization for this examiner’s amendment was given in an interview with Attorney Brian Short on 08/31/2022.


The application has been amended as follows: 
AMENDMENTS TO THE CLAIMS

(Currently Amended)  A method of processing, comprising:
scheduling, by a scheduler, a thread for execution;
executing, by a processor of a plurality of processors, the thread; 
fetching, by the processor, a plurality of instructions for the thread from a memory;
selecting, by a thread arbiter of the processor, an instruction of the plurality of instructions for execution in an arithmetic logic unit (ALU) pipeline of the processor, and reading the instruction; and
determining, by a macro-instruction iterator of the processor, whether the instruction is a Sum-Of-Multiply-Accumulate (SOMAC) instruction with an instruction size, wherein the instruction size indicates a number of iterations that the SOMAC instruction is to be executed, wherein the number of iterations includes multiple iterations;
executing the SOMAC instruction, comprising:
reading, by the processor, a first source operand Ai of a plurality of source operands of the SOMAC instruction from a register file, wherein the first source operand Ai includes multiple terms i, wherein the first source operand Ai includes a vector that is common and same for all iterations of the SOMAC instruction, wherein each source operand of the plurality of source operands is one or more of a plurality of registers from a corresponding register file that is an input to the SOMAC instruction; 
determining, by the macro-instruction iterator of the processor, if all terms of the first source operand Ai are zero and accumulating the terms of the first source operand Ai into destination operands C leaves the destination operands C unchanged, thereby allowing the SOMAC instruction to be pruned; and
executing the number of iterations of the SOMAC instruction when all terms of the first source operand Ai are not zero, comprising, for each iteration of the number of iterations, reading a single term of a second operand Bi[j], wherein the second operand Bi includes multiple terms i, reading a destination operand C[j] of the destination operands C, and writing a single multiply-accumulate result back to the destination operand C[j] according to: 
C[j]  = ∑Ai*Bi[j] + C[j]; where j is a loop index that varies from 0 to the instruction size and increments each iteration

 resulting in a number of destination operands C equal to the instruction size;
wherein for each iteration of the number of iterations: 
each of the multiple terms i of the first source operand Ai are multiplied with the single term of the second operand Bi[j] producing a plurality of multiplication results, each of the plurality of multiplication results are added together to produce a summation, and the summation is added to the destination operation C[j] producing the single multiply-accumulate result.

(Previously Canceled)  

(Previously Presented) The method of claim 1, wherein execution of the SOMAC instruction is skipped and a next instruction is read for execution when all terms of the first source operand are zero.

(Previously Presented)  The method of claim 3, further comprising:
selecting, by the thread arbiter of the processor, a second instruction of the plurality of instructions for execution in an arithmetic logic unit (ALU) pipeline of the processor, and reading the second instruction; and
determining, by the macro-instruction iterator of the processor, whether the second instruction is a second Sum-Of-Multiply-Accumulate (SOMAC) instruction with an instruction size of the second instruction, wherein the instruction size of the second instruction indicates a number of iterations of the second instruction that the second SOMAC instruction is to be executed.

(Previously Presented) The method of claim 1, further comprising:
wherein the second operand includes a number of sets of one or more terms, wherein the number of sets is the instruction size;
determining, by the macro-instruction iterator of the processor, an instruction mask, wherein the instruction mask includes a plurality of bits, and each bit is determined based on which sets of the number of sets of the second operand have all terms of the set being zero.

(Previously Presented)  The method of claim 5, wherein each bit of the plurality of bits corresponding to a set of the number of sets of the second operand having all terms of zero are reset, and each bit of the plurality of bits corresponding to a set of the number of sets of the second operand having at least one term non-zero are set.

(Original)  The method of claim 6, further comprising:
executing, by the processor, multiply and accumulate operations of the SOMAC operation for the iterations which are not disabled (mask bit is set) and skipping the iterations which are disabled (mask bit is reset) based on the instruction mask.

(Currently Amended) The method of claim 7, further comprising:
adding a sum-of-multiply result to the destination operand C[j]; and
writing the multiply-accumulate result back to the destination operand C[j].

(Original)  The method of claim 1, wherein the instruction is one of a plurality of Sum-Of-Multiply-Accumulate (SOMAC) instructions of an implementation of a neural network.

(Original)  The method of claim 9, wherein each of the plurality of SOMAC instructions includes at least one of a multiply-accumulate operation, a dot product-accumulate operation, or a convolve-accumulate operation.

(Currently Amended)  A system, comprising:
a scheduler operative to schedule a thread;
a plurality of processors operative to execute the thread;
logic encoded in one or more non-transitory computer-readable storage media for execution by the plurality of processors and when executed operable to cause a processor of the plurality of processors to operate to:
fetch a plurality of instructions for the thread from a memory;
select by a thread arbiter of the processor an instruction of the plurality of instructions for execution in an arithmetic logic unit (ALU) pipeline of the processor, and read the instruction;
determine, by a macro-instruction iterator of the processor, whether the instruction is a Sum-Of-Multiply-Accumulate (SOMAC) instruction with an instruction size, wherein the instruction size indicates a number of iterations that the SOMAC instruction is to be executed, wherein the number of iterations includes multiple iterations;
executing the SOMAC instruction, comprising:
reading, by the processor, a first source operand Ai of a plurality of source operands of the SOMAC instruction from a register file, wherein the first source operand Ai includes multiple i terms, wherein the first source operand Ai includes a vector that is common and same for all iterations of the SOMAC instruction, wherein each source operand of the plurality of source operands is one of a plurality of registers from a corresponding register file that is an input to the SOMAC instruction; 
determining, by the macro-instruction iterator of the processor, if all terms of the first source operand Ai are zero and accumulating the terms of the first source operand Ai into destination operands C leaves the destination operands C unchanged, thereby allowing the SOMAC instruction to be pruned; and
executing the number of iterations of the SOMAC instruction when all terms of the first source operand Ai are not zero, comprising, for each iteration of the number of iterations, reading a single term of a second operand Bi[j], wherein the second operand Bi includes multiple terms i, reading a destination operand C[j] of the destination operands C, and writing a single multiply-accumulate result back to the destination operand C[j] according to: 
C[j]  = ∑Ai*Bi[j] + C[j]; where j is a loop index that varies from 0 to the instruction size and increments each iteration

;
wherein for each iteration of the number of iterations: 
each of the multiple terms i of the first source operand Ai are multiplied with the single term of the second operand Bi[j] producing a plurality of multiplication results, each of the plurality of multiplication results are added together to produce a summation, and the summation is added to the destination operation C[j] producing the single multiply-accumulate result.

(Cancel)  

(Previously Presented)  The system of claim 11, wherein, execution of the SOMAC instruction is skipped and a next instruction is read for execution when all terms of the first source operand are zero.

(Previously Presented)  The system of claim 13, wherein the thread arbiter of the processor further operates to:
select a second instruction of the plurality of instructions for execution in an arithmetic logic unit (ALU) pipeline of the processor, and read the second instruction; and
wherein the macro-instruction iterator of the processor further operates to:
determine whether the second instruction is a second Sum-Of-Multiply-Accumulate (SOMAC) instruction with an instruction size of the second instruction, wherein the instruction size of the second instruction indicates a number of iterations of the second instruction that the second SOMAC instruction is to be executed.

(Previously Presented)  The system of claim 11, wherein the processor further operates to:
wherein the second operand includes a number of sets of one or more terms, wherein the number of sets is the instruction size;
wherein a macro-instruction iterator of the processor operates to:
determine an instruction mask, wherein the instruction mask includes a plurality of bits, and each bit is determined based on which sets of the number of sets of the second operand have all terms of the set being zero.

(Previously Presented)  The system of claim 15, wherein each bit of the plurality of bits corresponding to a set of the number of sets of the second operand having all terms of zero are reset, and each bit of the plurality of bits corresponding to a set of the number of sets of the second operand having at least one term non-zero are set.

(Original)  The system of claim 16, wherein the processor further operates to:
execute multiply and accumulate operations of the SOMAC operation for the iterations which are not disabled (mask bit is set) and skipping the iterations which are disabled (mask bit is reset) based on the instruction mask.

(Currently Amended)  The system of claim 17, wherein the processor further operates to:
add a sum-of-multiply result to the destination operand C[j]; and 
write the multiply-accumulate result back to the destination operand C[j].

(Original)  The system of claim 11, wherein the instruction is one of a plurality of Sum-Of-Multiply-Accumulate (SOMAC) instructions of an implementation of a neural network.

(Original)  The system of claim 19, wherein each of the plurality of SOMAC instructions includes at least one of a multiply-accumulate operation, a dot product-accumulate operation, or a convolve-accumulate operation.



Reasons for Allowance
Claims 1, 3-11, and 13-20 are allowed.
The following is an examiner’s statement of reasons for allowance: 
The known prior art of record, taken alone or in combination, was not found to teach, in combination with other limitations in the claims, when all terms of a first source operand are not zero, for each iteration of a number of iterations, multiplying each term of a first source operand with a single term of a second operand to produce a plurality of multiplication results, adding  each of the plurality of multiplication results together to produce a summation, and adding the summation to a destination operand to produce a multiply-accumulate result that is written back to the destination operand, as required by claims 1 and 11.
The closest prior art of record was found to be:
US 2020/0409705 (hereinafter, Vall) which teaches skipping multiplications based on detected values in the multiplicand that would produce zero-valued results ([0174]). However, Vall does not teach executing a number of iterations of a SOMAC instruction as required by claims 1 and 11.
US 2010/0274990 (hereinafter, Wilder) which teaches a repeating MAC instruction ([0087]) which is executed over multiple iterations, where each term of a first operand vd is multiplied with a single term of a second operand vc, which are then each added to a destination operand vacc (Fig. 1B). However, Wilder does not teach this multiplication producing a plurality of results which are then added together to produce a summation, and the summation being added the destination operand, for each iteration, as required by claims 1 and 11. 
Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee.  Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance.”

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to KASIM ALLI whose telephone number is (571)270-1476. The examiner can normally be reached Monday - Friday 9am 5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jyoti Mehta can be reached on (571) 270-3995. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/KASIM ALLI/Examiner, Art Unit 2183                                                                                                                                                                                                        
/JYOTI MEHTA/Supervisory Patent Examiner, Art Unit 2182