DETAILED ACTION
Continued Examination Under 37 CFR 1.114
	A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 2/5/2021 has been entered.
 	This communication is responsive to the request for continued examination filed on 2/5/21.  This action is Non-Final.  Claims 1-7 and 9-20 are pending.  Claims 1-3, 9, 15-16 and 19-20 have been amended.  Claim 8 has been canceled.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 101
2.	35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


3.	Claims 1-7 and 9-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to a judicial exception (i.e. abstract idea) without significantly more. 

4.	Claims 1, 15 and 20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.

5.	At step 2A, prong one, the claims recite “identify data element values that are associated with one another in the first and second input registers based on the first and second plurality of indices, perform one or more same reduction operations on the associated data element values in the first and second input registers based on the identification regardless of the associated data element values being from the first or second input registers”.  The limitations above describe a process that under its broadest reasonable interpretation cover performance of the limitations in the mind, as well as mathematical concepts, but for the recitation of computer components.  That is, other than reciting “a processor”, nothing in the claim elements precludes the steps from practically being performed in the mind.  For example, “identifying” in the context of this claim encompasses an observation or evaluation that values in two input vectors correspond to a same index value of two index registers (also see paragraphs [00247-00250 and 00258-00259] which disclose identifying values used in different tree reduction operations based on index values); wherein further the “one or more same reduction operations” in the context of the claim encompasses performing one or more mathematical operations on the values of the input vectors which correspond to the same index values (see paragraphs [00251-00253 and 00258-00259] which disclose 

6.	At step 2A, prong two, the judicial exception is not integrated into a practical application.  For example, claim 1 additionally recites “A processor comprising: decoding circuitry to decode an instruction, wherein the instruction specifies a first input register containing a plurality of data element values, a first index register containing a first plurality of indices, and an output register, wherein each index of the first plurality of indices maps to one unique data element position of the first input register; and execution circuitry to execute the decoded instruction, wherein the execution is performed based on the first input register and a second input register, and wherein another plurality of data element values within the second input register and a second plurality of indices within a second index register are used to perform the execution… and store results of the one or more same reduction operations in the output register and another output register.”  Similar to claim 1, claim 15 additionally recites “A method comprising: decoding an instruction, wherein the instruction specifies a first input register containing a plurality of data element values, a first index register containing a first plurality of indices, and an output register, wherein each index of the first plurality of indices maps to one unique data element position of the first input register; and executing the decoded instruction, wherein the execution is -performed based on the first input register and a second input register, and wherein another plurality of data element values within the second input register and a second plurality of indices within a second index register are used to perform the execution... and store results of the one or more same reduction operations in the output register and another output register..” Similar to claims 1 and 15, claim 20 additionally recites “A non-transitory machine-readable medium storing an instruction, which when executed by a processor causes the processor to perform operations, the operations comprising: decoding the instruction, wherein the instruction specifies a first input register containing a plurality of data element values, a first index register containing a first plurality of indices, and an output register, wherein each index of the first plurality of indices maps to one unique data element position of the first input register; and executing the decoded instruction, wherein the execution is performed based on the first input register and a second input register, and wherein another plurality of data element values within the second input register and another a second plurality of indices within a second index register are used in performing the execution... store results of the one or more same reduction operations in the output register and another output register.”
	The limitations which claim “A processor comprising: decoding circuitry to decode an instruction, wherein the instruction specifies a first input register containing a plurality of data element values, a first index register containing a first plurality of indices, and an output register, wherein each index of the first plurality of indices maps to one unique data element position of the first input register; and execution circuitry to execute the decoded instruction, wherein the execution is performed based on the first input register and a second input register, and wherein another plurality of data element values within the second input register and a second plurality of indices within a second index register are used to perform the execution… and store results of the one or more same reduction operations in the output register and another output register” recite at a high-level of generality a generic processor which can be viewed as an attempt to generally link the use of the judicial exception to the technological environment of a computer (see MPEP 2106.05(h)).  Furthermore, the processor is recited at a high-level of generality such that it amounts to no more than mere instructions to apply the exception using a generic computer component (see MPEP 2106.05 (f)).
	Also, the limitation “store results of the one or more same reduction operations in the output register and another output register” are recited at a high-level of generality and therefore represent insignificant extra-solution activity because it amounts to mere data outputting (see MPEP 2106.05(g)).  Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea.  Claims 1, 15 and 20 are therefore directed to an abstract idea, for including similar limitations described above.

7. 	At step 2B, the claims does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements of the claim “A processor comprising: decoding circuitry to decode an instruction, wherein the instruction specifies a first input register containing a plurality of data element values, a first index register containing a first plurality of indices, and an output register, wherein each index of the first plurality of indices maps to one unique data element position of the first input register; and execution circuitry to execute the decoded instruction, wherein the execution is performed based on the first input register and a second input register, and wherein another plurality of data element values within the second input register and a second plurality of indices within a second index register are used to perform the execution… and store results of the one or more same reduction operations in the output register and another output register” can be viewed as an attempt to generally link the use of the judicial exception to the technological environment of a computer, such that it amounts to no more than mere instructions to apply the exception using a generic computer (see MPEP 2106.05 (f and h)).  In particular, the above cited limitations are well-understood, routine and conventional, for example paragraphs [00107-00113] of the applicant’s disclosure discuss a processor core which supports well-known instruction set architectures such as x86, as well as instruction sets from ARM and MIPS; as well as describing well-known core types such as RISC and CISC.  The above cited limitations are further well-understood, routine and conventional, based on paragraphs [0003-0004] of the applicant’s disclosure which discuss well known trademarked Intel processors and disclose that an instruction set includes instructions formats which specify operations (opcodes) and operands (input and output registers) .  Further, the prior art reference Kunzman, PGPUB No 2016/0179537, teaches a processor being one of various well-known processors (i.e.  RISC, CISC, MIPS, ARM processors) which fetch, decode and execute instructions (see [0005-0006, 0106-0111 and See Figs. 4A-B]).  The NPL reference “Computer Architecture: A Quantitative Approach” by John Hennessy and Patterson, cited in the pertinent art section, teaches that in order to execute instructions the instructions must move through a pipeline which utilizes a fetch, decode and execute stage; and wherein each stage includes circuitry (See pages C-34-C37).  Therefore, merely applying a judicial exception to a particular technological environment cannot provide an inventive concept, and therefore independent claims 1, 15 and 20 are not patent eligible.

8.	Dependent claims 2-7, 9-14 and 16-19 do not aid in the eligibility of the respective independent claim.  For example, claims 2-7, 9-14 and 16-19 merely provide further embellishments of the limitations recited in the respective independent claims.  For example, similar claims 2-7, 9-14 and 16-19, further recite limitations which discuss further operands, fields, or operations used/performed by the instruction (see claims 2-3, 5-7, 9, 16, 18-19), discuss different generic computing units used to implement the abstract idea (claims 10-14), or discuss mere data output (i.e. storing results) (see claims 4 and 17).  Thus, dependent claims 2-7, 9-14 and 16-19 are also ineligible.

Claim Rejections - 35 USC § 112
9.	The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.

The following is a quotation of the first paragraph of pre-AIA  35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.

10.	Claims 1-7 and 9-20 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA  35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention.

11.	In regards to claim 1, the limitations “decode an instruction, wherein the instruction specifies a first input register containing a plurality of data element values, a first index register containing a first plurality of indices, and an output register, wherein each index of the first plurality of indices maps to one unique data element position of the first input register…execute the decoded instruction, wherein the execution is performed based on the first input register and a second input register, and wherein another plurality of data element values within the second input register and a second plurality of indices within a second index register are used to perform the execution” fails to comply with the written description requirement because the original disclosure does not properly describe executing an instruction based on first and second input and index registers, when the instruction only specifies a first input register, first index register, and an output register in sufficient detail such that one skilled in the art can reasonably conclude that the inventor had possession of the claimed invention.
	Specifically, while original paragraphs [00248-00253], appear to teach a “vmatchindx” instruction which is executed based on a first input register, first index register, and first output register (See Fig. 17), the original disclosure does not appear to provide support for the “vmatchindx” instruction of fig. 17 being performed based on first and second input and index registers.  Rather, paragraphs [00258-00259] and Fig. 18, of the disclosure illustrate a second instruction “vmatchindx2” which is executed based on the instruction specifying first and second input and index registers.  Therefore, the disclosure provides support for executing a “vmatchindx2” instruction based on first and second input and index registers because the instruction specifies each of those operands.  However, claim 1 appears to be describing an instruction such as “vmatchindx” performed in Fig. 17, with functionalities performed by “vmatchindx2” as disclosed in Fig. 18, and the disclosure does not provide support for such a combination.
	The examiner suggest the applicant amend the claims to indicate the embodiment of Fig. 18, which executes the instruction “vmatchindx2”, by amending the claims to specify all inputs and outputs used by the “vmatchindx2” instruction if applicant is intending to claim the embodiment of Fig. 18.  If applicant is intending to claim the embodiment of Fig. 17, the applicant should remove limitations discussing the first and second inputs and outputs from the independent claims.
	
12.	Claims 15 and 20 are similarly rejected on the same basis as claim 1 above.
	Claims 2-7, 9-14 and 16-19 are dependent upon one of the independent claims above and therefore are similarly rejected for including the deficiencies of one of the independent claims above. 

13.	The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.




14.  Claims 1-7 and 10-20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.

15.	In regards to claim 1, the limitations “execution circuitry to execute the decoded instruction, wherein the execution is performed based on the first input register and a second input register, and wherein another plurality of data element values within the second input register and another a second plurality of indices within a second index register are used to perform the execution” lack clarity.  The limitations lack clarity because the decoded instruction only specifies first input, index and output registers (see lines 2-6 of claim 1), yet the execution of the instruction is based on additional second input and index registers.  It is therefore unclear how the execution of the instruction identifies the additional second input and index registers?  Are the registers specified by bits of the opcode of the instruction or are the registers specified by additional operands of the instruction?  The examiner suggest the applicant amend the claim to clarify how the instruction identifies the additional elements (such as amending the claim with limitations of claim 9 for example).
	For purposes of examination the examiner will interpret the claim to identify the additional registers using additional specifiers in the instruction.

	Claims 15 and 20 are similarly rejected on the same basis as claim 1 above.
	Claims 2-7, 10-14 and 16-19 are similarly rejected on the same basis as claim 1 above because they are dependent upon one of the rejected claims above and further do not include clarifying limitations such as disclosed in dependent claim 9. 

Claim Rejections - 35 USC § 103
16.	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

17.	Claims 1-7, 9 and 14-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Kunzman, PGPUB No.:  2016/0179537, and further in view of Ronen, PGPUB No.: 2002/0087955.

	In regards to claim 1, Kunzman teaches “A processor comprising: decoding circuitry to decode instructions” ([0106-0107]:  wherein a processor core includes a decoder (element 440) for decoding instructions (also see Fig. 6, processor (element 600)) “a first input register containing a plurality of data element values, a first index register containing a first plurality of indices, and an output register, wherein each index of the first plurality of indices maps to one unique data element position of the first input register” (See Fig. 13:  wherein a input register (element 1302) includes a plurality of data elements, an output register (element 1303), and an index register (element 1301) comprising a plurality of indices that maps to one unique data element position in input register are disclosed))  “and execution circuitry to execute decoded instructions” ([0153-0158]:  wherein SIMD reduction logic (element 1305) executes a reduction operation based on decoded conflict detection instructions ([0186] indicates logic is hardware)) “wherein the execution is performed based on the first input register” (See Fig. 13:  wherein the execution is based on the input register (element 1302)) “wherein the execution includes to: identify data element values that are associated with one another in the first input register based on the first plurality of indices, perform one or more same reduction operations on the associated data element values in the first input register based on the identification, and store results of the one or more same reduction operations in the output register” ([0156-0158]:  wherein element 1305 identifies elements that are associated with on another based on indices in element 1301, performs reduction operations on the associated elements and stores the results in the output register element 1303 (See Fig. 13))
	Kunzman does not teach “an instruction, wherein the instruction specifies registers”, “wherein the execution is performed based on a second input register, and wherein another plurality of data element values within the second input register and a second plurality of indices within a second index register are used to perform the execution” nor “wherein the execution includes to: identify data element values that are associated with one another in second input register based on the second plurality of indices, perform one or more same reduction operations on the associated data element values in second input register based on the identification regardless of the associated data element values being from the second input register, and store results of the one or more same reduction operations in another output register.” While Kunzman discloses performing one or more same reduction operations using first input and index registers, as well as an output register, based on the execution of several vector conflict instructions, Kunzman includes no discussion of performing one or more same reduction operations using second input and index registers, as well as another output register, based on executing a single instruction.
	However, Ronen teaches executing a fused instruction which performs two add operations, based on fusing two separate add instructions which perform a same functionality (see [0026-0030 and 0040]:  wherein a fused instruction execution unit (element 143c) is used to execute a fused add operation, which includes two combined separate add instructions. Wherein an add operation can be considered a type of reduction operation and therefore the cited portions teach fusing two reduction instructions in order to form one reduction instruction with a single opcode).  The combination would have a processor like Kunzman, which would execute a single instruction specifying two fused reduction operations, wherein the two reduction instructions function as the reduction operation described in Fig. 13 of Kunzman.  Therefore, the single instruction of the combination would be based on first and second input, index and output registers; and wherein the single instruction would be performed on data element values associated with one another in the first and second input registers based on associations identified by the first and second index registers, in order to store results in both a first and second output register.  One of ordinary skill in the art would be motivated to generate a single fused reduction instruction, which can be used to perform reduction operations of one or more simple reduction operations (wherein the reduction operations function as disclosed in Fig. 13 of Kunzman) in a single instruction, for the benefits of faster decoding and overall improved processor performance (See Ronen [0020-0024]).
	Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify the processor of Kunzman, which performs one or more same reduction operations, to perform the one or more same reduction operations using a single fused reduction instruction as taught in Ronen. It would have been obvious to one of ordinary skill in the art because using a single fused reduction instruction to perform reduction operations of two fused reduction operations may save instruction memory space because less instructions would be stored in the memory space.  Furthermore, using a single fused instruction would improve overall processor performance (See Ronen [0020-0024]).

	Claim 15 is similarly rejected on the same basis as claim 1 above because claim 15 is the method claim corresponding to the processor of claim 1.
	Claim 20 is similarly rejected on the same basis as claim 1 above because claim 20 is the non-transitory machine-readable medium claim corresponding to the processor of claim 1.   The examiner notes that claim 20 further includes the limitation stating “a non-transitory machine-readable medium storing an instruction” which is taught by Kunzman paragraphs [0143-0143], which discloses a non-transitory medium for storing instructions executed by a processor. Therefore, the additional limitation would be taught by the reference Kunzman discussed in the rejection of claim 1 above.

	In regards to claim 2, the overall combination of Kunzman and Ronen teaches “The processor of claim 1” (see rejection of claim 1 above) “wherein operation code (opcode) of the instruction specifies the one or more same reduction operations.” (Ronen [0026-0029 and See Fig. 2]:  wherein an opcode of the instruction specifies one or more same reduction (add) operations (note the overall combination of Kunzman and Ronen teaches the above reduction operations, and therefore the combination of references teaches the above limitations))

	In regards to claim 3, the overall combination of Kunzman and Ronen teaches “The processor of claim 1” (see rejection of claim 1 above) “wherein a group of data element values being associated with one another when the group of data element values have a same index value, and wherein to perform the one or more same reduction operations is to, for the group of data element values sharing the same index value, combine the group of data element values to generate an arithmetic combination as a result.” (Kunzman [0156-0158]:  wherein a group of data elements being associated with one another have a same index value (i.e. A, B, C or D) and wherein performing reduction operations for the groups of data elements sharing the same index includes combining the data elements to generate an accumulation as a result (See Fig. 13))

	Claim 16 is similarly rejected on the same basis as claim 3 above because claim 16 is the method claim corresponding to the processor of claim 3.

            In regards to claim 4, the overall combination of Kunzman and Ronen teaches “The processor of claim 1” (see rejection of claim 1) “wherein the results are stored in a plurality of data element positions of the output register, each data element position corresponding to one of corresponding associated data element values.” (Kunzman [0156-0158]:  wherein the results are stored in element positions of the output register (element 1303) each data element position corresponding to one of the corresponding associated data element values)

	Claim 17 is similarly rejected on the same basis as claim 4 above because claim 17 is the method claim corresponding to the processor of claim 4.

             In regards to claim 5, the overall combination of Kunzman and Ronen teaches “The processor of claim 1” (see rejection of claim 1) “wherein the one or more same reduction operations are performed in a plurality of iterations on a group of associated data element values, and intermediate results of the plurality of iterations are stored in data element positions corresponding to ones of corresponding associated data element values involved in getting the intermediate results.” (Kunzman:  See Fig. 13:  wherein the reduction operations are performed in a plurality of iterations on a group of associated data elements, and intermediate results of the plurality of iterations are stored in element positions corresponding to associated data elements values involved in getting the intermediate results)

	Claim 18 is similarly rejected on the same basis as claim 5 above because claim 18 is the method claim corresponding to the processor of claim 5.

          In regards to claim 6, the overall combination of Kunzman and Ronen teaches “The processor of claim 1” (see rejection of claim 1) “wherein the one or more same reduction operations comprises one or more of: accumulation of the associated data element values, selection of a maximum value or minimum value of the associated data element values, and computation of a mean or median value of the associated data element values.” (Kunzman [0156-0158]:  wherein reduction operations include accumulation of associated data elements| Ronen [0029]:  wherein accumulation (addition) of data elements is performed)

	Claim 19 is similarly rejected on the same basis as claim 6 above because claim 19 is the method claim corresponding to the processor of claim 6.

          In regards to claim 7, the overall combination of Kunzman and Ronen thus far teaches “The processor of claim 1” (see rejection of claim 1 above).
        The overall combination of Kunzman and Ronen thus far does not teach “wherein the instruction further specifies a mask vector containing a plurality of mask values, wherein each mask value indicates a data element position of the output register being active or inactive, and wherein the results do not write to the data element position that is inactive.” 
        However, an embodiment of Kunzman does teach “wherein the instruction further specifies a mask vector containing a plurality of mask values, wherein each mask value indicates a data element position of the output register being active or inactive, and wherein the results do not write to the data element position that is inactive.”  ([0029, 0043 and See Figs. 1A-B]:  wherein a vector instruction format specifies a mask vector, wherein each value of the mask indicates a data position in a destination register being active or inactive.  Wherein the results of the operations are not written to the data element positions of the destination register when the mask value is zero)
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify the instruction of the combination of Kunzman and Ronen to specify a mask as taught in an embodiment of a vector instruction of Kunzman.  It would have been obvious to one of ordinary skill in the art because using a mask in vector instruction can provide more control over destination updates and also provide added flexibility to an instruction.

	 In regards to claim 9, the overall combination of Kunzman and Ronen teaches “The processor of claim 1” (see rejection of claim 1 above) “wherein the instruction further specifies the second input register containing the another plurality of data element values” (Kunzman:  See Fig. 13:  wherein an input register is disclosed| Ronen [0026-0029]: wherein duplicate add instructions are fused to form one double add instruction, and therefore two input registers are disclosed. Wherein the 103 combination of the above references would teach a second input register with the same functionality as the first input register of Kunzman and therefore the overall combination presented in claim 1 has taught a second input register) “and the second index register containing the second plurality of indices, each index of the second plurality of indices mapping to one unique data element position of the second input register” (Kunzman:  See Fig. 13:  wherein an index register is disclosed| Ronen [0026-0029]: wherein duplicate add instructions are fused to form one double add instruction, and therefore two source registers are disclosed. Wherein the 103 combination of the above references would teach a second index register with the same functionality as the first index register of Kunzman and therefore the overall combination presented in claim 1 has taught a second index register). Wherein the second index register would include indices which map to one unique data element position of the second input register.  Wherein the 103 combination in claim 1 includes a motivation to include a second index register with the same functionality as the first index register of Kunzman and therefore the overall combination presented in claim 1 has taught a second index register) “and wherein the one or more same reduction operations are performed on the data element values of the first and second input registers” (Kunzman:  See Fig. 13:  wherein one or more reduction operations are performed for data element values of the first input registers| Ronen [0026-0029]: wherein duplicate add instructions are fused to form one double add instruction, and therefore two input registers used to perform an add operation. Wherein the 103 combination of the above references would teach performing one or more reduction operations, as described in Fig. 13 of Kunzman, on a first and second input register) “based on the first and second plurality of indices of the first and second index registers.” (Kunzman:  See Fig. 13:  wherein an index register is used to perform on or more reduction operations| Ronen [0026-0029]: wherein duplicate add instructions are fused to form one double add instruction, and therefore two source registers are disclosed.  Wherein the 103 combination of references would teach performing reduction operations on a first and second input register using first and second index registers using the same functionality of Fig. 13 of Kunzman. Therefore the overall combination presented in claim 1 has taught a first and second index register used to perform one or more reduction operations)

         In regards to claim 14, the overall combination of Kunzman and Ronen teaches “The processor of claim 1” (see rejection of claim 1 above) “wherein the processor is a graphics processing unit (GPU).” (Kunzman [0119]:  wherein the processor is a graphics processing unit)

18.	Claims 10-13 is/are rejected under 35 U.S.C. 103 as being unpatentable over Kunzman, Ronen and further in view of Fahs, PGPUB No.:  2011/0078417.

In regards to claim 10, the overall combination of Kunzman and Ronen teaches “The processor of claim 1” (see rejection of claim 1 above) 
The overall combination of Kunzman and Ronen does not teach “wherein the instruction is executed by two or more computing units.”  Kunzman does teach a processor including graphics processing and multithreading environments (see [0112 and 0119]), however Kunzman does not explicitly teach two or more threads executing a reduction instruction.  Therefore another reference is brought in for that teaching.
Fahs teaches “wherein the instruction is executed by two or more computing units.” ([0057-0060]:  wherein an instruction is executed by two or more threads (i.e. computing units) in a thread group (see [0039-0040] for cooperative thread array discussion))
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify the instruction of Kunzman and Ronen to be executed by multiple threads in a thread group as taught in Fahs.  It would have been obvious to one of ordinary skill in the art because concurrently executing threads can improve execution by increasing parallelism and throughput in a processor.

In regards to claim 11, the overall combination of Kunzman, Ronen and Fahs teaches “The processor of claim 10” (see rejection of claim 10 above) “wherein each of the two or more computing units is a warp or a thread.” (Fahs [0057-0060]:  wherein an instruction is executed by two or more threads (i.e. computing units) in a thread group (see [0039-0040] for cooperative thread array discussion))
In regards to clam 12, the overall combination of Kunzman, Ronen and Fahs teaches “The processor of claim 10” (see rejection of claim 10 above) 
The overall combination of Kunzman, Ronen and Fahs thus far does not teach “wherein the two or more computing units are synchronized in performing the one or more same reduction operations.”
Fahs teaches “wherein the two or more computing units are synchronized in performing the one or more same reduction operations.” ([0060-0065]:  wherein threads of a thread group are synchronized in performing one or more reduction operations when performing a barrier aggregate instruction)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify the execution of reduction operations of Kunzman and Ronen to be synchronized between threads as taught in Fahs.  It would have been obvious to one of ordinary skill in the art because synchronizing threads which execute in parallel and share resources can be used to avoid resource conflicts between threads.

In regards to claim 13, the overall combination of Kunzman, Ronen and Fahs teaches “The processor of claim 12” (see rejection of claim 12 above) 
The overall combination of Kunzman, Ronen and Fahs thus far does not teach “wherein the instruction further specifies a location that stores a value indicating whether the two or more computing units are synchronized.”
Fahs teaches “wherein the instruction further specifies a location that stores a value indicating whether the two or more computing units are synchronized. (Fahs [0066-0067]:  wherein the instruction implicitly specifies an arrival counter (i.e.  counter register location) that stores a value indicating whether the two threads have reached the barrier point (i.e. synchronized))
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify the instruction of Kunzman and Ronen to specify a location that indicates when threads are synchronized as taught in Fahs.  It would have been obvious to one of ordinary skill in the art because it could have been used for the benefit of using a fast synchronization technique (See Fahs [0057]).

Response to Arguments
19.	Applicant’s arguments, see page 7 of the remarks, filed on 2/05/2021, with respect to the previous 112(b) rejections have been fully considered and are persuasive.  Therefore the previous 112(b) rejection of claim 7 has been withdrawn. 

20.	Applicant’s arguments with respect to the rejections of similar claim(s) 1, 15 and 20 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument. 
	The dependent claims are argued at least for being directly or indirectly dependent upon rejected claims 1 and 15 above, and therefore remain rejected at least based on their dependency.

Conclusion
21.	The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
NPL reference “Computer Architecture A Quantitative Approach” for teaching a processor pipeline which implements a fetch, decode, execute and write back stage; wherein each stage includes circuitry

22.	Any inquiry concerning this communication or earlier communications from the examiner should be directed to COURTNEY P CARMICHAEL-MOODY whose telephone number is (571)431-0692.  The examiner can normally be reached on M-F, 10am-7pm, EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Aimee Li can be reached on 571-272-4169.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/COURTNEY P CARMICHAEL-MOODY/Examiner, Art Unit 2183