DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 04/18/2022 has been entered.
 
Response to Arguments
Applicant's arguments filed 04/18/2022 have been fully considered. 
Regarding to double patenting, the applicant does not argue the double patenting. The applicant states the Applicant will address this double patenting rejection once the claims are allowed. Therefore, the examiner maintains the double patenting rejection.
Regarding to claim 1, the applicant argues that claim limitations include “the minimum integer obtainment operation only happens for the selected threads”. The arguments have been fully considered, but are not persuasive. The amended claim limitation does not include “the minimum integer obtainment operation only happens for the selected threads”, in particularly, the claim limitation does not include “only”. 
	Regarding to claim 1, the applicant argues that Dimitrov in view of Karras, Howes, Bolz fails to teach or suggest “returning the minimum integer from values, each value of the values being obtained from one of a set of threads that are selected from the plurality of threads based on a mask that comprises one bit associated with each thread of the plurality of threads to indicate whether a corresponding value from a thread of the plurality of threads is included to obtain the minimum integer” as required by amended claim 1. The arguments have been fully considered. But they are not persuasive. The examiner cannot concur with the applicant for following reasons:
Karras discloses “returning a minimum value from values associated with a set of threads”. For example, in Fig. 2 and paragraph [0062], Karras teaches a thread is an instantiation of a set of instructions configured to be executed by the PPU 200; Karras further teaches the PPU 200 process a large number of threads in parallel; Karras further more teaches executed instructions and instruction results are associated with threads in PPU. In Fig. 2 and paragraph [0050], Karras teaches returning minimum as illustrated min function in Table 1; Karras further teaches the instructions in functions executed by a thread are associated with a thread as illustrated in Fig. 2; Karras further more teaches 
    PNG
    media_image1.png
    17
    193
    media_image1.png
    Greyscale
the min () function selects a minimum value from a set value with four parameters as illustrated min function in Table 1. In paragraph [0051], Karras teaches the function bslab( ) is called to calculate and return a minimum value and maximum value for the parametric variable according to a potential intersection for each dimension; Karras further teaches the function bslab is executed and is associated a thread; Karras further teaches an overall minimum value parametric variable is selected as the maximum value from a first set of values that includes a minimum value of the parametric variable range for the intersection query, and minimum values of the parametric variable for at least one dimension. In paragraph [0072], Karras teaches a thread block refers to a plurality of groups of threads including instructions to perform the task; Karras further teaches threads in the same thread block may exchange data through shared memory. In paragraph [0079], Karras teaches a programmable streaming processor that is configured to process tasks represented by a number of threads.
Dimitrov discloses “the set of threads selected from the plurality of threads based on a mask”. For example, in paragraph [0060], Dimitrov teaches one true state bit in the channel mask. In paragraph [0061], Dimitrov teaches the channel mask includes an arbitrary number of bits, e.g., 64 bits, which may be set or cleared to either include or exclude specific threads.
Dimitrov further discloses “a mask that comprises one bit associated with each thread of the plurality of threads”. For example, in paragraph [0060], Dimitrov teaches one true state bit in the channel mask. In paragraph [0061], Dimitrov teaches the channel mask includes an arbitrary number of bits, e.g., 64 bits.
Dimitrov further more discloses “to indicate whether a corresponding value from a thread of the plurality of threads is included”. For example, in paragraph [0061], Dimitrov teaches the channel mask may include an arbitrary number of bits, e.g., 64 bits, which may be set or cleared to either include or exclude specific threads.
Howes discloses “returning the minimum integer”. For example, in Fig. 2 and paragraph [0037], Howes teaches finding runnable work items with the minimum valued program counters. In paragraph [0039], Howes teaches the find_minimum_runnable_pc( ) function accesses the stored program counter vector and finds the one or more minimum valued entries in that vector; Howes further teaches the find_minimum_runnable_pc( ) function returns the one or more minimum valued entries.
Howes further discloses “to obtain the minimum integer”. For example, in Fig. 2 and paragraph [0037], Howes teaches finding runnable work items with the minimum valued program counters. In paragraph [0039], Howes teaches the find_minimum_runnable_pc( ) function accesses the stored program counter vector and finds the one or more minimum valued entries in that vector; Howes further teaches the find_minimum_runnable_pc( ) function returns the one or more minimum valued entries.

	Claim 7 is amended similarly as amended claim 1. Therefore, claim 7 is not allowable for the similar reasons as discussed above.

Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.

Claims 1-12 rejected on the ground of nonstatutory double patenting as being unpatentable over claims 2-13 of U.S. Application 16/996,208 in view of Dimitrov (US 20190206023 A1). 

Application 17/079,191
Application 16/996,208
Claim 1 
Claim 2 (New) A graphics processing unit, comprising:
A non-transitory machine-readable storage medium storing instructions, which when executed by a machine, cause the machine to perform the operations of:


a plurality of multi-core groups, wherein a multi-core group comprises:

a plurality of graphics cores;

a plurality of tensor cores to perform matrix operations;

a plurality of ray tracing cores to perform ray tracing operations; and

a set of register files to store operand values; and
scheduling instructions of a plurality of threads to be executed by a processor; and

 executing a first instruction including a first operand specifying values associated with the plurality of threads, wherein executing the first instruction comprises: 
wherein execution circuitry of at least one of 

the graphics cores, tensor cores, and ray 

tracing cores is to execute a first instruction 

including a first operand specifying values 

associated with a plurality of threads to 

perform the operation of:
returning a minimum value from values associated with a set of threads, the set of threads selected from the plurality of threads based on a mask.
returning a minimum value from values associated with a set of threads, the set of threads selected from the plurality of threads based on a mask.


In same filed of endeavor, Dimitrov teaches: 
A non-transitory machine-readable storage medium storing instructions, which when executed by a machine, cause the machine to perform the operations of (Fig. 1F; [0054]: a GPU and DRAMs; [0056]: GPU and CPU process the data stored within the DRAMs 168, L2 caches 167, and data stored within memory circuits; [0057-0058]; Fig. 5; [0108]: computer programs and computer control logic algorithms, are stored in the main memory 504; [0109]):
scheduling instructions of a plurality of threads to be executed by a processor ([0039]: the driver schedules a copy command to copy the first page of data from the second GPU to the first GPU; the copy command within a GPU command stream; Fig. 4; [0093]: a scheduler unit 410 manages instruction scheduling for one or more groups of threads; the scheduler unit 410 schedules threads for execution in groups of parallel threads; [0094]: transmit instructions to one or more of the functional units).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Application 16/996,208 to include A non-transitory machine-readable storage medium storing instructions, which when executed by a machine, cause the machine to perform the operations of: scheduling instructions of a plurality of threads to be executed by a processor as taught by Dimitrov. The motivation for doing so would have been to improve overall performance; to manage instruction scheduling for one or more groups of threads as taught by Dimitrov in paragraphs [0043] and [0093].

Application 17/079,191
Application 16/996,208
Claim 2.    The non-transitory machine-readable storage medium of claim 1, wherein the mask comprises one bit associated with each thread, wherein a first bit value indicates that a corresponding thread is included in the set of threads, and a second bit value indicates that the thread is not included in the set of threads.
Claim 3.    (New) The graphics processing unit of claim 2, wherein the mask comprises one bit associated with each thread, wherein a first bit value indicates that a corresponding thread is included in the set of threads, and a second bit value indicates that the thread is not included in the set of threads.
Claim 3.    The non-transitory machine-readable storage medium of claim 1, wherein each of the values associated with the plurality of threads comprises an integer.
4.    (New) The graphics processing unit of claim 2, wherein each of the values associated with the plurality of threads comprises an integer.
4.    The non-transitory machine-readable storage medium of claim 1, wherein the set of threads are synchronized.
5.    (New) The graphics processing unit of claim 2, wherein the set of threads are synchronized.
5.    The non-transitory machine-readable storage medium of claim 1, wherein the machine is caused to further perform the operations of: executing a second instruction including a second operand specifying the values associated with the plurality of threads, wherein executing the second instruction comprises:
returning a maximum value from the values associated with the set of threads, the set of threads selected from the plurality of threads based on the mask.
6.    (New) The graphics processing unit of claim 2, wherein the execution circuitry of at least one of the graphics cores, tensor cores, and ray tracing cores to execute a second instruction including a second operand specifying values associated with the plurality of threads to perform the operation of:
returning a maximum value from the values associated with the set of threads, the set of threads selected from the plurality of threads based on the mask.
6.    The non-transitory machine-readable storage medium of claim 1, wherein the machine is caused to further perform the operations of:
generating rays for traversal through a graphics scene;
constructing a hierarchical acceleration data structure comprising a plurality of hierarchically arranged nodes; and
traversing one or more of the rays through the hierarchical acceleration data structure and intersecting the one or more of the rays with primitives contained within the hierarchically arranged nodes.
7.    (New) The graphics processing unit of claim 2, wherein performing the ray tracing operations comprises:
generating rays for traversal through a graphics scene;
constructing a hierarchical acceleration data structure comprising a plurality of hierarchically arranged nodes; and
traversing one or more of the rays through the hierarchical acceleration data structure and intersecting the one or more rays with primitives contained within the hierarchically arranged nodes.
Claims 7-12
Claims 8-13




Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.

The following is a quotation of the first paragraph of pre-AIA  35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.

Claims 1 and 11 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA  35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention. The specification teaches “If the ith value in the execution mask is set to 1, then a check is performed to ensure that the source lane is within the range of 0 to the SIMD width” in paragraph [00349]. The specification describes “if flag bit for a source lane is set to 1, the source lane is marked as invalid and the instruction proceeds as described above” in paragraph [00354]. The specification further describes “a cross lane minimum/maximum instruction are supported for float and integer data types” in paragraph [00358]. The specification further describes “The cross lane maximum instruction operates in substantially the same manner, the only difference being that the maximum of the data element in position I and the destination value is selected” in paragraph [00359]. However, the specification does not describe “a mask that comprises one bit associated with each thread of the plurality of threads to indicate whether a corresponding value from a thread of the plurality of threads is included to obtain the minimum integer”. Therefore, the claim limitation “a mask that comprises one bit associated with each thread of the plurality of threads to indicate whether a corresponding value from a thread of the plurality of threads is included to obtain the minimum integer” is a new matter. 
Claims 5 and 7 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA  35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention.  The specification teaches “If the ith value in the execution mask is set to 1, then a check is performed to ensure that the source lane is within the range of 0 to the SIMD width” in paragraph [00349]. The specification describes “if flag bit for a source lane is set to 1, the source lane is marked as invalid and the instruction proceeds as described above” in paragraph [00354]. The specification further describes “a cross lane minimum/maximum instruction are supported for float and integer data types” in paragraph [00358]. The specification further describes “The cross lane maximum instruction operates in substantially the same manner, the only difference being that the maximum of the data element in position I and the destination value is selected” in paragraph [00359]. However, the specification does not describe “a mask that comprises one bit associated with each thread of the plurality of threads to indicate whether a corresponding value from a thread of the plurality of threads is included to obtain the maximum integer”. Therefore, the claim limitation “a mask that comprises one bit associated with each thread of the plurality of threads to indicate whether a corresponding value from a thread of the plurality of threads is included to obtain the maximum integer” is a new matter.
Claims 2, 4-6, and 13 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph due to dependence of claim 1. Claims 8, 10-12, and 14 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph due to dependency of claim 7.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-2, 4-8, and 10-12 are rejected under 35 U.S.C. 103 as being unpatentable over Dimitrov (US 20190206023 A1) in view of Karras (US 20160071310 A1), in view of Howes (US 20140365752 A1), and further in view of Bolz (US 20110063296 A1).
Regarding to claim 1 (Currently Amended), Dimitrov discloses a non-transitory machine-readable storage medium storing instructions, which when executed by a machine (Fig. 1F; [0054]: a GPU and DRAMs; [0056]: GPU and CPU process the data stored within the DRAMs 168, L2 caches 167, and data stored within memory circuits; [0057-0058]; Fig. 5; [0108]: computer programs and computer control logic algorithms, are stored in the main memory 504; [0109]), cause the machine to perform the operations of:
scheduling instructions of a plurality of threads to be executed by a processor (Fig. 1C; [0039]: the driver schedules a copy command to copy the first page of data from the second GPU to the first GPU; the copy command within a GPU command stream; Fig. 4; [0093]: a scheduler unit 410 manages instruction scheduling for one or more groups of threads; the scheduler unit 410 schedules threads for execution in groups of parallel threads; [0094]: transmit instructions to one or more of the functional units); and
executing a first instruction including a first operand specifying values associated with the plurality of threads (Fig. 1C; [0039]: copy the first page of data, i.e. operand, from the second GPU to the first GPU; Fig. 2; [0070]: process a large number of threads in parallel; [0093]: dispatch tasks for execution on the GPCs 250 of the PPU 200; [0096]: processing cores execute instructions; [0086]: transmit instructions to one or more of the functional units; [0087]: process a different set of data based on the same set of instructions; [0095]: The register file 420 provides temporary storage for operands connected to the data paths of the functional units; [0096]: a floating point arithmetic logic unit and an integer arithmetic logic unit), wherein each of the values associated with the plurality of threads comprises one integer (Dimitrov; [0029]: an integer value of N; [0096]: an integer arithmetic logic unit executes an integer), and, wherein executing the first instruction comprises: 
returning a value from values associated with a set of threads (Fig. 2; [0070]: process a large number of threads in parallel; [0093]: dispatch tasks for execution on the GPCs 250 of the PPU 200; [0096]: processing cores execute instructions; Fig. 6; [0112]: generate output data) that are selected from the plurality of threads based on a mask ([0060]: one true state bit in the channel mask; [0061]: the channel mask includes an arbitrary number of bits, e.g., 64 bits, which may be set or cleared to either include or exclude specific threads) that comprises one bit associated with each thread of the plurality of threads (Dimitrov; [0060]: one true state bit in the channel mask; [0061]: the channel mask includes an arbitrary number of bits, e.g., 64 bits) to indicate whether a corresponding value from a thread of the plurality of threads is included (Dimitrov; [0061]: the channel mask may include an arbitrary number of bits, e.g., 64 bits, which may be set or cleared to either include or exclude specific threads; Fig. 1G; [0066]).
Dimitrov fails to explicitly disclose: 
wherein executing the first instruction is to select a minimum integer from the values;
returning the minimum integer from values, each value of the values being obtained from one of a set of threads;
to obtain the minimum integer.
In same field of endeavor, Karras teaches:
 wherein executing the first instruction is to select a minimum value from the values ([0050]: 
    PNG
    media_image1.png
    17
    193
    media_image1.png
    Greyscale
the min () function selects a minimum value from a set value with four parameters as illustrated min function in Table 1; [0051]);
returning a minimum value from values associated with a set of threads ([0050]: 
    PNG
    media_image2.png
    21
    253
    media_image2.png
    Greyscale
 return minimum as illustrated min function in Table 1; [0051]: an overall minimum value parametric variable is selected as the maximum value from a first set of values that includes a minimum value of the parametric variable range for the intersection query, and minimum values of the parametric variable for at least one dimension; [0053]: table 2).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Dimitrov to include wherein executing the first instruction is to select a minimum value from the values; returning a minimum value from values associated with a set of threads as taught by Karras. The motivation for doing so would have been to determine whether the query beam intersects the target bounding volume based on at least the parametric variable range for the first dimension; to select an overall minimum value as taught by Karras in paragraphs [0039] and [0050-0053].
Dimitrov and in view of Karras fails to explicitly disclose:
a minimum integer;
returning the minimum integer;
each value of the values being obtained from one of a set of threads;
to obtain the minimum integer.
In same field of endeavor, Howes teaches:
a minimum integer (Fig. 2; [0037]: finds runnable work items with the minimum valued program counters; 
    PNG
    media_image3.png
    41
    179
    media_image3.png
    Greyscale
 [0039]: the find_minimum_runnable_pc( ) function accesses the stored program counter vector and finds the one or more minimum valued entries in that vector);
returning the minimum integer (Fig. 2; [0037]: finds runnable work items with the minimum valued program counters; [0039]: the find_minimum_runnable_pc( ) function accesses the stored program counter vector and finds the one or more minimum valued entries in that vector);
to obtain the minimum integer (Fig. 2; [0037]: finds runnable work items with the minimum valued program counters; [0039]: the find_minimum_runnable_pc( ) function accesses the stored program counter vector and finds the one or more minimum valued entries in that vector).
It would have been obvious to one of ordinary skill in the art before the effective filing date of claimed invention to modify Dimitrov and in view of Karras to include a minimum integer; returning the minimum integer; to obtain the minimum integer as taught by Howes. The motivation for doing so would have been to yield substantial improvements in improved processing efficiency and flexibility in programming; to significantly improve the performance of systems; finds the one or more minimum valued entries in that vector as taught by Howes in paragraphs [0011], [0023], [0037], and [0039].
Dimitrov in view of Karras and Howes fails to explicitly disclose:
each value of the values being obtained from one of a set of threads.
In same field of endeavor, Bolz teaches each value of the values being obtained from one of a set of threads ([0047]: the atomic command also specifies a numerical operation to be performed on data stored within the buffer object specified by the atomic command; [0049]: perform 32-bit integer and unsigned integer atomic minimum operation; [0050] perform 32-bit integer and unsigned integer atomic maximum operation; [0060]: atomic operations are useful for synchronizing between threads, e.g. to implement critical sections; collect results from multiple threads running in parallel;  identify and return minimum/maximum values from one of a set threads).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Dimitrov in view of Karras and Howes to include each value of the values being obtained from one of a set of threads as taught by Bolz. The motivation for doing so would have been to improve the performance of the MMU; to perform the atomic commands, such as, 32-bit integer and unsigned integer atomic minimum; to synchronize between threads; to identify and return minimum/maximum values; to enable a dramatic performance increase in CPU-limited applications as taught by Bolz in paragraphs [0023], [0047-0050], [0060], and [0067].

Regarding to claim 2 (Currently amended), Dimitrov in view of Karras, Howes, and Bolz discloses the non-transitory machine-readable storage medium of claim 1, wherein a first bit value indicates that the corresponding value from the thread is included in the set of threads, and a second bit value indicates that the corresponding value from the thread is not included in the set of threads (Dimitrov; [0061]: the channel mask may include an arbitrary number of bits, e.g., 64 bits, which may be set or cleared to either include or exclude specific threads; Fig. 1G; [0066]).

Regarding to claim 4 (Original), Dimitrov in view of Karras, Howes, and Bolz discloses the non-transitory machine-readable storage medium of claim 1, wherein the set of threads are synchronized (Dimitrov; [0041]: synchronize progress of the copy command stream and the general command stream; Fig. 2; [0070]: synchronize and process a large number of threads in parallel; [0093]: schedule and synchronize threads for execution in groups of parallel threads).

Regarding to claim 5 (Currently Amended), Dimitrov in view of Karras, Howes, and Bolz discloses the non-transitory machine-readable storage medium of claim 1, wherein the machine is caused to further perform the operations of:
executing a second instruction including a second operand specifying the values associated with the plurality of threads (Dimitrov; Fig. 1C; [0039]: copy the first page of data, i.e. operand, from the second GPU to the first GPU; Fig. 2; [0070]: process a large number of threads in parallel; [0093]: dispatch tasks for execution on the GPCs 250 of the PPU 200; [0096]: processing cores execute instructions; [0086]: transmit instructions to one or more of the functional units; [0087]: process a different set of data based on the same set of instructions; [0095]: the register file 420 provides temporary storage for operands connected to the data paths of the functional units; [0096]: a floating point arithmetic logic unit and an integer arithmetic logic unit), wherein each of the values associated with the plurality of threads comprises one integer (Dimitrov; [0029]: an integer value of N; [0096]: an integer arithmetic logic unit executes an integer), wherein executing the second instruction comprises:
returning a value from the values associated with the set of threads (Dimitrov; Fig. 2; [0070]: process a large number of threads in parallel; [0093]: dispatch tasks for execution on the GPCs 250 of the PPU 200; [0096]: processing cores execute instructions; Fig. 6; [0112]: generate output data) that are selected from the plurality of threads based on the mask (Dimitrov; [0060]: one true state bit in the channel mask; [0061]: the channel mask may include an arbitrary number of bits, e.g., 64 bits, which may be set or cleared to either include or exclude specific threads) that comprises one bit associated with each thread of the plurality of threads (Dimitrov; [0060]: one true state bit in the channel mask; [0061]: the channel mask includes an arbitrary number of bits, e.g., 64 bits) to indicate whether a corresponding value from a thread of the plurality of threads is included to obtain the maximum integer (Dimitrov; [0061]: the channel mask may include an arbitrary number of bits, e.g., 64 bits, which may be set or cleared to either include or exclude specific threads; Fig. 1G; [0066]).
wherein executing the second instruction is to select a maximum value from the values ([0050]: 
    PNG
    media_image4.png
    15
    233
    media_image4.png
    Greyscale
max function selects a maximum value from a set of values; [0051]),
returning the maximum value from the values obtained from with the set of threads (Karras; [0050]: 
    PNG
    media_image5.png
    22
    235
    media_image5.png
    Greyscale
 return maximum value as illustrated in max function in Table 1; [0051]: an overall maximum value parametric variable is selected as the minimum value from a second set of values that includes a maximum value of the parametric variable range for the intersection query, and maximum values of the parametric variable for at least one dimension; [0053]: table 2).
	Dimitrov and in view of Karras, Howers, and Bolz further more discloses a maximum integer (Howes; [0004]: single instruction multiple data; SIMD; [0038]: maximum program counter; Fig. 2; [0056]: a maximum integer MAX_INT as illustrated in Fig. 2).

Regarding to claim 6 (Original), Dimitrov in view of Karras, Howes, and Bolz discloses the non-transitory machine-readable storage medium of claim 1, wherein the machine is caused to further perform the operations of:
generating rays for traversal through a graphics scene (Karras; [0096]: general tree traversal operations; [0099]: intersect a ray with a BVH data structure that represents each of the geometric primitives in a 3D scene or 3D model; [0118]: perform a tree traversal operation for a specific application, such as ray-tracing; [0133]: a ray 690 that is associated with a tree traversal operation);
constructing a hierarchical acceleration data structure comprising a plurality of hierarchically arranged nodes (Karras; [0118]: the tree is implemented as a bounding volume hierarchy; Fig. 6A; Fig. 6B; [0128]: a bounding volume hierarchy; all other nodes in the tree data structure 600 descend from the root node 601; [0129-0130]; [0133]: a ray 690 that is associated with a tree traversal operation); and
traversing one or more of the rays through the hierarchical acceleration data structure and intersecting the one or more of the rays with primitives contained within the hierarchically arranged nodes (Karras; [0119]: the interface 505 may push a root node for the BVH onto the traversal stack data structure; [0120]: the tree traversal operation associated with the ray; [0121]; [0124]: If the root node of the block intersects the ray data structure then each of the child nodes of the root node may be passed to a particular traversal unit 530 to continue traversing the BVH in parallel; [0126]: the geometric primitives included in the result queue were those primitives associated with nodes that intersected the ray; [0133]: a ray 690 that is associated with a tree traversal operation; tay-tracing techniques involve the operation of intersecting a plurality of rays with the geometric primitives of a model).

Regarding to claim 7 (Currently Amended), Dimitrov discloses a non-transitory machine-readable storage medium storing instructions, which when executed by a machine (Fig. 1F; [0054]: a GPU and DRAMs; [0056]: GPU and CPU process the data stored within the DRAMs 168, L2 caches 167, and data stored within memory circuits; [0057-0058]; Fig. 5; [0108]: computer programs and computer control logic algorithms, are stored in the main memory 504; [0109]), cause the machine to perform the operations of:
scheduling instructions of a plurality of threads to be executed by a processor (Fig. 1C; [0039]: the driver schedules a copy command to copy the first page of data from the second GPU to the first GPU; the copy command within a GPU command stream; Fig. 4; [0093]: a scheduler unit 410 manages instruction scheduling for one or more groups of threads; the scheduler unit 410 schedules threads for execution in groups of parallel threads; [0094]: transmit instructions to one or more of the functional units); and
executing a first instruction including a first operand specifying values associated with the plurality of threads (Fig. 1C; [0039]: copy the first page of data, i.e. operand, from the second GPU to the first GPU; Fig. 2; [0070]: process a large number of threads in parallel; [0093]: dispatch tasks for execution on the GPCs 250 of the PPU 200; [0096]: processing cores execute instructions; [0086]: transmit instructions to one or more of the functional units; [0087]: process a different set of data based on the same set of instructions; [0095]: The register file 420 provides temporary storage for operands connected to the data paths of the functional units; [0096]: a floating point arithmetic logic unit and an integer arithmetic logic unit), wherein each of the values associated with the plurality of threads comprises one integer (Dimitrov; [0029]: an integer value of N; [0096]: an integer arithmetic logic unit), and, wherein executing the first instruction comprises: 
returning a value from values associated with a set of threads (Fig. 2; [0070]: process a large number of threads in parallel; [0093]: dispatch tasks for execution on the GPCs 250 of the PPU 200; [0096]: processing cores execute instructions; Fig. 6; [0112]: generate output data) that are selected from the plurality of threads based on a mask ([0060]: one true state bit in the channel mask; [0061]: the channel mask may include an arbitrary number of bits, e.g., 64 bits, which may be set or cleared to either include or exclude specific threads) that comprises one bit associated with each thread of the plurality of threads (Dimitrov; [0060]: one true state bit in the channel mask; [0061]: the channel mask includes an arbitrary number of bits, e.g., 64 bits) to indicate whether a corresponding value from a thread of the plurality of threads is included (Dimitrov; [0061]: the channel mask may include an arbitrary number of bits, e.g., 64 bits, which may be set or cleared to either include or exclude specific threads; Fig. 1G; [0066]).
Dimitrov fails to explicitly disclose: 
wherein executing the first instruction is to select a maximum integer from the values;
returning the maximum integer from values, each value of the values being obtained from one of a set of threads;
to obtain the maximum integer.
In same field of endeavor, Karras teaches: 
wherein executing the first instruction is to select a maximum value from the values ([0050]: 
    PNG
    media_image4.png
    15
    233
    media_image4.png
    Greyscale
max function selects a maximum value from a set of values; [0051]);
returning a maximum value from values associated with a set of threads (Karras; [0050]: 
    PNG
    media_image5.png
    22
    235
    media_image5.png
    Greyscale
 return minimum and maximum value as illustrated min function and max function in Table 1; [0051]: an overall maximum value parametric variable is selected as the minimum value from a second set of values that includes a maximum value of the parametric variable range for the intersection query, and maximum values of the parametric variable for at least one dimension; [0053]: table 2).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Dimitrov to include wherein executing the first instruction is to select a maximum value from the values; returning a maximum value from values associated with a set of threads as taught by Karras. The motivation for doing so would have been to determine whether the query beam intersects the target bounding volume based on at least the parametric variable range for the first dimension; to select an overall minimum value as taught by Karras in paragraphs [0039] and [0050-0053].
Dimitrov and in view of Karras fails to explicitly disclose:
a maximum integer;
returning the maximum integer;
each value of the values being obtained from one of a set of threads;
to obtain the maximum integer.
In same field of endeavor, Howes teaches:
a maximum integer (Howes; [0004]: single instruction multiple data; SIMD; [0038]: maximum program counter; Fig. 2; [0056]: a maximum integer MAX_INT as illustrated in Fig. 2);
returning the maximum integer (Howes; [0004]: single instruction multiple data; SIMD; [0038]: maximum program counter; Fig. 2; [0056]: a maximum integer MAX_INT as illustrated in Fig. 2);
to obtain the maximum integer (Howes; [0004]: single instruction multiple data; SIMD; [0038]: maximum program counter; Fig. 2; [0056]: a maximum integer MAX_INT as illustrated in Fig. 2).
It would have been obvious to one of ordinary skill in the art before the effective filing date of claimed invention to modify Dimitrov and in view of Karras to include a maximum integer; returning the maximum integer; to obtain the maximum integer as taught by Howes. The motivation for doing so would have been to yield substantial improvements in improved processing efficiency and flexibility in programming; to significantly improve the performance of systems; finds the one or more minimum valued entries in that vector as taught by Howes in paragraphs [0011], [0023], [0037], and [0039].
Dimitrov in view of Karras and Howes fails to explicitly disclose:
each value of the values being obtained from one of a set of threads.
In same field of endeavor, Bolz teaches each value of the values being obtained from one of a set of threads ([0047]: the atomic command also specifies a numerical operation to be performed on data stored within the buffer object specified by the atomic command; [0049]: perform 32-bit integer and unsigned integer atomic minimum operation; [0050] perform 32-bit integer and unsigned integer atomic maximum operation; [0060]: atomic operations are useful for synchronizing between threads, e.g. to implement critical sections; collect results from multiple threads running in parallel;  identify and return minimum/maximum values from one of a set threads).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Dimitrov in view of Karras and Howes to include each value of the values being obtained from one of a set of threads as taught by Bolz. The motivation for doing so would have been to improve the performance of the MMU; to perform the atomic commands, such as, 32-bit integer and unsigned integer atomic minimum; to synchronize between threads; to identify and return minimum/maximum values; to enable a dramatic performance increase in CPU-limited applications as taught by Bolz in paragraphs [0023], [0047-0050], [0060], and [0067].

Regarding to claim 8 (Currently amended), the claim limitations are similar to claim limitations recited in claim 2. Therefore, same rational used to reject claim 2 is also used to reject claim 8.

Regarding to claim 10 (Original), the claim limitations are similar to claim limitations recited in claim 4. Therefore, same rational used to reject claim 4 is also used to reject claim 10.

Regarding to claim 11 (Currently Amended), Dimitrov in view of Karras, Howes, and Bolz discloses the non-transitory machine-readable storage medium of claim 7, wherein the machine is caused to further perform the operations of:
executing a second instruction including a second operand specifying the values associated with the plurality of threads (Dimitrov; Fig. 1C; [0039]: copy the first page of data, i.e. operand, from the second GPU to the first GPU; Fig. 2; [0070]: process a large number of threads in parallel; [0093]: dispatch tasks for execution on the GPCs 250 of the PPU 200; [0096]: processing cores execute instructions; [0086]: transmit instructions to one or more of the functional units; [0087]: process a different set of data based on the same set of instructions; [0095]: the register file 420 provides temporary storage for operands connected to the data paths of the functional units; [0096]: a floating point arithmetic logic unit and an integer arithmetic logic unit), wherein each of the values associated with the plurality of threads comprises one integer (Dimitrov; [0029]: an integer value of N; [0096]: an integer arithmetic logic unit executes an integer), wherein executing the second instruction comprises:
returning a value from the values associated with the set of threads (Dimitrov; Fig. 2; [0070]: process a large number of threads in parallel; [0093]: dispatch tasks for execution on the GPCs 250 of the PPU 200; [0096]: processing cores execute instructions; Fig. 6; [0112]: generate output data) that are selected from the plurality of threads based on the mask (Dimitrov; [0060]: one true state bit in the channel mask; [0061]: the channel mask includes an arbitrary number of bits, e.g., 64 bits, which may be set or cleared to either include or exclude specific threads) that comprises one bit associated with each thread of the plurality of threads ( Dimitrov; [0060]: one true state bit in the channel mask; [0061]: the channel mask includes an arbitrary number of bits, e.g., 64 bits) to indicate whether a corresponding value from a thread of the plurality of threads is included (Dimitrov; [0061]: the channel mask may include an arbitrary number of bits, e.g., 64 bits, which may be set or cleared to either include or exclude specific threads; Fig. 1G; [0066]).
Dimitrov in view of Karras, Howes, and Bolz further discloses: 
wherein executing the second instruction is to select a minimum value from the values (Karras; [0050]: 
    PNG
    media_image1.png
    17
    193
    media_image1.png
    Greyscale
the min () function selects a minimum value from a set value with four parameters as illustrated min function in Table 1; [0051]),
returning the minimum value from the values obtained from the set of threads (Karras; [0050]: 
    PNG
    media_image6.png
    20
    237
    media_image6.png
    Greyscale
 return minimum as illustrated min function in Table 1; [0051]: an overall minimum value parametric variable is selected as the maximum value from a first set of values that includes a minimum value of the parametric variable range for the intersection query, and minimum values of the parametric variable for at least one dimension; [0053]: table 2).
Dimitrov in view of Karras, Howes, and Bolz further more discloses:
a minimum integer (Howes; Fig. 2; [0037]: finds runnable work items with the minimum valued program counters; 
    PNG
    media_image3.png
    41
    179
    media_image3.png
    Greyscale
 [0039]: the find_minimum_runnable_pc( ) function accesses the stored program counter vector and finds the one or more minimum valued entries in that vector);
returning the minimum integer (Howes; Fig. 2; [0037]: finds runnable work items with the minimum valued program counters; [0039]: the find_minimum_runnable_pc( ) function accesses the stored program counter vector and finds the one or more minimum valued entries in that vector);
to obtain the minimum integer (Howes; Fig. 2; [0037]: finds runnable work items with the minimum valued program counters; [0039]: the find_minimum_runnable_pc( ) function accesses the stored program counter vector and finds the one or more minimum valued entries in that vector).

Regarding to claim 12 (Original), the claim limitations are similar to claim limitations recited in claim 6. Therefore, same rational used to reject claim 6 is also used to reject claim 12.    

Claims 13-14 are rejected under 35 U.S.C. 103 as being unpatentable over Dimitrov (US 20190206023 A1) in view of Karras (US 20160071310 A1), Howes (US 20140365752 A1), Bolz (US 20110063296 A1), and further in view of Harris (US 20200082491 A1).
Regarding to claim 13 (Previously Presented), Dimitrov in view of Karras, Howes, Bolz discloses the non-transitory machine-readable storage medium of claim 1, 
Dimitrov in view of Karras, Howes, and Bolz fails to explicitly disclose wherein the first instruction is a graphics processor instruction.
In same field of endeavor, Harris teaches wherein the first instruction is a graphics processor instruction ([0139]: the instruction replacement is performed at the instruction preparation stage in the execution pipeline of the graphics processor; the input operand value; [0141]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Dimitrov in view of Karras, Howes, and Bolz to include wherein the first instruction is a graphics processor instruction as taught by Harris. The motivation for doing so would have been to improve performance when executing the shader program; to perform the instruction replacement in the instruction preparation stage in the execution pipeline of the graphics processor as taught by Harris in paragraphs [0039] and [0139].

Regarding to claim 14 (Previously Presented), the claim limitations are similar to claim limitations recited in claim 13. Therefore, same rational used to reject claim 13 is also used to reject claim 14.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Hai Tao Sun whose telephone number is (571)272-5630. The examiner can normally be reached 9:00AM-6:00PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kee Tung can be reached on 5712727794. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/HAI TAO SUN/Primary Examiner, Art Unit 2616