DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
Applicant’s amendment/response filed 12/11/2020 has been entered and made of record. Claims 1, 11, and 19 were amended. Claims 1, 3-7, 9-11, 13-17, 19, and 21-23 are pending in the application.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 3, 9, 11, 13, 19, and 21 is/are rejected under 35 U.S.C. 103 as being unpatentable over Stinger et al. (US 2013/0055166) in view of Ould-Ahmed-Vall et al. (US 2018/0315159), Shah et al. (US 2012/0019542), and Schaff (US 2008/0148190).
Regarding claim 1, Stinger teaches/suggests: A method for indicating resource utilization by a graphics processing unit (GPU) (Stinger [0044]: “The computing 
obtaining, for the GPU, data indicating a hierarchy of architectural units for executing processing threads on the GPU, wherein the hierarchy of architectural units specifies multiple layers of architectural units relating to a plurality of single instruction multiple data (SIMD) modules (Stinger [0072]: “FIG. 5 illustrates an example of a hierarchy that may be used to organize information collected from a computing system. While this hierarchy may be purely logical, it may also be displayed as a tree in tree view area 402;” [0043]: “In embodiments where the computing device 300 includes one or more processing units 321, or a processing unit 321 including one or more processing cores, the processors can execute a single instruction simultaneously on multiple pieces of data (SIMD).”); 
displaying, via an interface, indications representative of the hierarchy of architectural units including the multiple layers of architectural units (Stinger [0072]: “The example of FIG. 5 shows information about computing system 500 (labeled "Computer A"). The information about computing system 500 may be sub-divided into two groups: hardware information 501 and software information 502. The hardware information group 501 includes processor information group 511, memory information group 512, and BIOS information group 512. Examples of processor information include the type of processor(s) used by the computing system and current or historical processor utilization.”); 
Stinger further teaches/suggests a collection of threads for executing on the GPU (Stinger [0064]: “Examples of status information for a computing system include processor load, temperature, error logs or error messages, available memory, a sampling of a server's response time to various requests, the number of simultaneous users of a system, local or remote storage usage, logs of application changes, the number of busy threads, numbers of computing requests that are ready to be executed, queued to be executed, or currently being executed, etc.”). Stinger does not teach/suggest a slot, which is one of multiple slots concurrently executed by a single instruction multiple data (SIMD) module, assigned to a collection of threads for executing on the GPU, wherein the SIMD module is capable of concurrently executing multiple collections of threads. Ould-Ahmed-Vall, however, teaches/suggests a slot, which is one of multiple slots concurrently executed by a single instruction multiple data (SIMD) module, assigned to a collection of threads for executing on the GPU, wherein the SIMD module is capable of concurrently executing multiple collections of threads (Ould-Ahmed-Vall [0077]-[0078]: “The GPGPU cores 262 can each include floating point units (FPUs) and/or integer arithmetic logic units (ALUs) that are used to execute instructions of the graphics multiprocessor 234 ... In one embodiment the GPGPU cores 262 include SIMD logic capable of performing a single instruction on multiple sets of data;” [0215]: “To increase SIMD utilization, one embodiment divides eight SIMD lanes into two SIMD4 slots (e.g., SIMD4 slot 1910, SIMD4 slot 1912). The SIMD4 slots can be filled in a variety of ways. In one embodiment, two separate SIMD threads (SIMD thread 1902, SIMD thread 1904) that combine to cover a total of four SIMD lanes are assigned to a SIMD4 slot (e.g., SIMD4 slot 1910). In one embodiment, the SIMT thread group 1906 can be assigned to a SIMD4 slot 1912.”). At the time of the effective filing, it would have been obvious for one of ordinary skill in the art to modify the SIMD processing cores of Stinger to include SIMD slots and threads as taught/suggested by Ould-Ahmed-Vall in order to increase SIMD utilization.

Stinger as modified by Ould-Ahmed-Vall does not teach/suggest:
receiving an indication of a slot; 
determining, based on the data indicating the hierarchy of architectural units and based on the indication of the slot, a first architectural unit of a first layer of the multiple layers to which the slot is assigned, wherein the first architectural unit is the SIMD module; 
determining, based on the data indicating the hierarchy of architectural units, a second architectural unit of a second layer of the multiple layers that includes the first architectural unit; and 
highlighting, via the interface and based on the determining the first architectural unit and the determining the second architectural unit, a first indication of the indications representing the first architectural unit as executing the collection of threads and a second indication of the indications representing the second architectural unit that includes the first architectural unit.

receiving an indication of a slot (Ould-Ahmed-Vall [0215]: “To increase SIMD utilization, one embodiment divides eight SIMD lanes into two SIMD4 slots (e.g., SIMD4 slot 1910, SIMD4 slot 1912). The SIMD4 slots can be filled in a variety of ways. In one embodiment, two separate SIMD threads (SIMD thread 1902, SIMD thread 1904) that combine to cover a total of four SIMD lanes are assigned to a SIMD4 slot (e.g., SIMD4 slot 1910). In one embodiment, the SIMT thread group 1906 can be assigned to a SIMD4 slot 1912;” Shah [0012]: “In one implementation, this LBPW technique monitors the number of arithmetic logic unit (ALU) instructions and fetch instructions executed within each SIMD. Additionally, newly assigned thread loads (i.e. wavefronts) are queued and are monitored. This monitoring is used to assess current and future utilization of the SIMDs.”); 
At the time of the effective filing, it would have been obvious for one of ordinary skill in the art to modify the FPUs and/or ALUs (the claimed SIMD modules) of Stinger as modified by Ould-Ahmed-Vall to be monitored as taught/suggested by Shah in order to assess SIMD utilization. As such, Stinger as modified by Ould-Ahmed-Vall and Shah teaches/suggests:
determining, based on the data indicating the hierarchy of architectural units and based on the indication of the slot, a first architectural unit of a first layer of the multiple layers to which the slot is assigned, wherein the first architectural unit is the SIMD module (Stinger [0072]: “The example of FIG. 5 shows information about computing system 500 (labeled "Computer A"). The information about computing system 500 may be sub-divided into two groups: hardware information 501 and software information 502. The hardware information group 501 includes processor information group 511, memory information group 512, and BIOS information group 512. Examples of processor information include the type of processor(s) used by the computing system and current or historical processor utilization;” Ould-Ahmed-Vall [0215]: “To increase SIMD utilization, one embodiment divides eight SIMD lanes into two SIMD4 slots (e.g., SIMD4 slot 1910, SIMD4 slot 1912). The SIMD4 slots can be filled in a variety of ways. In one embodiment, two separate SIMD threads (SIMD thread 1902, SIMD thread 1904) that combine to cover a total of four SIMD lanes are assigned to a SIMD4 slot (e.g., SIMD4 slot 1910). In one embodiment, the SIMT thread group 1906 can be assigned to a SIMD4 slot 1912;” Shah [0012]: “In one implementation, this LBPW technique monitors the number of arithmetic logic unit (ALU) instructions and fetch instructions executed within each SIMD. Additionally, newly assigned thread loads (i.e. wavefronts) are queued and are monitored. This monitoring is used to assess current and future utilization of the SIMDs.”); 
determining, based on the data indicating the hierarchy of architectural units, a second architectural unit of a second layer of the multiple layers that includes the first architectural unit (Stinger [0072]: “The example of FIG. 5 shows information about computing system 500 (labeled "Computer A"). The information about computing system 500 may be sub-divided into two groups: hardware information 501 and software information 502. The hardware information group 501 includes processor information group 511, memory information group 512, and BIOS information group 512. Examples of processor information include the type of processor(s) used by the computing system and current or historical processor utilization;” Ould-Ahmed-Vall [0215]: “To increase SIMD utilization, one embodiment divides eight SIMD lanes into two SIMD4 slots (e.g., SIMD4 slot 1910, SIMD4 slot 1912). The SIMD4 slots can be filled in a variety of ways. In one embodiment, two separate SIMD threads (SIMD thread 1902, SIMD thread 1904) that combine to cover a total of four SIMD lanes are assigned to a SIMD4 slot (e.g., SIMD4 slot 1910). In one embodiment, the SIMT thread group 1906 can be assigned to a SIMD4 slot 1912;” Shah [0012]: “In one implementation, this LBPW technique monitors the number of arithmetic logic unit (ALU) instructions and fetch instructions executed within each SIMD. Additionally, newly assigned thread loads (i.e. wavefronts) are queued and are monitored. This monitoring is used to assess current and future utilization of the SIMDs.”); and 
increase SIMD utilization, one embodiment divides eight SIMD lanes into two SIMD4 slots (e.g., SIMD4 slot 1910, SIMD4 slot 1912). The SIMD4 slots can be filled in a variety of ways. In one embodiment, two separate SIMD threads (SIMD thread 1902, SIMD thread 1904) that combine to cover a total of four SIMD lanes are assigned to a SIMD4 slot (e.g., SIMD4 slot 1910). In one embodiment, the SIMT thread group 1906 can be assigned to a SIMD4 slot 1912;” Shah [0012]: “In one implementation, this LBPW technique monitors the number of arithmetic logic unit (ALU) instructions and fetch instructions executed within each SIMD. Additionally, newly assigned thread loads (i.e. wavefronts) are queued and are monitored. This monitoring is used to assess current and future utilization of the SIMDs.”).

Stinger as modified by Ould-Ahmed-Vall and Shah does not teach/suggest highlighting. Schaff, however, teaches/suggests highlighting (Schaff [0019]: “A path within the expandable hierarchical tree structure interface leading to the target location is determined at 106. The path is a traversal through a subset of the plurality of nodes within the expandable hierarchical tree structure interface. At 108, each node in the path is visually highlighted to direct the user to the target location. A node in the path is visually highlighted responsive to the node being graphically displayed to the user.”). At the time of the effective filing, it would have been obvious for one of ordinary skill in the art to modify the tree view of Stinger as modified by Ould-Ahmed-Vall and Shah such that those FPUs and/or ALUs in execution are highlighted as taught/suggested by Schaff in order to direct the user to them.

Regarding claim 3, Stinger as modified by Ould-Ahmed-Vall, Shah, and Schaff teaches/suggests: The method of claim 1, wherein receiving the information comprises receiving information of multiple slots assigned to a plurality of collections of threads, wherein determining the first architectural unit comprises determining, for each of the multiple slots and based on the data indicating the hierarchy of architectural units, the corresponding architectural unit to which the slot is assigned, and wherein highlighting the first indication is part of highlighting multiple ones of the indications of each of the corresponding architectural units (Ould-Ahmed-Vall [0215]: “To increase SIMD utilization, one embodiment divides eight SIMD lanes into two SIMD4 slots (e.g., SIMD4 slot 1910, SIMD4 slot 1912). The SIMD4 slots can be filled in a variety of ways. In one embodiment, two separate SIMD threads (SIMD thread 1902, SIMD thread 1904) that combine to cover a total of four SIMD lanes are assigned to a SIMD4 slot (e.g., SIMD4 slot 1910). In one embodiment, the SIMT thread group 1906 can be assigned to a SIMD4 slot 1912;” Shah [0012]: “In one implementation, this LBPW technique monitors the number of arithmetic logic unit (ALU) instructions and fetch instructions executed within each SIMD. Additionally, newly assigned thread loads (i.e. wavefronts) are queued and are monitored. This monitoring is used to assess current and future utilization of the SIMDs.”). The same rationale to combine as set forth in the rejection of claim 1 above is incorporated herein.

Regarding claim 9, Stinger as modified by Ould-Ahmed-Vall, Shah, and Schaff teaches/suggests: The method of claim 1, wherein displaying the indications comprises displaying, based on the data indicating the hierarchy of architectural units, a representation of the hierarchy of architectural units that includes one or more labels for one or more of the hierarchy of architectural units as indicated in the data (Stinger [0072]: “The example of FIG. 5 shows information about computing system 500 (labeled "Computer A").”).

Claims 11 and 13 recite limitations similar in scope to those of claims 1 and 3, respectively, and are rejected using the same rationales. Stinger as modified by Ould-Ahmed-Vall, Shah, and Schaff further teaches/suggests a memory storing one or more parameters or instructions for executing an operating system and one or more applications including a tracking application; and at least one processor coupled to the memory (Stinger [0022]: “Software may be stored within memory 115 and/or other storage to provide instructions to processor 103 for enabling generic computing device 101 to perform various functions.”).

Claims 19 and 21 recite limitations similar in scope to those of claims 1 and 3, respectively, and are rejected using the same rationales. Stinger as modified by Ould-Ahmed-Vall, Shah, and Schaff a computer-readable medium, comprising code executable by one or more processors (Stinger [0022]: “Software may be stored within memory 115 and/or other storage to provide instructions to processor 103 for enabling generic computing device 101 to perform various functions.”).

Claims 4-7, 14-17, and 22-23 is/are rejected under 35 U.S.C. 103 as being unpatentable over Stinger et al. (US 2013/0055166) in view of Ould-Ahmed-Vall et al. (US 2018/0315159), Shah et al. (US 2012/0019542), and Schaff (US 2008/0148190) as applied to claims 1, 11, and 19 above, and further in view of Brackman (US 2012/0272224).
Regarding claim 4, Stinger as modified by Ould-Ahmed-Vall, Shah, and Schaff does not teach/suggest: The method of claim 1, further comprising requesting, from a graphics driver specific to the GPU, the data indicating the hierarchy of architectural units, wherein obtaining the data comprises obtaining the data from the graphics driver based on the requesting. Brackman, however, teaches/suggests requesting from a graphics driver specific to the GPU (Brackman [0062]: “GPU driver 112 represents an instructions that, when executed, cause CPU 92 to provide an interface by which to communicate with GPU 94.”). At the time of the effective filing, it would have been obvious for one of ordinary skill in the art to modify the system of Stinger to include a graphic driver as taught/suggested by Brackman in order to communicate with the GPU.

As such, Stinger as modified by Ould-Ahmed-Vall, Shah, Schaff, and Brackman teaches/suggests requesting, from a graphics driver specific to the GPU, the data indicating the hierarchy of architectural units, wherein obtaining the data comprises obtaining the data from the graphics driver based on the requesting (Stinger [0072]: “The example of FIG. 5 shows information about computing system 500 (labeled "Computer A"). The information about computing system 500 may be sub-divided into two groups: hardware information 501 and software information 502. The hardware information group 501 includes processor information group 511, memory information group 512, and BIOS information group 512. Examples of processor information include the type of processor(s) used by the computing system and current or historical processor utilization;” Brackman [0062]: “GPU driver 112 represents an instructions that, when executed, cause CPU 92 to provide an interface by which to communicate with GPU 94.”).

Regarding claim 5, Stinger as modified by Ould-Ahmed-Vall, Shah, Schaff, and Brackman teaches/suggests: The method of claim 4, wherein the data indicates the multiple layers of architectural units where a last layer includes SIMD modules of the GPU, including the SIMD module (Stinger [0072]: “The example of FIG. 5 shows information about computing system 500 (labeled "Computer A"). The information about computing system 500 may be sub-divided into two groups: hardware information 501 and software information 502. The hardware information group 501 includes processor information group 511, memory information group 512, and BIOS information group 512. Examples of processor information include the type of processor(s) used by the computing system and current or historical processor utilization;” Ould-Ahmed-Vall [0077]-[0078]: “The GPGPU cores 262 can each include floating point units (FPUs) and/or integer arithmetic logic units (ALUs) that are used to execute instructions of the graphics multiprocessor 234 ... In one embodiment the GPGPU cores 262 include SIMD logic capable of performing a single instruction on multiple sets of data.”). The same rationale to combine as set forth in the rejection of claim 1 above is incorporated herein.

Regarding claim 6, Stinger as modified by Ould-Ahmed-Vall, Shah, Schaff, and Brackman teaches/suggests: The method of claim 5, wherein the multiple layers include at least a computational unit layer that includes a plurality of SIMD modules, and an engine layer that includes one or more computational unit layers (Stinger [0043]: “In embodiments where the computing device 300 includes one or more processing units 321, or a processing unit 321 including one or more processing cores, the processors can execute a single instruction simultaneously on multiple pieces of data (SIMD);” Ould-Ahmed-Vall [0077]-[0078]: “The GPGPU cores 262 can each include floating point units (FPUs) and/or integer arithmetic logic units (ALUs) that are used to execute instructions of the graphics multiprocessor 234 ... In one embodiment the GPGPU cores 262 include SIMD logic capable of performing a single instruction on multiple sets of data.”). The FPUs and/or ALUs meet the claimed computational unit layer; the SIMD processing cores meet the claimed engine layer. The same rationale to combine as set forth in the rejection of claim 1 above is incorporated herein.

Regarding claim 7, Stinger as modified by Ould-Ahmed-Vall, Shah, Schaff, and Brackman teaches/suggests: The method of claim 6, wherein determining the first architectural unit comprises determining the SIMD module, a computational unit from the computational unit layer that includes the SIMD module, and an engine from the engine layer that include the computational unit (Stinger [0043]: “In embodiments where the computing device 300 includes one or more processing units 321, or a processing unit 321 including one or more processing cores, the processors can execute a single instruction simultaneously on multiple pieces of data (SIMD);” Ould-Ahmed-Vall [0077]-[0078]: “The GPGPU cores 262 can each include floating point units (FPUs) and/or integer arithmetic logic units (ALUs) that are used to execute instructions of the graphics multiprocessor 234 ... In one embodiment the GPGPU cores 262 include SIMD logic capable of performing a single instruction on multiple sets of data.”). The same rationale to combine as set forth in the rejection of claim 1 above is incorporated herein.

Claims 14-17 recite limitations similar in scope to those of claims 4-7, respectively, and are rejected using the same rationales.

Claims 22 and 23 recite limitations similar in scope to those of claims 4 and 5, respectively, and are rejected using the same rationales.

Claim 10 is/are rejected under 35 U.S.C. 103 as being unpatentable over Stinger et al. (US 2013/0055166) in view of Ould-Ahmed-Vall et al. (US 2018/0315159), Shah et al. (US 2012/0019542), and Schaff (US 2008/0148190) as applied to claim 1 above, and further in view of Du et al. (US 2018/0213232).
Regarding claim 10, Stinger as modified by Ould-Ahmed-Vall, Shah, and Schaff does not teach/suggest: The method of claim 1, further comprising executing a playback of GPU rendering instructions, wherein receiving the information of the slot comprises receiving the information during the playback based on executing the collection of threads. Du, however, teaches/suggests executing a playback of GPU rendering instructions (Du [0037]: “As shown in FIG. 1, a cloud game is run on the trace end. The trace end may capture each graphical interface drawing operation, record a rendering instruction, to generate graphical instruction data, and send the graphical instruction data to the retrace end by using a network. The retrace end receives the instruction data and parses and plays back the rendering instruction, invokes a related graphical drawing application programming interface (API) to draw an image, and plays an interface of the cloud game.”). At the time of the effective filing, it would have been obvious for one of ordinary skill in the art to modify the GPU of Stinger to play back rendering instructions as taught/suggested by Du in order for tracing.

As such, Stinger as modified by Ould-Ahmed-Vall, Shah, Schaff, and Du teaches/suggests executing a playback of GPU rendering instructions, wherein receiving the information of the slot comprises receiving the information during the playback based on executing the collection of threads (Ould-Ahmed-Vall [0215]: “To increase SIMD utilization, one embodiment divides eight SIMD lanes into two SIMD4 slots (e.g., SIMD4 slot 1910, SIMD4 slot 1912). The SIMD4 slots can be filled in a variety of ways. In one embodiment, two separate SIMD threads (SIMD thread 1902, SIMD thread 1904) that combine to cover a total of four SIMD lanes are assigned to a SIMD4 slot (e.g., SIMD4 slot 1910). In one embodiment, the SIMT thread group 1906 can be assigned to a SIMD4 slot 1912;” Shah [0012]: “In one implementation, this LBPW technique monitors the number of arithmetic logic unit (ALU) instructions and fetch instructions executed within each SIMD. Additionally, newly assigned thread loads (i.e. wavefronts) are queued and are monitored. This monitoring is used to assess current and future utilization of the SIMDs;” Du [0037]: “As shown in FIG. 1, a cloud game is run on the trace end. The trace end may capture each graphical interface drawing operation, record a rendering instruction, to generate graphical instruction data, and send the graphical instruction data to the retrace end by using a network. The retrace end receives the instruction data and parses and plays back the rendering instruction, invokes a related graphical drawing application programming interface (API) to draw an image, and plays an interface of the cloud game.”).
Response to Arguments
Applicant's arguments filed 12/11/2020 have been fully considered but they are moot. Specifically, Applicant’s arguments regarding “receiving an indication of a slot, which is one of multiple slots concurrently executed by a single instruction multiple data (SIMD) module, assigned to a collection of threads for executing on the GPU” are moot in view of the new ground(s) of the rejection set forth in this Office action.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
US 2008/0143730 – SIMD slots and threads
US 2018/0307606 – monitoring slot addresses
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a). 
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ANH-TUAN V NGUYEN whose telephone number is 571-270-7513. The examiner can normally be reached on M-F 9AM-5PM EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Barry Drennan can be reached on 571-270-7262. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/ANH-TUAN V NGUYEN/
Primary Examiner, Art Unit 2618