Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-2 and 11-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Goldman et al. (U.S. PGPUB 20190213706) in view of Williams (U.S. PGPUB 20130106880) and further in view of Andre et al. (U.S. PGPUB 20080266302).
With respect to claim 1, Goldman et al. disclose a computing system (paragraph 26, As shown in FIG. 2, operating environment 200) comprising:
a memory configured to store a shader program (paragraph 26, The application 114 may include an original non-instrumented application that includes original API-based code (OAC) 206 (e.g., for implementing various compute operations, a graphics renderer, a graphics shader, paragraph 37, any of the instruction-level GPU profiling framework 108, the profiling application 110, the binary instrumentation module 112, the GPU driver 120, the example application 114, the GPU compiler 202, the GPU hardware 204, the memory 218, the application interface 252, the compiler interface 254, the instrumentation interface 256, the GPU interface 258, the schema interface 262, the instruction inserter 264, the driver interface 266, and/or the memory interface 268 may be implemented by one or more analog or digital circuit(s), logic circuits, programmable processor(s), application specific integrated circuit(s) (ASIC(s)), programmable logic device(s) (PLD(s)) and/or field programmable logic device(s) (FPLD(s))); and
a graphics processing unit (GPU) configured to obtain the shader program stored in the memory in a profile mode (paragraph 26, The application 114 may communicate with the corresponding GPU driver 120 (or a runtime system API) as defined by the specific graphics API interface(s), paragraph 27, After the OAC 206 is compiled, the resulting OBC 208 may be in form for execution by the GPU hardware device 204), the GPU being configured to perform:
inserting, into the shader program, one or more monitor associative codes (paragraph 30, a high-level API-based user-specified performance profiling parameter in the instrumentation schema 212 may instruct the binary instrumentation module 112 to insert profiling instructions at particular locations of target object code that measure different aspects of high-level graphics operations (e.g., different aspects of a graphics renderer, different aspects of a graphics shader, different aspects of a graphics compute kernel, etc.). The different aspects may include the performance of one or more move instructions, one or more add instructions, one or more multiply instructions, one or more shift instructions, etc. and/or any combination of machine instruction-level instructions that make up different portions of high-level graphics operations);
compiling the shader program, into which the one or more monitor associative codes are inserted, into a language that is capable of being processed by a plurality of cores (paragraph 32, During the binary instrumentation process, the binary instrumentation module 112 may obtain the performance profiling parameter settings or configurations from the instrumentation schema 212 to identify the types of profiling instructions to insert in the OBC 208 and locations in the OBC 208 at which to insert the profiling instructions to generate example instrumented binary code (IBC) 124, paragraph 35, the GPU driver 120 may be provided with or otherwise associated with an application interface 252, a compiler interface 254, an instrumentation interface 256, and/or a GPU interface 258 to enable the GPU driver 120 to receive, arbitrate, and send ones of the OAC 206, OBC 208, and IBC 124 from and/or to ones of the example application 114, the example GPU compiler 202, the example GPU hardware device 204, and the example binary instrumentation module 112, paragraph 53, The computing architecture 800 includes various common computing elements, such as one or more processors, multi-core processors); and
obtaining a runtime performance characteristic of the shader program by executing the compiled shader program and the one or more monitor associative codes (paragraph 33, when the GPU hardware device 204 executes the IBC 124, the IBC 124 causes the GPU hardware device 204 to perform the graphics operations programmed in the OBC 208 and also causes the GPU hardware device 204 to generate and collect profiling data based on the instrumented profiling instructions. In the illustrated example of FIG. 2, the collected profiling data is shown as generated profiling data (GPD) 216). However, Goldman et al. do not expressly disclose each of the plurality of code blocks includes a plurality of codes to be sequentially executed, and a first code block among the plurality of code blocks comprises a first code for requesting a resource 
Williams, who also deals with modifying shader code, discloses a method wherein each of the plurality of code blocks includes a plurality of codes to be sequentially executed (paragraph 65, the code generator 300 can rewrite a shader program to generate indications of which blocks and/or branches of the code have been hit. In an example implementation, a block of code is defined as a section containing instructions that always execute sequentially), and wherein a first code block among the plurality of code blocks comprises a first code (paragraph 68, The user can insert the source code for a vertex shader program into section 354, and the source code for a fragment shader program into section 356), and the inserting comprises inserting [additional] code at a position after the first code (paragraph 69, The user can then define one or several test suite routines, such as test suite routines 360 and 362, test suite code is inserted after the vertex shader code and fragment shader code).
	Goldman et al. and Williams are in the same field of endeavor, namely computer graphics.
	Before the effective filing date of the claimed invention, it would have been obvious to apply the method wherein each of the plurality of code blocks includes a plurality of codes to be sequentially executed, wherein a first code block among the plurality of code blocks comprises a first code, and the inserting comprises inserting additional code at a position after the first code, as taught by Williams, to insert the monitor associative code of Goldman et al. after the first code, because the code generator 300 may define a global integer that stores code coverage state for a 
	Andre et al., who also deal with graphics code, disclose a method wherein a first code block among the plurality of code blocks comprises a first code for requesting a resource shared by at least two or more cores of the plurality of cores (paragraph 121, program code causes the enablement of embodiments of the present invention, including the following embodiments: (i) the functions of the systems and techniques disclosed herein (such as granting a GPU in a multi-GPU environment controlled access to a shared resource); functions accomplished and/or structure provided by the systems and techniques described above can be represented in a core (such as a GPU core), thus requesting a resource among the plurality of cores in a multi-GPU environment).
	Goldman et al., Williams, and Andre et al. are in the same field of endeavor, namely computer graphics.
	Before the effective filing date of the claimed invention, it would have been obvious to apply the method wherein a first code block among the plurality of code blocks comprises a first code for requesting a resource shared by at least two or more cores of the plurality of cores, as taught by Andre et al., to the Goldman et al. as modified by Williams system, because this would provide a mechanism for granting controlled access to a shared resource, without requiring a single point of control (paragraph 8 of Andre et al.).
the code generator 300 can rewrite a shader program to generate indications of which blocks and/or branches of the code have been hit. In an example implementation, a block of code is defined as a section containing instructions that always execute sequentially. The code generator 300 accordingly can insert instructions 322 into each block of code with a corresponding numerical identifier of the block). Goldman et al. in combination with Williams disclose inserting monitor associative codes, as in claim 1.
	With respect to claim 11, Goldman et al. as modified by Williams and Andre et al. disclose the computing system of claim 1, wherein the GPU further comprises a shared memory configured to store the resource shared by the at least two or more cores of the plurality of cores, and the obtaining the runtime performance characteristic comprises storing the obtained runtime performance characteristic in the shared memory (Goldman et al.: paragraph 33, Based on the instrumented profiling instructions in the IBC 124, the GPU hardware device 204 stores the GPD 216 at one or more locations in memory 218 specified by the instrumented profiling instructions. The instrumented profiling instructions may cause the GPU hardware device 204 to allocate memory space in the memory 218 at which to store the GPD 216, Goldman et al.: paragraph 37, the instruction-level GPU profiling framework 108, the profiling application 110, the binary instrumentation module 112, the GPU driver 120, the application 114, the GPU compiler 202, the GPU hardware 204, the memory 218, the application interface 252, the compiler interface 254, the instrumentation interface 256, the GPU interface 258, the schema interface 262, the instruction inserter 264, the driver interface 266, and/or the memory interface 268 of FIG. 1 and/or FIG. 2 may be implemented by hardware, software, firmware and/or any combination of hardware, software and/or firmware). Goldman et al. in combination with Andre et al. disclose the resource shared by the at least two or more cores, as in claim 1.
	With respect to claim 12, Goldman et al. as modified by Williams and Andre et al. disclose the computing system of claim 11, wherein the GPU is further configured to output the runtime performance characteristic stored in the shared memory according to termination of execution of the shader program (Goldman et al.: paragraph 34, after completion of execution of the IBC 124 (e.g., during or after execution of a portion of the application 114, during or after a draw command, after completing processing of a command buffer, etc.), the profiling application 110 may operate with the binary instrumentation module 112 to retrieve and access the GPD 216 from the memory 218. In the illustrated embodiment, the profiling application 110 may display performance measures based on the GPD 216 via a user interface).
	With respect to claim 13, Goldman et al. as modified by Williams and Andre et al. disclose the computing system of claim 1, wherein the one or more monitor associative codes comprise a monitor code for directly monitoring a performance characteristic of a code block into which the one or more monitor associative codes are inserted (Goldman et al.: paragraph 30, a high-level API-based user-specified performance profiling parameter in the instrumentation schema 212 may instruct the binary instrumentation module 112 to insert profiling instructions at particular locations of target object code that measure different aspects of high-level graphics operations (e.g., different aspects of a graphics renderer, different aspects of a graphics shader, different aspects of a graphics compute kernel, etc.). The different aspects may include the performance of one or more move instructions, one or more add instructions, one or more multiply instructions, one or more shift instructions, etc. and/or any combination of machine instruction-level instructions that make up different portions of high-level graphics operations).
With respect to claim 14, Goldman et al. as modified by Williams and Andre et al. disclose the computing system of claim 1, wherein the one or more monitor associative codes comprise a branch code that instructs to branch (Williams: paragraph 65, the code generator 300 can rewrite a shader program to generate indications of which blocks and/or branches of the code have been hit) into a general purpose monitor code, the general purpose monitor code being used for monitoring a performance characteristic of a code block (Goldman et al.: paragraph 30, a high-level API-based user-specified performance profiling parameter in the instrumentation schema 212 may instruct the binary instrumentation module 112 to insert profiling instructions at particular locations of target object code that measure different aspects of high-level graphics operations (e.g., different aspects of a graphics renderer, different aspects of a graphics shader).
With respect to claim 15, Goldman et al. as modified by Williams and Andre et al. disclose a method of operating a computing system, the method comprising:
outputting a shader program including a plurality of code blocks, in response to a profile command including a graphics command (Goldman et al.: paragraph 26, The application 114 may include an original non-instrumented application that includes original API-based code (OAC) 206 (e.g., for implementing various compute operations, a graphics renderer, a graphics shader, Goldman et al.: paragraph 27, the GPU compiler 202 receives and compiles the OAC 206 to generate example original binary code (OBC) 208 (e.g., in the form of a file), Goldman et al.: paragraph 29, the GPU driver 120 may be configured to reroute the OBC 208 to the binary instrumentation module 112 so that the binary instrumentation module 112 can instrument the OBC 208 for performance profiling by inserting machine instruction-level profiling instructions into the OBC 208 to generate the example IBC 124);
inserting monitor associative codes into each of at least part of the plurality of code blocks (Goldman et al.: paragraph 30, a high-level API-based user-specified performance profiling parameter in the instrumentation schema 212 may instruct the binary instrumentation module 112 to insert profiling instructions at particular locations of target object code that measure different aspects of high-level graphics operations (e.g., different aspects of a graphics renderer, different aspects of a graphics shader, different aspects of a graphics compute kernel, etc.). The different aspects may include the performance of one or more move instructions, one or more add instructions, one or more multiply instructions, one or more shift instructions, etc. and/or any combination of machine instruction-level instructions that make up different portions of high-level graphics operations);
compiling the shader program, into which the monitor associative codes are inserted (Goldman et al.: paragraph 35, the GPU driver 120 may be provided with or otherwise associated with an application interface 252, a compiler interface 254, an instrumentation interface 256, and/or a GPU interface 258 to enable the GPU driver 120 to receive, arbitrate, and send ones of the OAC 206, OBC 208, and IBC 124 from and/or to ones of the example application 114, the example GPU compiler 202, the example GPU hardware device 204, and the example binary instrumentation module 112);
obtaining a runtime performance characteristic of the shader program by executing the plurality of code blocks and the monitor associative codes in a plurality of cores (Goldman et al.: paragraph 33, Since the IBC 124 of the illustrated example includes the original code of the OBC 208 and the instrumented profiling instructions inserted by the binary instrumentation module 112, when the GPU hardware device 204 executes the IBC 124, the IBC 124 causes the GPU hardware device 204 to perform the graphics operations programmed in the OBC 208 and also causes the GPU hardware device 204 to generate and collect profiling data based on the instrumented profiling instructions);
recording the obtained runtime performance characteristic of the shader program in a memory shared by the plurality of cores (Goldman et al.: paragraph 33, Based on the instrumented profiling instructions in the IBC 124, the GPU hardware device 204 stores the GPD 216 at one or more locations in memory 218 specified by the instrumented profiling instructions. For example, the instrumented profiling instructions may cause the GPU hardware device 204 to allocate memory space in the memory 218 at which to store the GPD 216); and
outputting the recorded runtime performance characteristic of the shader program in response to termination of the shader program (Goldman et al.: paragraph after completion of execution of the IBC 124 (e.g., during or after execution of a portion of the application 114, during or after a draw command, after completing processing of a command buffer, etc.), the profiling application 110 may operate with the binary instrumentation module 112 to retrieve and access the GPD 216 from the memory 218. In the illustrated embodiment, the profiling application 110 may display performance measures based on the GPD 216 via a user interface. In some embodiments, the profiling application 110 may apply one or more different types of analyses to the GPD 216 and display results of such analyses via a user interface),
wherein each of the plurality of code blocks includes a plurality of codes to be sequentially executed (Williams: paragraph 65, the code generator 300 can rewrite a shader program to generate indications of which blocks and/or branches of the code have been hit. In an example implementation, a block of code is defined as a section containing instructions that always execute sequentially), and wherein a first code block among the plurality of code blocks comprises a first code (Williams: paragraph 68, The user can insert the source code for a vertex shader program into section 354, and the source code for a fragment shader program into section 356), and the inserting comprises inserting [additional] code at a position after the first code (Williams: paragraph 69, The user can then define one or several test suite routines, such as test suite routines 360 and 362, test suite code is inserted after the vertex shader code and fragment shader code). Goldman et al. in combination with Williams disclose inserting monitor associative codes, as in claim 1. Goldman et al. in combination with Andre et al. disclose the code for requesting a resource shared by the at least two or more cores of the plurality of cores, as in claim 1.

	With respect to claim 16, Goldman et al. as modified by Williams and Andre et al. disclose the method of claim 15, wherein the obtaining comprises generating a time stamp that corresponds to a performance time of a code block including the monitor associative codes according to execution of the monitor associative codes (Goldman et al.: paragraph 31, the resulting instrumentation of the OBC 208 with the time-stamp start/stop read (or counter start/stop read) profiling instructions added at corresponding instruction insertion points can be used to measure an execution duration (e.g., in a time unit of measure or in GPU clock cycles) of the bounded code sequence inclusive of the machine instructions A and B).
	With respect to claim 17, Goldman et al. as modified by Williams and Andre et al. disclose the method of claim 15, further comprising generating performance data about the shader program based on the output runtime performance characteristic of the shader program (Goldman et al.: paragraph 31, the resulting instrumentation of the OBC 208 with the time-stamp start/stop read (or counter start/stop read) profiling instructions added at corresponding instruction insertion points can be used to measure an execution duration (e.g., in a time unit of measure or in GPU clock cycles) of the bounded code sequence inclusive of the machine instructions A and B. Alternatively, an instruction insertion statement may specify to measure a particular performance parameter (e.g., an execution duration) for a code sequence bound by machine instructions A and B in the OBC 208). 
	With respect to claim 18, Goldman et al. as modified by Williams and Andre et al. disclose the method of claim 17, wherein the generating comprises analyzing at least the resulting instrumentation of the OBC 208 with the time-stamp start/stop read (or counter start/stop read) profiling instructions added at corresponding instruction insertion points can be used to measure an execution duration (e.g., in a time unit of measure or in GPU clock cycles) of the bounded code sequence inclusive of the machine instructions A and B. Alternatively, an instruction insertion statement may specify to measure a particular performance parameter (e.g., an execution duration) for a code sequence bound by machine instructions A and B in the OBC 208), efficiency of the shader program, instruction issue-efficiency, and a statistics value regarding whether a branching condition is satisfied for each branch point, based on the output runtime performance characteristic of the shader program. 
	With respect to claim 19, Goldman et al. as modified by Williams and Andre et al. disclose the method of claim 15, wherein the runtime performance characteristic of the shader program comprise information about at least one of a shader program number, a time stamp (Goldman et al.: paragraph 31, the resulting instrumentation of the OBC 208 with the time-stamp start/stop read (or counter start/stop read) profiling instructions added at corresponding instruction insertion points can be used to measure an execution duration (e.g., in a time unit of measure or in GPU clock cycles) of the bounded code sequence inclusive of the machine instructions A and B), mask execution, a resource score board, branch condition satisfaction/non-satisfaction, and an invocation count.
	With respect to claim 20, Goldman et al. as modified by Williams and Andre et al. disclose a graphics processing unit (GPU) (Goldman et al.: paragraph 26, the GPU driver 120 is in communication with the application 114, the binary instrumentation module 112, a GPU compiler 202, and a GPU hardware device 204) comprising:
a compiler configured to obtain a program including a plurality of code blocks (Goldman et al.: paragraph 27, The example GPU compiler 202 may include a graphics processor compiler that compiles source code such as the OAC 206 to object code based on a target instruction set architecture (ISA) for execution by a target GPU device such as the GPU hardware device 204), the compiler comprising a monitor associative code injector configured to insert one or more monitor associative codes into each of at least part of the plurality of code blocks (Goldman et al.: paragraph 29, the GPU driver 120 may be configured to reroute the OBC 208 to the binary instrumentation module 112 so that the binary instrumentation module 112 can instrument the OBC 208 for performance profiling by inserting machine instruction-level profiling instructions into the OBC 208 to generate the example IBC 124), and configured to compile the one or more monitor associative codes and the program in a profile mode (Goldman et al.: paragraph 27, the binary instrumentation module 112 may be implemented as part of the GPU compiler 202);
a plurality of cores (Goldman et al.: paragraph 53, The computing architecture 800 includes various common computing elements, such as one or more processors, multi-core processors) configured to perform the compiled program and the one or more monitor associative codes in a parallel manner (Goldman et al.: paragraph 2, A graphics processing unit (GPU) provides a parallel hardware environment, Goldman et al.: paragraph 123, activities described with respect to the methods identified herein can be executed in serial or parallel fashion) and configured to obtain a runtime performance when the GPU hardware device 204 executes the IBC 124, the IBC 124 causes the GPU hardware device 204 to perform the graphics operations programmed in the OBC 208 and also causes the GPU hardware device 204 to generate and collect profiling data based on the instrumented profiling instructions); and
a shared memory configured to store the obtained runtime performance characteristic of the program (Goldman et al.: paragraph 33, Based on the instrumented profiling instructions in the IBC 124, the GPU hardware device 204 stores the GPD 216 at one or more locations in memory 218 specified by the instrumented profiling instructions. For example, the instrumented profiling instructions may cause the GPU hardware device 204 to allocate memory space in the memory 218 at which to store the GPD 216),
wherein each of the plurality of code blocks includes a plurality of codes to be sequentially executed (Williams: paragraph 65, the code generator 300 can rewrite a shader program to generate indications of which blocks and/or branches of the code have been hit. In an example implementation, a block of code is defined as a section containing instructions that always execute sequentially), and wherein a first code block among the plurality of code blocks comprises a first code (Williams: paragraph 68, The user can insert the source code for a vertex shader program into section 354, and the source code for a fragment shader program into section 356), and the inserting comprises inserting [additional] code at a position after the first code (Williams: paragraph 69, The user can then define one or several test suite routines, such as test suite routines 360 and 362, test suite code is inserted after the vertex shader code and fragment shader code). Goldman et al. in combination with Williams disclose inserting monitor associative codes, as in claim 1. Goldman et al. in combination with Andre et al. disclose the code for requesting a resource shared by the at least two or more cores of the plurality of cores, as in claim 1.

Claims 4-6 is/are rejected under 35 U.S.C. 103 as being unpatentable over Goldman et al. (U.S. PGPUB 20190213706) in view of Williams (U.S. PGPUB 20130106880), Andre et al. (U.S. PGPUB 20080266302), and further in view of Kalaiah et al. (U.S. PGPUB 20080007559).
	With respect to claim 4, Goldman et al. as modified by Williams and Andre et al. disclose the computing system of claim 2, wherein a second code block among the plurality of code blocks comprises a second code for using the resource (Andre et al.: paragraph 121, The program code causes the enablement of embodiments of the present invention, including the following embodiments: (i) the functions of the systems and techniques disclosed herein (such as granting a GPU in a multi-GPU environment controlled access to a shared resource), granting access to the resource comprises using the resource). However, Goldman et al. as modified by Williams and Andre et al. do not expressly disclose the inserting comprises inserting a second monitor associative code at a starting point of the second code block. 
	Kalaiah et al., who also deal with shader code, discloses a method wherein the inserting comprises inserting a [second code] at a starting point of the second code block (paragraph 103, Effectively, the three tasks above consist of adding some predetermined code components at the beginning of the code as well as a string replacement code).
	Goldman et al., Williams, Andre et al., and Kalaiah et al. are in the same field of endeavor, namely computer graphics.
	Before the effective filing date of the claimed invention, it would have been obvious to apply the method wherein inserting comprises inserting a [second code] at a starting point of the second code block, as taught by Kalaiah et al., to insert a second monitor associative code at a starting point of the second code block of the Goldman et al. as modified by Williams and Andre et al. system, because this would ensure the custom code will be executed in advance.
	With respect to claim 5, Goldman et al. as modified by Williams and Andre et al. disclose the computing system of claim 4, wherein the obtaining the runtime performance characteristic comprises: requesting the resource by executing the first code (Andre et al.: paragraph 121, program code causes the enablement of embodiments of the present invention, including the following embodiments: (i) the functions of the systems and techniques disclosed herein (such as granting a GPU in a multi-GPU environment controlled access to a shared resource)); and in a time period until when the resource is available, obtaining a runtime performance characteristic of the first code block by executing the first monitor associative code (Goldman et al.: paragraph 33, when the GPU hardware device 204 executes the IBC 124, the IBC 124 causes the GPU hardware device 204 to perform the graphics operations programmed in the OBC 208 and also causes the GPU hardware device 204 to generate and collect profiling data based on the instrumented profiling instructions). 
the resulting instrumentation of the OBC 208 with the time-stamp start/stop read (or counter start/stop read) profiling instructions added at corresponding instruction insertion points can be used to measure an execution duration (e.g., in a time unit of measure or in GPU clock cycles) of the bounded code sequence inclusive of the machine instructions A and B), or the resource.

Claims 7-8 is/are rejected under 35 U.S.C. 103 as being unpatentable over Goldman et al. (U.S. PGPUB 20190213706) in view of Williams (U.S. PGPUB 20130106880), Andre et al. (U.S. PGPUB 20080266302), and further in view of Bienkowski et al. (U.S. Patent No. 9,064,052).
With respect to claim 7, Goldman et al. as modified by Williams and Andre et al. disclose the computing system of claim 2, wherein and the inserting comprises inserting a monitor associative code at an ending point of the code block (Williams: paragraph 69, The user can then define one or several test suite routines, such as test suite routines 360 and 362, test suite code is inserted after the vertex shader code and fragment shader code, see Goldman et al. regarding inserting monitor associative code). However, Goldman et al. as modified by Williams and Andre et al. do not expressly disclose a third code block among the plurality of code blocks comprises a 
Bienkowski et al., who also deal with modifying code, disclose a method wherein a third code block among the plurality of code blocks comprises a conditional statement and a code for performing an operation when a condition of the conditional statement is satisfied (column 10, lines 13-18, a third block of code (e.g., a conditional IF statement with a result shown as FuncX Wins!), as shown by reference number 504).
Goldman et al., Williams, Andre et al., and Bienkowski et al. are in the same field of endeavor, namely computer graphics.
Before the effective filing date of the claimed invention, it would have been obvious to apply the method wherein a third code block among the plurality of code blocks comprises a conditional statement and a code for performing an operation when a condition of the conditional statement is satisfied, as taught by Bienkowski et al., to the Goldman et al. as modified by Williams and Andre et al. system for inserting a third monitor associative code at an ending point of the third code block, because this would control a manner in which intermediate results are provided for display (column 7, lines 56-62 of Bienkowski et al.).
	With respect to claim 8, Goldman et al. as modified by Williams, Andre et al., and Bienkowski et al. disclose the computing system of claim 7, wherein the obtaining the runtime performance characteristic comprises obtaining a runtime performance characteristic of the third code block by executing the third monitor associative code (Goldman et al.: paragraph 33, when the GPU hardware device 204 executes the IBC 124, the IBC 124 causes the GPU hardware device 204 to perform the graphics operations programmed in the OBC 208 and also causes the GPU hardware device 204 to generate and collect profiling data based on the instrumented profiling instructions), and the runtime performance characteristic of the third code block comprises information about at least one of a shader program number, a time stamp generated based on a performance time of the third code block (Goldman et al.: paragraph 31, the resulting instrumentation of the OBC 208 with the time-stamp start/stop read (or counter start/stop read) profiling instructions added at corresponding instruction insertion points can be used to measure an execution duration (e.g., in a time unit of measure or in GPU clock cycles) of the bounded code sequence inclusive of the machine instructions A and B), or satisfaction of the conditional statement.

Claims 9-10 is/are rejected under 35 U.S.C. 103 as being unpatentable over Goldman et al. (U.S. PGPUB 20190213706) in view of Williams (U.S. PGPUB 20130106880), Andre et al. (U.S. PGPUB 20080266302), Bienkowski et al. (U.S. Patent No. 9,064,052), and further in view of Kalaiah et al. (U.S. PGPUB 20080007559).
	With respect to claim 9, Goldman et al. as modified by Williams, Andre et al., Bienkowski et al. and Kalaiah et al. disclose the computing system of claim 7, wherein a fourth code block among the plurality of code blocks comprises a code for performing an operation when the condition of the conditional statement is not satisfied (Bienkowski et al.: column 10, lines 13-18, a third block of code (e.g., a conditional IF statement with a result shown as FuncX Wins!), as shown by reference number 504), fourth code block is conditional ELSE when IF statement is not satisfied), and the inserting comprises inserting a fourth monitor associative code at a starting time of the fourth code block Effectively, the three tasks above consist of adding some predetermined code components at the beginning of the code as well as a string replacement code). Goldman et al. in combination with Kalaiah et al. disclose inserting monitor associative code at a starting time, as in claim 4.
	With respect to claim 10, Goldman et al. as modified by Williams, Andre et al., Bienkowski et al., and Kalaiah et al. disclose the computing system of claim 9, wherein the obtaining the runtime performance characteristic comprises obtaining a runtime performance characteristic of the fourth code block by executing the fourth monitor associative code (Goldman et al.: paragraph 33, when the GPU hardware device 204 executes the IBC 124, the IBC 124 causes the GPU hardware device 204 to perform the graphics operations programmed in the OBC 208 and also causes the GPU hardware device 204 to generate and collect profiling data based on the instrumented profiling instructions),  and the runtime performance characteristic of the fourth code block comprises information about at least one of a shader program number, a time stamp generated based on a performance time of the fourth code block (Goldman et al.: paragraph 31, the resulting instrumentation of the OBC 208 with the time-stamp start/stop read (or counter start/stop read) profiling instructions added at corresponding instruction insertion points can be used to measure an execution duration (e.g., in a time unit of measure or in GPU clock cycles) of the bounded code sequence inclusive of the machine instructions A and B), and non-satisfaction of the conditional statement.
Response to Arguments
Applicant’s arguments with respect to claim(s) 7 have been considered but are moot in view of the new ground(s) of rejection.
December 22, 2020 have been fully considered but they are not persuasive. Applicant argues that Williams does not disclose inserting comprises inserting a first monitor associative code at a position after the first code in that Williams does not mention “inserting a first monitor associative code at a position after the first code” and that Williams is silent as to determining an insertion position of the instructions among a plurality of codes in a code block (bottom of page 10 to top of page 11 of remarks). However, Williams discloses inserting instructions after the first code (paragraph 69, The user can then define one or several test suite routines, such as test suite routines 360 and 362, test suite code is inserted after the vertex shader code and fragment shader code). By defining the test suites, this comprises inserting code (see Fig. 6, test suite 360), which is positioned after the first code (vertex shader code and fragment shader code in Fig. 6). Furthermore, the claim does not require determining an exact insertion position of instructions; instead, the claim recites inserting a first monitor associative code at a position after a first code. Williams meets this limitation by defining additional code after the first code block.
Applicant argues that Andre does not teach inserting a first monitor associative code at a certain position, the runtime performance characteristic of the shader program itself, nor, obtaining a runtime performance characteristic of the shader program, and concludes Andre does not disclose obtaining a runtime performance characteristic of the shader program by executing the compiled shader program and the one or more monitor associative codes (page 11). However, Andre is not cited to teach: obtaining a runtime performance characteristic of the shader program by executing the compiled shader program and the one or more monitor associative codes; instead, Andre program code causes the enablement of embodiments of the present invention, including the following embodiments: (i) the functions of the systems and techniques disclosed herein (such as granting a GPU in a multi-GPU environment controlled access to a shared resource); functions accomplished and/or structure provided by the systems and techniques described above can be represented in a core (such as a GPU core), thus requesting a resource among the plurality of cores in a multi-GPU environment).
	Applicant argues that none of Williams and Goldman teaches the specific features of inserting a first monitor associative code at a position after the first code for requesting a resource shared by at least two or more cores and obtaining a runtime performance characteristic of the first code block by executing the first monitor associative code in a time period until when the resource requested by executing the first code is available in that Goldman does not disclose anything about obtaining a runtime performance characteristic of the first code block, which is for requesting a resource shared by at least two or more cores of the plurality of cores, by executing the first monitor associative code inserted at a position after the first code in a time period until when the requested resource is available (page 12). However, the claim does not explicitly limit when to perform obtaining a performance characteristic of a first code block by executing the first monitor associative code to terminate the process when the resource is available. Furthermore, the obtaining a performance characteristic of a first code block could continue to be performed if the condition “when the resource is available” is not met.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
U.S. Patent No. 8,296,738 to Kiel et al. for a method of modifying shader code for performance
U.S. PGPUB 20180165200 to Sethuraman for a method of sharing resources among a plurality of cores.
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to ANDREW GUS YANG whose telephone number is (571)272-5514.  The examiner can normally be reached on M-F 9 AM - 5:30 PM.

If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Mark Zimmerman can be reached on (571)272-7653.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/ANDREW G YANG/Primary Examiner, Art Unit 2619                                                                                                                                                                                                        
2/18/21