DETAILED ACTION
Claims 1-20 are pending.
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 1, 6, 7, 9, 14, 15, and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Diamond (US Patent No. US 9,087,161 B1) in further view of Gordon et al. (US Patent No. US 10,102,015 B1).

Regarding claim 1, Diamond teaches the invention substantially as claimed including a method of assigning tasks to dedicated processing resources (Col. 13, lines 22-34: the primitive allocation unit 2406 divides the graphics application workload into a first GPU workload for the GPU 2421 and a second GPU workload for the GPU 2422…The GPU 2421 and the GPU 2422 both execute their respective allocated GPU workloads.), comprising: 
obtaining hardware information of a plurality of dedicated processing resources, the plurality of dedicated processing resources comprising a first dedicated processing resource and a second dedicated processing resource, and the hardware information comprising first hardware information of the first dedicated processing resource and second hardware information of the second dedicated processing resource (Col. 13, lines 7-14: system configuration information 2403 regarding the capabilities of the plurality of GPUs coupled to the computer system. Based upon the profiling and the capabilities of the GPUs (reasonably teaches “obtaining”), a front end processor 2405 functions with a graphics primitive allocation unit 2406 to divide the workload from the graphics application into portions appropriate for the capabilities of the coupled GPUs; Col. 11, line 66 through Col. 12, line 5: The GPUs 2201-2203 are asymmetric, meaning that their rendering capabilities and/or rendering power is not equal. For example, in the present exemplary embodiment, the embedded GPU 2201 is a two pipeline GPU, the card based GPU 2202 (e.g., add-in AGP card or PCI Express card) is a four pipeline GPU, and the DGS 2203 hosts a 16 pipeline GPU.); 
generating a first task based on the first hardware information and a second task based on the second hardware information (Col. 12, lines 19-23: The graphics application workload from the graphics application can be divided as appropriate to fit the capabilities of each of the GPUs 2201-2203. Thus for example, the more powerful GPU 2203 would be allocated a higher graphics processing workload then the GPU 2201; Col. 13, lines 9-31: Based upon the profiling and the capabilities of the GPUs, a front end processor 2405 functions with a graphics primitive allocation unit 2406 to divide (generates) the workload from the graphics application into portions appropriate for the capabilities of the coupled GPUs. The dividing of the graphics workload from the graphics application is done in order to make the best use of the graphics hardware/software of the plurality of coupled GPUs. As described above, the GPUs can be asymmetric in their capability. The objective would be to make the most efficient use of the divides the graphics application workload into a first GPU workload for the GPU 2421 and a second GPU workload for the GPU 2422.); and 
allocating the first task to the first dedicated processing resource and the second task to the second dedicated processing resource (Col. 13, lines 25-38: In one embodiment, the primitive allocation unit 2406 allocates a larger number of primitives comprising the 3-D scene to be rendered to the more powerful GPU 2422. In one embodiment, the primitive allocation unit 2406 allocates larger, more complex blocks comprising the 3-D scene to be rendered to the more powerful GPU 2422. These blocks can include more complex textures, for example. The GPU 2421 and the GPU 2422 both execute their respective allocated GPU workloads.).

	Diamond does not expressly disclose a step for obtaining hardware information and allocating… the second task to the second dedicated processing resource; and
the first task and second task being generated by utilizing and first compilation rule and a second compilation rule based on the first hardware information and the second hardware information, respectively.

	However, Diamond does teach:
-Col. 13, lines 9-14: “Based upon the profiling and the capabilities of the GPUs, a front end processor 2405 functions with a graphics primitive allocation unit 2406 to divide the workload from the graphics application into portions appropriate for the capabilities of the coupled GPUs.”



It would have been obvious to one of ordinary skill in the art to understand Diamond teachings to encompass the claimed invention for at least the following reasons: 1. In order to determine how to divide the workload for different GPUs, the GPUs capability has to be obtained. Therefore, although Diamond does not explicitly disclose an obtaining step, since the division of the workload is based on the application profile and  the GPUs capabilities, Diamond necessarily obtains/determines hardware information of the GPUs. And 2. While Diamond does teach allocating a portion of the workload to a first high performing GPU, it does not explicitly disclose a step for allocating the second portion of the workload to the second GPU. However, as shown in the citation above the step of allocating the second task is implied, as Diamond states “The GPU 2421 and the GPU 2422 both execute their respective allocated GPU workloads.” As such, Diamond reasonably teaches the limitation obtaining hardware information and allocating… the second task to the second dedicated processing resource.

	While Diamond does generate first and second task to each of the hetergeneous GPU elements to fit the capabilities of each of the GPUs, Diamond does not expressly teach wherein the first task and second task being generated by utilizing and first compilation rule and a second compilation rule based on the first hardware information and the second hardware information, respectively.

the first task and second task being generated by utilizing and first compilation rule and a second compilation rule based on the first hardware information and the second hardware information, respectively (Col. 2, lines 26-53: In one example, the first GPU 18A of the first computing device 12 is architecturally distinct from the second GPU 18B of the second computing device 14. As shown in FIG. 1, the first GPU 18A has a first instruction set architecture (ISA) 22A and a first application binary interface (ABI) 24A, while the second GPU 18B has a second ISA 22B and a second ABI different from the first ISA 22A and first ABI 24A of the first GPU 18A. Due to architectural differences between the first GPU 18A and the second GPU 18B, application programs configured to be executed using the first processor 16A and first GPU 18A may not be successfully executed using the second processor 16B and second GPU 18B, and vice versa. For example, a compiled binary of an application program 26 (e.g., first task) may utilize GPU-executed programs configured to be executed on the first GPU 18A having the first ISA 22A and the first ABI 24A. Thus, as the compiled binary of the application e.g., second task)).

It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Gordon with the teachings of Diamond to compile based on rules associated with the instruction set architecture of the heterogeneous GPUs to utilize rules to translate the tasks for them to be compatible with each GPU. The modification would have been motivated by the desire of maximizing the utilization of different versions of GPU in a system (See at least Gordon’s Col. 1 line 63 through Col. 2, line 10). 

Regarding claim 6, Diamond teaches further comprising: 
receiving, from the first dedicated processing resource, a first result for the first task, receiving, from the second dedicated processing resource, a second result for the second task and combining the first result and the second result (Col. 10, lines 1-14: The frame synchronization master 1301 functions by synchronizing the rendered 3D graphics frames produced by the respective GPUs 901-904. The output of the respective GPUs 901-904 are combined by the output multiplexer 1302 to produce a resulting GPU output stream 1330.).

Regarding claim 7, Diamond teaches wherein the plurality of dedicated processing resources further comprises a third dedicated processing resource of the same type as the second dedicated processing resource, wherein the assigning the second task to the second dedicated processing resource comprises: allocating the second task to the second dedicated processing resource and the third dedicated processing resource (Col. 11, line 63 through Col. 12, line 8: The GPUs 2201-2203 are asymmetric, meaning that their rendering capabilities and/or rendering power is not equal. For example, in the present exemplary embodiment, the all three are resources of the same type, GPUs; Col. 12, lines 10-28: The control unit 2204 allocates the graphics processing workload among GPUs 2201-2203 such that the total available hardware is utilized as efficiently as possible. This includes, for example, keeping all available pipelines busy executing graphics instructions… The graphics application workload from the graphics application can be divided as appropriate to fit the capabilities of each of the GPUs 2201-2203; Col. 10, lines 46-47: Graphics processing workload can be allocated among available GPUs such that the workload is executed parallel).

Regarding claim 9, it is a system type claim having the same limitations as claim 1 above. Therefore, it is rejected under the same rationale as of claim 1 above. Further, the additional limitations a processing unit; and memory coupled to the processing unit and storing instructions thereon, the instructions when executed by the processing unit, executing the acts are taught by Diamond in Col. 5, lines 54-61: “certain processes and steps of the present invention are realized, in one embodiment, as a series of instructions (e.g., software program) that reside within computer readable memory (e.g., system memory 102) of a computer system (e.g., system 100) and are executed by the CPU 101 and DGS 110 of system 100. When executed, the instructions cause the computer system 100 to implement the functionality of the present invention as described”

As per claim 14, it is a system type claim having similar limitations as of claim 6 above. Therefore, it is rejected under the same rationale as of claim 6 above.

As per claim 15, it is a system type claim having similar limitations as of claim 7 above. Therefore, it is rejected under the same rationale as of claim 7 above.

Regarding claim 17, it is a media/product type claim having the same limitations as claim 1 above. Therefore, it is rejected under the same rationale as of claim 1 above.

Claims 2, 10, and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Diamond and Gordon, as applied to claim 1, in further view of, Balci et al. (US PGPUB US 2018/0165788 A1).

Regarding claim 2, Diamond teaches wherein the obtaining hardware information of a plurality of dedicated processing resources comprises: 
obtaining high-performance computing tasks (Col. 7, lines 33-35: when high-performance 3D rendering is desired (e.g., for a high fidelity real-time 3D rendering application) ; 
allocating the plurality of dedicated processing resources to the high-performance computing tasks (Col. 13, lines 22-32: the primitive allocation unit 2406 divides the graphics application workload into a first GPU workload for the GPU 2421 and a second GPU workload for the GPU 2422. In one embodiment, the primitive allocation unit 2406 allocates a larger number of primitives comprising the 3-D scene to be rendered to the more powerful GPU 2422. In one embodiment, the primitive allocation unit 2406 allocates larger, more complex blocks ; and 
obtaining hardware information of each of the plurality of dedicated processing resources (Col. 13, lines 7-14: system configuration information 2403 regarding the capabilities of the plurality of GPUs coupled to the computer system. Based upon the profiling and the capabilities of the GPUs (reasonably teaches “obtaining”), a front end processor 2405 functions with a graphics primitive allocation unit 2406 to divide the workload from the graphics application into portions appropriate for the capabilities of the coupled GPUs; Col. 11, line 66 through Col. 12, line 5: The GPUs 2201-2203 are asymmetric, meaning that their rendering capabilities and/or rendering power is not equal. For example, in the present exemplary embodiment, the embedded GPU 2201 is a two pipeline GPU, the card based GPU 2202 (e.g., add-in AGP card or PCI Express card) is a four pipeline GPU, and the DGS 2203 hosts a 16 pipeline GPU.).

	Diamond and Gordon do not explicitly disclose wherein the obtaining hardware information of each of the plurality of dedicated processing resources is performed in real time.

	However, Balci teaches obtaining hardware information of each of the plurality of dedicated processing resources is performed in real time (¶ [0024]: [0024] The techniques of this disclosure also allow for finer-grained, real-time GPU state information (e.g., heuristics) that the GPU may use to determine the rendering mode for a scene; ¶ [0091]: In another example, the heuristics may include determining hardware capabilities of GPU 12 such that GPUs having different hardware capabilities may render scenes differently. Additionally, the determined .

	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Balci with the teachings of Diamond and Gordon to determine GPU capability at real-time. The modification would have been motivated by the desire of ensuring capability/statistic data is accurate and up to date. See at least Balci’s ¶ [0024].

As per claim 10, it is a system type claim having similar limitations as of claim 2 above. Therefore, it is rejected under the same rationale as of claim 2 above.

As per claim 18, it is a media/product type claim having similar limitations as of claim 2 above. Therefore, it is rejected under the same rationale as of claim 2 above

Claims 3, 11, and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Diamond and Gordon, as applied to claim 1, in further view of, Parke et al. (US PGPUB US 2018/0357746 A1).
 
Regarding claim 3, Diamond and Gordon do not expressly disclose wherein the obtaining hardware information of a plurality of dedicated processing resources comprises: 
querying hardware information of each of the plurality of dedicated processing resources periodically; 
storing the queried hardware information into a database, the hardware information comprising an identifier, a type, and a performance parameter of each dedicated processing resource; and 
obtaining, from the database, the hardware information of the plurality of dedicated processing resources.

However, Parke teaches wherein the obtaining hardware information of a plurality of dedicated processing resources comprises: 
querying hardware information of each of the plurality of dedicated processing resources periodically (¶ [0004]: GPU features and capabilities may be determined via one or more query functions to determine capabilities iteratively); 
storing the queried hardware information into a database, the hardware information comprising an identifier (Fig. 7, Configuration ID; ¶ [0246]: As an example, a ConfigurationID may be a 64-bit value that consists of four 16-bit components: PCI Vendor ID, PCI Device ID, PCI Revision ID, and a vendor-supplied ID. Providing the PCI Vendor ID, PCI Device ID, PCI Revision ID, and vendor-supplied ID in this fashion allows a software vendor to quickly perform the appropriate lookups), a type (¶ [0004]: GPU corresponds to type), and a performance parameter of each dedicated processing resource ([0013] FIG. 7 shows, in table form, examples of configuration identifiers for the same hardware or different devices according to one or more embodiments; ¶ [0243]: Third party vendors typically provide hardware specifications to software vendors that detail the hardware feature set for their devices. These specifications provide the software vendor with the details required to simulate hardware functionality through other means (for example, shaders). With the introduction of the ; and 
obtaining, from the database, the hardware information of the plurality of dedicated processing resources (¶ [0004]: GPU features and capabilities may be determined via one or more query functions to determine capabilities iteratively or by using a look up table containing configuration identifiers tied to specific capabilities of a GPU.).

It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Parke with the teachings of Diamond and Gordon to have a data table storing information regarding GPUs. The modification would have been motivated by the desire of quickly performing the appropriate lookups and meet the needs of both the software vendor and the vendor's choice of available functionality (¶ [0246]).

As per claim 11, it is a system type claim having similar limitations as of claim 3 above. Therefore, it is rejected under the same rationale as of claim 3 above.

As per claim 19, it is a media/product type claim having similar limitations as of claim 3 above. Therefore, it is rejected under the same rationale as of claim 3 above

Claims 4, 12, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Diamond and Gordon, as applied to claim 1, in further view of McGrane et al. (US PGPUB US 2011/0022870)

Regarding claim 4, Diamond teaches generating a first task based on the first hardware information and a second task based on the second hardware information (Col. 12, lines 19-23: The graphics application workload from the graphics application can be divided as appropriate to fit the capabilities of each of the GPUs 2201-2203. Thus for example, the more powerful GPU 2203 would be allocated a higher graphics processing workload then the GPU 2201; Col. 13, lines 9-31: Based upon the profiling and the capabilities of the GPUs, a front end processor 2405 functions with a graphics primitive allocation unit 2406 to divide the workload from the graphics application into portions appropriate for the capabilities of the coupled GPUs. The dividing of the graphics workload from the graphics application is done in order to make the best use of the graphics hardware/software of the plurality of coupled GPUs. As described above, the GPUs can be asymmetric in their capability. The objective would be to make the most efficient use of the processing hardware to yield the highest overall system performance. In the FIG. 24 embodiment, the primitive allocation unit 2406 divides the graphics application workload into a first GPU workload for the GPU 2421 and a second GPU workload for the GPU 2422.).

	Diamond does not expressly teach further comprises: 
determining the first compilation rule and the second compilation rule based on the first hardware information and the second hardware information respectively, the first hardware information indicating that the first dedicated processing resource enables a specific function, and the second hardware information indicating the second dedicated processing resource disables the specific function; and 
generating the first task and the second task using the first compilation rule and the second compilation rule respectively.

However, Gordon teaches further comprises: 
determining the first compilation rule and the second compilation rule based on the first hardware information and the second hardware information respectively and generating the first task and the second task using the first compilation rule and the second compilation rule respectively (Col. 2, lines 26-53: In one example, the first GPU 18A of the first computing device 12 is architecturally distinct from the second GPU 18B of the second computing device 14. As shown in FIG. 1, the first GPU 18A has a first instruction set architecture (ISA) 22A and a first application binary interface (ABI) 24A, while the second GPU 18B has a second ISA 22B and a second ABI different from the first ISA 22A and first ABI 24A of the first GPU 18A. Due to architectural differences between the first GPU 18A and the second GPU 18B, application programs configured to be executed using the first processor 16A and first GPU 18A may not be successfully executed using the second processor 16B and second GPU 18B, and vice versa. For example, a compiled binary of an application program 26 may utilize GPU-executed programs configured to be executed on the first GPU 18A having the first ISA 22A and the first ABI 24A. Thus, as the compiled binary of the application program 26 was configured for the specific architecture of the processor 16A and GPU 18A of the first computing device 12, the application program 26 may be run natively on the first computing .

It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Gordon with the teachings of Diamond to compile based on rules associated with the instruction set architecture of the heterogeneous GPUs to utilize rules to translate the tasks for them to be compatible with each GPU. The modification would have been motivated by the desire of maximizing the utilization of different versions of GPU in a system (See at least Gordon’s Col. 1 line 63 through Col. 2, line 10). 

While Gordon discusses compilation rules with hardware information and deciding how to compile and allocate the code to different GPUs of different versions and different (ISAs), neither Diamond nor Gordon expressly teach the first hardware information indicating that the first dedicated processing resource enables a specific function, and the second hardware information indicating the second dedicated processing resource disables the specific function.

However, McGrane teaches the first hardware information indicating that the first dedicated processing resource enables a specific function, and the second hardware information indicating the second dedicated processing resource disables the specific function (¶ [0042]: The system configuration may be the environment in which the workload is .

It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of McGrane with the teachings of Diamond and Gordon to further describe the hardware information to include enabled and disabled features. The modification would have been motivated by the desire of optimizing workload execution in different system configurations. See at least McGrane’s Abstract.

As per claim 12, it is a system type claim having similar limitations as of claim 4 above. Therefore, it is rejected under the same rationale as of claim 4 above.

As per claim 20, it is a media/product type claim having similar limitations as of claim 4 above. Therefore, it is rejected under the same rationale as of claim 4 above.

Claims 5 and 13 are rejected under 35 U.S.C. 103 as being unpatentable over Diamond, Gordon, and McGrane, as applied to claims 4 and 12, in further view of, Krishnamurthy et al. (US PGPUB US 2009/0217275 A1).

Regarding claim 5, Diamond teaches generating a first task based on a first hardware information and a second task based on a second hardware information (Col. 12, lines 19-23: The graphics application workload from the graphics application can be divided as appropriate to fit the capabilities of each of the GPUs 2201-2203. Thus for example, the more powerful GPU 2203 would be allocated a higher graphics processing workload then the GPU 2201; Col. 13, lines 9-31: Based upon the profiling and the capabilities of the GPUs, a front end processor 2405 functions with a graphics primitive allocation unit 2406 to divide the workload from the graphics application into portions appropriate for the capabilities of the coupled GPUs. The dividing of the graphics workload from the graphics application is done in order to make the best use of the graphics hardware/software of the plurality of coupled GPUs. As described above, the GPUs can be asymmetric in their capability. The objective would be to make the most efficient use of the processing hardware to yield the highest overall system performance. In the FIG. 24 embodiment, the primitive allocation unit 2406 divides the graphics application workload into a first GPU workload for the GPU 2421 and a second GPU workload for the GPU 2422.). 

Diamond, Gordon, and McGrane do not expressly teach further comprises: 
associating the first task with a first identifier of the first dedicated processing resource; and 
associating the second task with a second identifier of the second dedicated processing resource, each identifier comprising an Internet Protocol (IP) address and a local identification of each dedicated processing resource.

However, Krishnamurthy further comprises: 
associating the first task with a first identifier of the first dedicated processing resource and associating the second task with a second identifier of the second dedicated processing resource (¶ [0032]: Upon association, work requests or tasks may be assigned to IP addresses corresponding to the mapped accelerators.), each identifier comprising an Internet Protocol (IP) address and a local identification of each dedicated processing resource ([0008]: The method includes associating hardware addresses to at least one processing unit (PU) or at least one logical partition (LPAR) of the computing system, receiving a work request for an associated hardware accelerator address, and queuing the work request on the PU or in a hardware accelerator using the associated hardware accelerator address; ¶ [0032]: Upon association, work requests or tasks may be assigned to IP addresses corresponding to the mapped accelerators.).

It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Krishnamurthy with the teachings of Diamond, Gordon, and McGrane to after portioning tasks associated them with the addresses of the accelerators to ensure the tasks are properly routed. The modification would have been motivated by the desire of ensuring proper task scheduling.

As per claim 13, it is a system type claim having similar limitations as of claim 5 above. Therefore, it is rejected under the same rationale as of claim 5 above.

Claims 8 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Diamond and Gordon, as applied to claim 1, in further view of, Gupta et al. (US PGPUB US 2018/0341525 A1).

Regarding claim 8, Diamond teaches wherein the dedicated processing resources are graphics processing units (GPUs) (Background: Graphics Processing Units (GPUs) are specialized integrated circuit devices that are commonly used in graphics systems to accelerate the performance of a 3D rendering application), the method further comprising: 
performing the deep learning tasks on the first dedicated processing resource and the second dedicated processing resource respectively (Col. 13, lines 33-38: The GPU 2421 and the GPU 2422 both execute their respective allocated GPU workloads.).

	Diamond and Gordon do not expressly disclose wherein the first task and the second task are deep learning tasks; and 
performing deep learning tasks in graphics processing units (GPUs).

	However, Gupta teaches wherein the first task and the second task are deep learning tasks (¶ [0070]: The deep learning model can be configured to assign each of the tasks in the received job to one of the CPU 103 or one of the GPU 104 in the worker node 102. Whether the ; and 
performing deep learning tasks in graphics processing units (GPUs) (¶ [0072]: a neural network is in electronic communication with the GPU 104. For example, the neural network can be deployed on GPU 104 or a CPU to execute the deep learning model for the corresponding image processing task; Claim 9).

It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Gupta with the teachings of Diamond and Gordon to further utilize GPU for deep learning tasks. The modification would have been motivated by the desire of utilize GPUs to meet computing needs and achieve maximum throughput.

As per claim 16, it is a system type claim having similar limitations as of claim 8 above. Therefore, it is rejected under the same rationale as of claim 8 above.

Response to Arguments
Applicant’s arguments with respect to claims 1-20 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.




Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Auerback et al. (US PGPUB US 2013/0036409A1) TECHNIQUE FOR COMPILING AND RUNNING HIGH-LEVEL PROGRAMS ON HETEROGENEOUS COMPUTERS. See at least ¶¶ [0005] and [0020] “An example hybrid computer 104 shown includes a number of different types of processors, a CPU 126, a GPU 128, a FPGA 130, an XML 132 and another application specific integrated circuit (ASIC) 134. A host compiler 106 may generate an executable code or entity 116 that is compatible to run on a CPU 104. In addition, a GPU compiler 108 may operate on the same source code 102 and generate an executable entity or code 118 compatible to run on the GPU 128; a FPGA compiler 110 may operate on the same source code 102 and generate an executable entity or code 120 compatible to run on the FPGA 130; an XML compiler 112 may operate on the same source code 102 and generate an executable entity or code 122 compatible to run on the XML processor 132; another compiler 114 may operate on the same source code 102 and generate an executable entity or code 124 compatible to run on another special processor or application specific integrated circuit (ASIC).”.
Pechanec et al. (US PGPUB US 2015/0199787 A1) DISTRIBUTE WORKLOAD OF AN APPLICATION TO A GRAPHICS PROCESSING UNIT. See at least ¶¶ [0043-45] “A compiler that is specific to a GPU is dependent on the architecture of the GPU. For example, GPUs that are provided by different vendors may have different instruction sets and different compilers may be specific to each of the GPUs. In an example, node 110 is a JVM and GPU 128 is based on an ARM instruction set architecture that enables the JVM to execute Java bytecode.”
Hervas et al. (US PGPUB US 2010/0329564 A1) Automatic Generation and Use of Region of Interest and Domain of Definition Functions. See at least ¶ [0220] “In addition, some embodiments may separately analyze multiple variants of GPU code corresponding to the different code types generated by the code translation engine. This may be done for each code type because different GPUs may have different capabilities, and thus a particular GPU may use different resources to implement a particular set of operations than another GPU. Thus, as in the example of FIG. 12, each GPU processing code variant 1260 may be analyzed separately in order to generate its corresponding fingerprint 1265.”
Applicant's amendment necessitated the new grounds of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 

Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Meng-Ai T An can be reached on (571)-272-3756.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/JORGE A CHU JOY-DAVILA/Primary Examiner, Art Unit 2195