DETAILED ACTION
Claims 1-20 are pending.
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 1, 6, 7, 9, 14, 15, and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Diamond (US Patent No. US 9,087,161 B1) in view of Gordon et al. (US Patent No. US 10,102,015 B1), in further view of Darrington et al. (US PGPUB US 2010/0095152 A1).

Regarding claim 1, Diamond teaches the invention substantially as claimed including a method of assigning tasks to dedicated processing resources (Col. 13, lines 22-34: the primitive allocation unit 2406 divides the graphics application workload into a first GPU workload for the GPU 2421 and a second GPU workload for the GPU 2422…The GPU 2421 and the GPU 2422 both execute their respective allocated GPU workloads.), comprising: 
 hardware information of a plurality of dedicated processing resources, the plurality of dedicated processing resources comprising a first dedicated processing resource and a second dedicated processing resource, and the hardware information comprising first hardware information of the first dedicated processing resource and second hardware information of the second dedicated processing resource (Col. 13, lines 7-14: system configuration information 2403 regarding the capabilities of the plurality of GPUs coupled to the computer system. Based upon the profiling and the capabilities of the GPUs (reasonably teaches “obtaining”), a front end processor 2405 functions with a graphics primitive allocation unit 2406 to divide the workload from the graphics application into portions appropriate for the capabilities of the coupled GPUs; Col. 11, line 66 through Col. 12, line 5: The GPUs 2201-2203 are asymmetric, meaning that their rendering capabilities and/or rendering power is not equal. For example, in the present exemplary embodiment, the embedded GPU 2201 is a two pipeline GPU, the card based GPU 2202 (e.g., add-in AGP card or PCI Express card) is a four pipeline GPU, and the DGS 2203 hosts a 16 pipeline GPU.); 
generating a first task based on the first hardware information and a second task based on the second hardware information (Col. 12, lines 19-23: The graphics application workload from the graphics application can be divided as appropriate to fit the capabilities of each of the GPUs 2201-2203. Thus for example, the more powerful GPU 2203 would be allocated a higher graphics processing workload then the GPU 2201; Col. 13, lines 9-31: Based upon the profiling and the capabilities of the GPUs, a front end processor 2405 functions with a graphics primitive allocation unit 2406 to divide (generates) the workload from the graphics application into portions appropriate for the capabilities of the coupled GPUs. The dividing of the graphics workload from the graphics application is done in order to make the best use of the divides the graphics application workload into a first GPU workload for the GPU 2421 and a second GPU workload for the GPU 2422.); and 
allocating the first task to the first dedicated processing resource and the second task to the second dedicated processing resource (Col. 13, lines 25-38: In one embodiment, the primitive allocation unit 2406 allocates a larger number of primitives comprising the 3-D scene to be rendered to the more powerful GPU 2422. In one embodiment, the primitive allocation unit 2406 allocates larger, more complex blocks comprising the 3-D scene to be rendered to the more powerful GPU 2422. These blocks can include more complex textures, for example. The GPU 2421 and the GPU 2422 both execute their respective allocated GPU workloads.).

	Diamond does not expressly disclose a step for obtaining hardware information and allocating… the second task to the second dedicated processing resource; and
the first task and second task being generated by utilizing and first compilation rule and a second compilation rule based on the first hardware information and the second hardware information, respectively, wherein the first compilation rule is used to compile kernel codes associated with different hardware capabilities to be executed on a first type of a dedicated processing resource, and the second compilation rule is used to compile kernel codes associated with different hardware capabilities to be executed on a second type of dedicated processing resource.


-Col. 13, lines 9-14: “Based upon the profiling and the capabilities of the GPUs, a front end processor 2405 functions with a graphics primitive allocation unit 2406 to divide the workload from the graphics application into portions appropriate for the capabilities of the coupled GPUs.”
-Col. 13, lines 33-34: “The GPU 2421 and the GPU 2422 both execute their respective allocated GPU workloads.”

It would have been obvious to one of ordinary skill in the art to understand Diamond teachings to encompass the claimed invention for at least the following reasons: 1. In order to determine how to divide the workload for different GPUs, the GPUs capability has to be obtained. Therefore, although Diamond does not explicitly disclose an obtaining step, since the division of the workload is based on the application profile and  the GPUs capabilities, Diamond necessarily obtains/determines hardware information of the GPUs. And 2. While Diamond does teach allocating a portion of the workload to a first high performing GPU, it does not explicitly disclose a step for allocating the second portion of the workload to the second GPU. However, as shown in the citation above the step of allocating the second task is implied, as Diamond states “The GPU 2421 and the GPU 2422 both execute their respective allocated GPU workloads.” As such, Diamond reasonably teaches the limitation obtaining hardware information and allocating… the second task to the second dedicated processing resource.

	While Diamond does generate first and second task to each of the heterogeneous GPU elements to fit the capabilities of each of the GPUs, Diamond does not expressly teach wherein the first task and second task being generated by utilizing and first compilation rule and a second compilation rule based on the first hardware information and the second hardware information, respectively, wherein the first compilation rule is used to compile kernel codes associated with different hardware capabilities to be executed on a first type of a dedicated processing resource, and the second compilation rule is used to compile kernel codes associated with different hardware capabilities to be executed on a second type of dedicated processing resource.

However, Gordon teaches “different graphics processing unit architectures may have different instruction set architectures, application binary interfaces, and memory environments. For example, graphics processing units from different manufacturers, or different generations of graphics processing unit technologies from a single manufacturer may have binary incompatible architectures. These differences may cause a program that was specified for one graphics processing unit architecture to be binary incompatible with another graphics processing unit architecture. The systems and methods described herein perform a just-in-time cross compilation of graphics processing unit executed programs by performing instruction set architecture and application binary interface translation in O(N) complexity to the number of instructions in the program.” (Col. 1, line 63 through Col. 2, line 10). Further, Gordon teaches the first task and second task being generated by utilizing and first compilation rule and a second compilation rule based on the first hardware information and the second hardware information, respectively (Col. 2, lines 26-53: In one example, the first GPU 18A of the first computing device 12 is architecturally distinct from the second GPU 18B of the second computing device 14. As shown in FIG. 1, the first GPU 18A has a first instruction set e.g., first task) may utilize GPU-executed programs configured to be executed on the first GPU 18A having the first ISA 22A and the first ABI 24A. Thus, as the compiled binary of the application program 26 was configured for the specific architecture of the processor 16A and GPU 18A of the first computing device 12, the application program 26 may be run natively on the first computing device 12 without needing modifications. However, the same compiled binary of the application program 26 is not binary compatible with the second ISA 22B and second ABI 24B of the second GPU 18B of the second computing device 14. Thus, the application program 26 will not successfully be executed on the second computing device 14 without modification; Col. 7, line 12 through Col. 8, line 27; Col. 17, lines 24-61: Another aspect provides a computing device for just-in-time cross-compiling compiled binaries of application programs that utilize graphics processing unit (GPU) executed programs configured to be executed on a first GPU having a first instruction set architecture (ISA) and a first application binary interface (ABI), the computing device comprising a second GPU having a second ISA and a second ABI different from the first ISA and first ABI of the first GPU, and a processor configured to execute an application program that utilizes a plurality of GPU-executed programs configured to be executed for the first ISA and first ABI of the first GPU, execute a run-time executable cross -compiler configured to, while the application program is being executed preprocess the plurality e.g., second task)).

It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Gordon with the teachings of Diamond to compile based on rules associated with the instruction set architecture of the heterogeneous GPUs to utilize rules to translate the tasks for them to be compatible with each GPU. The modification would have been motivated by the desire of maximizing the utilization of different versions of GPU in a system (See at least Gordon’s Col. 1 line 63 through Col. 2, line 10). 

While Diamond and Gordon discuss the creation/division of a workload into multiple pieces executable by accelerators, neither Diamond nor Gordon expressly teach wherein the first compilation rule is used to compile kernel codes associated with different hardware capabilities to be executed on a first type of a dedicated processing resource, and the second compilation rule is used to compile kernel codes associated with different hardware capabilities to be executed on a second type of dedicated processing resource.

divide the workload into parts, or tasks, that are operable to be executed, or processed, primarily by the hybrid nodes” (See at least Abstract and ¶ [0037]). Further, Darrington teaches wherein the first compilation rule is used to compile kernel codes associated with different hardware capabilities to be executed on a first type of a dedicated processing resource, and the second compilation rule is used to compile kernel codes associated with different hardware capabilities to be executed on a second type of dedicated processing resource (¶ [0037]: The hybrid architecture parallel processing computing system is configured to receive a workload and divide the workload into parts, or tasks, that are operable to be executed, or processed, primarily by the hybrid nodes. In particular, the tasks may be further subdivided to be processed by the host element and/or at least one accelerator element, and may be further subdivided to be processed by the one or more cores of a multithreaded processor of the host element and/or one or more elements of a multi-element processor of the at least one accelerator element. As such, the parallel processing computing system is configured to perform several computations at once. In particular, each synergistic processing element and/or general purpose processing element of a multi-element processor may execute one computation kernel, depending on the configuration of the hybrid architecture parallel processing computing system. The at least one multithreaded processor of the host element may be coupled to a memory configured with an application, or a portion of an application, to execute tasks and configure tasks into at least one computation kernel for at least one accelerator element, while the at least one multi-element processor of the at least one accelerator element may be coupled to a memory configured with at least one control unit to dispatch at least one computation kernel to at least one control unit configured to execute on a general purpose processing element of a multi-element processor of an accelerator node, which in turn may configure and manage the execution of the at least one computation kernel on at least one synergistic processing element of that multi-element processor; ¶ [0074]: Work may be entered for the application at the management node. The work may be divided into at least one connected unit workload that may be distributed to at least one service node. A service node may in turn divide the connected unit workload into at least one workload configured to be executed by a hybrid node, and distribute the at least one workload to at least one hybrid node. As such, the program code may receive a workload at a host element of a hybrid node (block 222) and configure the workload into at least one task (block 224). In turn, the program code may compile at least one task into at least one computation kernel (block 226) configured to be executed by an accelerator element. In response to compiling at least one computation kernel, the program code may select at least one accelerator element and configure the at least one computation kernel on that at least one accelerator element (block 228). The program code may then execute the at least one computation kernel on the at least one accelerator element (block 230).).

It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Darrington with the teachings of Diamond and Gordon, to compile kernel codes based on rules associated with the architecture of the processing elements (i.e., processors and accelerators), for them to be able to execute the 

Regarding claim 6, Diamond teaches further comprising: 
receiving, from the first dedicated processing resource, a first result for the first task, receiving, from the second dedicated processing resource, a second result for the second task and combining the first result and the second result (Col. 10, lines 1-14: The frame synchronization master 1301 functions by synchronizing the rendered 3D graphics frames produced by the respective GPUs 901-904. The output of the respective GPUs 901-904 are combined by the output multiplexer 1302 to produce a resulting GPU output stream 1330.).

Regarding claim 7, Diamond teaches wherein the plurality of dedicated processing resources further comprises a third dedicated processing resource of the same type as the second dedicated processing resource, wherein the assigning the second task to the second dedicated processing resource comprises: allocating the second task to the second dedicated processing resource and the third dedicated processing resource (Col. 11, line 63 through Col. 12, line 8: The GPUs 2201-2203 are asymmetric, meaning that their rendering capabilities and/or rendering power is not equal. For example, in the present exemplary embodiment, the embedded GPU 2201 is a two pipeline GPU, the card based GPU 2202 (e.g., add-in AGP card or PCI Express card) is a four pipeline GPU, and the DGS 2203 hosts a 16 pipeline GPU; all three are resources of the same type, GPUs; Col. 12, lines 10-28: The control unit 2204 allocates the graphics processing workload among GPUs 2201-2203 such that the total available hardware is utilized as efficiently as possible. This includes, for example, keeping all available pipelines busy 

Regarding claim 9, it is a system type claim having the same limitations as claim 1 above. Therefore, it is rejected under the same rationale as of claim 1 above. Further, the additional limitations a processing unit; and memory coupled to the processing unit and storing instructions thereon, the instructions when executed by the processing unit, executing the acts are taught by Diamond in Col. 5, lines 54-61: “certain processes and steps of the present invention are realized, in one embodiment, as a series of instructions (e.g., software program) that reside within computer readable memory (e.g., system memory 102) of a computer system (e.g., system 100) and are executed by the CPU 101 and DGS 110 of system 100. When executed, the instructions cause the computer system 100 to implement the functionality of the present invention as described”

As per claim 14, it is a system type claim having similar limitations as of claim 6 above. Therefore, it is rejected under the same rationale as of claim 6 above.

As per claim 15, it is a system type claim having similar limitations as of claim 7 above. Therefore, it is rejected under the same rationale as of claim 7 above.

Regarding claim 17, it is a media/product type claim having the same limitations as claim 1 above. Therefore, it is rejected under the same rationale as of claim 1 above.

Claims 2, 10, and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Diamond, Gordon, and Darrington, as applied to claim 1, in further view of, Balci et al. (US PGPUB US 2018/0165788 A1).

Regarding claim 2, Diamond teaches wherein the obtaining hardware information of a plurality of dedicated processing resources comprises: 
obtaining high-performance computing tasks (Col. 7, lines 33-35: when high-performance 3D rendering is desired (e.g., for a high fidelity real-time 3D rendering application)); 
allocating the plurality of dedicated processing resources to the high-performance computing tasks (Col. 13, lines 22-32: the primitive allocation unit 2406 divides the graphics application workload into a first GPU workload for the GPU 2421 and a second GPU workload for the GPU 2422. In one embodiment, the primitive allocation unit 2406 allocates a larger number of primitives comprising the 3-D scene to be rendered to the more powerful GPU 2422. In one embodiment, the primitive allocation unit 2406 allocates larger, more complex blocks comprising the 3-D scene to be rendered to the more powerful GPU 2422. These blocks can include more complex textures, for example.); and 
obtaining hardware information of each of the plurality of dedicated processing resources (Col. 13, lines 7-14: system configuration information 2403 regarding the capabilities of the plurality of GPUs coupled to the computer system. Based upon the profiling and the reasonably teaches “obtaining”), a front end processor 2405 functions with a graphics primitive allocation unit 2406 to divide the workload from the graphics application into portions appropriate for the capabilities of the coupled GPUs; Col. 11, line 66 through Col. 12, line 5: The GPUs 2201-2203 are asymmetric, meaning that their rendering capabilities and/or rendering power is not equal. For example, in the present exemplary embodiment, the embedded GPU 2201 is a two pipeline GPU, the card based GPU 2202 (e.g., add-in AGP card or PCI Express card) is a four pipeline GPU, and the DGS 2203 hosts a 16 pipeline GPU.).

	Diamond, Gordon, and Darrington do not explicitly disclose wherein the obtaining hardware information of each of the plurality of dedicated processing resources is performed in real time.

	However, Balci teaches obtaining hardware information of each of the plurality of dedicated processing resources is performed in real time (¶ [0024]: [0024] The techniques of this disclosure also allow for finer-grained, real-time GPU state information (e.g., heuristics) that the GPU may use to determine the rendering mode for a scene; ¶ [0091]: In another example, the heuristics may include determining hardware capabilities of GPU 12 such that GPUs having different hardware capabilities may render scenes differently. Additionally, the determined heuristics may include other GPU performances queries, as well as state determinations at the time of execution not explicitly disclosed herein.).



As per claim 10, it is a system type claim having similar limitations as of claim 2 above. Therefore, it is rejected under the same rationale as of claim 2 above.

As per claim 18, it is a media/product type claim having similar limitations as of claim 2 above. Therefore, it is rejected under the same rationale as of claim 2 above

Claims 3, 11, and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Diamond, Gordon, and Darrington, as applied to claim 1, in further view of, Parke et al. (US PGPUB US 2018/0357746 A1).
 
Regarding claim 3, Diamond, Gordon, and Darrington do not expressly disclose wherein the obtaining hardware information of a plurality of dedicated processing resources comprises: 
querying hardware information of each of the plurality of dedicated processing resources periodically; 
storing the queried hardware information into a database, the hardware information comprising an identifier, a type, and a performance parameter of each dedicated processing resource; and 
obtaining, from the database, the hardware information of the plurality of dedicated processing resources.

However, Parke teaches wherein the obtaining hardware information of a plurality of dedicated processing resources comprises: 
querying hardware information of each of the plurality of dedicated processing resources periodically (¶ [0004]: GPU features and capabilities may be determined via one or more query functions to determine capabilities iteratively); 
storing the queried hardware information into a database, the hardware information comprising an identifier (Fig. 7, Configuration ID; ¶ [0246]: As an example, a ConfigurationID may be a 64-bit value that consists of four 16-bit components: PCI Vendor ID, PCI Device ID, PCI Revision ID, and a vendor-supplied ID. Providing the PCI Vendor ID, PCI Device ID, PCI Revision ID, and vendor-supplied ID in this fashion allows a software vendor to quickly perform the appropriate lookups), a type (¶ [0004]: GPU corresponds to type), and a performance parameter of each dedicated processing resource ([0013] FIG. 7 shows, in table form, examples of configuration identifiers for the same hardware or different devices according to one or more embodiments; ¶ [0243]: Third party vendors typically provide hardware specifications to software vendors that detail the hardware feature set for their devices. These specifications provide the software vendor with the details required to simulate hardware functionality through other means (for example, shaders). With the introduction of the ; and 
obtaining, from the database, the hardware information of the plurality of dedicated processing resources (¶ [0004]: GPU features and capabilities may be determined via one or more query functions to determine capabilities iteratively or by using a look up table containing configuration identifiers tied to specific capabilities of a GPU.).

It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Parke with the teachings of Diamond, Gordon, and Darrington to have a data table storing information regarding GPUs. The modification would have been motivated by the desire of quickly performing the appropriate lookups and meet the needs of both the software vendor and the vendor's choice of available functionality (¶ [0246]).

As per claim 11, it is a system type claim having similar limitations as of claim 3 above. Therefore, it is rejected under the same rationale as of claim 3 above.

As per claim 19, it is a media/product type claim having similar limitations as of claim 3 above. Therefore, it is rejected under the same rationale as of claim 3 above

Claims 4, 12, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Diamond, Gordon, and Darrington, as applied to claim 1, in further view of McGrane et al. (US PGPUB US 2011/0022870)

Regarding claim 4, Diamond teaches generating a first task based on the first hardware information and a second task based on the second hardware information (Col. 12, lines 19-23: The graphics application workload from the graphics application can be divided as appropriate to fit the capabilities of each of the GPUs 2201-2203. Thus for example, the more powerful GPU 2203 would be allocated a higher graphics processing workload then the GPU 2201; Col. 13, lines 9-31: Based upon the profiling and the capabilities of the GPUs, a front end processor 2405 functions with a graphics primitive allocation unit 2406 to divide the workload from the graphics application into portions appropriate for the capabilities of the coupled GPUs. The dividing of the graphics workload from the graphics application is done in order to make the best use of the graphics hardware/software of the plurality of coupled GPUs. As described above, the GPUs can be asymmetric in their capability. The objective would be to make the most efficient use of the processing hardware to yield the highest overall system performance. In the FIG. 24 embodiment, the primitive allocation unit 2406 divides the graphics application workload into a first GPU workload for the GPU 2421 and a second GPU workload for the GPU 2422.).

	Diamond does not expressly teach further comprises: 
determining the first compilation rule and the second compilation rule based on the first hardware information and the second hardware information respectively, the first hardware information indicating that the first dedicated processing resource enables a specific function, and the second hardware information indicating the second dedicated processing resource disables the specific function; and 
generating the first task and the second task using the first compilation rule and the second compilation rule respectively.

However, Gordon teaches further comprises: 
determining the first compilation rule and the second compilation rule based on the first hardware information and the second hardware information respectively and generating the first task and the second task using the first compilation rule and the second compilation rule respectively (Col. 2, lines 26-53: In one example, the first GPU 18A of the first computing device 12 is architecturally distinct from the second GPU 18B of the second computing device 14. As shown in FIG. 1, the first GPU 18A has a first instruction set architecture (ISA) 22A and a first application binary interface (ABI) 24A, while the second GPU 18B has a second ISA 22B and a second ABI different from the first ISA 22A and first ABI 24A of the first GPU 18A. Due to architectural differences between the first GPU 18A and the second GPU 18B, application programs configured to be executed using the first processor 16A and first GPU 18A may not be successfully executed using the second processor 16B and second GPU 18B, and vice versa. For example, a compiled binary of an application program 26 may utilize GPU-executed programs configured to be executed on the first GPU 18A having the first ISA 22A and the first ABI 24A. Thus, as the compiled binary of the application program 26 was configured for the specific architecture of the processor 16A and GPU 18A of the first computing device 12, the application program 26 may be run natively on the first computing .

It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Gordon with the teachings of Diamond to compile based on rules associated with the instruction set architecture of the heterogeneous GPUs to utilize rules to translate the tasks for them to be compatible with each GPU. The modification would have been motivated by the desire of maximizing the utilization of different versions of GPU in a system (See at least Gordon’s Col. 1 line 63 through Col. 2, line 10). 

While Gordon discusses compilation rules with hardware information and deciding how to compile and allocate the code to different GPUs of different versions and different (ISAs), neither Diamond, Gordon, nor Darrington expressly teach the first hardware information indicating that the first dedicated processing resource enables a specific function, and the second hardware information indicating the second dedicated processing resource disables the specific function.

However, McGrane teaches the first hardware information indicating that the first dedicated processing resource enables a specific function, and the second hardware information indicating the second dedicated processing resource disables the specific function (¶ [0042]: The system configuration may be the environment in which the workload is .

It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of McGrane with the teachings of Diamond, Gordon, and Darrington to further describe the hardware information to include enabled and disabled features. The modification would have been motivated by the desire of optimizing workload execution in different system configurations. See at least McGrane’s Abstract.

As per claim 12, it is a system type claim having similar limitations as of claim 4 above. Therefore, it is rejected under the same rationale as of claim 4 above.

As per claim 20, it is a media/product type claim having similar limitations as of claim 4 above. Therefore, it is rejected under the same rationale as of claim 4 above.

Claims 5 and 13 are rejected under 35 U.S.C. 103 as being unpatentable over Diamond, Gordon, Darrington, and McGrane, as applied to claims 4 and 12, in further view of, Krishnamurthy et al. (US PGPUB US 2009/0217275 A1).

Regarding claim 5, Diamond teaches generating a first task based on a first hardware information and a second task based on a second hardware information (Col. 12, lines 19-23: The graphics application workload from the graphics application can be divided as appropriate to fit the capabilities of each of the GPUs 2201-2203. Thus for example, the more powerful GPU 2203 would be allocated a higher graphics processing workload then the GPU 2201; Col. 13, lines 9-31: Based upon the profiling and the capabilities of the GPUs, a front end processor 2405 functions with a graphics primitive allocation unit 2406 to divide the workload from the graphics application into portions appropriate for the capabilities of the coupled GPUs. The dividing of the graphics workload from the graphics application is done in order to make the best use of the graphics hardware/software of the plurality of coupled GPUs. As described above, the GPUs can be asymmetric in their capability. The objective would be to make the most efficient use of the processing hardware to yield the highest overall system performance. In the FIG. 24 embodiment, the primitive allocation unit 2406 divides the graphics application workload into a first GPU workload for the GPU 2421 and a second GPU workload for the GPU 2422.). 

Diamond, Gordon, Darrington, and McGrane do not expressly teach further comprises: 
associating the first task with a first identifier of the first dedicated processing resource; and 
associating the second task with a second identifier of the second dedicated processing resource, each identifier comprising an Internet Protocol (IP) address and a local identification of each dedicated processing resource.

However, Krishnamurthy further comprises: 
associating the first task with a first identifier of the first dedicated processing resource and associating the second task with a second identifier of the second dedicated processing resource (¶ [0032]: Upon association, work requests or tasks may be assigned to IP addresses corresponding to the mapped accelerators.), each identifier comprising an Internet Protocol (IP) address and a local identification of each dedicated processing resource ([0008]: The method includes associating hardware addresses to at least one processing unit (PU) or at least one logical partition (LPAR) of the computing system, receiving a work request for an associated hardware accelerator address, and queuing the work request on the PU or in a hardware accelerator using the associated hardware accelerator address; ¶ [0032]: Upon association, work requests or tasks may be assigned to IP addresses corresponding to the mapped accelerators.).

It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Krishnamurthy with the teachings of Diamond, Gordon, Darrington, and McGrane to after portioning tasks associated them with the addresses of the accelerators to ensure the tasks are properly routed. The modification would have been motivated by the desire of ensuring proper task scheduling.

As per claim 13, it is a system type claim having similar limitations as of claim 5 above. Therefore, it is rejected under the same rationale as of claim 5 above.

Claims 8 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Diamond, Gordon, and Darrington, as applied to claim 1, in further view of, Gupta et al. (US PGPUB US 2018/0341525 A1).

Regarding claim 8, Diamond teaches wherein the dedicated processing resources are graphics processing units (GPUs) (Background: Graphics Processing Units (GPUs) are specialized integrated circuit devices that are commonly used in graphics systems to accelerate the performance of a 3D rendering application), the method further comprising: 
performing the deep learning tasks on the first dedicated processing resource and the second dedicated processing resource respectively (Col. 13, lines 33-38: The GPU 2421 and the GPU 2422 both execute their respective allocated GPU workloads.).

	Diamond, Gordon, and Darrington do not expressly disclose wherein the first task and the second task are deep learning tasks; and 
performing deep learning tasks in graphics processing units (GPUs).

	However, Gupta teaches wherein the first task and the second task are deep learning tasks (¶ [0070]: The deep learning model can be configured to assign each of the tasks in the received job to one of the CPU 103 or one of the GPU 104 in the worker node 102. Whether the ; and 
performing deep learning tasks in graphics processing units (GPUs) (¶ [0072]: a neural network is in electronic communication with the GPU 104. For example, the neural network can be deployed on GPU 104 or a CPU to execute the deep learning model for the corresponding image processing task; Claim 9).

It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Gupta with the teachings of Diamond, Gordon, and Darrington to further utilize GPU for deep learning tasks. The modification would have been motivated by the desire of utilize GPUs to meet computing needs and achieve maximum throughput.

As per claim 16, it is a system type claim having similar limitations as of claim 8 above. Therefore, it is rejected under the same rationale as of claim 8 above.

Response to Arguments
Applicant’s arguments with respect to claims 1-20 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.




Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Kim et al. (US PGPUB US 2019/0384370 A1) Software Assisted Power Management. See at least ¶ [0025]“ Most DL/AI workloads comprise networks or layers operating in a "stream" mode where specialized kernels (also referred to herein as "instruction streams") are downloaded to an accelerator. An accelerator is a type of microprocessor or computer system designed as hardware acceleration for machine learning applications. Hardware acceleration involves the use of computer hardware to perform certain functions more efficiently than typical software running on a general-purpose processor. Because these instruction streams in a DL/AI workload can be accelerator-specific, they can be compiled and optimized for the best performance.”.
Gummaraju et al. (US PGPUB US 2013/0160016 A1) 
Krishnamurthy et al. (US PGPUB US 2012/0054770 A1) 
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JORGE A CHU JOY-DAVILA whose telephone number is (571)270-0692.  The examiner can normally be reached on Monday-Friday, 9:00am-5:00pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.

Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/JORGE A CHU JOY-DAVILA/Primary Examiner, Art Unit 2195