DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .


Allowable Subject Matter
Claims 1-17 allowed.

Reasons for Allowance
The following is an examiner’s statement of reasons for allowance: 

During the Office Action filed on 06-18-2021, the Office submitted pertinent prior that included the teachings of U.S. Publication No. 2020/0301736 by Kutschbach et al. in view of U.S. Publication No. 2016/0170476 by Pillai et al.. Those teachings are incorporated here be reference. Meanwhile, an updated search to the communication filed on 06-18-2021 suggests prior art that is believed to be relevant to the claimed subject matter. 

These references are as follows: 

Pertinent prior art for the instant application is U.S. Publication No. 2017/0256023 by Li et al. which discloses the invention directed to the “presented local CDN processing systems and methods facilitate efficient image processing and storage.  methods enable faster response to user image requests, reduced consumption of network bandwidth and reduced source server CPU use.  The FPGA resources are shared for storage controller and image processor operations in a PCIe flash device.  This allows processing burdens to be offloaded from source server CPUs and reduces the expense of the CPU complex (including bus, memory, operating system, etc.).  An original image can be selectively copied onto the CDN server and later requests regarding the original image (including processed image 
requests associated with the original image) can be accomplished by the CDN server.  Thus, the network bandwidth utilization on this service is reduced.  An image processing service's file access granularity is based on the individual images instead of the same-size block.  A unique indexing of images can be used for image operations.  The unique indexing assigns a non-repeatable index and ensures the images have their respective index.  Therefore, the file management is simplified and consequently, after bypassing the traditional file system, the access to the image is significantly accelerated, and the overhead spent on file system construction and maintenance is reduced.  Furthermore, DRAM-less design can exploit the internal page buffer resources inside each Flash LUN, and provide a data buffer functionality similar to a DRAM cache conventional used on an add-in card without actually using a DRAM.  Facilitating removal of the DRAM can further reduces the cost, the power consumption and the system design complexity of the add-in card.  The caching algorithm for the processed image is developed to balance the storage capacity and the online FPGA processing load” (¶ [0060], Fig. 4)

offload server 1 according to the present embodiment improves the efficiency of the system by appropriately performing function allocation and offloading on each of the device layer 70, network layer 60, and cloud layer 50.  The improvement of the efficiency is mainly achieved by allocating each function to an appropriate layer of the three layers to efficiently perform processing and by offloading functional processing such as image analysis to heterogeneous hardware such as a G and FPGA.  In the cloud layer 50, there are an increasing number of servers 
provided with heterogeneous hardware (hereinafter referred to as heterogeneous 
device(s)) such as a GPU and FPGA.  For example, Bing search of Microsoft 
(registered trademark) uses FPGAs.  Performance improvement can be achieved 
utilizing heterogenous devices, for example, by offloading matrix calculations to a GPU or by offloading specific processing such as Fast Fourier Transformation (FFT) to an FPGA.(¶ [0060]) 

Therefore, based on the closest prior arts of record found and the interpretation of the claims in light of the specification, the examiner submits that the claimed invention is patently distinct from the prior art of record. Any individual or combination of any of these prior arts do not distinctly teach or suggests whether individually or in combination the claimed invention directed to obtaining monetary cost using an external server that determines a threshold range of executing speed for a plurality of hardware devices. 

Hence, based on the reasons above, claims 1-17 are allowable.

Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee.  Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance.”


EXAMINER’S AMENDMENT
An Examiner’s amendment to the record appears below. Should the changes and/or additions be unacceptable to applicant, an amendment may be filed as provided by 37 CFR 1.312. To ensure consideration of such an amendment, it MUST be submitted no later than the payment of the issue fee.

Authorization for this Examiner’s amendment for claims 1-27 was given in a telephone interview with Michael Power (Reg. No 61,942) on August 30, 2021.

Claims have been amended as follows:

1.	(Currently Amended) A computer-implemented method for hardware device selection in a computing environment, the method comprising:

generating, by the processor, a compute kernel for each hardware device in the plurality of hardware devices based at least in part on the programming code;
	obtaining, by the processor, a performance model associated with the programming code; 
	obtaining, by the processor, an associated monetary cost for computation associated with each of the plurality of hardware devices, wherein the associated monetary cost for computation comprises a server utilization monetary cost for operating each of the plurality of hardware devices on an external server, and wherein obtaining, by the processor, the associated monetary cost for computation comprises:
	determining a threshold range of execution speed for processing the programming code;
           wherein selecting the target hardware device from the plurality of hardware devices based on the execution costs and the associated monetary costs comprises:
           selecting the target hardware device form the plurality of hardware devices having an execution speed within the threshold range of execution speed and having a lowest associated monetary cost;
obtaining, by the processor, runtime data associated with hybrid computing environment;

selecting a target hardware device from the plurality of hardware devices based on the execution costs and the associated monetary cost.
2.	(Original) The computer-implemented method of Claim 1 further comprising:
	obtaining, by the processor, hardware description data associated with each of the plurality of hardware devices;
determining an updated execution cost for executing the programming code on each of the plurality of hardware devices based on the hardware description data; and
selecting an updated target hardware device from the plurality of hardware devices based on a determination that the updated execution cost is lower than the execution cost.
3.	(Previously Presented) The computer-implemented method of Claim 1 further comprising scheduling the compute kernel for execution on the target hardware device.
4.	(Previously Presented) The computer-implemented method of Claim 1, further comprising:
wherein prior to receiving the request to execute the programming code:
		analyzing, by the processor, the programming code to determine that the programming code is a candidate for execution on the plurality of hardware devices;

5.	(Original) The computer-implemented method of Claim 1, wherein the target hardware device comprises a graphics processing unit (GPU), a field-programmable gate array (FPGA), and a many integrated core (MIC) co-processor.
6. 	(Original) The computer-implemented method of Claim 1, wherein the performance model comprises one or more algorithms for determining execution costs for execution of the programming code.
7.	(Original) The computer-implemented method of Claim 1, wherein the runtime data comprises memory access patterns for the programming code.
8.	(Currently Amended) A computer program product for hardware device selection in a computing environment, the computer program product comprising a computer readable storage medium having program instructions embodied therewith, wherein the computer readable storage medium is not a transitory signal per se, the program instructions executable by a processor to cause the processor to perform a method comprising:
receiving, by a processor, a request to execute a programming code, wherein the processor is operating in a hybrid computing environment comprising a plurality of hardware devices;
generating, by the processor, a compute kernel for each hardware device in the plurality of hardware devices based at least in part on the programming code;
	obtaining, by the processor, a performance model associated with the programming code; 
, and wherein obtaining, by the processor, the associated monetary cost for computation comprises:
		determining a threshold range of execution speed for processing the programming code;
		wherein selecting the target hardware device from the plurality of hardware devices based on the execution costs and the associated monetary costs comprises:
		selecting the target hardware device form the plurality of hardware devices having an execution speed within the threshold range of execution speed and having a lowest associated monetary cost;
obtaining, by the processor, runtime data associated with the hybrid computing environment;
feeding the runtime data into the performance model to determine an execution cost for executing the compute kernel for each of the plurality of hardware devices; and
selecting a target hardware device from the plurality of hardware devices based on the execution costs and the associated monetary cost.
9.	(Original) The computer program product of Claim 8 further comprising:
	obtaining, by the processor, hardware description data associated with each of the plurality of hardware devices;

selecting an updated target hardware device from the plurality of hardware devices based on a determination that the updated execution cost is lower than the execution cost.
10.	(Previously Presented) The computer program product of Claim 8 further comprising scheduling the compute kernel for execution on the target hardware device.
11.	(Previously Presented) The computer program product of Claim 8, further comprising:
wherein prior to receiving the request to execute the programming code:
		analyzing, by the processor, the programming code to determine that the programming code is a candidate for execution on the plurality of hardware devices;
		generating the performance model to estimate an execution cost for the compute kernel to execute on each of the plurality of hardware devices.
12.	(Original) The computer program product of Claim 8, wherein the target hardware device comprises a graphics processing unit (GPU), a field-programmable gate array (FPGA), and a many integrated core (MIC) co-processor.
13. 	(Original) The computer program product of Claim 8, wherein the performance model comprises one or more algorithms for determining execution costs for execution of the programming code.
14.	(Original) The computer program product of Claim 8, wherein the runtime data comprises memory access patterns for the programming code.

a processor communicatively coupled to a memory and a plurality of hardware devices, wherein the processor is configured to:
receive a request to execute a programming code, wherein the processor is operating in a hybrid computing environment comprising a plurality of hardware devices;
generate a compute kernel for each hardware device in the plurality of hardware devices based at least in part on the programming code;
obtain a performance model associated with the programming code; 
obtain an associated monetary cost for computation associated with each of the plurality of hardware devices, wherein the associated monetary cost for computation comprises a server utilization monetary cost for operating each of the plurality of hardware devices on an external server, and wherein obtaining, by the processor, the associated monetary cost for computation comprises:
			determining a threshold range of execution speed for processing the programming code;
			wherein selecting the target hardware device from the plurality of hardware devices based on the execution costs and the associated monetary costs comprises:
		selecting the target hardware device form the plurality of hardware devices having an execution speed within the threshold range of execution speed and having a lowest associated monetary cost;
obtain runtime data associated with the hybrid computing environment;

select a target hardware device from the plurality of hardware devices based on the execution costs and the associated monetary cost.
16.	(Previously Presented) The system of Claim 15, wherein the processor is further configured to schedule the compute kernel for execution on the target hardware device.
17.	(Previously Presented) The system of Claim 15, wherein prior to receiving the request to execute the programming code, the processor is further configured to:
	analyze the programming code to determine that the programming code is a candidate for execution on the plurality of hardware devices;
	generate the performance model to estimate an execution cost for the compute kernel to execute on each of the plurality of hardware devices.

18. – 27. (Canceled)



Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to AUREL PRIFTI whose telephone number is (571)270-1743.  The examiner can normally be reached on M-F 8 a.m.- 6 p.m..

Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/AUREL PRIFTI/Primary Examiner, Art Unit 2186                                                                                                                                                                                                        

Aurel Prifti     
 Primary Examiner
Art Unit 2186
Tel. (571) 270-1743
Fax (571) 270-2743

aurel.prifti@uspto.gov