Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This Office Action is in response to claims filed 09/29/2017.
Claims 1-28 are pending.

Drawings
Fig. 6-7 (including replacement drawing filed 12/13/17) are objected to because it fails to comply with 37 CFR 1.84(p)(3), which requires that numbers, letters, and reference characters must measure at least .32 cm. (1/8 inch) in height. They should not be placed in the drawing so as to interfere with its comprehension. Therefore, they should not cross or mingle with the lines. They should not be placed upon hatched or shaded surfaces. When necessary, such as indicating a surface or cross section, a reference character may be underlined and a blank space may be left in the hatching or shading where the character occurs so that it appears distinct. Corrected drawings in compliance with 37 CFR 1.121(d) are required in reply to the Office action to avoid abandonment of the application. The replacement sheet(s) should be labeled “Replacement Sheet” in the page header (as per 37 CFR 1.84(c)) so as not to obstruct any portion of the drawing figures. If the changes are not accepted by the examiner, the applicant will be notified and informed of any required corrective action in the next Office action. The objection to the drawings will not be held in abeyance. The instant Fig. 6-7 comprise reference characters smaller than the minimum height 

Specification
Applicant is reminded of the proper language and format for an abstract of the disclosure. The abstract should be in narrative form and generally limited to a single paragraph on a separate sheet within the range of 50 to 150 words in length. The abstract should describe the disclosure sufficiently to assist readers in deciding whether there is a need for consulting the full patent text for details. The language should be clear and concise and should not repeat information given in the title. It should avoid using phrases which can be implied, such as, “The disclosure concerns,” “The disclosure defined by this invention,” “The disclosure describes,” etc.  In addition, the form and legal phraseology often used in patent claims, such as “means” and “said,” should be avoided. The instant abstract is greater than 150 words.

Claim Objections
Claims 4 and 16 are objected to because of the following informalities: These claims recite a list of kernel parameters concatenated as “A and/or B”. While Examiner notes that a proper claim construction would be A alone, B alone, or A and B, however, the preferred language is “at least one of A and B” and “and/or”, on its face, appears ambiguous as to whether both elements are required. Appropriate correction is required.

Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 

(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation(s) is/are: “management logic unit” in claims 1-12. Examiner notes that claim 25 also recites various “circuitry for …”, however, MPEP § 2181(A) third paragraph recites, in part, “…examples of structural terms that have been found not to invoke 35 U.S.C. 112(f)  or pre-AIA  35 U.S.C. 112, paragraph 6: "circuit,"…” and therefore Examiner has not interpreted “circuitry for” as invoking 35 U.S.C. 112(f). Lastly, claims 26-28 recite “receiving”, “determining”, “selecting”, “determining (again)”, “registering” and “scheduling” that are all performed “by the compute device”. MPEP § 2181(A) first paragraph recites, in part, “… The following is a list of non-structural generic placeholders that may invoke 35 U.S.C. 112(f): "mechanism for," "module for," "device for," "unit for," …” wherein “device for” may invoke 35 U.S.C. 112(f) and Examiner does not believe “compute” is a structural modifier. However, within the context of the claims and the disclosure as a whole, the “compute device” is described, and indeed claimed in at least claim 26, to comprise “accelerator devices”. Accelerator devices in the art (as well as the instant application, see at least claim 3 “the accelerator devices is a field programmable gate array (FPGA)”) are structural devices 
Further, claim 25 recites claim limitation(s) explicitly using “mean for” which is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier. Such claim limitation(s) is/are “means for selecting…”, “means for determining…” and “means for scheduling…” in claim 25.
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof. A review of the specification, as originally filed, reveals the corresponding of the “management logic unit” in ¶ [0023] and [0025], specifically “the management logic unit 132 may be included in the processor 120”. Further, as the “management logic unit” performs the same functions as the various “means for” identified in claim 25, the corresponding structure of the “management logic unit” also corresponds to the various “means for” of claim 25.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) 

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 13-24 rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter.  
With regard to claims 13, the claim is drawn to a “One or more machine-readable storage media".  The specification recites in ¶ [0009], “The disclosed embodiments may also be implemented as instructions carried by or stored on one or more transitory or non-transitory machine-readable (e.g., computer-readable) storage media, which may be read and executed by one or more processors. A machine-readable storage medium may be embodied as any storage device, mechanism, or other physical structure for storing or transmitting information in a form readable by a machine (e.g., a volatile or non-volatile memory, a media disc, or other media device)”. Examiner notes that the description does not disavow transitory embodiments as well as the fact that signals themselves, although transient, are physical. Thus, applying the broadest reasonable interpretation in light of the specification and taking into account the meaning of the words in their ordinary usage as they would be understood by one of ordinary skill in the art (MPEP 2111), the claim as a whole covers both transitory and non-transitory media.  A 
The claim may be amended by changing "One or more machine-readable storage media" to " One or more non-transitory machine-readable storage media”, thus excluding that portion of the scope covering transitory signals.
Claims 14-24 depend, directly or indirectly, from claim 13 and do not resolve the deficiencies thereof and are therefore rejected for at least the same reasons.

Claims 1-28 are rejected under 35 U.S.C. 101 because the claimed invention recites a judicial exception, is directed to that judicial exception, an abstract idea, as it has not been integrated into practical application and the claims further do not recite significantly more than the judicial exception. Examiner has evaluated the claims under the framework provided in the 2019 Patent Eligibility Guidance published in the Federal Register 01/07/2019 and has provided such analysis below.
Step 1: Claims 1-12 and 25 are compute devices and thus fall within the statutory category of machines. Claims 13-24 are directed to signals, see above, but we still continue analysis under the Mayo/Alice framework to Step 2 to additionally analyze whether the claims are directed to a judicial exception. Claims 26-28 are methods which fall within the statutory category of processes. Therefore, “Are the claims to a process, machine, manufacture or composition of matter?” Yes (continued analysis for signals of claims 13-24 as well).
In order to evaluate the Step 2A inquiry “Is the claim directed to a law of nature, a natural phenomenon or an abstract idea?” we must determine, at Step 2A Prong 1, 
Step 2A Prong 1:
Claim 1: The limitation of “determine one or more job parameters of each requested job based on the corresponding job execution request”, “select an accelerator device of the compute device to execute each job based at least in part on the job parameters of the corresponding job”, “determine, for each job, whether one or more kernels are to be registered on the corresponding accelerator device selected for the corresponding job to enable the corresponding accelerator device to execute the job”, “register, in response to a determination that the one or more kernels are to be registered, the one or more kernels on the corresponding accelerator device” and “schedule, for each accelerator device of the compute device, the kernels of the corresponding accelerator device based on a kernel prediction”, as drafted, is a process that, but for the recitation of generic computing components, under its broadest reasonable interpretation, covers performance of the limitation in the mind. For example, determining, selecting, determining, registering and scheduling in the context of this claim encompasses a person thinking about parameters associated with a job (first determine), thinking about what accelerator could meet the requirements of those parameters (select), thinking about what kernels will be needed for the job at the accelerator (second determine), thinking about and mentally assigning/associating kernels that will be needed to the accelerator (register) and finally think about when the 
Therefore, Yes, claim 1 recites judicial exceptions.
The claim has been identified to recite judicial exceptions, Step 2A Prong 2 will evaluate whether the claims are directed to the judicial exception.
Step 2A Prong 2: 
Claim 1: The judicial exception is not integrated into a practical application. In particular, the claim recites the following additional elements – “a plurality of accelerator devices; and a management logic unit to” which is merely a recitation of generic computing components (see MPEP § 2106.05(b) as well as claim interpretation under 35 U.S.C. § 112(f)) and merely constitutes the field of use/technological environment in which the judicial exception is being applied and does not meaningful limit the judicial exception, see MPEP § 2106.05(h); and “receive a plurality of job execution requests, each job execution request including a job requested to be accelerated received from an orchestrator server” which is merely insignificant pre-solution data gathering activity which does not meaningfully limit the judicial exception, see MPEP § 2106.05(g).
Therefore, “Do the claims recite additional elements that integrate the judicial exception into a practical application? No, these additional elements do not integrate the abstract idea into a practical application and they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
After having evaluating the inquires set forth in Steps 2A Prong 1 and 2, it has been concluded that the claim 1 not only recites a judicial exception but that the claims 
Step 2B: 
Claim 1: The claim does not include additional elements, alone or in combination, that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements amount to no more than insignificant pre-solution data gathering (as evidenced by court decisions discussed in MPEP § 2103.05(g) and 2103.05(d)(II) and in accordance with the Office guidance in the memorandum addressing the decision in Berkheimer v. HP, Inc.), generic computing components and field of use/technological environment without imposing meaningful limits on practicing the abstract idea and thus cannot provide an inventive concept.
Therefore, “Do the claims recite additional elements that amount to significantly more than the judicial exception? No, these additional elements, alone or in combination, do not amount to significantly more than the judicial exception.
Having concluded analysis within the provided framework, Claim 1 does not recite patent eligible subject matter under 35 U.S.C. § 101.
	With regard to claim 25, the above analysis is incorporated herein by reference as it applies equally to claim 25 except in that claim 25 recites some elements as circuitry or “means for” language interpreted under 35 U.S.C. § 112(f). These are, however, additional generic computing component recitations and for the same reasons as above with regard to integration into practical application and whether additional elements amount to significantly more, claim 25 also fails both Step 2A prong 2, thus Claim 25 does not recite patent eligible subject matter under 35 U.S.C. § 101.
With regard to claim 13, the above analysis is incorporated herein by reference as it applies equally to claim 13 except in that claim 13 recites additional elements not included in claim one: “One or more machine-readable storage media”. These are, however, additional generic computing component recitations and for the same reasons as above with regard to integration into practical application and whether additional elements amount to significantly more, claim 13 also fails both Step 2A prong 2, thus the claim is directed to the judicial exception as it has not been integrated into practical application, and fails Step 2B as not amounting to significantly more. Therefore, Claim 13 does not recite patent eligible subject matter under 35 U.S.C. § 101.
With regard to claim 26, the above analysis is incorporated herein by reference as it applies equally to claim 26 except in that claim 26 recites lacks some of the structural/generic computing components as it is directed to a method. Therefore, for the same reasons as above Claim 26 recites a judicial exception, is directed to the judicial exception and does not amount to significantly more. Therefore, Claim 26 does not recite patent eligible subject matter under 35 U.S.C. § 101.
	With regard to claims 2 and 14, they recite additional abstract idea recitations of “determine whether each kernel associated with a corresponding requested job has been previously registered on the compute device” as drafted, is a process that, but for the recitation of generic computing components, under its broadest reasonable interpretation, covers performance of the limitation in the mind. For example, Claims 2 and 14 do not recite patent eligible subject matter under 35 U.S.C. § 101.
With regard to claims 3 and 15, they recite additional abstract idea recitations of “determine one or more kernel parameters of each kernel” as drafted, is a process that, but for the recitation of generic computing components, under its broadest reasonable interpretation, covers performance of the limitation in the mind. For example, “determining” in the context of this claim encompasses a person thinking about the parameters associated with the kernel. Further, claims 3 and 15 recite the following additional elements – “wherein each of the plurality of accelerator devices is a field programmable gate array (FPGA)” which is merely a recitation of generic computing components (see MPEP § 2106.05(b) as well as claim interpretation under 35 U.S.C. § 112(f)) and merely constitutes the field of use/technological environment in which the judicial exception is being applied and does not meaningful limit the judicial exception, see MPEP § 2106.05(h). Thus, for the same reasons as above with regard to integration into practical application and whether additional elements amount to significantly more, claims 3 and 15 also fail both Step 2A prong 2, thus the claims are Claims 3 and 15 do not recite patent eligible subject matter under 35 U.S.C. § 101.
With regard to claims 4 and 16, they recite additional abstract idea recitations of “determine an application identification (ID) of an application requesting the requested job to be accelerated, a kernel identification (ID) of each kernel, a bit-stream, an estimated runtime of each kernel based on one or more previous executions of each kernel, and/or one or more previous timestamps of each kernel”, as drafted, is a process that, but for the recitation of generic computing components, under its broadest reasonable interpretation, covers performance of the limitation in the mind. For example, “determining” in the context of this claim encompasses a person further thinking about parameters associated with a kernel. Claims 4 and 16 do not recite any further additional elements and therefore, for the same reasons as above with regard to integration into practical application and whether additional elements amount to significantly more, claims 4 and 16 also fail both Step 2A prong 2, thus the claims are directed to the judicial exception as it has not been integrated into practical application, and fail Step 2B as not amounting to significantly more. Therefore, Claims 4 and 16 do not recite patent eligible subject matter under 35 U.S.C. § 101.
With regard to claims 5 and 17, they recite additional abstract idea recitations of “determine a kernel identification (ID) of the kernel associated with each requested job” as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind. For example, “determining” in the context of this claim encompasses a person further thinking about a name/identifier of a kernel Claims 5 and 17 do not recite patent eligible subject matter under 35 U.S.C. § 101.
With regard to claims 6 and 18, they recite additional abstract idea recitations of “determine a payload of each requested job” as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind. For example, “determining” in the context of this claim encompasses a person further thinking about a workload of a job. Further, claims 6 and 18 do not recite any further additional elements and for the same reasons as above with regard to integration into practical application and whether additional elements amount to significantly more, claims 6 and 18 also fail both Step 2A prong 2, thus the claims are directed to the judicial exception as it has not been integrated into practical application, and fail Step 2B as not amounting to significantly more. Therefore, Claims 6 and 18 do not recite patent eligible subject matter under 35 U.S.C. § 101.
With regard to claims 7 and 19, they recite additional abstract idea recitations of “determine an estimated runtime of each requested job” as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind. For example, “determining” in the context of this claim encompasses a person further thinking about a runtime of a job. Further, claims 7 and 19 do not recite any Claims 7 and 19 do not recite patent eligible subject matter under 35 U.S.C. § 101.
With regard to claims 8, 20 and 27, they recite additional abstract idea recitations of “prioritize the kernels registered on the compute device based on the kernel prediction” as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind. For example, “prioritize” in the context of this claim encompasses a person further thinking about the registering/associating based on the thought about kernel prediction and further thinking about which kernels are more and less important in terms of the registration. Further, claims 8, 20 and 27 do not recite any further additional elements and for the same reasons as above with regard to integration into practical application and whether additional elements amount to significantly more, claims 8, 20 and 27 also fail both Step 2A prong 2, thus the claims are directed to the judicial exception as it has not been integrated into practical application, and fail Step 2B as not amounting to significantly more. Therefore, Claims 8, 20 and 27 do not recite patent eligible subject matter under 35 U.S.C. § 101.
With regard to claims 9 and 21, they recite additional abstract idea recitations of “prioritize the kernels based on an estimated runtime of each kernel or a past execution history of each kernel” as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind. For example, “prioritize” Claims 9 and 21 do not recite patent eligible subject matter under 35 U.S.C. § 101.
With regard to claims 10 and 22, they recite additional abstract idea recitations of “prioritize a next most probable kernel to receive a job to be accelerated” as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind. For example, “prioritize” in the context of this claim encompasses a person further thinking about the prioritizing and in choosing the more/less important kernels thinking about and making that determination on the basis of the kernel thought to be most probable to receive a job to be accelerated. Further, claims 10 and 22 do not recite any further additional elements and for the same reasons as above with regard to integration into practical application and whether additional elements amount to significantly more, claims 10 and 22 also fail both Step 2A prong 2, thus the claims are directed to the judicial exception as it has not been integrated into practical application, and fail Step 2B as not amounting to significantly more. Therefore, Claims 10 and 22 do not recite patent eligible subject matter under 35 U.S.C. § 101.
Claims 11, 23 and 28 do not recite patent eligible subject matter under 35 U.S.C. § 101.
With regard to claims 12 and 24, they recite additional abstract idea recitations of “predict an execution pattern of each kernel registered on the accelerator devices of the compute device for each application” as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind. For example, “predicting” in the context of this claim encompasses a person further thinking about the execution pattern of kernels for each application. Further, claims 12 and 24 do not recite any further additional elements and for the same reasons as above with regard to integration into practical application and whether additional elements amount to significantly more, claims 12 and 24 also fail both Step 2A prong 2, thus the claims Claims 12 and 24 do not recite patent eligible subject matter under 35 U.S.C. § 101.
Therefore, Claims 1-28 do not recite patent eligible subject matter under 35 U.S.C. § 101.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 3-9, 13, 15-21 and 25-27 are rejected under 35 U.S.C. 103 as being unpatentable over Krishnamurthy et al. Pub. No. US 2012/0054770 A1 (hereafter Krishnamurthy) in view of Chen et al. “Enabling FPGAs in the Cloud” (hereafter Chen).

With regard to claim 1, Krishnamurthy teaches a compute device comprising: a plurality of accelerator devices (the hybrid system 112 is a heterogeneous system. Therefore, in one embodiment, the hybrid computing environment 100 implements a cross-platform parallel programming environment such as, but not limited to, an OpenCL (Open Compute Language) environment. This type of environment allows for parallel programming across a number of heterogeneous devices such as CPUs, GPUs, and accelerators. In other words, a cross-platform parallel programming environment 
a management logic unit to (The workload manager 118 comprises an SLA manager 124, a workload allocator 126, a cluster scheduler 128, a batch window manager 130, a dynamic workload reallocator 132, a workload mobility manager 134, and workload queue 136 in at least ¶ [0033] and Fig. 1 and The computer 1302 has a processor(s) 1304 (such as processors 114 or 116) that is connected to a main memory 1306, mass storage interface 1308, and network adapter hardware 1310. A system bus 1312 interconnects these system components. The main memory 1306, in one embodiment, comprises either the components of the server system 102 such as the workload manager 118 (and is components) in at least ¶ [0076] and Fig. 13, Refer to claim interpretation above for corresponding structure, as well as equivalents, for limitations invoking 35 U.S.C. § 112(f)):
a queue of workloads to be accelerated (one or more workload queues 138 for queuing various workloads to be performed at the accelerator 104 in at least ¶ [0033])
received from an orchestrator server (the server system 102 comprises, among other things, a workload manager 118 … A workload manager negotiates levels and orchestrates workloads to meet those service levels in at least ¶ [0030]);
determine one or more job parameters of each requested job (the server system 102 comprises, among other things, a workload manager 118, one or more processors 114, and a plurality of Service Level Agreements (SLAs) <job parameters> 120 stored within a database or memory 122 … An SLA, in one embodiment, is a stipulation by a user of service levels that need to be met by a workload <job> … SLA 
select an accelerator device of the compute device to execute each job based at least in part on the job parameters of the corresponding job (In this example, the kernels at the accelerators 104, as an aggregate, may only be able to perform 5,000 operations per second, which would not satisfy the SLA 120. Therefore, the workload manager 118 schedules a portion 202, 204 of the accelerator workload (e.g., tasks) as one or more compute kernels 206, 208 at the accelerator processors 116 <selects accelerators> and a remaining portion 210, 212 of the workload as one or more compute kernels 214, 216 at the server system processors 114 so that the SLA 120 of 10,000 operations per second <job parameter to be met> can be achieved in at least ¶ [0039]);
determine, for each job, whether one or more kernels are to be registered on the corresponding accelerator device selected for the corresponding job to enable the corresponding accelerator device to execute the job (Therefore, the workload manager 118 schedules a portion 202, 204 of the accelerator workload (e.g., tasks) as one or more compute kernels 206, 208 at the accelerator processors 116 in at least ¶ [0039]);
register, in response to a determination that the one or more kernels are to be registered, the one or more kernels on the corresponding accelerator device (Therefore, the workload manager 118 schedules a portion 202, 204 of the accelerator workload (e.g., tasks) as one or more compute kernels 206, 208 at the accelerator processors 116 in at least ¶ [0039], scheduling workload tasks at kernels at the accelerator means that kernels have been registered/associated with the accelerator as registering is an association of one to the other (the kernels at the accelerator are associated/registered with it) and further This process can include adding or deleting additional accelerators on the fly and/or adding/deleting server system surrogate processors on the fly. Each additional accelerator can run compute kernels that allow the contracted batch time window to be met. If the batch window specification is relaxed, then the workload manager 134 can remove accelerators and associated kernels so that these maybe reused by other workloads. The workload can complete within the relaxed batch window using the optimized set of accelerator resources in at least ¶ [0042], wherein adding/removing accelerators and kernels is dynamic/on-the-fly, therefore the kernels must be registered/associated with the accelerators
The workload manager 118 sends these compute kernels to the accelerators 104. Alternatively, an OpenCL runtime process on the server can send these kernels to the accelerators when a process calling these kernel functions is launched. The accelerators can also choose to store OpenCL kernels in their local tertiary storage. These kernels can be invoked when a calling process on the server is run. These compute kernels are then launched on the accelerators 104 in at least ¶ [0035]); and
schedule, for each accelerator device of the compute device, the kernels of the corresponding accelerator device based on a kernel prediction (In this example, the kernels at the accelerators 104, as an aggregate, may only be able to perform 5,000 operations per second <predicted to not be sufficient>, which would not satisfy the SLA 120. Therefore, the workload manager 118 schedules a portion 202, 204 of the accelerator workload (e.g., tasks) as one or more compute kernels 206, 208 at the accelerator processors 116 and a remaining portion 210, 212 of the workload as one or more compute kernels 214, 216 at the server system processors 114 so that the SLA 120 of 10,000 operations per second <above selection, registration and scheduling predicted to meet this job parameter> can be achieved in at least ¶ [0039] and A kernel scheduler 519 schedules a physical accelerator compute resource (processors 502, 504, 506) or pseudo-accelerator compute resource (e.g. processors/kernels 508/514, 510/516, 512/518) at the server system 102 or accelerator to satisfy the kernel call in at least ¶ [0055]).
Krishnamurthy teaches evaluating workload SLAs, required kernels and accelerators and making predictions, registrations and schedules on those bases (see mapping above). Although Krishnamurthy teaches workloads with corresponding SLAs governing required parameters, Krishnamurthy does not specifically teach receiving job execution requests specifying the job parameters.
However, in analogous art Chen teaches receive a plurality of job execution requests, each job execution request including a job requested to be accelerated; determine one or more job parameters of each requested job based on the corresponding job execution request (FPGAs must be exposed to the cloud stack as 
It would have been obvious to a person having ordinary skill in the art prior to the effective filing date of the claimed invention to combine the job execution requests specifying the job parameters of Chen with the systems and methods of Krishnamurthy resulting in a system in which the workload parameters/SLAs of Krishnamurthy are communicated via job request as in Chen wherein the workload parameter/SLAs are comprised with job requests for accelerators such as the FGPAs of Chen. A person having ordinary skill in the art would have been motivated to make this combination, with a reasonable expectation of success, for the purpose of improving finer-grain control of accelerator provisioning and precise quantitative acceleration resource allocation, and priority-based workload scheduling (See at least Chen abstract).

With regard to claim 3, Krishnamurthy teaches determine one or more kernel parameters of each kernel (The kernel stealing the data portion can provide its identifier and location <kernel parameters> so it can participate in any synchronization activity that the "stolen from" kernel might require in at least ¶ [0053]).
Krishnamurthy does not specifically teach that the accelerators are FPGAs.
However, in analogous art Chen teaches wherein each of the plurality of the accelerator devices is a field programmable gate array (FPGA) (an accelerator pool (AP) abstraction as a trade-off between current FPGA limitations and cloud principles. In the AP abstraction, each FPGA chip has several pre-defined accelerator slots, e.g. slots A, B, C and D shown in Figure 2. By using the dynamic partial reconfiguration mechanism of modern FPGAs, each slot can be considered as a virtual FPGA chip with standardized resource types, capacity and interfaces in at least 3. Enabling FPGA in Cloud, ¶ 3) and
It would have been obvious to a person having ordinary skill in the art prior to the effective filing date of the claimed invention to combine the accelerators are FPGAs of Chen with the systems and methods of Krishnamurthy resulting in a system in which the accelerators of Krishnamurthy are FPGAs as in Chen. A person having ordinary skill in the art would have been motivated to make this combination, with a reasonable expectation of success, for the purpose of improving finer-grain control of accelerator provisioning and precise quantitative acceleration resource allocation, and priority-based workload scheduling (See at least Chen abstract) through the use of FPGAs wherein one may take advantage of the re-configurability and customizability of FPGAs 
Krishnamurthy teaches (in view of Chen wherein accelerators are FPGAs) wherein to register the one or more kernels on the corresponding accelerator device comprises to register the one or more kernels on the corresponding FPGA (The workload manager 118 sends these compute kernels to the accelerators 104. Alternatively, an OpenCL runtime process on the server can send these kernels to the accelerators when a process calling these kernel functions is launched. The accelerators can also choose to store OpenCL kernels in their local tertiary storage. These kernels can be invoked when a calling process on the server is run. These compute kernels are then launched on the accelerators 104 in at least ¶ [0035]) and 

With regard to claim 4, Krishnamurthy teaches wherein to determine the one or more kernel parameters of each kernel comprises to determine an application identification (ID) of an application requesting the requested job to be accelerated, a kernel identification (ID) of each kernel, a bit-stream, an estimated runtime of each kernel based on one or more previous executions of each kernel, and/or one or more previous timestamps of each kernel (The kernel stealing the data portion can provide its identifier and location so it can participate in any synchronization activity that the "stolen from" kernel might require in at least ¶ [0053]).

With regard to claim 5, Krishnamurthy teaches wherein to determine the one or more job parameters of each requested job based on the corresponding job execution request comprises to determine a kernel identification (ID) of the kernel associated with each requested job (In this example, the kernels at the accelerators 104, as an aggregate, may only be able to perform 5,000 operations per second, which would not satisfy the SLA 120. Therefore, the workload manager 118 schedules a portion 202, 204 of the accelerator workload (e.g., tasks) as one or more compute kernels 206, 208 at the accelerator processors 116 and a remaining portion 210, 212 of the workload as one or more compute kernels 214, 216 at the server system processors 114 so that the SLA 120 of 10,000 operations per second can be achieved in at least ¶ [0039] and The kernel stealing the data portion can provide its identifier and location so it can participate in any synchronization activity that the "stolen from" kernel might require in at least ¶ [0053]).

With regard to claim 6, Krishnamurthy teaches wherein to determine the one or more job parameters of each requested job based on the corresponding job execution request comprises to determine a performance-throughput and response time of each requested job (SLA record can have multiple sub-records for performance-throughput, performance-response time-batch window, Energy, Reliability and Availability. SLA record values can change dynamically during a workload run in at least ¶ [0030] and the SLA manager 124 retrieves an SLA 120 from a storage area 122 and determines that the SLA 120 requires 10,000 operations per second over a two day period of time in at least ¶ [0039]).
Krisnamurthy teaches determining performance throughput and response time, which is indicative of how quickly the payload must be processed  but not specifically 
However, in analogous art Chen teaches determine a payload of each requested job (A job queue receives acceleration jobs from software. A scheduler manages the job queue and schedules jobs into accelerators according to software specified strategies, such as priority and workload size <payload> in at least 4.1 Hardware layer, ¶ 7)
It would have been obvious to a person having ordinary skill in the art prior to the effective filing date of the claimed invention to combine the indication of the payload  of Chen with the systems and methods of Krishnamurthy resulting in a system in which Krishnamurthy not only considered the performance throughput and response time, which is indicative of how quickly the payload must be processed but also the payload itself as in Chen. A person having ordinary skill in the art would have been motivated to make this combination, with a reasonable expectation of success, for the purpose of improving precise quantitative acceleration resource allocation based on the workload (See at least Chen abstract).

With regard to claim 7, Krishnamurthy teaches wherein to determine the one or more job parameters of each requested job based on the corresponding job execution request comprises to determine an estimated runtime of each requested job (SLA record can have multiple sub-records for performance-throughput, performance-response time-batch window, Energy, Reliability and Availability. SLA record values can change dynamically during a workload run in at least ¶ [0030] and the 

With regard to claim 8, Krishnamurthy teaches wherein to schedule the kernels registered on the accelerator device of the compute device comprises to prioritize the kernels registered on the compute device based on the kernel prediction (An SLA may also include fields for prioritization between batch windows and energy windows in at least ¶ [0047] and in this example, the kernels at the accelerators 104, as an aggregate, may only be able to perform 5,000 operations per second, which would not satisfy the SLA 120. Therefore, the workload manager 118 schedules a portion 202, 204 of the accelerator workload (e.g., tasks) as one or more compute kernels 206, 208 at the accelerator processors 116 and a remaining portion 210, 212 of the workload as one or more compute kernels 214, 216 at the server system processors 114 so that the SLA 120 of 10,000 operations per second can be achieved in at least ¶ [0039], Kernels at accelerator 104 predicted to only perform 5,000 ops, not enough, therefore prioritizing the use of kernels 206 and 208 at accelerator 116 as part of scheduling to meet the SLA
… It should be noted that for a workload with throughput SLA, Energy SLA and batch-window SLA, prioritization is possible …in at least ¶ [0069]).

With regard to claim 9, Krishnamurthy teaches wherein to prioritize the kernels registered on the compute device based on the kernel prediction comprises to prioritize the kernels based on an estimated runtime of each kernel or a past execution history of each kernel (In this example, the kernels at the accelerators 104, as an aggregate, may only be able to perform 5,000 operations per second, which would not satisfy the SLA 120. Therefore, the workload manager 118 schedules a portion 202, 204 of the accelerator workload (e.g., tasks) as one or more compute kernels 206, 208 at the accelerator processors 116 and a remaining portion 210, 212 of the workload as one or more compute kernels 214, 216 at the server system processors 114 so that the SLA 120 of 10,000 operations per second can be achieved in at least ¶ [0039], Kernels at accelerator 104 predicted to only perform 5,000 ops, not enough, therefore prioritizing the use of kernels 206 and 208 at accelerator 116 as part of scheduling to meet the SLA and the SLA manager 124 retrieves an SLA 120 from a storage area 122 and determines that the SLA 120 requires 10,000 operations per second over a two day period of time <estimated runtime> in at least ¶ [0039] and ¶ [0047] and ¶ [0069]).

With regard to claim 13, Krishnamurthy teaches one or more machine-readable storage media comprising a plurality of instructions stored thereon that, when executed by a compute device cause the compute device to (a computer program product for providing high-throughput computing in a hybrid processing system is disclosed. The computer program product comprises a storage medium readable by a processing circuit and storing instructions for execution by the processing circuit for performing a method in at least ¶ [0006]):
a queue of workloads to be accelerated (one or more workload queues 138 for queuing various workloads to be performed at the accelerator 104 in at least ¶ [0033])
received from an orchestrator server (the server system 102 comprises, among other things, a workload manager 118 … A workload manager negotiates levels and orchestrates workloads to meet those service levels in at least ¶ [0030]);
determine one or more job parameters of each requested job (the server system 102 comprises, among other things, a workload manager 118, one or more processors 114, and a plurality of Service Level Agreements (SLAs) <job parameters> 120 stored within a database or memory 122 … An SLA, in one embodiment, is a stipulation by a user of service levels that need to be met by a workload <job> … SLA record can have multiple sub-records for performance-throughput, performance-response time-batch window, Energy, Reliability and Availability. SLA record values can change dynamically during a workload run in at least ¶ [0030] and the SLA manager 124 retrieves an SLA 120 from a storage area 122 and determines that the SLA 120 requires 10,000 operations per second over a two day period of time in at least ¶ [0039])
select an accelerator device of the compute device to execute each job based at least in part on the job parameters of the corresponding job (In this example, the kernels at the accelerators 104, as an aggregate, may only be able to perform 5,000 operations per second, which would not satisfy the SLA 120. Therefore, the workload manager 118 schedules a portion 202, 204 of the accelerator workload (e.g., tasks) as one or more compute kernels 206, 208 at the accelerator processors 116 <selects accelerators> and a remaining portion 210, 212 of the workload as one or more compute kernels 214, 216 at the server system processors 114 so that the SLA 
determine, for each job, whether one or more kernels are to be registered on the corresponding accelerator device selected for the corresponding job to enable the corresponding accelerator device to execute the job (Therefore, the workload manager 118 schedules a portion 202, 204 of the accelerator workload (e.g., tasks) as one or more compute kernels 206, 208 at the accelerator processors 116 in at least ¶ [0039]);
register, in response to a determination that the one or more kernels are to be registered, the one or more kernels on the corresponding accelerator device (Therefore, the workload manager 118 schedules a portion 202, 204 of the accelerator workload (e.g., tasks) as one or more compute kernels 206, 208 at the accelerator processors 116 in at least ¶ [0039], scheduling workload tasks at kernels at the accelerator means that kernels have been registered/associated with the accelerator as registering is an association of one to the other (the kernels at the accelerator are associated/registered with it) and further This process can include adding or deleting additional accelerators on the fly and/or adding/deleting server system surrogate processors on the fly. Each additional accelerator can run compute kernels that allow the contracted batch time window to be met. If the batch window specification is relaxed, then the workload manager 134 can remove accelerators and associated kernels so that these maybe reused by other workloads. The workload can complete within the relaxed batch window using the optimized set of accelerator resources in at , wherein adding/removing accelerators and kernels is dynamic/on-the-fly, therefore the kernels must be registered/associated with the accelerators
The workload manager 118 sends these compute kernels to the accelerators 104. Alternatively, an OpenCL runtime process on the server can send these kernels to the accelerators when a process calling these kernel functions is launched. The accelerators can also choose to store OpenCL kernels in their local tertiary storage. These kernels can be invoked when a calling process on the server is run. These compute kernels are then launched on the accelerators 104 in at least ¶ [0035]); and
schedule, for each accelerator device of the compute device, the kernels of the corresponding accelerator device based on a kernel prediction (In this example, the kernels at the accelerators 104, as an aggregate, may only be able to perform 5,000 operations per second <predicted to not be sufficient>, which would not satisfy the SLA 120. Therefore, the workload manager 118 schedules a portion 202, 204 of the accelerator workload (e.g., tasks) as one or more compute kernels 206, 208 at the accelerator processors 116 and a remaining portion 210, 212 of the workload as one or more compute kernels 214, 216 at the server system processors 114 so that the SLA 120 of 10,000 operations per second <above selection, registration and scheduling predicted to meet this job parameter> can be achieved in at least ¶ [0039] and A kernel scheduler 519 schedules a physical accelerator compute resource (processors 502, 504, 506) or pseudo-accelerator compute resource (e.g. processors/kernels 508/514, 510/516, 512/518) at the server system 102 or accelerator to satisfy the kernel call in at least ¶ [0055]).

However, in analogous art Chen teaches receive a plurality of job execution requests, each job execution request including a job requested to be accelerated; determine one or more job parameters of each requested job based on the corresponding job execution request (FPGAs must be exposed to the cloud stack as a resource pool that can be actively managed, i.e. it can be requested, allocated and deallocated by a tenant in at least 1. Introduction 1. Abstraction and instead of requesting programmable resources in PRP, a tenant directly requests various combination of accelerator functions and performance A cloud provides a list of pre-defined accelerators, handles tenant requests and configures accelerators into idle slots. If no accelerator matches the requirements, a tenant can submit his own designs and the cloud owner performs the compilation and adds the tenant design into the accelerator list in at least 3.1 FPGA Resources Abstraction ¶ 3 and A tenant can issue requests for an arbitrary amount of registers, LUTs, and memory and the the controller provides a virtual FPGA chip consisting of what was requested in at least 3.1 FPGA Resources Abstraction ¶ 1);
It would have been obvious to a person having ordinary skill in the art prior to the effective filing date of the claimed invention to combine the job execution requests specifying the job parameters of Chen with the systems and methods of Krishnamurthy 

With regard to claim 15, Krishnamurthy teaches determine one or more kernel parameters of each kernel (The kernel stealing the data portion can provide its identifier and location <kernel parameters> so it can participate in any synchronization activity that the "stolen from" kernel might require in at least ¶ [0053]).
Krishnamurthy does not specifically teach that the accelerators are FPGAs.
However, in analogous art Chen teaches wherein each of the plurality of the accelerator devices is a field programmable gate array (FPGA) (an accelerator pool (AP) abstraction as a trade-off between current FPGA limitations and cloud principles. In the AP abstraction, each FPGA chip has several pre-defined accelerator slots, e.g. slots A, B, C and D shown in Figure 2. By using the dynamic partial reconfiguration mechanism of modern FPGAs, each slot can be considered as a virtual FPGA chip with standardized resource types, capacity and interfaces in at least 3. Enabling FPGA in Cloud, ¶ 3) and
It would have been obvious to a person having ordinary skill in the art prior to the effective filing date of the claimed invention to combine the accelerators are FPGAs of 
Krishnamurthy teaches (in view of Chen wherein accelerators are FPGAs) wherein to register the one or more kernels on the corresponding accelerator device comprises to register the one or more kernels on the corresponding FPGA (The workload manager 118 sends these compute kernels to the accelerators 104. Alternatively, an OpenCL runtime process on the server can send these kernels to the accelerators when a process calling these kernel functions is launched. The accelerators can also choose to store OpenCL kernels in their local tertiary storage. These kernels can be invoked when a calling process on the server is run. These compute kernels are then launched on the accelerators 104 in at least ¶ [0035]) and 

With regard to claim 16, Krishnamurthy teaches wherein to determine the one or more kernel parameters of each kernel comprises to determine an application identification (ID) of an application requesting the requested job to be accelerated, a kernel identification (ID) of each kernel, a bit-stream, an estimated runtime of each kernel based on one or more previous executions of each kernel, and/or one or more previous timestamps of each kernel (The kernel stealing the data portion can provide its identifier and location so it can participate in any synchronization activity that the "stolen from" kernel might require in at least ¶ [0053]).

With regard to claim 17, Krishnamurthy teaches wherein to determine the one or more job parameters of each requested job based on the corresponding job execution request comprises to determine a kernel identification (ID) of the kernel associated with each requested job (In this example, the kernels at the accelerators 104, as an aggregate, may only be able to perform 5,000 operations per second, which would not satisfy the SLA 120. Therefore, the workload manager 118 schedules a portion 202, 204 of the accelerator workload (e.g., tasks) as one or more compute kernels 206, 208 at the accelerator processors 116 and a remaining portion 210, 212 of the workload as one or more compute kernels 214, 216 at the server system processors 114 so that the SLA 120 of 10,000 operations per second can be achieved in at least ¶ [0039] and The kernel stealing the data portion can provide its identifier and location so it can participate in any synchronization activity that the "stolen from" kernel might require in at least ¶ [0053]).

With regard to claim 18, Krishnamurthy teaches wherein to determine the one or more job parameters of each requested job based on the corresponding job execution request comprises to determine a performance-throughput and response time  of each requested job (SLA record can have multiple sub-records for 
Krisnamurthy teaches determining performance throughput and response time, which is indicative of how quickly the payload must be processed  but not specifically indicative of the payload itself (although could be calculated from the operations per second over the time period as above).
However, in analogous art Chen teaches determine a payload of each requested job (A job queue receives acceleration jobs from software. A scheduler manages the job queue and schedules jobs into accelerators according to software specified strategies, such as priority and workload size <payload> in at least 4.1 Hardware layer, ¶ 7)
It would have been obvious to a person having ordinary skill in the art prior to the effective filing date of the claimed invention to combine the indication of the payload  of Chen with the systems and methods of Krishnamurthy resulting in a system in which Krishnamurthy not only considered the performance throughput and response time, which is indicative of how quickly the payload must be processed but also the payload itself as in Chen. A person having ordinary skill in the art would have been motivated to make this combination, with a reasonable expectation of success, for the purpose of improving precise quantitative acceleration resource allocation based on the workload (See at least Chen abstract).

With regard to claim 19, Krishnamurthy teaches wherein to determine the one or more job parameters of each requested job based on the corresponding job execution request comprises to determine an estimated runtime of each requested job (SLA record can have multiple sub-records for performance-throughput, performance-response time-batch window, Energy, Reliability and Availability. SLA record values can change dynamically during a workload run in at least ¶ [0030] and the SLA manager 124 retrieves an SLA 120 from a storage area 122 and determines that the SLA 120 requires 10,000 operations per second over a two day period of time <estimated runtime> in at least ¶ [0039]).

With regard to claim 20, Krishnamurthy teaches wherein to schedule the kernels registered on the accelerator device of the compute device comprises to prioritize the kernels registered on the compute device based on the kernel prediction (An SLA may also include fields for prioritization between batch windows and energy windows in at least ¶ [0047] and in this example, the kernels at the accelerators 104, as an aggregate, may only be able to perform 5,000 operations per second, which would not satisfy the SLA 120. Therefore, the workload manager 118 schedules a portion 202, 204 of the accelerator workload (e.g., tasks) as one or more compute kernels 206, 208 at the accelerator processors 116 and a remaining portion 210, 212 of the workload as one or more compute kernels 214, 216 at the server system processors 114 so that the SLA 120 of 10,000 operations per second can be achieved in at least ¶ [0039], Kernels at accelerator 104 predicted to only perform 5,000 ops, not enough, therefore prioritizing the use of kernels 206 and 208 at accelerator 116 as part of scheduling to meet the SLA
… It should be noted that for a workload with throughput SLA, Energy SLA and batch-window SLA, prioritization is possible …in at least ¶ [0069]).

With regard to claim 21, Krishnamurthy teaches wherein to prioritize the kernels registered on the compute device based on the kernel prediction comprises to prioritize the kernels based on an estimated runtime of each kernel or a past execution history of each kernel (In this example, the kernels at the accelerators 104, as an aggregate, may only be able to perform 5,000 operations per second, which would not satisfy the SLA 120. Therefore, the workload manager 118 schedules a portion 202, 204 of the accelerator workload (e.g., tasks) as one or more compute kernels 206, 208 at the accelerator processors 116 and a remaining portion 210, 212 of the workload as one or more compute kernels 214, 216 at the server system processors 114 so that the SLA 120 of 10,000 operations per second can be achieved in at least ¶ [0039], Kernels at accelerator 104 predicted to only perform 5,000 ops, not enough, therefore prioritizing the use of kernels 206 and 208 at accelerator 116 as part of scheduling to meet the SLA and the SLA manager 124 retrieves an SLA 120 from a storage area 122 and determines that the SLA 120 requires 10,000 operations per second over a two day period of time <estimated runtime> in at least ¶ [0039] and ¶ [0047] and ¶ [0069]).

With regard to claim 25, Krishnamurthy teaches a compute device comprising (the hybrid system 112 is a heterogeneous system. Therefore, in one embodiment, the hybrid computing environment 100 implements a cross-platform parallel programming environment such as, but not limited to, an OpenCL (Open Compute Language) environment. This type of environment allows for parallel programming across a number of heterogeneous devices such as CPUs, GPUs, and accelerators. In other words, a cross-platform parallel programming environment allows programs to execute across heterogeneous components in at least ¶ [0034] and Fig. 1):
circuitry and means for (The workload manager 118 comprises an SLA manager 124, a workload allocator 126, a cluster scheduler 128, a batch window manager 130, a dynamic workload reallocator 132, a workload mobility manager 134, and workload queue 136 in at least ¶ [0033] and Fig. 1 and The computer 1302 has a processor(s) 1304 (such as processors 114 or 116) that is connected to a main memory 1306, mass storage interface 1308, and network adapter hardware 1310. A system bus 1312 interconnects these system components. The main memory 1306, in one embodiment, comprises either the components of the server system 102 such as the workload manager 118 (and is components) in at least ¶ [0076] and Fig. 13, Refer to claim interpretation above for corresponding structure, as well as equivalents, for limitations invoking 35 U.S.C. § 112(f))
a queue of workloads to be accelerated (one or more workload queues 138 for queuing various workloads to be performed at the accelerator 104 in at least ¶ [0033])
received from an orchestrator server (the server system 102 comprises, among other things, a workload manager 118 … A workload manager negotiates levels and orchestrates workloads to meet those service levels in at least ¶ [0030]);
determining one or more job parameters of each requested job (the server system 102 comprises, among other things, a workload manager 118, one or more processors 114, and a plurality of Service Level Agreements (SLAs) <job parameters> 120 stored within a database or memory 122 … An SLA, in one embodiment, is a stipulation by a user of service levels that need to be met by a workload <job> … SLA record can have multiple sub-records for performance-throughput, performance-response time-batch window, Energy, Reliability and Availability. SLA record values can change dynamically during a workload run in at least ¶ [0030] and the SLA manager 124 retrieves an SLA 120 from a storage area 122 and determines that the SLA 120 requires 10,000 operations per second over a two day period of time in at least ¶ [0039])
selecting an accelerator device of the compute device to execute each job based at least in part on the job parameters of the corresponding job (In this example, the kernels at the accelerators 104, as an aggregate, may only be able to perform 5,000 operations per second, which would not satisfy the SLA 120. Therefore, the workload manager 118 schedules a portion 202, 204 of the accelerator workload (e.g., tasks) as one or more compute kernels 206, 208 at the accelerator processors 116 <selects accelerators> and a remaining portion 210, 212 of the workload as one or more compute kernels 214, 216 at the server system processors 114 so that the SLA 
determining, for each job, whether one or more kernels are to be registered on the corresponding accelerator device selected for the corresponding job to enable the corresponding accelerator device to execute the job (Therefore, the workload manager 118 schedules a portion 202, 204 of the accelerator workload (e.g., tasks) as one or more compute kernels 206, 208 at the accelerator processors 116 in at least ¶ [0039]);
registering, in response to a determination that the one or more kernels are to be registered, the one or more kernels on the corresponding accelerator device (Therefore, the workload manager 118 schedules a portion 202, 204 of the accelerator workload (e.g., tasks) as one or more compute kernels 206, 208 at the accelerator processors 116 in at least ¶ [0039], scheduling workload tasks at kernels at the accelerator means that kernels have been registered/associated with the accelerator as registering is an association of one to the other (the kernels at the accelerator are associated/registered with it) and further This process can include adding or deleting additional accelerators on the fly and/or adding/deleting server system surrogate processors on the fly. Each additional accelerator can run compute kernels that allow the contracted batch time window to be met. If the batch window specification is relaxed, then the workload manager 134 can remove accelerators and associated kernels so that these maybe reused by other workloads. The workload can complete within the relaxed batch window using the optimized set of accelerator resources in at , wherein adding/removing accelerators and kernels is dynamic/on-the-fly, therefore the kernels must be registered/associated with the accelerators
The workload manager 118 sends these compute kernels to the accelerators 104. Alternatively, an OpenCL runtime process on the server can send these kernels to the accelerators when a process calling these kernel functions is launched. The accelerators can also choose to store OpenCL kernels in their local tertiary storage. These kernels can be invoked when a calling process on the server is run. These compute kernels are then launched on the accelerators 104 in at least ¶ [0035]); and
scheduling, for each accelerator device of the compute device, the kernels of the corresponding accelerator device based on a kernel prediction (In this example, the kernels at the accelerators 104, as an aggregate, may only be able to perform 5,000 operations per second <predicted to not be sufficient>, which would not satisfy the SLA 120. Therefore, the workload manager 118 schedules a portion 202, 204 of the accelerator workload (e.g., tasks) as one or more compute kernels 206, 208 at the accelerator processors 116 and a remaining portion 210, 212 of the workload as one or more compute kernels 214, 216 at the server system processors 114 so that the SLA 120 of 10,000 operations per second <above selection, registration and scheduling predicted to meet this job parameter> can be achieved in at least ¶ [0039] and A kernel scheduler 519 schedules a physical accelerator compute resource (processors 502, 504, 506) or pseudo-accelerator compute resource (e.g. processors/kernels 508/514, 510/516, 512/518) at the server system 102 or accelerator to satisfy the kernel call in at least ¶ [0055]).

However, in analogous art Chen teaches receiving a plurality of job execution requests, each job execution request including a job requested to be accelerated; determining one or more job parameters of each requested job based on the corresponding job execution request (FPGAs must be exposed to the cloud stack as a resource pool that can be actively managed, i.e. it can be requested, allocated and deallocated by a tenant in at least 1. Introduction 1. Abstraction and instead of requesting programmable resources in PRP, a tenant directly requests various combination of accelerator functions and performance A cloud provides a list of pre-defined accelerators, handles tenant requests and configures accelerators into idle slots. If no accelerator matches the requirements, a tenant can submit his own designs and the cloud owner performs the compilation and adds the tenant design into the accelerator list in at least 3.1 FPGA Resources Abstraction ¶ 3 and A tenant can issue requests for an arbitrary amount of registers, LUTs, and memory and the the controller provides a virtual FPGA chip consisting of what was requested in at least 3.1 FPGA Resources Abstraction ¶ 1);
It would have been obvious to a person having ordinary skill in the art prior to the effective filing date of the claimed invention to combine the job execution requests specifying the job parameters of Chen with the systems and methods of Krishnamurthy 

With regard to claim 26, Krishnamurthy teaches a method for overprovisioning accelerator devices of a compute device, the method comprising (the hybrid system 112 is a heterogeneous system. Therefore, in one embodiment, the hybrid computing environment 100 implements a cross-platform parallel programming environment such as, but not limited to, an OpenCL (Open Compute Language) environment. This type of environment allows for parallel programming across a number of heterogeneous devices such as CPUs, GPUs, and accelerators. In other words, a cross-platform parallel programming environment allows programs to execute across heterogeneous components in at least ¶ [0034] and Fig. 1):
a queue of workloads to be accelerated (one or more workload queues 138 for queuing various workloads to be performed at the accelerator 104 in at least ¶ [0033])
received from an orchestrator server (the server system 102 comprises, among other things, a workload manager 118 … A workload manager negotiates levels and orchestrates workloads to meet those service levels in at least ¶ [0030]);
determining, by the compute device, one or more job parameters of each requested job (the server system 102 comprises, among other things, a workload manager 118, one or more processors 114, and a plurality of Service Level Agreements (SLAs) <job parameters> 120 stored within a database or memory 122 … An SLA, in one embodiment, is a stipulation by a user of service levels that need to be met by a workload <job> … SLA record can have multiple sub-records for performance-throughput, performance-response time-batch window, Energy, Reliability and Availability. SLA record values can change dynamically during a workload run in at least ¶ [0030] and the SLA manager 124 retrieves an SLA 120 from a storage area 122 and determines that the SLA 120 requires 10,000 operations per second over a two day period of time in at least ¶ [0039])
selecting, by the compute device, an accelerator device of the compute device to execute each job based at least in part on the job parameters of the corresponding job (In this example, the kernels at the accelerators 104, as an aggregate, may only be able to perform 5,000 operations per second, which would not satisfy the SLA 120. Therefore, the workload manager 118 schedules a portion 202, 204 of the accelerator workload (e.g., tasks) as one or more compute kernels 206, 208 at the accelerator processors 116 <selects accelerators> and a remaining portion 210, 212 of the workload as one or more compute kernels 214, 216 at the server system processors 114 so that the SLA 120 of 10,000 operations per second <job parameter to be met> can be achieved in at least ¶ [0039]);
determining, by the compute device and for each job, whether one or more kernels are to be registered on the corresponding accelerator device selected for the corresponding job to enable the corresponding accelerator device to execute the job (Therefore, the workload manager 118 schedules a portion 202, 204 of the accelerator workload (e.g., tasks) as one or more compute kernels 206, 208 at the accelerator processors 116 in at least ¶ [0039]);
registering, by the compute device and in response to a determination that the one or more kernels are to be registered, the one or more kernels on the corresponding accelerator device (Therefore, the workload manager 118 schedules a portion 202, 204 of the accelerator workload (e.g., tasks) as one or more compute kernels 206, 208 at the accelerator processors 116 in at least ¶ [0039], scheduling workload tasks at kernels at the accelerator means that kernels have been registered/associated with the accelerator as registering is an association of one to the other (the kernels at the accelerator are associated/registered with it) and further This process can include adding or deleting additional accelerators on the fly and/or adding/deleting server system surrogate processors on the fly. Each additional accelerator can run compute kernels that allow the contracted batch time window to be met. If the batch window specification is relaxed, then the workload manager 134 can remove accelerators and associated kernels so that these maybe reused by other workloads. The workload can complete within the relaxed batch window using the optimized set of accelerator resources in at least ¶ [0042], wherein adding/removing accelerators and kernels is dynamic/on-the-fly, therefore the kernels must be registered/associated with the accelerators
The workload manager 118 sends these compute kernels to the accelerators 104. Alternatively, an OpenCL runtime process on the server can send these kernels to 
scheduling, for each accelerator device of the compute device and by the compute device, the kernels of the corresponding accelerator device based on a kernel prediction (In this example, the kernels at the accelerators 104, as an aggregate, may only be able to perform 5,000 operations per second <predicted to not be sufficient>, which would not satisfy the SLA 120. Therefore, the workload manager 118 schedules a portion 202, 204 of the accelerator workload (e.g., tasks) as one or more compute kernels 206, 208 at the accelerator processors 116 and a remaining portion 210, 212 of the workload as one or more compute kernels 214, 216 at the server system processors 114 so that the SLA 120 of 10,000 operations per second <above selection, registration and scheduling predicted to meet this job parameter> can be achieved in at least ¶ [0039] and A kernel scheduler 519 schedules a physical accelerator compute resource (processors 502, 504, 506) or pseudo-accelerator compute resource (e.g. processors/kernels 508/514, 510/516, 512/518) at the server system 102 or accelerator to satisfy the kernel call in at least ¶ [0055]).
Krishnamurthy teaches evaluating workload SLAs, required kernels and accelerators and making predictions, registrations and schedules on those bases (see mapping above). Although Krishnamurthy teaches workloads with corresponding SLAs governing required parameters, Krishnamurthy does not specifically teach receiving job execution requests specifying the job parameters.
receiving, by the compute device, a plurality of job execution requests, each job execution request including a job requested to be accelerated; determining, by the compute device, one or more job parameters of each requested job based on the corresponding job execution request (FPGAs must be exposed to the cloud stack as a resource pool that can be actively managed, i.e. it can be requested, allocated and deallocated by a tenant in at least 1. Introduction 1. Abstraction and instead of requesting programmable resources in PRP, a tenant directly requests various combination of accelerator functions and performance A cloud provides a list of pre-defined accelerators, handles tenant requests and configures accelerators into idle slots. If no accelerator matches the requirements, a tenant can submit his own designs and the cloud owner performs the compilation and adds the tenant design into the accelerator list in at least 3.1 FPGA Resources Abstraction ¶ 3 and A tenant can issue requests for an arbitrary amount of registers, LUTs, and memory and the the controller provides a virtual FPGA chip consisting of what was requested in at least 3.1 FPGA Resources Abstraction ¶ 1);
It would have been obvious to a person having ordinary skill in the art prior to the effective filing date of the claimed invention to combine the job execution requests specifying the job parameters of Chen with the systems and methods of Krishnamurthy resulting in a system in which the workload parameters/SLAs of Krishnamurthy are communicated via job request as in Chen wherein the workload parameter/SLAs are comprised with job requests for accelerators such as the FGPAs of Chen. A person having ordinary skill in the art would have been motivated to make this combination, with a reasonable expectation of success, for the purpose of improving finer-grain 

With regard to claim 27, Krishnamurthy teaches wherein scheduling the kernels registered on the accelerator device of the compute device comprises prioritizing the kernels registered on the compute device based on the kernel prediction (An SLA may also include fields for prioritization between batch windows and energy windows in at least ¶ [0047] and in this example, the kernels at the accelerators 104, as an aggregate, may only be able to perform 5,000 operations per second, which would not satisfy the SLA 120. Therefore, the workload manager 118 schedules a portion 202, 204 of the accelerator workload (e.g., tasks) as one or more compute kernels 206, 208 at the accelerator processors 116 and a remaining portion 210, 212 of the workload as one or more compute kernels 214, 216 at the server system processors 114 so that the SLA 120 of 10,000 operations per second can be achieved in at least ¶ [0039], Kernels at accelerator 104 predicted to only perform 5,000 ops, not enough, therefore prioritizing the use of kernels 206 and 208 at accelerator 116 as part of scheduling to meet the SLA
… It should be noted that for a workload with throughput SLA, Energy SLA and batch-window SLA, prioritization is possible …in at least ¶ [0069]).

Claims 2, 10-12, 14, 22-24 and 28 are rejected under 35 U.S.C. 103 as being unpatentable over Krishnamurthy et al. Pub. No. US 2012/0054770 A1 (hereafter Krishnamurthy) in view of Chen et al. “Enabling FPGAs in the Cloud” (hereafter Chen) .

With regard to claim 2, Krishnamurthy and Chen teach the compute device of claim 1,
Krishnamurthy and Chen do not specifically teach determining whether kernels have already been registered on the compute device.
However, in analogous art Becchi teaches wherein to determine whether one or more kernels are to be registered on the corresponding accelerator device comprises to determine whether each kernel associated with a corresponding requested job has been previously registered on the compute device (Block 604 determines whether the CPU or one of the accelerators would most efficiently execute the kernel. If the data is already located at the fastest processor for executing the kernel, the remaining steps may be skipped and the kernel may be executed at that processor. Block 606 determines how long it would take to transfer the requested data from its current location to another processing element. For example, if the data is stored in the CPU's memory, determining how long it would take to transfer that data to the accelerator memory. Block 608 then determines whether the transfer time required is too great (e.g., if it exceeds a specified threshold). If so, the kernel is assigned to whichever processing element currently stores the requested data at block 601. If not, block 610 transfers the data to the more efficient processing element and the kernel is assigned there at block 612 in at least ¶ [0073] and A runtime according to the present principles analyzes such situations using history-based models to predict processing as 
It would have been obvious to a person having ordinary skill in the art prior to the effective foiling date of the claimed invention to combine the determining whether kernels have already been registered on the compute device of Becchi with the systems and methods of Krishnamurthy and Chen resulting in a system in which the selection of an accelerator of Krishnamurthy further takes into consideration the kernel data already available at the compute device as in Becchi. A person having ordinary skill in the art would have been motivated to make this combination, with a reasonable expectation of success, for the purpose of increasing efficiency by allowing for compute devices which previously executed kernels to be selected as there would not be a need to transfer data thus saving processing time and resources (See at least Becchi ¶ [0073] and ¶ [0028]).

With regard to claim 10, Krishnamurthy and Chen teach the compute device of claim 8,
Krishnamurthy teaches prioritizing kernels which can meet the SLA (see mapping above) but Krishnamurthy and Chen do not specifically teach prioritizing the next most probable kernel.
However, in analogous art Becchi teaches wherein to prioritize the kernels registered on the compute device based on the kernel prediction comprises to prioritize a next most probable kernel to receive a job to be accelerated (If an 
It would have been obvious to a person having ordinary skill in the art prior to the effective filing date of the claimed invention to combine the prioritizing the next most probable kernel of Becchi with the systems and methods of Krishnamurthy and Chen resulting in a system in which the selection of an accelerator of Krishnamurthy further prioritizes selecting not only the kernel and accelerator estimated to be the fastest but also considers the kernel and accelerator most likely to use the data as in Becchi. A person having ordinary skill in the art would have been motivated to make this combination, with a reasonable expectation of success, for the purpose of increasing efficiency by analyzing situations using history-based models to predict processing as 

With regard to claim 11, Krishnamurthy and Chen teach the compute device of claim 1,
Krishnamurthy teaches prioritizing kernels which can meet the SLA (see mapping above) but Krishnamurthy and Chen do not specifically teach prioritizing the next most probable kernel based on execution pattern.
However, in analogous art Becchi teaches wherein the management logic unit is further to predict a next probable kernel from the kernels registered on the accelerator devices of the compute device to receive a job to be accelerated based on an execution pattern of each kernel (If an application has three candidate kernels with both CPU and GPU implementations and, during a certain execution path, the first kernel is estimated to be much faster, but the second and third much slower on the GPU (based on the sizes of their parameters), a data-agnostic scheduler is likely to run the first kernel on the GPU, and the rest on the CPU. However if the runtime discovers that the first kernel produces a large amount of data that is consumed by the second kernel, a better schedule may be to run the second kernel also on the GPU. Although the GPU is slower in processing the second kernel compared to the CPU, this schedule will obviate the large intermediate data transfer, and potentially result in an overall speedup. A runtime according to the present principles analyzes such situations using history-based models to predict processing as well as data transfer time and uses these to guide the scheduling policy. The runtime intercepts calls to candidate kernels, 
It would have been obvious to a person having ordinary skill in the art prior to the effective filing date of the claimed invention to combine the prioritizing the next most probable kernel based on execution pattern of Becchi with the systems and methods of Krishnamurthy and Chen resulting in a system in which the selection of an accelerator of Krishnamurthy further prioritizes selecting not only the kernel and accelerator estimated to be the fastest but also considers the kernel and accelerator most likely to use the data as in Becchi. A person having ordinary skill in the art would have been motivated to make this combination, with a reasonable expectation of success, for the purpose of increasing efficiency by analyzing situations using history-based models to predict processing as well as data transfer time and uses these to guide the scheduling policy (See at least Becchi ¶ [0028]).

With regard to claim 12, Krishnamurthy and Chen teach the compute device of claim 11,
Krishnamurthy teaches prioritizing kernels which can meet the SLA (see mapping above) but Krishnamurthy and Chen do not specifically teach prioritizing the next most probable kernel based on execution pattern of kernels for each application.
However, in analogous art Becchi teaches wherein to predict a next probable kernel from the kernels registered on the accelerator devices of the compute device comprises to predict an execution pattern of each kernel registered on the accelerator devices of the compute device for each application (If an application 
It would have been obvious to a person having ordinary skill in the art prior to the effective filing date of the claimed invention to combine the prioritizing the next most probable kernel based on execution pattern of kernels for each application of Becchi with the systems and methods of Krishnamurthy and Chen resulting in a system in which the selection of an accelerator of Krishnamurthy further prioritizes selecting not only the kernel and accelerator estimated to be the fastest but also considers the kernel and accelerator most likely to use the data as in Becchi. A person having ordinary skill in the art would have been motivated to make this combination, with a reasonable expectation of success, for the purpose of increasing efficiency by analyzing situations 

With regard to claim 14, Krishnamurthy and Chen teach the one or more machine-readable storage media of claim 13,
Krishnamurthy and Chen do not specifically teach determining whether kernels have already been registered on the compute device.
However, in analogous art Becchi teaches wherein to determine whether one or more kernels are to be registered on the corresponding accelerator device comprises to determine whether each kernel associated with a corresponding requested job has been previously registered on the compute device (Block 604 determines whether the CPU or one of the accelerators would most efficiently execute the kernel. If the data is already located at the fastest processor for executing the kernel, the remaining steps may be skipped and the kernel may be executed at that processor. Block 606 determines how long it would take to transfer the requested data from its current location to another processing element. For example, if the data is stored in the CPU's memory, determining how long it would take to transfer that data to the accelerator memory. Block 608 then determines whether the transfer time required is too great (e.g., if it exceeds a specified threshold). If so, the kernel is assigned to whichever processing element currently stores the requested data at block 601. If not, block 610 transfers the data to the more efficient processing element and the kernel is assigned there at block 612 in at least ¶ [0073] and A runtime according to the present principles analyzes such situations using history-based models to predict processing as 
It would have been obvious to a person having ordinary skill in the art prior to the effective foiling date of the claimed invention to combine the determining whether kernels have already been registered on the compute device of Becchi with the systems and methods of Krishnamurthy and Chen resulting in a system in which the selection of an accelerator of Krishnamurthy further takes into consideration the kernel data already available at the compute device as in Becchi. A person having ordinary skill in the art would have been motivated to make this combination, with a reasonable expectation of success, for the purpose of increasing efficiency by allowing for compute devices which previously executed kernels to be selected as there would not be a need to transfer data thus saving processing time and resources (See at least Becchi ¶ [0073] and ¶ [0028]).

With regard to claim 22, Krishnamurthy and Chen teach the one or more machine-readable storage media of claim 20,
Krishnamurthy teaches prioritizing kernels which can meet the SLA (see mapping above) but Krishnamurthy and Chen do not specifically teach prioritizing the next most probable kernel.
However, in analogous art Becchi teaches wherein to prioritize the kernels registered on the compute device based on the kernel prediction comprises to prioritize a next most probable kernel to receive a job to be accelerated (If an 
It would have been obvious to a person having ordinary skill in the art prior to the effective filing date of the claimed invention to combine the prioritizing the next most probable kernel of Becchi with the systems and methods of Krishnamurthy and Chen resulting in a system in which the selection of an accelerator of Krishnamurthy further prioritizes selecting not only the kernel and accelerator estimated to be the fastest but also considers the kernel and accelerator most likely to use the data as in Becchi. A person having ordinary skill in the art would have been motivated to make this combination, with a reasonable expectation of success, for the purpose of increasing efficiency by analyzing situations using history-based models to predict processing as 

With regard to claim 23, Krishnamurthy and Chen teach the one or more machine-readable storage media of claim 13,
Krishnamurthy teaches prioritizing kernels which can meet the SLA (see mapping above) but Krishnamurthy and Chen do not specifically teach prioritizing the next most probable kernel based on execution pattern.
However, in analogous art Becchi teaches wherein the plurality of instructions, when executed, further cause the compute device to predict a next probable kernel from the kernels registered on the accelerator devices of the compute device to receive a job to be accelerated based on an execution pattern of each kernel (If an application has three candidate kernels with both CPU and GPU implementations and, during a certain execution path, the first kernel is estimated to be much faster, but the second and third much slower on the GPU (based on the sizes of their parameters), a data-agnostic scheduler is likely to run the first kernel on the GPU, and the rest on the CPU. However if the runtime discovers that the first kernel produces a large amount of data that is consumed by the second kernel, a better schedule may be to run the second kernel also on the GPU. Although the GPU is slower in processing the second kernel compared to the CPU, this schedule will obviate the large intermediate data transfer, and potentially result in an overall speedup. A runtime according to the present principles analyzes such situations using history-based models to predict processing as well as data transfer time and uses these to guide the 
It would have been obvious to a person having ordinary skill in the art prior to the effective filing date of the claimed invention to combine the prioritizing the next most probable kernel based on execution pattern of Becchi with the systems and methods of Krishnamurthy and Chen resulting in a system in which the selection of an accelerator of Krishnamurthy further prioritizes selecting not only the kernel and accelerator estimated to be the fastest but also considers the kernel and accelerator most likely to use the data as in Becchi. A person having ordinary skill in the art would have been motivated to make this combination, with a reasonable expectation of success, for the purpose of increasing efficiency by analyzing situations using history-based models to predict processing as well as data transfer time and uses these to guide the scheduling policy (See at least Becchi ¶ [0028]).

With regard to claim 24, Krishnamurthy and Chen teach the one or more machine-readable storage media of claim 23,
Krishnamurthy teaches prioritizing kernels which can meet the SLA (see mapping above) but Krishnamurthy and Chen do not specifically teach prioritizing the next most probable kernel based on execution pattern of kernels for each application.
However, in analogous art Becchi teaches wherein to predict a next probable kernel from the kernels registered on the accelerator devices of the compute device comprises to predict an execution pattern of each kernel registered on the accelerator devices of the compute device for each application (If an application has three candidate kernels with both CPU and GPU implementations and, during a certain execution path, the first kernel is estimated to be much faster, but the second and third much slower on the GPU (based on the sizes of their parameters), a data-agnostic scheduler is likely to run the first kernel on the GPU, and the rest on the CPU. However if the runtime discovers that the first kernel produces a large amount of data that is consumed by the second kernel, a better schedule may be to run the second kernel also on the GPU. Although the GPU is slower in processing the second kernel compared to the CPU, this schedule will obviate the large intermediate data transfer, and potentially result in an overall speedup. A runtime according to the present principles analyzes such situations using history-based models to predict processing as well as data transfer time and uses these to guide the scheduling policy. The runtime intercepts calls to candidate kernels, examines their arguments, and uses historical information and prior decisions to devise a schedule on-the-fly in at least ¶ [0028]).
It would have been obvious to a person having ordinary skill in the art prior to the effective filing date of the claimed invention to combine the prioritizing the next most probable kernel based on execution pattern of kernels for each application of Becchi with the systems and methods of Krishnamurthy and Chen resulting in a system in which the selection of an accelerator of Krishnamurthy further prioritizes selecting not only the kernel and accelerator estimated to be the fastest but also considers the kernel and accelerator most likely to use the data as in Becchi. A person having ordinary skill in the art would have been motivated to make this combination, with a reasonable expectation of success, for the purpose of increasing efficiency by analyzing situations using history-based models to predict processing as well as data transfer time and uses these to guide the scheduling policy (See at least Becchi ¶ [0028]).

With regard to claim 28, Krishnamurthy and Chen teach the method of claim 26, further comprising
Krishnamurthy teaches prioritizing kernels which can meet the SLA (see mapping above) but Krishnamurthy and Chen do not specifically teach prioritizing the next most probable kernel based on execution pattern.
However, in analogous art Becchi teaches predicting, by the compute device, a next probable kernel from the kernels registered on the accelerator devices of the compute device to receive a job to be accelerated based on an execution pattern of each kernel (If an application has three candidate kernels with both CPU and GPU implementations and, during a certain execution path, the first kernel is estimated to be much faster, but the second and third much slower on the GPU (based on the sizes of their parameters), a data-agnostic scheduler is likely to run the first kernel on the GPU, and the rest on the CPU. However if the runtime discovers that the first kernel produces a large amount of data that is consumed by the second kernel, a better schedule may be to run the second kernel also on the GPU. Although the GPU is slower in processing the second kernel compared to the CPU, this schedule will obviate the large intermediate data transfer, and potentially result in an overall speedup. A runtime according to the present principles analyzes such situations using history-based models to predict processing as well as data transfer time and uses these to guide the scheduling policy. The runtime intercepts calls to candidate kernels, examines their arguments, and uses historical information and prior decisions to devise a schedule on-the-fly in at least ¶ [0028]).
It would have been obvious to a person having ordinary skill in the art prior to the effective filing date of the claimed invention to combine the prioritizing the next most probable kernel based on execution pattern of Becchi with the systems and methods of Krishnamurthy and Chen resulting in a system in which the selection of an accelerator of Krishnamurthy further prioritizes selecting not only the kernel and accelerator estimated to be the fastest but also considers the kernel and accelerator most likely to use the data as in Becchi. A person having ordinary skill in the art would have been motivated to make this combination, with a reasonable expectation of success, for the purpose of increasing efficiency by analyzing situations using history-based models to predict processing as well as data transfer time and uses these to guide the scheduling policy (See at least Becchi ¶ [0028]).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
US 20120054771 A1
teaches
Rescheduling workload in a hybrid computing environment
US 20110161972 A1
teaches
Goal oriented performance management of workload utilizing accelerators
US 20170212563 A1
teaches
Prediction-based power management strategy for gpu compute workloads
US 20170123684 A1
teaches
Emulating memory mapped i/o for coherent accelerators in error state
US 20170132163 A1
teaches
Enabling poll/select style interfaces with coherent accelerators
US 20190155239 A1
teaches
Method and apparatus for remote field programmable gate array processing
US 20180150334 A1
teaches
Technologies for providing accelerated functions as a service in a disaggregated architecture
US 20190065253 A1
teaches
Technologies for pre-configuring accelerators by predicting bit-streams
US 20190065260 A1
teaches
Technologies for kernel scale-out


Examiner respectfully requests, in response to this Office action, support be shown for language added to any original claims on amendment and any new claims. That is, indicate support for newly added claim language by specifically pointing to page(s) and line number(s) in the specification and/or drawing figure(s). This will assist Examiner in prosecuting the application.

When responding to this Office Action, Applicant is advised to clearly point out the patentable novelty which he or she thinks the claims present, in view of the state of the art disclosed by the references cited or the objections made. He or she must also show how the amendments avoid such references or objections.  See 37 CFR 1.111(c).

Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to BRADLEY A TEETS whose telephone number is (571)272-3338.  The examiner can normally be reached on Monday - Friday, 6am-2pm.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Meng An can be reached on 5712723756.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/BRADLEY A TEETS/Primary Examiner, Art Unit 2195