Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This Office Action is in response to claims filed 04/30/2021.
Claims 1-29 are pending.

Drawings
Fig. 6 is objected to because it fails to comply with 37 CFR 1.84(p)(3), which requires that numbers, letters, and reference characters must measure at least .32 cm. (1/8 inch) in height. They should not be placed in the drawing so as to interfere with its comprehension. Therefore, they should not cross or mingle with the lines. They should not be placed upon hatched or shaded surfaces. When necessary, such as indicating a surface or cross section, a reference character may be underlined and a blank space may be left in the hatching or shading where the character occurs so that it appears distinct. In the instant application, in Fig. 6, the text “Submit Job + Hints”, “Allocate FPGA 0 and 1” and “Consider Input from Scheduling Logic and Monitoring and Prediction Logic” (all three instances) cross or mingle with lines.
Corrected drawings in compliance with 37 CFR 1.121(d) are required in reply to the Office action to avoid abandonment of the application. The replacement sheet(s) should be labeled “Replacement Sheet” in the page header (as per 37 CFR 1.84(c)) so as not to obstruct any portion of the drawing figures. If the changes are not accepted by the examiner, the applicant will be notified and informed of any required corrective action in the next Office action. “Drawing and specification corrections, presentation of a new oath and the like are generally considered as formal matters, although the filing of drawing corrections in reply to an objection to the drawings cannot normally be held in abeyance … An amendment which does not comply with the provisions of 37 CFR 1.121(b), (c), (d), and (h) may be held not fully responsive” (MPEP § 714.02). The objection to the drawings will not be held in abeyance.

Specification
The disclosure is objected to because of the following informalities: See 37 CFR 1.72(a) and MPEP § 606. The title of the invention should be placed at the top of the first page of the specification unless the title is provided in an application data sheet. The instant disclosure begins with an extraneous cover page listing inventors and attorney docket number. This cover page should be removed and the pagination of the remainder of the specification corrected. Appropriate correction is required

The disclosure is objected to because of the following informalities: Every page recites “Docket No. P119569-C1”. This is extraneous information that should be removed. Appropriate correction is required.

Claim Objections
Claim 24 is objected to because of the following informalities: Claim 24 appears to contain a typographical error reciting “The method of claim 121” which should be corrected to – The method of claim [[1]]21 -- . For the purpose of compact prosecution Examiner will interpret Claim 24 as depending on Claim 21. Appropriate correction is required.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 1-29 are rejected under 35 U.S.C. 101 because the claimed invention recites a judicial exception, is directed to that judicial exception, an abstract idea, as it has not been integrated into practical application and the claims further do not recite significantly more than the judicial exception. Examiner has evaluated the claims under the framework provided in the 2019 Patent Eligibility Guidance published in the Federal Register 01/07/2019 and has provided such analysis below.
Step 1: Claims 1-11 are directed to devices and fall within the statutory category of machines; Claims 12-20 are directed to computer-readable media and fall within the statutory category of articles of manufacture; and Claims 21-29 are directed to a methods and fall within the statutory category of processes. Therefore, “Are the claims to a process, machine, manufacture or composition of matter?” Yes.
In order to evaluate the Step 2A inquiry “Is the claim directed to a law of nature, a natural phenomenon or an abstract idea?” we must determine, at Step 2A Prong 1, whether the claim recites a law of nature, a natural phenomenon or an abstract idea and further whether the claim recites additional elements that integrate the judicial exception into a practical application.
Step 2A Prong 1:
Claims 1, 12 and 21: The limitations of “schedule the first job execution request for execution by a first kernel at the FPGA; and schedule the second job execution request for execution by a second kernel at the FPGA”, as drafted, is a process that, but for the recitation of generic computing components, under its broadest reasonable interpretation, covers performance of the limitation in the mind. For example, a person can mentally evaluate the execution requests and make a mental plan, schedule, for which kernel at the FPGA to schedule each request.
Therefore, Yes, claims 1, 12 and 21 recite judicial exceptions.
The claims have been identified to recite judicial exceptions, Step 2A Prong 2 will evaluate whether the claims are directed to the judicial exception.
Step 2A Prong 2: 
Claims 1, 12 and 21: The judicial exception is not integrated into a practical application. In particular, the claim recites the following additional elements – “a field programmable gate array (FPGA)”, “accelerator management circuitry to”, “a first kernel”, “a second kernel”, “a first processor”, “a second processor”, “One or more non-transitory machine-readable storage media comprising a plurality of instructions stored thereon that, when executed by a system at a compute device cause the system to” and “a field programmable gate array (FPGA) resident on the compute device” which are merely recitations of generic computing components and functions (see MPEP § 2106.05(b)) which does not integrate a judicial exception into practical application. Further, the claims recite the following additional elements – “receive a first job execution request from a first processor; receive a second job execution request from a second processor” which is merely a recitation of insignificant pre-solution data gathering activity (see MPEP § 2106.05(g)) which does not integrate a judicial exception into practical application. This element will be further analyzed below at step 2B with regard to being Well-Understood, Routine and Conventional.
Therefore, “Do the claims recite additional elements that integrate the judicial exception into a practical application? No, these additional elements do not integrate the abstract idea into a practical application and they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
After having evaluating the inquires set forth in Steps 2A Prong 1 and 2, it has been concluded that claims 1,12 and 21 not only recite a judicial exception but that the claims are directed to the judicial exception as the judicial exception has not been integrated into practical application.
Step 2B: 
Claims 1, 12 and 21: The claims do not include additional elements, alone or in combination, that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements amount to no more than generic computing components and insignificant pre-solution data gathering activity which do not amount to significantly more than the abstract idea. Moreover, the insignificant pre-solution data gathering activity in also Well-Understood, Routine and Conventional, see MPEP § 2106.05(d)(II) “The courts have recognized the following computer functions as well‐understood, routine, and conventional functions when they are claimed in a merely generic manner (e.g., at a high level of generality) or as insignificant extra-solution activity. i. Receiving or transmitting data over a network … iv. Storing and retrieving information in memory” wherein the receiving requests is similar to the Court identified Well-Understood, Routine and Conventional activity of receiving or retrieving data.
Therefore, “Do the claims recite additional elements that amount to significantly more than the judicial exception? No, these additional elements, alone or in combination, do not amount to significantly more than the judicial exception.
Having concluded analysis within the provided framework, Claims 1, 12 and 21 do not recite patent eligible subject matter under 35 U.S.C. § 101.
With regard to claims 2, 13 and 22, they recite additional abstract idea recitations of “causes an overprovisioning of the FPGA that includes the first and second processors” as drafted, is a process that, but for the recitation of generic computing components, under its broadest reasonable interpretation, covers performance of the limitation in the mind. For example, a person can mentally evaluate the execution requests and make a mental plan, schedule, for which kernel at the FPGA to schedule each request wherein that mental plan overprovisions the resources of the FPGA. Further, claims 2, 12 and 22 recite “in a disaggregated architecture” which is merely a field of use/technological environment (See MPEP § 2106.05(h)) without imposing meaningful limitation which does not integrate a judicial exception into practical application and for the same reasons as above with regard to integration into practical application and whether additional elements amount to significantly more, claims 2, 12 and 22 also fail both Step 2A prong 2, thus the claims are directed to the judicial exception as they have not been integrated into practical application, and fail Step 2B as not amounting to significantly more. Therefore, Claims 2, 12 and 22 do not recite patent eligible subject matter under 35 U.S.C. § 101.
With regard to claims 3, 14 and 23, they recite additional element recitations of “wherein the first processor and the second processor are located on the compute device” which is merely a recitation of field of use/technological environment (See MPEP § 2106.05(h)) without imposing meaningful limitation which does not integrate a judicial exception into practical application. Claims 3, 14 and 23 do not recite any further additional elements and for the same reasons as above with regard to integration into practical application and whether additional elements amount to significantly more, claims 3, 14 and 23 also fail both Step 2A prong 2, thus the claims are directed to the judicial exception as they have not been integrated into practical application, and fail Step 2B as not amounting to significantly more. Therefore, Claims 3, 14 and 23 do not recite patent eligible subject matter under 35 U.S.C. § 101.
With regard to claims 4, 15 and 24, they recite additional element recitations of “wherein the first processor is located on the compute device and the second processor is located on a second compute device communicatively coupled with the compute device via a network” which is merely a recitation of field of use/technological environment (See MPEP § 2106.05(h)) without imposing meaningful limitation which does not integrate a judicial exception into practical application. Claims 4, 15 and 24 do not recite any further additional elements and for the same reasons as above with regard to integration into practical application and whether additional elements amount to significantly more, claims 4, 15 and 24 also fail both Step 2A prong 2, thus the claims are directed to the judicial exception as they have not been integrated into practical application, and fail Step 2B as not amounting to significantly more. Therefore, Claims 4, 15 and 24 do not recite patent eligible subject matter under 35 U.S.C. § 101.
With regard to claim 5, it recites additional element recitations of “wherein the first processor is located on a second compute device and the second processor is located on a third compute device, the second and third compute devices communicatively coupled with compute device via a network” which is merely a recitation of field of use/technological environment (See MPEP § 2106.05(h)) without imposing meaningful limitation which does not integrate a judicial exception into practical application. Claim 5 does not recite any further additional elements and for the same reasons as above with regard to integration into practical application and whether additional elements amount to significantly more, claim 5 also fails both Step 2A prong 2, thus the claim is directed to the judicial exception as it has not been integrated into practical application, and fails Step 2B as not amounting to significantly more. Therefore, Claim 5 does not recite patent eligible subject matter under 35 U.S.C. § 101.
With regard to claim 6, it recites additional element recitations of “wherein the acceleration management circuitry is included in the FPGA” which is merely a recitation of field of use/technological environment (See MPEP § 2106.05(h)) without imposing meaningful limitation which does not integrate a judicial exception into practical application. Claim 6 does not recite any further additional elements and for the same reasons as above with regard to integration into practical application and whether additional elements amount to significantly more, claim 6 also fails both Step 2A prong 2, thus the claim is directed to the judicial exception as it has not been integrated into practical application, and fails Step 2B as not amounting to significantly more. Therefore, Claim 6 does not recite patent eligible subject matter under 35 U.S.C. § 101.
With regard to claims 7, 16 and 25, they recite similar limitations to those of independent claims 1, 12 and 21. Claims 7, 16 and 25 merely recite performing the same steps for a third and fourth job execution request thus these claims are rejected for the same reasons as above regarding the first and second job execution requests and scheduling. Further, claims 7, 16 and 25 recite “a second FPGA” which is merely a generic computing component (See MPEP § 2106.05(b)) without imposing meaningful limitation which does not integrate a judicial exception into practical application and for the same reasons as above with regard to integration into practical application and whether additional elements amount to significantly more, claims 7, 16 and 25 also fail both Step 2A prong 2, thus the claims are directed to the judicial exception as they have not been integrated into practical application, and fail Step 2B as not amounting to significantly more. Therefore, Claims 7, 16 and 25 do not recite patent eligible subject matter under 35 U.S.C. § 101.
With regard to claims 8, 17 and 26, they recite additional element recitations of “wherein the first and second processors are located on the compute device and the third and fourth processors are located on a second compute device communicatively coupled with the compute device via a network” which is merely a recitation of field of use/technological environment (See MPEP § 2106.05(h)) without imposing meaningful limitation which does not integrate a judicial exception into practical application. Claims 8, 17 and 26 do not recite any further additional elements and for the same reasons as above with regard to integration into practical application and whether additional elements amount to significantly more, claims 8, 17 and 26 also fail both Step 2A prong 2, thus the claims are directed to the judicial exception as they have not been integrated into practical application, and fail Step 2B as not amounting to significantly more. Therefore, Claims 8, 17 and 26 do not recite patent eligible subject matter under 35 U.S.C. § 101.
With regard to claims 9, 18 and 27, they recite additional abstract idea recitations of “schedule the first and second job execution requests for execution by respective first and second kernels at the FPGA based on separate priorities assigned to the first and second kernels” as drafted, is a process that, but for the recitation of generic computing components, under its broadest reasonable interpretation, covers performance of the limitation in the mind. For example, a person can mentally evaluate the execution requests and make a mental plan, schedule, for which kernel at the FPGA to schedule each request wherein determining the mental plan includes mentally evaluating priorities of the kernels. Claims 9, 18 and 27 no not include any further additional elements and for the same reasons as above with regard to integration into practical application and whether additional elements amount to significantly more, claims 9, 18 and 27 also fail both Step 2A prong 2, thus the claims are directed to the judicial exception as they have not been integrated into practical application, and fail Step 2B as not amounting to significantly more. Therefore, Claims 9, 18 and 27 do not recite patent eligible subject matter under 35 U.S.C. § 101.
With regard to claims 10, 19 and 28, they recite additional abstract idea recitations of “wherein a first priority is assigned to the first kernel based on an estimated runtime to fulfill the first job execution request and a second priority is assigned to the second kernel based on an estimated runtime to fulfill the second job execution request” as drafted, is a process that, but for the recitation of generic computing components, under its broadest reasonable interpretation, covers performance of the limitation in the mind. For example, a person can mentally evaluate the execution requests and make a mental plan, schedule, for which kernel at the FPGA to schedule each request wherein determining the mental plan includes mentally evaluating priorities based on an estimated runtime. Claims 10, 19 and 28 no not include any further additional elements and for the same reasons as above with regard to integration into practical application and whether additional elements amount to significantly more, claims 10, 19 and 28 also fail both Step 2A prong 2, thus the claims are directed to the judicial exception as they have not been integrated into practical application, and fail Step 2B as not amounting to significantly more. Therefore, Claims 10, 19 and 28 do not recite patent eligible subject matter under 35 U.S.C. § 101.
With regard to claims 11, 20 and 29, they recite additional abstract idea recitations of “wherein a first priority is assigned to the first kernel based on previously fulfilled job execution requests of the first kernel and a second priority is assigned to the second kernel based on previously fulfilled job execution requests of the second kernel” as drafted, is a process that, but for the recitation of generic computing components, under its broadest reasonable interpretation, covers performance of the limitation in the mind. For example, a person can mentally evaluate the execution requests and make a mental plan, schedule, for which kernel at the FPGA to schedule each request wherein determining the mental plan includes mentally evaluating priorities based on previously fulfilled jobs. Claims 11, 20 and 29 no not include any further additional elements and for the same reasons as above with regard to integration into practical application and whether additional elements amount to significantly more, claims 11, 20 and 29 also fail both Step 2A prong 2, thus the claims are directed to the judicial exception as they have not been integrated into practical application, and fail Step 2B as not amounting to significantly more. Therefore, Claims 11, 20 and 29 do not recite patent eligible subject matter under 35 U.S.C. § 101.
Therefore, Claims 1-29 do not recite patent eligible subject matter under 35 U.S.C. § 101.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 3-10, 12, 14-19, 21 and 23-28 are rejected under 35 U.S.C. 103 as being unpatentable over Krishnamurthy et al. Pub. No. US 2012/0054770 A1 (hereafter Krishnamurthy) in view of Zievers Pub. No US 2012/0284501 A1 (hereafter Zievers).

With regard to claim 1, Krishnamurthy teaches a compute device comprising: a accelerator (The combination of the server system 102 and the accelerator systems 104 is herein referred to as a hybrid server/system 112 within the hybrid environment 100 because of the heterogeneous combination of various system types of the server 102 and accelerators 104 … The processor(s) 114 comprises a set of architectural registers (not shown) that defines the host computer architecture. Each accelerator 104 also comprises one or more processors 116 that comprise a set of architectural registers (not shown) that defines the accelerator architecture in at least ¶ [0028]); and
accelerator management circuitry to (the server system 102 comprises, among other things, a workload manager 118 in at least ¶ [0030]): receive a first job execution request and receive a second job execution request (A set of data-parallel workload tasks is dynamically scheduled across at least one resource in the first set of resources and at least one resource in the second set of resources in at least ¶ [0004] and A task, at step 1206, calls a compute kernel at the accelerator system 104 or pseudo accelerator system. The kernel scheduler 519, at step 1208, schedules a physical accelerator compute resource or pseudo-accelerator compute resource at the server system 102 or accelerator to satisfy the kernel call in at least ¶ [0073]) from a first processor; from a second processor (Fig. 12, 1206 and These kernels can be invoked when a calling process on the server is run. These compute kernels are then launched on the accelerators 104 in at least ¶ [0035] and FIG. 5 shows server system processors 502, 504, 506 that comprise cross-platform parallel programming OpenCL calling address spaces and server system processors 508, 510, 512. The server system processors 508, 510, 512 each comprise one or more compute kernels 514, 516, 518 in at least ¶ [0054]);
schedule the first job execution request for execution by a first kernel at the accelerator; and schedule the second job execution request for execution by a second kernel at the accelerator (Each process may execute several tasks and each task can offload certain compute intensive portions to a compute kernel, which are basic units of executable code. The compute kernel may execute on one compute element or several compute elements such as a processor core. A compute kernel can be either task-parallel with respect to CPUs such as the processors found within the server system 102 or data-parallel with respect to CPUs such as the processors found within the accelerators 104 ... Workloads are queued in the workload queues 136, 138 of the server 102 and accelerators 104, respectively in at least ¶ [0034] and The workload manager 118 sends these compute kernels to the accelerators 104 in at least ¶ [0035] and the workload manager 118 schedules a portion 202, 204 of the accelerator workload (e.g., tasks) as one or more compute kernels 206, 208 at the accelerator processors 116 in at least ¶ [0039]).
Krishnamurthy does not specifically teach that the accelerator is an FPGA.
However, in analogous art Zievers teaches a compute device comprising: a field programmable gate array (FPGA) (the microkernel 604 and the kernel modules 606 can altogether be implemented with an ASIC, an FPGA, or a combination thereof in at least ¶ [0097] and  The first kernel unit 906 is one of the kernel units 406 of FIG. 4. The first microkernel 908 is the microkernel 604 of FIG. 6 in at least ¶ [0101] and The second kernel unit 916 is another of the kernel units 406. The second microkernel 918 is another of the microkernel 604 in at least ¶ [0105]); and
schedule the first and second job execution request for execution at the FPGA (The computing system 100 can include a request module 924 for evaluating a user request 926 to initialize the application 304 in at least ¶ [0112] and The computing system 100 can include an allocation module 928 for performing scheduling and optimization to generate placement and interconnection of reconfigurable resources 930 for implementing the application 304. in at least ¶ [0114] and The allocation module 928 can perform a scheduling procedure 932 and an optimization procedure 934 for implementing functions of the application 304. The scheduling procedure 932 is a method that determines which and when microkernel resources 936 are allocated to perform the optimization procedure. The optimization procedure is a method that determines the reconfigurable resources 930 in the first reconfigurable hardware devices 912, the second reconfigurable hardware devices 922, or a combination thereof that are used to implement the functionalities of the application 304 in at least ¶ [0116], Examiner notes that FPGAs are comprised of reconfigurable resources). by a first and second kernel (The provision module 902 can generate a first cluster 904, which is one of the clusters 404. The first cluster 904 can be generated by grouping a first kernel unit 906 having a first microkernel 908 with a first user interface unit 910. The first cluster 904 can include a number of first reconfigurable hardware devices 912 in at least ¶ [0099] – [0100] and The provision module 902 can generate a second cluster 914, which is another of the clusters 404. The second cluster 914 can be generated by grouping a second kernel unit 916 having a second microkernel 918 with a second user interface unit 920. The second cluster 914 can include a number of second reconfigurable hardware devices 922 in at least ¶ [0103] – [0104]); and
It would have been obvious to a person having ordinary skill in the art prior to the effective filing date of the claimed invention to combine the accelerator is an FPGA of Zievers with the systems and methods of Krishnamurthy resulting in a system in which accelerators of Krishnamurthy are implemented as FPGAs as in Zievers. A person having ordinary skill in the art would have been motivated to make this combination, with a reasonable expectation of success, for the purpose of improving dynamic allocation of hardware resources by providing a platform, an FPGA, for fragmenting and mapping tasks across reconfigurable hardware devices of the FPGA through use of one or more kernel units and further improving performance by reducing delays, cost and complexity and improving scalability by providing for management of any number of reconfigurable hardware devices of FPGAs (see at least Zievers ¶ [0053] – [0054] and ¶ [0130] – [0131]).

With regard to claim 3, Krishnamurthy teaches wherein the first processor and the second processor are located on the compute device (Fig. 12, 1206 and These kernels can be invoked when a calling process on the server is run. These compute kernels are then launched on the accelerators 104 in at least ¶ [0035] and FIG. 5 shows server system processors 502, 504, 506 that comprise cross-platform parallel programming OpenCL calling address spaces and server system processors 508, 510, 512. The server system processors 508, 510, 512 each comprise one or more compute kernels 514, 516, 518 in at least ¶ [0054]).

With regard to claim 4, Krishnamurthy teaches wherein the first processor is located on the compute device and the second processor is located on a second compute device communicatively coupled with the compute device via a network (Some embodiments of the present invention utilize architectures, such as an object oriented framework mechanism, that allows instructions of the components of operating system (not shown) to be executed on any processor located within the information processing system 1300 in at least ¶ [0079] and IG. 1 shows a host system such as a server system 102, a plurality of accelerator systems 104, and one or more user clients 106 communicatively coupled via one or more networks 108 in at least ¶ [0026] and The combination of the server system 102 and the accelerator systems 104 is herein referred to as a hybrid server/system 112 within the hybrid environment 100 because of the heterogeneous combination of various system types of the server 102 and accelerators 104 … comprises one or more processors 114 … also comprises one or more processors 116, Examiner notes that some embodiments can utilize any processor of the system and the system is disclosed as teaching processors on different devices coupled by a network).

With regard to claim 5, Krishnamurthy teaches wherein the first processor is located on a second compute device and the second processor is located on a third compute device, the second and third compute devices communicatively coupled with compute device via a network (Some embodiments of the present invention utilize architectures, such as an object oriented framework mechanism, that allows instructions of the components of operating system (not shown) to be executed on any processor located within the information processing system 1300 in at least ¶ [0079] and IG. 1 shows a host system such as a server system 102, a plurality of accelerator systems 104, and one or more user clients 106 communicatively coupled via one or more networks 108 in at least ¶ [0026] and The combination of the server system 102 and the accelerator systems 104 is herein referred to as a hybrid server/system 112 within the hybrid environment 100 because of the heterogeneous combination of various system types of the server 102 and accelerators 104 … comprises one or more processors 114 … also comprises one or more processors 116, Examiner notes that some embodiments can utilize any processor of the system and the system is disclosed as teaching processors on different devices coupled by a network, further, there are a plurality of accelerators 104 comprising processors 116, thus at least a second and third device with processors 116).

With regard to claim 6, Krishnamurthy teaches the compute device of claim 1,
Krishnamurthy does not specifically teach that the acceleration management circuitry is included in the FPGA.
However, in analogous art Zievers teaches wherein the acceleration management circuitry is included in the FPGA (The computing system 100 can include an allocation module 928 for performing scheduling and optimization to generate placement and interconnection of reconfigurable resources 930 for implementing the application 304. The allocation module 928 can be coupled to the request module 924. The reconfigurable resources 930 are defined as programmable logic blocks and interconnects in the first reconfigurable hardware devices 912, the second reconfigurable hardware devices 922, or a combination thereof. The reconfigurable resources 930 include a number of programmable devices in at least ¶ [0114] – [0115] and the reconfigurable hardware devices 202 can represent the programmable devices including field-programmable gate arrays (FPGAs) in at least ¶ [0037], Examiner notes that the reconfigurable resources 930 are reconfigurable hardware devices as in 202 of an FPGA. Thus, the allocation module, acceleration management circuitry, is included in the FPGA the allocation module and reconfigurable resources are co-located (see at least Fig. 9)).
It would have been obvious to a person having ordinary skill in the art prior to the effective filing date of the claimed invention to combine the accelerator management circuitry included in an FPGA of Zievers with the systems and methods of Krishnamurthy resulting in a system in which acceleration management circuitry of Krishnamurthy are implemented in an FPGA as in Zievers. A person having ordinary skill in the art would have been motivated to make this combination, with a reasonable expectation of success, for the purpose of improving dynamic allocation of hardware resources by providing a platform within an FPGA for acceleration management which aids in fragmenting and mapping tasks across reconfigurable hardware devices of the FPGA through use of one or more kernel units and further improving performance by reducing delays, cost and complexity and improving scalability by providing for management of any number of reconfigurable hardware devices of FPGAs (see at least Zievers ¶ [0053] – [0054] and ¶ [0130] – [0131]).

With regard to claim 7, Zievers teaches a second FPGA (The line cards 112 can include an electronic component including an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA) in at least ¶ [0033]); and
Krishnamurthy teaches the accelerator management circuitry to (the server system 102 comprises, among other things, a workload manager 118 in at least ¶ [0030]): receive a third job execution request and receive a fourth job execution request (A set of data-parallel workload tasks is dynamically scheduled across at least one resource in the first set of resources and at least one resource in the second set of resources in at least ¶ [0004] and A task, at step 1206, calls a compute kernel at the accelerator system 104 or pseudo accelerator system. The kernel scheduler 519, at step 1208, schedules a physical accelerator compute resource or pseudo-accelerator compute resource at the server system 102 or accelerator to satisfy the kernel call in at least ¶ [0073]) from a third processor; from a fourth processor (Fig. 12, 1206 and These kernels can be invoked when a calling process on the server is run. These compute kernels are then launched on the accelerators 104 in at least ¶ [0035] and FIG. 5 shows server system processors 502, 504, 506 that comprise cross-platform parallel programming OpenCL calling address spaces and server system processors 508, 510, 512. The server system processors 508, 510, 512 each comprise one or more compute kernels 514, 516, 518 in at least ¶ [0054], Examiner notes kernels 1-N request scheduling of workload tasks on CPUs 1-N, thus there are third and fourth execution requests and third and fourth processors);
schedule the third job execution request for execution by a third kernel at the second accelerator; and schedule the third job execution request for execution by a fourth kernel at the second accelerator (Each process may execute several tasks and each task can offload certain compute intensive portions to a compute kernel, which are basic units of executable code. The compute kernel may execute on one compute element or several compute elements such as a processor core. A compute kernel can be either task-parallel with respect to CPUs such as the processors found within the server system 102 or data-parallel with respect to CPUs such as the processors found within the accelerators 104 ... Workloads are queued in the workload queues 136, 138 of the server 102 and accelerators 104, respectively in at least ¶ [0034] and The workload manager 118 sends these compute kernels to the accelerators 104 in at least ¶ [0035] and the workload manager 118 schedules a portion 202, 204 of the accelerator workload (e.g., tasks) as one or more compute kernels 206, 208 at the accelerator processors 116 in at least ¶ [0039]).
Krishnamurthy does not specifically teach that the accelerator is an FPGA.
However, in analogous art Zievers teaches schedule the third and fourth job execution request for execution at the FPGA (The computing system 100 can include a request module 924 for evaluating a user request 926 to initialize the application 304 in at least ¶ [0112] and The computing system 100 can include an allocation module 928 for performing scheduling and optimization to generate placement and interconnection of reconfigurable resources 930 for implementing the application 304. in at least ¶ [0114] and The allocation module 928 can perform a scheduling procedure 932 and an optimization procedure 934 for implementing functions of the application 304. The scheduling procedure 932 is a method that determines which and when microkernel resources 936 are allocated to perform the optimization procedure. The optimization procedure is a method that determines the reconfigurable resources 930 in the first reconfigurable hardware devices 912, the second reconfigurable hardware devices 922, or a combination thereof that are used to implement the functionalities of the application 304 in at least ¶ [0116] and The line cards 112 can include an electronic component including an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA) in at least ¶ [0033]).
It would have been obvious to a person having ordinary skill in the art prior to the effective filing date of the claimed invention to combine the accelerator is an FPGA of Zievers with the systems and methods of Krishnamurthy resulting in a system in which accelerators of Krishnamurthy are implemented as FPGAs as in Zievers. A person having ordinary skill in the art would have been motivated to make this combination, with a reasonable expectation of success, for the purpose of improving dynamic allocation of hardware resources by providing a platform, an FPGA, for fragmenting and mapping tasks across reconfigurable hardware devices of the FPGA through use of one or more kernel units and further improving performance by reducing delays, cost and complexity and improving scalability by providing for management of any number of reconfigurable hardware devices of FPGAs (see at least Zievers ¶ [0053] – [0054] and ¶ [0130] – [0131]).

With regard to claim 8, Krishnamurthy teaches wherein the first and second processors are located on the compute device (Fig. 12, 1206 and These kernels can be invoked when a calling process on the server is run. These compute kernels are then launched on the accelerators 104 in at least ¶ [0035] and FIG. 5 shows server system processors 502, 504, 506 that comprise cross-platform parallel programming OpenCL calling address spaces and server system processors 508, 510, 512. The server system processors 508, 510, 512 each comprise one or more compute kernels 514, 516, 518 in at least ¶ [0054]) and the third and fourth processors are located on a second compute device communicatively coupled with the compute device via a network (Some embodiments of the present invention utilize architectures, such as an object oriented framework mechanism, that allows instructions of the components of operating system (not shown) to be executed on any processor located within the information processing system 1300 in at least ¶ [0079] and IG. 1 shows a host system such as a server system 102, a plurality of accelerator systems 104, and one or more user clients 106 communicatively coupled via one or more networks 108 in at least ¶ [0026] and The combination of the server system 102 and the accelerator systems 104 is herein referred to as a hybrid server/system 112 within the hybrid environment 100 because of the heterogeneous combination of various system types of the server 102 and accelerators 104 … comprises one or more processors 114 … also comprises one or more processors 116, Examiner notes that some embodiments can utilize any processor of the system and the system is disclosed as teaching processors on different devices coupled by a network thus first and second processor may be co-located whereas third and fourth processor may be on another device).

With regard to claim 9, Krishnamurthy teaches the accelerator management circuitry to schedule the first and second job execution requests for execution by respective first and second kernels at the FPGA based on separate priorities assigned to the first and second kernels (In this environment a workload comprises multiple processes/address spaces. Each process may execute several tasks and each task can offload certain compute intensive portions to a compute kernel, which are basic units of executable code in at least ¶ [0033] and The workload manager 118, at step 806, determines if at least one of the workloads has exceeded a threshold of highest priority SLA limits on one of the systems 102, 104. It should be noted that for a workload with throughput SLA, Energy SLA and batch-window SLA, prioritization is possible. Highest priority SLA limits refers to the SLA with highest priority of the aforementioned three SLA types. If the result of this determination is negative, the control flow returns to step 804. If the result of this determination is positive, the workload manager 118, at step 808, determines if the workload is likely to meet highest SLA priority limits on a second system that has the resource capacity for one additional workload in at least ¶ [0068], Examiner notes that each workload/kernel may be prioritized based on SLA).

With regard to claim 10, Krishnamurthy teaches wherein a first priority is assigned to the first kernel based on an estimated runtime to fulfill the first job execution request and a second priority is assigned to the second kernel based on an estimated runtime to fulfill the second job execution request (The workload manager 118, at step 806, determines if at least one of the workloads has exceeded a threshold of highest priority SLA limits on one of the systems 102, 104. It should be noted that for a workload with throughput SLA, Energy SLA and batch-window SLA, prioritization is possible in at least ¶ [0068] and A batch window is a time window that specifies when a batch processing workload may complete. This is an attribute of an SLA (Service Level Agreement). A batch window specification is used by the workload manager to allocate resources so that the workload can complete within the batch window in at least ¶ [0041], Examiner notes that each workload/kernel may be prioritized based on SLA).).

With regard to claim 12, Krishnamurthy teaches one or more non-transitory machine-readable storage media comprising a plurality of instructions stored thereon that, when executed (in at least ¶ [0057] and ¶ [0060]) by a system at a compute device cause the system to (The combination of the server system 102 and the accelerator systems 104 is herein referred to as a hybrid server/system 112 within the hybrid environment 100 because of the heterogeneous combination of various system types of the server 102 and accelerators 104 … The processor(s) 114 comprises a set of architectural registers (not shown) that defines the host computer architecture. Each accelerator 104 also comprises one or more processors 116 that comprise a set of architectural registers (not shown) that defines the accelerator architecture in at least ¶ [0028] and the server system 102 comprises, among other things, a workload manager 118 in at least ¶ [0030]):
receive a first job execution request and receive a second job execution request (A set of data-parallel workload tasks is dynamically scheduled across at least one resource in the first set of resources and at least one resource in the second set of resources in at least ¶ [0004] and A task, at step 1206, calls a compute kernel at the accelerator system 104 or pseudo accelerator system. The kernel scheduler 519, at step 1208, schedules a physical accelerator compute resource or pseudo-accelerator compute resource at the server system 102 or accelerator to satisfy the kernel call in at least ¶ [0073]) from a first processor; from a second processor (Fig. 12, 1206 and These kernels can be invoked when a calling process on the server is run. These compute kernels are then launched on the accelerators 104 in at least ¶ [0035] and FIG. 5 shows server system processors 502, 504, 506 that comprise cross-platform parallel programming OpenCL calling address spaces and server system processors 508, 510, 512. The server system processors 508, 510, 512 each comprise one or more compute kernels 514, 516, 518 in at least ¶ [0054]);
schedule the first job execution request for execution by a first kernel at an accelerator resident on the compute device; and schedule the second job execution request for execution by a second kernel at the accelerator (Each process may execute several tasks and each task can offload certain compute intensive portions to a compute kernel, which are basic units of executable code. The compute kernel may execute on one compute element or several compute elements such as a processor core. A compute kernel can be either task-parallel with respect to CPUs such as the processors found within the server system 102 or data-parallel with respect to CPUs such as the processors found within the accelerators 104 ... Workloads are queued in the workload queues 136, 138 of the server 102 and accelerators 104, respectively in at least ¶ [0034] and The workload manager 118 sends these compute kernels to the accelerators 104 in at least ¶ [0035] and the workload manager 118 schedules a portion 202, 204 of the accelerator workload (e.g., tasks) as one or more compute kernels 206, 208 at the accelerator processors 116 in at least ¶ [0039]).
Krishnamurthy does not specifically teach that the accelerator is an FPGA.
However, in analogous art Zievers teaches a field programmable gate array (FPGA) resident on the compute device (the microkernel 604 and the kernel modules 606 can altogether be implemented with an ASIC, an FPGA, or a combination thereof in at least ¶ [0097] and  The first kernel unit 906 is one of the kernel units 406 of FIG. 4. The first microkernel 908 is the microkernel 604 of FIG. 6 in at least ¶ [0101] and The second kernel unit 916 is another of the kernel units 406. The second microkernel 918 is another of the microkernel 604 in at least ¶ [0105]); and
schedule the first and second job execution request for execution at the FPGA (The computing system 100 can include a request module 924 for evaluating a user request 926 to initialize the application 304 in at least ¶ [0112] and The computing system 100 can include an allocation module 928 for performing scheduling and optimization to generate placement and interconnection of reconfigurable resources 930 for implementing the application 304. in at least ¶ [0114] and The allocation module 928 can perform a scheduling procedure 932 and an optimization procedure 934 for implementing functions of the application 304. The scheduling procedure 932 is a method that determines which and when microkernel resources 936 are allocated to perform the optimization procedure. The optimization procedure is a method that determines the reconfigurable resources 930 in the first reconfigurable hardware devices 912, the second reconfigurable hardware devices 922, or a combination thereof that are used to implement the functionalities of the application 304 in at least ¶ [0116], Examiner notes that FPGAs are comprised of reconfigurable resources). by a first and second kernel (The provision module 902 can generate a first cluster 904, which is one of the clusters 404. The first cluster 904 can be generated by grouping a first kernel unit 906 having a first microkernel 908 with a first user interface unit 910. The first cluster 904 can include a number of first reconfigurable hardware devices 912 in at least ¶ [0099] – [0100] and The provision module 902 can generate a second cluster 914, which is another of the clusters 404. The second cluster 914 can be generated by grouping a second kernel unit 916 having a second microkernel 918 with a second user interface unit 920. The second cluster 914 can include a number of second reconfigurable hardware devices 922 in at least ¶ [0103] – [0104])
It would have been obvious to a person having ordinary skill in the art prior to the effective filing date of the claimed invention to combine the accelerator is an FPGA of Zievers with the systems and methods of Krishnamurthy resulting in a system in which accelerators of Krishnamurthy are implemented as FPGAs as in Zievers. A person having ordinary skill in the art would have been motivated to make this combination, with a reasonable expectation of success, for the purpose of improving dynamic allocation of hardware resources by providing a platform, an FPGA, for fragmenting and mapping tasks across reconfigurable hardware devices of the FPGA through use of one or more kernel units and further improving performance by reducing delays, cost and complexity and improving scalability by providing for management of any number of reconfigurable hardware devices of FPGAs (see at least Zievers ¶ [0053] – [0054] and ¶ [0130] – [0131]).

With regard to claim 14, Krishnamurthy teaches wherein the first processor and the second processor are located on the compute device (Fig. 12, 1206 and These kernels can be invoked when a calling process on the server is run. These compute kernels are then launched on the accelerators 104 in at least ¶ [0035] and FIG. 5 shows server system processors 502, 504, 506 that comprise cross-platform parallel programming OpenCL calling address spaces and server system processors 508, 510, 512. The server system processors 508, 510, 512 each comprise one or more compute kernels 514, 516, 518 in at least ¶ [0054]).

With regard to claim 15, Krishnamurthy teaches wherein the first processor is located on the compute device and the second processor is located on a second compute device communicatively coupled with the compute device via a network (Some embodiments of the present invention utilize architectures, such as an object oriented framework mechanism, that allows instructions of the components of operating system (not shown) to be executed on any processor located within the information processing system 1300 in at least ¶ [0079] and IG. 1 shows a host system such as a server system 102, a plurality of accelerator systems 104, and one or more user clients 106 communicatively coupled via one or more networks 108 in at least ¶ [0026] and The combination of the server system 102 and the accelerator systems 104 is herein referred to as a hybrid server/system 112 within the hybrid environment 100 because of the heterogeneous combination of various system types of the server 102 and accelerators 104 … comprises one or more processors 114 … also comprises one or more processors 116, Examiner notes that some embodiments can utilize any processor of the system and the system is disclosed as teaching processors on different devices coupled by a network).

With regard to claim 16, Zievers teaches a second FPGA (The line cards 112 can include an electronic component including an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA) in at least ¶ [0033]); and
Krishnamurthy teaches receive a third job execution request and receive a fourth job execution request (the server system 102 comprises, among other things, a workload manager 118 in at least ¶ [0030] and A set of data-parallel workload tasks is dynamically scheduled across at least one resource in the first set of resources and at least one resource in the second set of resources in at least ¶ [0004] and A task, at step 1206, calls a compute kernel at the accelerator system 104 or pseudo accelerator system. The kernel scheduler 519, at step 1208, schedules a physical accelerator compute resource or pseudo-accelerator compute resource at the server system 102 or accelerator to satisfy the kernel call in at least ¶ [0073]) from a third processor; from a fourth processor (Fig. 12, 1206 and These kernels can be invoked when a calling process on the server is run. These compute kernels are then launched on the accelerators 104 in at least ¶ [0035] and FIG. 5 shows server system processors 502, 504, 506 that comprise cross-platform parallel programming OpenCL calling address spaces and server system processors 508, 510, 512. The server system processors 508, 510, 512 each comprise one or more compute kernels 514, 516, 518 in at least ¶ [0054], Examiner notes kernels 1-N request scheduling of workload tasks on CPUs 1-N, thus there are third and fourth execution requests and third and fourth processors);
schedule the third job execution request for execution by a third kernel at a second accelerator; and schedule the third job execution request for execution by a fourth kernel at the second accelerator (Each process may execute several tasks and each task can offload certain compute intensive portions to a compute kernel, which are basic units of executable code. The compute kernel may execute on one compute element or several compute elements such as a processor core. A compute kernel can be either task-parallel with respect to CPUs such as the processors found within the server system 102 or data-parallel with respect to CPUs such as the processors found within the accelerators 104 ... Workloads are queued in the workload queues 136, 138 of the server 102 and accelerators 104, respectively in at least ¶ [0034] and The workload manager 118 sends these compute kernels to the accelerators 104 in at least ¶ [0035] and the workload manager 118 schedules a portion 202, 204 of the accelerator workload (e.g., tasks) as one or more compute kernels 206, 208 at the accelerator processors 116 in at least ¶ [0039]).
Krishnamurthy does not specifically teach that the accelerator is an FPGA.
However, in analogous art Zievers teaches schedule the third and fourth job execution request for execution at the FPGA resident on the compute device (The computing system 100 can include a request module 924 for evaluating a user request 926 to initialize the application 304 in at least ¶ [0112] and The computing system 100 can include an allocation module 928 for performing scheduling and optimization to generate placement and interconnection of reconfigurable resources 930 for implementing the application 304. in at least ¶ [0114] and The allocation module 928 can perform a scheduling procedure 932 and an optimization procedure 934 for implementing functions of the application 304. The scheduling procedure 932 is a method that determines which and when microkernel resources 936 are allocated to perform the optimization procedure. The optimization procedure is a method that determines the reconfigurable resources 930 in the first reconfigurable hardware devices 912, the second reconfigurable hardware devices 922, or a combination thereof that are used to implement the functionalities of the application 304 in at least ¶ [0116] and The line cards 112 can include an electronic component including an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA) in at least ¶ [0033]).
It would have been obvious to a person having ordinary skill in the art prior to the effective filing date of the claimed invention to combine the accelerator is an FPGA of Zievers with the systems and methods of Krishnamurthy resulting in a system in which accelerators of Krishnamurthy are implemented as FPGAs as in Zievers. A person having ordinary skill in the art would have been motivated to make this combination, with a reasonable expectation of success, for the purpose of improving dynamic allocation of hardware resources by providing a platform, an FPGA, for fragmenting and mapping tasks across reconfigurable hardware devices of the FPGA through use of one or more kernel units and further improving performance by reducing delays, cost and complexity and improving scalability by providing for management of any number of reconfigurable hardware devices of FPGAs (see at least Zievers ¶ [0053] – [0054] and ¶ [0130] – [0131]).

With regard to claim 17, Krishnamurthy teaches wherein the first and second processors are resident on the compute device (Fig. 12, 1206 and These kernels can be invoked when a calling process on the server is run. These compute kernels are then launched on the accelerators 104 in at least ¶ [0035] and FIG. 5 shows server system processors 502, 504, 506 that comprise cross-platform parallel programming OpenCL calling address spaces and server system processors 508, 510, 512. The server system processors 508, 510, 512 each comprise one or more compute kernels 514, 516, 518 in at least ¶ [0054]) and the third and fourth processors are resident on a second compute device communicatively coupled with the compute device via a network (Some embodiments of the present invention utilize architectures, such as an object oriented framework mechanism, that allows instructions of the components of operating system (not shown) to be executed on any processor located within the information processing system 1300 in at least ¶ [0079] and IG. 1 shows a host system such as a server system 102, a plurality of accelerator systems 104, and one or more user clients 106 communicatively coupled via one or more networks 108 in at least ¶ [0026] and The combination of the server system 102 and the accelerator systems 104 is herein referred to as a hybrid server/system 112 within the hybrid environment 100 because of the heterogeneous combination of various system types of the server 102 and accelerators 104 … comprises one or more processors 114 … also comprises one or more processors 116, Examiner notes that some embodiments can utilize any processor of the system and the system is disclosed as teaching processors on different devices coupled by a network thus first and second processor may be co-located whereas third and fourth processor may be on another device).

With regard to claim 18, Krishnamurthy teaches schedule the first and second job execution requests for execution by respective first and second kernels at the FPGA based on separate priorities assigned to the first and second kernels (In this environment a workload comprises multiple processes/address spaces. Each process may execute several tasks and each task can offload certain compute intensive portions to a compute kernel, which are basic units of executable code in at least ¶ [0033] and The workload manager 118, at step 806, determines if at least one of the workloads has exceeded a threshold of highest priority SLA limits on one of the systems 102, 104. It should be noted that for a workload with throughput SLA, Energy SLA and batch-window SLA, prioritization is possible. Highest priority SLA limits refers to the SLA with highest priority of the aforementioned three SLA types. If the result of this determination is negative, the control flow returns to step 804. If the result of this determination is positive, the workload manager 118, at step 808, determines if the workload is likely to meet highest SLA priority limits on a second system that has the resource capacity for one additional workload in at least ¶ [0068], Examiner notes that each workload/kernel may be prioritized based on SLA).

With regard to claim 19, Krishnamurthy teaches wherein a first priority is assigned to the first kernel based on an estimated runtime to fulfill the first job execution request and a second priority is assigned to the second kernel based on an estimated runtime to fulfill the second job execution request (The workload manager 118, at step 806, determines if at least one of the workloads has exceeded a threshold of highest priority SLA limits on one of the systems 102, 104. It should be noted that for a workload with throughput SLA, Energy SLA and batch-window SLA, prioritization is possible in at least ¶ [0068] and A batch window is a time window that specifies when a batch processing workload may complete. This is an attribute of an SLA (Service Level Agreement). A batch window specification is used by the workload manager to allocate resources so that the workload can complete within the batch window in at least ¶ [0041], Examiner notes that each workload/kernel may be prioritized based on SLA).).

With regard to claim 21, Krishnamurthy a method comprising:
receiving a first job execution request and receiving a second job execution request (A set of data-parallel workload tasks is dynamically scheduled across at least one resource in the first set of resources and at least one resource in the second set of resources in at least ¶ [0004] and A task, at step 1206, calls a compute kernel at the accelerator system 104 or pseudo accelerator system. The kernel scheduler 519, at step 1208, schedules a physical accelerator compute resource or pseudo-accelerator compute resource at the server system 102 or accelerator to satisfy the kernel call in at least ¶ [0073]) from a first processor; from a second processor (Fig. 12, 1206 and These kernels can be invoked when a calling process on the server is run. These compute kernels are then launched on the accelerators 104 in at least ¶ [0035] and FIG. 5 shows server system processors 502, 504, 506 that comprise cross-platform parallel programming OpenCL calling address spaces and server system processors 508, 510, 512. The server system processors 508, 510, 512 each comprise one or more compute kernels 514, 516, 518 in at least ¶ [0054]);
scheduling the first job execution request for execution by a first kernel at an accelerator scheduling the second job execution request for execution by a second kernel at the accelerator (Each process may execute several tasks and each task can offload certain compute intensive portions to a compute kernel, which are basic units of executable code. The compute kernel may execute on one compute element or several compute elements such as a processor core. A compute kernel can be either task-parallel with respect to CPUs such as the processors found within the server system 102 or data-parallel with respect to CPUs such as the processors found within the accelerators 104 ... Workloads are queued in the workload queues 136, 138 of the server 102 and accelerators 104, respectively in at least ¶ [0034] and The workload manager 118 sends these compute kernels to the accelerators 104 in at least ¶ [0035] and the workload manager 118 schedules a portion 202, 204 of the accelerator workload (e.g., tasks) as one or more compute kernels 206, 208 at the accelerator processors 116 in at least ¶ [0039]).
resident on the compute device (The combination of the server system 102 and the accelerator systems 104 is herein referred to as a hybrid server/system 112 within the hybrid environment 100 because of the heterogeneous combination of various system types of the server 102 and accelerators 104 … The processor(s) 114 comprises a set of architectural registers (not shown) that defines the host computer architecture. Each accelerator 104 also comprises one or more processors 116 that comprise a set of architectural registers (not shown) that defines the accelerator architecture in at least ¶ [0028] and the server system 102 comprises, among other things, a workload manager 118 in at least ¶ [0030]); and
Krishnamurthy does not specifically teach that the accelerator is an FPGA.
However, in analogous art Zievers teaches a field programmable gate array (FPGA) resident on the compute device (the microkernel 604 and the kernel modules 606 can altogether be implemented with an ASIC, an FPGA, or a combination thereof in at least ¶ [0097] and  The first kernel unit 906 is one of the kernel units 406 of FIG. 4. The first microkernel 908 is the microkernel 604 of FIG. 6 in at least ¶ [0101] and The second kernel unit 916 is another of the kernel units 406. The second microkernel 918 is another of the microkernel 604 in at least ¶ [0105]); and
scheduling the first and second job execution request for execution at the FPGA (The computing system 100 can include a request module 924 for evaluating a user request 926 to initialize the application 304 in at least ¶ [0112] and The computing system 100 can include an allocation module 928 for performing scheduling and optimization to generate placement and interconnection of reconfigurable resources 930 for implementing the application 304. in at least ¶ [0114] and The allocation module 928 can perform a scheduling procedure 932 and an optimization procedure 934 for implementing functions of the application 304. The scheduling procedure 932 is a method that determines which and when microkernel resources 936 are allocated to perform the optimization procedure. The optimization procedure is a method that determines the reconfigurable resources 930 in the first reconfigurable hardware devices 912, the second reconfigurable hardware devices 922, or a combination thereof that are used to implement the functionalities of the application 304 in at least ¶ [0116], Examiner notes that FPGAs are comprised of reconfigurable resources). by a first and second kernel (The provision module 902 can generate a first cluster 904, which is one of the clusters 404. The first cluster 904 can be generated by grouping a first kernel unit 906 having a first microkernel 908 with a first user interface unit 910. The first cluster 904 can include a number of first reconfigurable hardware devices 912 in at least ¶ [0099] – [0100] and The provision module 902 can generate a second cluster 914, which is another of the clusters 404. The second cluster 914 can be generated by grouping a second kernel unit 916 having a second microkernel 918 with a second user interface unit 920. The second cluster 914 can include a number of second reconfigurable hardware devices 922 in at least ¶ [0103] – [0104])
It would have been obvious to a person having ordinary skill in the art prior to the effective filing date of the claimed invention to combine the accelerator is an FPGA of Zievers with the systems and methods of Krishnamurthy resulting in a system in which accelerators of Krishnamurthy are implemented as FPGAs as in Zievers. A person having ordinary skill in the art would have been motivated to make this combination, with a reasonable expectation of success, for the purpose of improving dynamic allocation of hardware resources by providing a platform, an FPGA, for fragmenting and mapping tasks across reconfigurable hardware devices of the FPGA through use of one or more kernel units and further improving performance by reducing delays, cost and complexity and improving scalability by providing for management of any number of reconfigurable hardware devices of FPGAs (see at least Zievers ¶ [0053] – [0054] and ¶ [0130] – [0131]).

With regard to claim 23, Krishnamurthy teaches wherein the first processor and the second processor are located on the compute device (Fig. 12, 1206 and These kernels can be invoked when a calling process on the server is run. These compute kernels are then launched on the accelerators 104 in at least ¶ [0035] and FIG. 5 shows server system processors 502, 504, 506 that comprise cross-platform parallel programming OpenCL calling address spaces and server system processors 508, 510, 512. The server system processors 508, 510, 512 each comprise one or more compute kernels 514, 516, 518 in at least ¶ [0054]).

With regard to claim 24, Krishnamurthy teaches wherein the first processor is located on the compute device and the second processor is located on a second compute device communicatively coupled with the compute device via a network (Some embodiments of the present invention utilize architectures, such as an object oriented framework mechanism, that allows instructions of the components of operating system (not shown) to be executed on any processor located within the information processing system 1300 in at least ¶ [0079] and IG. 1 shows a host system such as a server system 102, a plurality of accelerator systems 104, and one or more user clients 106 communicatively coupled via one or more networks 108 in at least ¶ [0026] and The combination of the server system 102 and the accelerator systems 104 is herein referred to as a hybrid server/system 112 within the hybrid environment 100 because of the heterogeneous combination of various system types of the server 102 and accelerators 104 … comprises one or more processors 114 … also comprises one or more processors 116, Examiner notes that some embodiments can utilize any processor of the system and the system is disclosed as teaching processors on different devices coupled by a network).

With regard to claim 25, Zievers teaches a second FPGA (The line cards 112 can include an electronic component including an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA) in at least ¶ [0033]); and
Krishnamurthy teaches receiving a third job execution request and receiving a fourth job execution request (the server system 102 comprises, among other things, a workload manager 118 in at least ¶ [0030] and A set of data-parallel workload tasks is dynamically scheduled across at least one resource in the first set of resources and at least one resource in the second set of resources in at least ¶ [0004] and A task, at step 1206, calls a compute kernel at the accelerator system 104 or pseudo accelerator system. The kernel scheduler 519, at step 1208, schedules a physical accelerator compute resource or pseudo-accelerator compute resource at the server system 102 or accelerator to satisfy the kernel call in at least ¶ [0073]) from a third processor; from a fourth processor (Fig. 12, 1206 and These kernels can be invoked when a calling process on the server is run. These compute kernels are then launched on the accelerators 104 in at least ¶ [0035] and FIG. 5 shows server system processors 502, 504, 506 that comprise cross-platform parallel programming OpenCL calling address spaces and server system processors 508, 510, 512. The server system processors 508, 510, 512 each comprise one or more compute kernels 514, 516, 518 in at least ¶ [0054], Examiner notes kernels 1-N request scheduling of workload tasks on CPUs 1-N, thus there are third and fourth execution requests and third and fourth processors);
scheduling the third job execution request for execution by a third kernel at a second accelerator; and scheduling the third job execution request for execution by a fourth kernel at the second accelerator (Each process may execute several tasks and each task can offload certain compute intensive portions to a compute kernel, which are basic units of executable code. The compute kernel may execute on one compute element or several compute elements such as a processor core. A compute kernel can be either task-parallel with respect to CPUs such as the processors found within the server system 102 or data-parallel with respect to CPUs such as the processors found within the accelerators 104 ... Workloads are queued in the workload queues 136, 138 of the server 102 and accelerators 104, respectively in at least ¶ [0034] and The workload manager 118 sends these compute kernels to the accelerators 104 in at least ¶ [0035] and the workload manager 118 schedules a portion 202, 204 of the accelerator workload (e.g., tasks) as one or more compute kernels 206, 208 at the accelerator processors 116 in at least ¶ [0039]).
Krishnamurthy does not specifically teach that the accelerator is an FPGA.
However, in analogous art Zievers teaches scheduling the third and fourth job execution request for execution at the FPGA resident on the compute device (The computing system 100 can include a request module 924 for evaluating a user request 926 to initialize the application 304 in at least ¶ [0112] and The computing system 100 can include an allocation module 928 for performing scheduling and optimization to generate placement and interconnection of reconfigurable resources 930 for implementing the application 304. in at least ¶ [0114] and The allocation module 928 can perform a scheduling procedure 932 and an optimization procedure 934 for implementing functions of the application 304. The scheduling procedure 932 is a method that determines which and when microkernel resources 936 are allocated to perform the optimization procedure. The optimization procedure is a method that determines the reconfigurable resources 930 in the first reconfigurable hardware devices 912, the second reconfigurable hardware devices 922, or a combination thereof that are used to implement the functionalities of the application 304 in at least ¶ [0116] and The line cards 112 can include an electronic component including an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA) in at least ¶ [0033]).
It would have been obvious to a person having ordinary skill in the art prior to the effective filing date of the claimed invention to combine the accelerator is an FPGA of Zievers with the systems and methods of Krishnamurthy resulting in a system in which accelerators of Krishnamurthy are implemented as FPGAs as in Zievers. A person having ordinary skill in the art would have been motivated to make this combination, with a reasonable expectation of success, for the purpose of improving dynamic allocation of hardware resources by providing a platform, an FPGA, for fragmenting and mapping tasks across reconfigurable hardware devices of the FPGA through use of one or more kernel units and further improving performance by reducing delays, cost and complexity and improving scalability by providing for management of any number of reconfigurable hardware devices of FPGAs (see at least Zievers ¶ [0053] – [0054] and ¶ [0130] – [0131]).

With regard to claim 26, Krishnamurthy teaches wherein the first and second processors are resident on the compute device (Fig. 12, 1206 and These kernels can be invoked when a calling process on the server is run. These compute kernels are then launched on the accelerators 104 in at least ¶ [0035] and FIG. 5 shows server system processors 502, 504, 506 that comprise cross-platform parallel programming OpenCL calling address spaces and server system processors 508, 510, 512. The server system processors 508, 510, 512 each comprise one or more compute kernels 514, 516, 518 in at least ¶ [0054]) and the third and fourth processors are resident on a second compute device communicatively coupled with the compute device via a network (Some embodiments of the present invention utilize architectures, such as an object oriented framework mechanism, that allows instructions of the components of operating system (not shown) to be executed on any processor located within the information processing system 1300 in at least ¶ [0079] and IG. 1 shows a host system such as a server system 102, a plurality of accelerator systems 104, and one or more user clients 106 communicatively coupled via one or more networks 108 in at least ¶ [0026] and The combination of the server system 102 and the accelerator systems 104 is herein referred to as a hybrid server/system 112 within the hybrid environment 100 because of the heterogeneous combination of various system types of the server 102 and accelerators 104 … comprises one or more processors 114 … also comprises one or more processors 116, Examiner notes that some embodiments can utilize any processor of the system and the system is disclosed as teaching processors on different devices coupled by a network thus first and second processor may be co-located whereas third and fourth processor may be on another device).

With regard to claim 27, Krishnamurthy teaches wherein to schedule the first and second job execution requests for execution by respective first and second kernels at the FPGA is based on separate priorities assigned to the first and second kernels (In this environment a workload comprises multiple processes/address spaces. Each process may execute several tasks and each task can offload certain compute intensive portions to a compute kernel, which are basic units of executable code in at least ¶ [0033] and The workload manager 118, at step 806, determines if at least one of the workloads has exceeded a threshold of highest priority SLA limits on one of the systems 102, 104. It should be noted that for a workload with throughput SLA, Energy SLA and batch-window SLA, prioritization is possible. Highest priority SLA limits refers to the SLA with highest priority of the aforementioned three SLA types. If the result of this determination is negative, the control flow returns to step 804. If the result of this determination is positive, the workload manager 118, at step 808, determines if the workload is likely to meet highest SLA priority limits on a second system that has the resource capacity for one additional workload in at least ¶ [0068], Examiner notes that each workload/kernel may be prioritized based on SLA).

With regard to claim 28, Krishnamurthy teaches wherein a first priority is assigned to the first kernel based on an estimated runtime to fulfill the first job execution request and a second priority is assigned to the second kernel based on an estimated runtime to fulfill the second job execution request (The workload manager 118, at step 806, determines if at least one of the workloads has exceeded a threshold of highest priority SLA limits on one of the systems 102, 104. It should be noted that for a workload with throughput SLA, Energy SLA and batch-window SLA, prioritization is possible in at least ¶ [0068] and A batch window is a time window that specifies when a batch processing workload may complete. This is an attribute of an SLA (Service Level Agreement). A batch window specification is used by the workload manager to allocate resources so that the workload can complete within the batch window in at least ¶ [0041], Examiner notes that each workload/kernel may be prioritized based on SLA).).

Claims 2, 13 and 22 are rejected under 35 U.S.C. 103 as being unpatentable over Krishnamurthy et al. Pub. No. US 2012/0054770 A1 (hereafter Krishnamurthy) in view of Zievers Pub. No US 2012/0284501 A1 (hereafter Zievers) as applied to claims 1, 3-10, 12, 14-19, 21 and 23-28 above and in further view of Chung-Sheng et al. “Composable architecture for rack scale big data computing” (hereafter Chung-Sheng).

With regard to claim 2, Krishnamurthy and Zievers teach the compute device of claim 1, wherein the accelerator management circuitry to schedule the first and second job execution requests for respective execution by the first and second kernels at the FPGA (see mapping in rejection of independent claim above)
Krishnamurthy and Zievers do not specifically teach overprovisioning the FPGA.
However, in analogous art Chung-Sheng teaches causes an overprovisioning (best performance requires a memory overprovisioning of a factor of three or the workload suffers a substantial performance penalty in at least 9.2 Giraph workload, ¶ 2) of the FPGA in a disaggregated architecture that includes the first and second processors (Disaggregating GPU and FPGA are much less demanding as each GPU and FPGA is likely to have its local memory, and will often engage in computations that last many microseconds or milliseconds in at least 5. Network considerations, ¶ 3).
It would have been obvious to a person having ordinary skill in the art prior to the effective filing date of the claimed invention to combine the overprovisioning the FPGA of Chung-Sheng with the systems and methods of Krishnamurthy and Zievers resulting in a system in which the accelerator of Krishnamurthy (which has been modified by Zievers to be an FPGA) is disaggregated and overprovisioned as in Chung-Sheng. A person having ordinary skill in the art would have been motivated to make this combination, with a reasonable expectation of success, for the purpose of improving system performance (see at least Chung-Sheng 9.2 Giraph workload, ¶ 2, 5. Network considerations, ¶ 3 and 2. Composable system architecture ¶ 1).

With regard to claim 13, Krishnamurthy and Zievers teach the one or more non-transitory machine-readable storage media of claim 12, wherein the instructions to cause the system to schedule the first and second job execution requests for respective execution by the first and second kernels at the FPGA (see mapping in rejection of independent claim above)
Krishnamurthy and Zievers do not specifically teach overprovisioning the FPGA.
However, in analogous art Chung-Sheng teaches causes an overprovisioning (best performance requires a memory overprovisioning of a factor of three or the workload suffers a substantial performance penalty in at least 9.2 Giraph workload, ¶ 2) of the FPGA in a disaggregated architecture that includes the first and second processors (Disaggregating GPU and FPGA are much less demanding as each GPU and FPGA is likely to have its local memory, and will often engage in computations that last many microseconds or milliseconds in at least 5. Network considerations, ¶ 3).
It would have been obvious to a person having ordinary skill in the art prior to the effective filing date of the claimed invention to combine the overprovisioning the FPGA of Chung-Sheng with the systems and methods of Krishnamurthy and Zievers resulting in a system in which the accelerator of Krishnamurthy (which has been modified by Zievers to be an FPGA) is disaggregated and overprovisioned as in Chung-Sheng. A person having ordinary skill in the art would have been motivated to make this combination, with a reasonable expectation of success, for the purpose of improving system performance (see at least Chung-Sheng 9.2 Giraph workload, ¶ 2, 5. Network considerations, ¶ 3 and 2. Composable system architecture ¶ 1).

With regard to claim 22, Krishnamurthy and Zievers teach the method of claim 21, wherein scheduling the first and second job execution requests for respective execution by the first and second kernels at the FPGA (see mapping in rejection of independent claim above)
Krishnamurthy and Zievers do not specifically teach overprovisioning the FPGA.
However, in analogous art Chung-Sheng teaches causes an overprovisioning (best performance requires a memory overprovisioning of a factor of three or the workload suffers a substantial performance penalty in at least 9.2 Giraph workload, ¶ 2) of the FPGA in a disaggregated architecture that includes the first and second processors (Disaggregating GPU and FPGA are much less demanding as each GPU and FPGA is likely to have its local memory, and will often engage in computations that last many microseconds or milliseconds in at least 5. Network considerations, ¶ 3).
It would have been obvious to a person having ordinary skill in the art prior to the effective filing date of the claimed invention to combine the overprovisioning the FPGA of Chung-Sheng with the systems and methods of Krishnamurthy and Zievers resulting in a system in which the accelerator of Krishnamurthy (which has been modified by Zievers to be an FPGA) is disaggregated and overprovisioned as in Chung-Sheng. A person having ordinary skill in the art would have been motivated to make this combination, with a reasonable expectation of success, for the purpose of improving system performance (see at least Chung-Sheng 9.2 Giraph workload, ¶ 2, 5. Network considerations, ¶ 3 and 2. Composable system architecture ¶ 1).

Claims 11, 20 and 29 are rejected under 35 U.S.C. 103 as being unpatentable over Krishnamurthy et al. Pub. No. US 2012/0054770 A1 (hereafter Krishnamurthy) in view of Zievers Pub. No US 2012/0284501 A1 (hereafter Zievers) as applied to claims 1, 3-10, 12, 14-19, 21 and 23-28 above and in further view o Goodman Pub. No. US 2014/0130056 A1 (hereafter Goodman).

With regard to claim 11, Krishnamurthy and Zievers teach the compute device of claim 9, a first kernel and a second kernel (see mapping in rejection of independent claim above)
Krishnamurthy and Zievers do not specifically teach assigning priority based on previously fulfilled job execution requests.
However, in analogous art Goodman teaches wherein a first priority is assigned to the first work unit based on previously fulfilled job execution requests of the first work unit and a second priority is assigned to the second work unit based on previously fulfilled job execution requests of the second work unit (The scheduler 100 may then send a next set of one or more work units to the task server 102. The scheduler 100 might have the next set of work units already "teed up" (for example, in a queue identifying an order of specific tasks to be performed) to send to the task server 102, or it might determine the next set of work units on a just-in-time basis. This determination of the next set of work units could be based, for example, upon the speed at which the particular task server 102 finished the prior work units, and/or on an assigned priority of the next work units, so that a set of lower-priority work units might be sent to a task server 102 that had previously demonstrated that it took a relatively-long time to finish prior work units).
It would have been obvious to a person having ordinary skill in the art prior to the effective filing date of the claimed invention to combine the assigning priority based on previously fulfilled job execution requests of Goodman with the systems and methods of Krishnamurthy and Zievers resulting in a system in which the priority assignment of workloads/kernels as in Krishnamurthy consider previous work units as in Goodman. A person having ordinary skill in the art would have been motivated to make this combination, with a reasonable expectation of success, for the purpose of improving system efficiency by considering speed and time work units take to complete in prioritizing scheduling of next work units (see at least ¶ [0039] and abstract).

With regard to claim 20, Krishnamurthy and Zievers teach the one or more non-transitory machine-readable storage media of claim 18, a first kernel and a second kernel (see mapping in rejection of independent claim above)
Krishnamurthy and Zievers do not specifically teach assigning priority based on previously fulfilled job execution requests.
However, in analogous art Goodman teaches wherein a first priority is assigned to the first work unit based on previously fulfilled job execution requests of the first work unit and a second priority is assigned to the second work unit based on previously fulfilled job execution requests of the second work unit (The scheduler 100 may then send a next set of one or more work units to the task server 102. The scheduler 100 might have the next set of work units already "teed up" (for example, in a queue identifying an order of specific tasks to be performed) to send to the task server 102, or it might determine the next set of work units on a just-in-time basis. This determination of the next set of work units could be based, for example, upon the speed at which the particular task server 102 finished the prior work units, and/or on an assigned priority of the next work units, so that a set of lower-priority work units might be sent to a task server 102 that had previously demonstrated that it took a relatively-long time to finish prior work units).
It would have been obvious to a person having ordinary skill in the art prior to the effective filing date of the claimed invention to combine the assigning priority based on previously fulfilled job execution requests of Goodman with the systems and methods of Krishnamurthy and Zievers resulting in a system in which the priority assignment of workloads/kernels as in Krishnamurthy consider previous work units as in Goodman. A person having ordinary skill in the art would have been motivated to make this combination, with a reasonable expectation of success, for the purpose of improving system efficiency by considering speed and time work units take to complete in prioritizing scheduling of next work units (see at least ¶ [0039] and abstract).

With regard to claim 29, Krishnamurthy and Zievers teach the method of claim 27, a first kernel and a second kernel (see mapping in rejection of independent claim above)
Krishnamurthy and Zievers do not specifically teach assigning priority based on previously fulfilled job execution requests.
However, in analogous art Goodman teaches wherein a first priority is assigned to the first work unit based on previously fulfilled job execution requests of the first work unit and a second priority is assigned to the second work unit based on previously fulfilled job execution requests of the second work unit (The scheduler 100 may then send a next set of one or more work units to the task server 102. The scheduler 100 might have the next set of work units already "teed up" (for example, in a queue identifying an order of specific tasks to be performed) to send to the task server 102, or it might determine the next set of work units on a just-in-time basis. This determination of the next set of work units could be based, for example, upon the speed at which the particular task server 102 finished the prior work units, and/or on an assigned priority of the next work units, so that a set of lower-priority work units might be sent to a task server 102 that had previously demonstrated that it took a relatively-long time to finish prior work units).
It would have been obvious to a person having ordinary skill in the art prior to the effective filing date of the claimed invention to combine the assigning priority based on previously fulfilled job execution requests of Goodman with the systems and methods of Krishnamurthy and Zievers resulting in a system in which the priority assignment of workloads/kernels as in Krishnamurthy consider previous work units as in Goodman. A person having ordinary skill in the art would have been motivated to make this combination, with a reasonable expectation of success, for the purpose of improving system efficiency by considering speed and time work units take to complete in prioritizing scheduling of next work units (see at least ¶ [0039] and abstract).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
US 20190205271 A1
teaches
Computing system with hardware reconfiguration mechanism and method of operation thereof
US 20170262567 A1
teaches
Code partitioning for the array of devices
US 20190102224 A1
teaches
Technologies for opportunistic acceleration overprovisioning for disaggregated architectures
US 20190097635 A1
teaches
Techniques for reducing uneven aging in integrated circuits


Examiner respectfully requests, in response to this Office action, support be shown for language added to any original claims on amendment and any new claims. That is, indicate support for newly added claim language by specifically pointing to page(s) and line number(s) in the specification and/or drawing figure(s). This will assist Examiner in prosecuting the application.

When responding to this Office Action, Applicant is advised to clearly point out the patentable novelty which he or she thinks the claims present, in view of the state of the art disclosed by the references cited or the objections made. He or she must also show how the amendments avoid such references or objections.  See 37 CFR 1.111(c).

Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to BRADLEY A TEETS whose telephone number is (571)272-3338.  The examiner can normally be reached on Monday - Friday, 6am-2pm.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Meng An can be reached on 5712723756.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/BRADLEY A TEETS/Primary Examiner, Art Unit 2195