DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claims 1-20 are pending in this application. 


Double patenting 
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969). A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159. See MPEP §§ 706.02(l)(1) - 706.02(l)(3) for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.

Claims 1-20 are provisionally rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1-3, 5-9, 12-15, 17-21 and 24-28 of copending application 15/911,321 in view of Sun et al. (US Patent. 10,262,390 B1), Hundley (US Pub. 2005/0278502 A1) and further in view of Gloster et al. (US Pub. 2011/0167225 A1).

Although the claims at issue are not identical, they are not patentably distinct from each other.
Regarding claim 1 of the instant application, the following table compares claim 1 of the copending application 15/911,321. The differences have been bolded and underlined.
Instant Application
15/911,321
1. One or more non-transitory machine-readable storage media comprising a plurality of instructions stored thereon that, in response to being executed, cause a system to: 












schedule, in response to a request from an application through an application programming interface (API) call, acceleration of a function among a plurality of Field Programmable Gate Arrays, 




















to offload execution of the function to the Field Programmable Gate Array from a processor; 























store, in a library, a bit stream to enable one or more Field Programmable Gate Arrays to perform the function; and 




























load the bit stream associated with the function to be accelerated on one or more Field Programmable Gate Arrays.


1. A compute device comprising: 






a processor to execute an application; and 
an accelerator pool including multiple field programmable gate arrays (FPGA), 

the processor to (i) receive, from the application, a request to accelerate a function; (ii) determine a queue depth associated with each FPGA in the accelerator pool; (iii) schedule, in response to the request and the determined queue depth associated with each FPGA, acceleration of the function on multiple FPGAs to produce output data, 
(Sun, Fig. 1, 114 GPU application programming interface (as API), 140 GPU service controller, 150-1-4 GPU server nodes, 154 GPU devices; Col 5, lines 36-45, For example, the GPU API 114 is configured transmit a service request to the GPU service platform 130 to request GPU processing services provided by the GPU service platform 130. In addition, the GPU API 114 is configured to transmit blocks of application code (e.g., compute kernels) of the GPU-accelerated application 112 and any associated data, which are to be processed by one or more GPU server nodes within the server cluster 150 that have been allocated by the GPU service platform 130 to handle the service request)



(Hundley, Fig. 1, 130, HW accelerators; [0027] lines 2-6, hardware accelerators 130 all occupying a fraction of the logic space available on a circuit card assembly. The logic space consumed by the hardware accelerators 130 may be implemented in a variety of methods, including Field Programmable Gate Arrays (FPGAs); [0004] lines 7-14, One aspect of hardware acceleration is that algorithmic operations are performed on data using specially designed hardware rather than performing those same operations using generic hardware, such as software running on a microprocessor. Thus, a hardware accelerator can be any hardware that is designed to perform specific algorithmic operations on data. Hardware accelerators generally perform a specific task to off-load CPU (software) cycles (as offload execution of the function to the FPGA from a processor); see Fig. 1, 122 CPU).


(Gloster, Fig. 1, 210 FPGA; [0030] lines 1-3,  ASDSPs are stored in a central processor library. Each ASDSP is stored as an FPGA bit stream; [0055] lines 5-7, Each bit stream in the library is used to program an FPGA to function as an algorithm-specific digital signal processor; [0053] lines 1-8,  digital signal processors is provided wherein each processor executes a specific DSP algorithm…these algorithm specific digital signal processors (ASDSPs) are used to mitigate bottlenecks in software by replacing computationally intense portions of a high-level DSP application with custom hardware; also see [0059] The goal of the system is to produce algorithm-specific DSPs that best utilize the available FPGA resources (as loading the bit stream to enable one or more Field Programmable Gate Arrays to perform the function (i.e., a specific digital signal processing algorithm), see [0025] lines 1-4, An application-specific digital signal processor (ASDSP) is a high-performance, floating-point or fixed-point, vector processor that executes a specific digital signal processing algorithm)).



each of the multiple FPGAs to load a bit stream associated with the function to be accelerated; and (iv) provide, in response to completion of acceleration of the function, the output data.


Although the claims at issue are not identical, they are not patentably distinct from each other because the copending application ‘321’ is narrower than the instant application. The claim 1 of the copending application ‘321’ is directed to “a compute device” whereas the claim 1 of the instant application is directed to a “non-transitory machine-readable storage media”  The difference in the statutory class of invention between the two claims are merely obvious variant of one another where the invention defined in the claims of copending application ‘321’ could readily be practiced or embodied in as a non-transitory machine-readable storage media. Therefore, it would have been obvious to one of ordinary skill in the art before the invention was made to modify claim 1 of the copending application ‘321’ to be practiced as a non-transitory machine-readable storage media.  The ordinary artisan would have been motivated to modify the copending application claim for the simple purpose of carrying out or implementing the computer instructions recited therein stored in the non-transitory machine-readable storage media and being executed within a compute device. 

In addition, the copending application ‘321’ does not explicitly claim: 
schedule, in response to a request from an application through an application programming interface (API) call,

However, Sun teaches schedule, in response to a request from an application through an application programming interface (API) call (Sun, Fig. 1, 114 GPU application programming interface (as API), 140 GPU service controller, 150-1-4 GPU server nodes, 154 GPU devices; Col 5, lines 36-45, For example, the GPU API 114 is configured transmit a service request to the GPU service platform 130 to request GPU processing services provided by the GPU service platform 130. In addition, the GPU API 114 is configured to transmit blocks of application code (e.g., compute kernels) of the GPU-accelerated application 112 and any associated data, which are to be processed by one or more GPU server nodes within the server cluster 150 that have been allocated by the GPU service platform 130 to handle the service request).

It would have been obvious to one having ordinary skill in the art before the invention was made to modify the claim of the copending application by including the step of “schedule, in response to a request from an application through an application programming interface (API) call” as taught by Sun. One of ordinary skilled would have been motivated to modify claim of copending application ‘321’ in the manner described above for the purpose of clarifying that the request is received through an application programming interface (API) call which allowing the system to use an interface to accept different requests in order to improve the system efficiency and performance. 

Both copending application ‘321’ and Sun does not explicitly claim: 
offload execution of the function to the Field Programmable Gate Array from a processor.

However, Hundley teaches offload execution of the function to the Field Programmable Gate Array from a processor (Hundley, Fig. 1, 130, HW accelerators; [0027] lines 2-6, hardware accelerators 130 all occupying a fraction of the logic space available on a circuit card assembly. The logic space consumed by the hardware accelerators 130 may be implemented in a variety of methods, including Field Programmable Gate Arrays (FPGAs); [0004] lines 7-14, One aspect of hardware acceleration is that algorithmic operations are performed on data using specially designed hardware rather than performing those same operations using generic hardware, such as software running on a microprocessor. Thus, a hardware accelerator can be any hardware that is designed to perform specific algorithmic operations on data. Hardware accelerators generally perform a specific task to off-load CPU (software) cycles (as offload execution of the function to the FPGA from a processor); see Fig. 1, 122 CPU).

It would have been obvious to one having ordinary skill in the art before the invention was made to modify the claim of the copending application and Sun by including the step of “offload execution of the function to the Field Programmable Gate Array from a processor” as taught by Hundley. One of ordinary skilled would have been motivated to modify claim of copending application ‘321’ and Sun in the manner described above for the purpose of clarifying that the acceleration function is offloaded from processor that is needed to be acceleration which to allow the system to increase the execution speed and improving the system efficiency and performance.


	Copending application ‘321’, Sun and Hundley fails to explicitly claim:
store, in a library, a bit stream to enable one or more Field Programmable Gate Arrays to perform the function.

However, Gloster teaches store, in a library, a bit stream to enable one or more Field Programmable Gate Arrays to perform the function (Gloster, Fig. 1, 210 FPGA; [0030] lines 1-3,  ASDSPs are stored in a central processor library. Each ASDSP is stored as an FPGA bit stream; [0055] lines 5-7, Each bit stream in the library is used to program an FPGA to function as an algorithm-specific digital signal processor; [0053] lines 1-8,  digital signal processors is provided wherein each processor executes a specific DSP algorithm…these algorithm specific digital signal processors (ASDSPs) are used to mitigate bottlenecks in software by replacing computationally intense portions of a high-level DSP application with custom hardware; also see [0059] The goal of the system is to produce algorithm-specific DSPs that best utilize the available FPGA resources (as loading the bit stream to enable one or more Field Programmable Gate Arrays to perform the function (i.e., a specific digital signal processing algorithm), see [0025] lines 1-4, An application-specific digital signal processor (ASDSP) is a high-performance, floating-point or fixed-point, vector processor that executes a specific digital signal processing algorithm)).

It would have been obvious to one having ordinary skill in the art before the invention was made to modify the claim of the copending application, Sun and Hundley by including the step of “store, in a library, a bit stream to enable one or more Field Programmable Gate Arrays to perform the function” as taught by Gloster. One of ordinary skilled would have been motivated to modify claim of copending application ‘321’, Sun and Hundley in the manner described above for the purpose of clarifying the bit stream that is for configuring the FPGA to be able to perform a particular function which allowing the configured FPGA to execute the particular function in order to improve the application/tasks/functions processing speed.  

Similar claim mappings of the remaining claims would have been obvious to a person having ordinary skill in the art but have been omitted for the sake of brevity.


Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to a judicial exception (i.e., a law of nature, a natural phenomenon, or an abstract idea) without significantly more.  
Claim 1 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Step 1, Statutory Category: Yes, the claim 1 is a non-transitory machine-readable storage media having a plurality of instructions that executed to perform a series of steps and therefore falls in the statutory category of a process.
Step 2A- Prong 1: Judicial Exception Recited: Yes, the claim recites: “schedule, in response to a request, acceleration of a function among a plurality of Field Programmable Gate Arrays” As drafted, the claim as a whole recites a non-transitory machine-readable storage media having a plurality of instructions that executed to perform a series of steps including step that could be performed in the human mind, but for the recitation of generic computing components. The human mind can easily scheduling/planning to make a plan/schedule for the different devices (i.e., FPGAs) to perform the function based on the received request. Therefore, but for the recitation of generic computing components, these steps may be a Mental Processes that can be performed in the human mind (including an observation, evaluation, judgment, opinion). 
Therefore, yes, the claims do recite judicial exceptions.
Step 2A- Prong 2: Integrated into a practical Application: No, this judicial exception is not integrated into a practical application. In particular, the claim recites an additional limitations that “a request from an application through an application programming interface (API) call” which is insignificant pre-solution data gathering (see MPEP § 2106.05(g)). In addition, “one or more non-transitory machine-readable storage media”,  “a plurality of instructions”, “an application programming interface (API) call”, “a plurality of Field Programmable Gate Arrays”, “a processor”, “a library”, “a bit stream” and “function” which is Applying the judicial exception with, or by use of, a particular machine MPEP 2106.05(b). Further, “store, in a library, a bit stream to enable one or more Field Programmable Gate Arrays to perform the function; and load the bit stream associated with the function to be accelerated on one or more Field Programmable Gate Arrays.” which is Insignificant Extra-Solution Activity (i.e., mere data storing; see MPEP §2106.05(g)). The combination of these additional elements is no more than mere instructions to apply the exception using a generic computer component. Accordingly, even in combination, these additional elements do not integrate the abstract idea into a practical application because they not impose any meaningful limits on practicing the abstract idea. Therefore, the claim is directed to the abstract idea.
Step 2B: Claim provides an Inventive Concept: No. As discussed with respect to Step 2A prong Two, the additional element “one or more non-transitory machine-readable storage media”,  “a plurality of instructions”, “an application programming interface (API) call”, “a plurality of Field Programmable Gate Arrays”, “a processor”, “a library”, “a bit stream” and “function” (i.e., Applying the judicial exception with, or by use of, a particular machine MPEP 2106.05(b)). In addition, the limitation [receive] “a request from an application through an application programming interface (API) call” which is insignificant pre-solution data gathering (see MPEP § 2106.05(g)) and the limitation of “store, in a library, a bit stream to enable one or more Field Programmable Gate Arrays to perform the function; and load the bit stream associated with the function to be accelerated on one or more Field Programmable Gate Arrays” are Insignificant extra-solution activity (i.e., mere data storing; see MPEP §2106.05(g)) and the limitation of “offload execution of the function to the Field Programmable Gate Array from a processor” is an attempt to generally link the use of the judicial exception to a particular technological environment or field of use (MPEP 2106.05(h))) and Insignificant extra-solution activity. They are additionally well understood, routine, conventional activity (see MPEP § 2106.05(d)). The same analysis applies here in 2B, i.e., mere instructions to apply an exception on a generic computer cannot integrate a judicial exception into a practical application at Step 2A. These additional elements and combination of the elements does not amount to significant more than the exception itself or provide an inventive concept in Step 2B.

Under the 2019 PEG, a conclusion that an additional element is insignificant extra-solution activity in Step 2A should be re-evaluated in Step 2B. Here, the “store”, “load” and “offload” steps were considered to be extra-solution activity in Step 2A as Insignificant extra-solution activity and an attempt to generally link the use of the judicial exception to a particular technological environment or field of use (MPEP 2106.05(h))) and thus they are re-evaluated in Step 2B to determine if it is more than what is well understood, routine, conventional activity in the field. The “store” and “load” steps are for the purpose of “storing” the data and this can be reached on one of court case (Storing and retrieving information in memory, Versata Dev. Group, Inc. v. SAP Am., Inc., 793 F.3d 1306, 1334, 115 USPQ2d 1681, 1701 (Fed. Cir. 2015); OIP Techs., 788 F.3d at 1363, 115 USPQ2d at 1092-93). Accordingly, a conclusion that the “storing” and “loading” are well understood, routine, conventional activity is supported under Berkheimer options 2.
Further, for the step of “offload”, the background of the example does not provide any indication that the “offload” step is anything other than a generic, off-the-shelf computer component, and the specification paragraph [0002] lines 4-9, it specifically recites “the FPGA may be configured to perform a compression function, an encryption function, a convolution function, or other function that is amenable to acceleration (e.g., able to be performed faster using specialized hardware). Typically, the general purpose processor, executing software (e.g., the applications and/or hardware driver(s)) coordinates the scheduling (e.g., assignment) of functions to the FPGA”, that is the software/applications are offloaded from the processor to the FPGA. Accordingly, a conclusion that the step of “offload execution of the function to the Field Programmable Gate Array from a processor” is well understood, routine, conventional activity is supported under Berkheimer option 1.
 For these reasons, there is no inventive concept in the claim, and thus the claim is ineligible. 

Independent claims 8 and 14 are rejected for the same reason as claim 1 above. Claim 14 further recites “circuitry”. This additional element is directed to generic computer components providing generic computer functions (see MPEP § 2106.05(b)). 

With respect to the dependent claim 2, the claim elaborates that wherein the function is accelerated on multiple Field Programmable Gate Arrays (“multiple Field Programmable Gate Arrays” is directed to generic computer components providing generic computer functions (see MPEP § 2106.05(b))).

With respected to the dependent claim 3, the claim elaborates that wherein the bit stream associated with the function to be accelerated is loaded on one or more Field Programmable Gate Arrays based on one or more of a type of function to be accelerated, a size of a data set to be operated on, or a time period in which acceleration of the function is to be completed. (“type of function to be accelerated, a size of a data set to be operated on, or a time period in which acceleration of the function is to be completed” are an attempt to generally link the use of the judicial exception to a particular technological environment or field of use (MPEP 2106.05(h)))).

With respected to the dependent claim 4, the claim elaborates that wherein the one or more Field Programmable Gate Arrays to perform the function to be accelerated based on a queue depth associated with each of the one or more Field Programmable Gate Arrays. (“perform the function to be accelerated based on a queue depth” is Applying the judicial exception with, or by use of, a particular machine MPEP 2106.05(b) and an attempt to generally link the use of the judicial exception to a particular technological environment or field of use (MPEP 2106.05(h)))). In addition, the claim as a whole is a Mental Processes that can be performed in the human mind (including an observation, evaluation, judgment, opinion)).

With respected to the dependent claim 5, the claim elaborates that wherein the bit stream associated with the function to be accelerated is loaded on a Field Programmable Gate Array that has a shortest queue depth. (“function to be accelerated…based on a shortest queue depth” is Applying the judicial exception with, or by use of, a particular machine MPEP 2106.05(b) and an attempt to generally link the use of the judicial exception to a particular technological environment or field of use (MPEP 2106.05(h)))). In addition, the claim as a whole is a Mental Processes that can be performed in the human mind (including an observation, evaluation, judgment, opinion)).

With respected to the dependent claim 6, the claim elaborates that wherein acceleration of the function is scheduled based on a type of function each of the one or more Field Programmable Gate Arrays is presently configured to accelerate. (“scheduled based on a type of function” is a Mental Processes that can be performed in the human mind (including an observation, evaluation, judgment, opinion)).

With respected to the dependent claim 7, the claim elaborates that wherein the one or more Field Programmable Gate Arrays is to send, a notification indicative of completion of acceleration of the function (“send, a notification” is considered to be extra-solution activity, and this can be reached on one of court case (Electric Power Group, LLC v. Alstom S.A., 830 F.3d 1350, 1354-55, 119 USPQ2d 1739, 1742 (Fed. Cir. 2016) (collection, analysis and display data) see MPEP § 2106.05(g))).

Dependent claims 9-13 and15-20 recite the same features as applied to claims 2-7 above, therefore they are also rejected under the same rationale.



Claim Rejections - 35 USC § 112(b)
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.

Claims 1-20 are rejected under 35 U.S.C. 112(b), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor, or for pre-AIA  the applicant regards as the invention.
As per claims 1, 8 and 14 (line# refers to claim 1):
In line 5, it recites the phrase “offload execution of the function to the field programmable gate array”. However, prior to this phrase at lines 4-5, it recites “acceleration of a function among a plurality of field programmable gate arrays”. Thus, it is unclear whether the “execution of the function” is be executed/accelerated on “the field programmable gate array” or “plurality of field programmable gate arrays”. In addition, “the field programmable gate array” lacks antecedence basis. It is uncertain if this term intent to refer to one of “plurality of field programmable gate arrays” as cited in claim 1, lines 4-5.

In line 7, it recites the phrase “one or more field programmable gate arrays”. However, prior to this phrase at lines 4-5, it recites “a plurality of field programmable gate arrays”. Thus, it is unclear whether the second recitation of “one or more field programmable gate arrays” is the same or different from the first recitation of “a plurality of field programmable gate arrays”. If they are the same, same term should be used.

In lines 9-10, it recites the phrase “one or more field programmable gate arrays”. However, prior to this phrase at lines 7, it recites “one or more field programmable gate arrays”. Thus, it is unclear whether the second recitation of “one or more field programmable gate arrays” is the same or different from the first recitation of “one or more field programmable gate arrays”. If they are the same, the or said should be used.

As per claims 3, 10 and 16 (line# refers to claim 3):
In lines 2-3, it recites the phrase “one or more field programmable gate arrays”. However, prior to this phrase, in claim 1, at lines 9-10, it recites “one or more field programmable gate arrays”. Thus, it is unclear whether the second recitation of “one or more field programmable gate arrays” is the same or different from the first recitation of “one or more field programmable gate arrays”. If they are the same, the or said should be used.

As per claims 2, 4-7, 9, 11-13, 15 and 17-20:
	They are non-transitory machine-readable storage media, method and system claims that depend on claims 1, 8 and 14 respectively above. Therefore, they have same deficiencies as claims 1, 8 and 14 above.


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-3, 6-7, 8-10, 13-16 and 19-20 are rejected under 35 U.S.C. 103 as being unpatentable over Sun et al. (US Patent. 10,262,390 B1) in view of Hundley (US Pub. 2005/0278502 A1) and further in view of Gloster et al. (US Pub. 2011/0167225 A1).
Hundley was cited in the IDS filed 11/15/2021.

As per claim 1, Sun teaches the invention substantially as claimed including One or more non-transitory machine-readable storage media comprising a plurality of instructions stored thereon that, in response to being executed, cause a system to (Sun, claim 10, lines 1-5, a non-transitory processor-readable storage medium having stored therein program code of one or more software programs, wherein the program code is executable by one or more processors to implement a process comprising): 
schedule, in response to a request from an application through an application programming interface (API) call, acceleration of a function among a plurality of GPU devices (Sun, Fig. 1, 114 GPU application programming interface (as API), 140 GPU service controller, 150-1-4 GPU server nodes, 154 GPU devices; Col 5, lines 36-45, For example, the GPU API 114 is configured transmit a service request to the GPU service platform 130 to request GPU processing services provided by the GPU service platform 130. In addition, the GPU API 114 is configured to transmit blocks of application code (e.g., compute kernels) of the GPU-accelerated application 112 (as acceleration of a function) and any associated data, which are to be processed by one or more GPU server nodes within the server cluster 150 that have been allocated by the GPU service platform 130 to handle the service request; also see Col 6, lines 35-42, the GPU service controller 140 is configured to receive a service request from the client system 110 for GPU processing services provided by the GPU service platform 130, and then invoke the GPU server allocation and scheduling module 142 to allocate and schedule one or more of the GPU server nodes 150-1, 150-2, . . . , 150-s within the GPU server cluster 150 to handle execution of GPU processing tasks associated with the received service request; Col 13, handle execution of tasks at the scheduled times using the assigned GPU devices) to offload execution of the function to the GPU device (Sun, Col 5, lines 21-26, an application program having compute-intensive portions or routines (e.g., compute kernels) which are included within the program code of the GPU-accelerated application 112, and which are offloaded to a GPU device for accelerated computing);

Sun fails to specifically teaches GPU devices are Field Programmable Gate Arrays, and when offloading, offload execution of the function to the Field Programmable Gate Array from a processor.


However, Hundley teaches GPU devices are Field Programmable Gate Arrays, and when offloading, offload execution of the function to the Field Programmable Gate Array from a processor (Hundley, Fig. 1, 130, HW accelerators; [0027] lines 2-6, hardware accelerators 130 all occupying a fraction of the logic space available on a circuit card assembly. The logic space consumed by the hardware accelerators 130 may be implemented in a variety of methods, including Field Programmable Gate Arrays (FPGAs); [0004] lines 7-14, One aspect of hardware acceleration is that algorithmic operations are performed on data using specially designed hardware rather than performing those same operations using generic hardware, such as software running on a microprocessor. Thus, a hardware accelerator can be any hardware that is designed to perform specific algorithmic operations on data. Hardware accelerators generally perform a specific task to off-load CPU (software) cycles (as offload execution of the function to the FPGA from a processor); see Fig. 1, 122 CPU).

It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have combined the teaching of Sun with Hundley because Hundley’s teaching of using the FPGA for accelerating the executions from the processor would have provided Sun’s system with the advantage and capability to allow the system to increase the execution speed which improving the system efficiency and performance. 

Sun and Hundley fail to specifically teach store, in a library, a bit stream to enable one or more Field Programmable Gate Arrays to perform the function; and load the bit stream associated with the function to be accelerated on one or more Field Programmable Gate Arrays.

However, Gloster teaches store, in a library, a bit stream to enable one or more Field Programmable Gate Arrays to perform the function (Gloster, Fig. 1, 210 FPGA; [0030] lines 1-3,  ASDSPs are stored in a central processor library. Each ASDSP is stored as an FPGA bit stream; [0055] lines 5-7, Each bit stream in the library is used to program an FPGA to function as an algorithm-specific digital signal processor; [0053] lines 1-8,  digital signal processors is provided wherein each processor executes a specific DSP algorithm…these algorithm specific digital signal processors (ASDSPs) are used to mitigate bottlenecks in software by replacing computationally intense portions of a high-level DSP application with custom hardware; also see [0059] The goal of the system is to produce algorithm-specific DSPs that best utilize the available FPGA resources (as loading the bit stream to enable one or more Field Programmable Gate Arrays to perform the function (i.e., a specific digital signal processing algorithm), see [0025] lines 1-4, An application-specific digital signal processor (ASDSP) is a high-performance, floating-point or fixed-point, vector processor that executes a specific digital signal processing algorithm)); and 
load the bit stream associated with the function to be accelerated on one or more Field Programmable Gate Arrays (Gloster, Fig. 7, 730; [0055] lines 5-7, Each bit stream in the library is used to program an FPGA to function as an algorithm-specific digital signal processor; [0070] lines 1-6, In step 730, the data unit and control unit are loaded onto a circuit board of the integrated circuit device, such as for example an FPGA. In one embodiment, one or both the data unit and control unit comprise a bit stream which is configured to be loaded onto the FPGA).

It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have combined the teaching of Sun and Hundley with Gloster because Gloster’s teaching of loading the bit stream to the FPGA to allow the FPGA to execute a specific digital signal processing algorithm would have provided Sun and Hundley’s system with the advantage and capability to configuring the FPGA based on the execution need which allowing the configured FPGA to execute a particular function in order to improve the system efficiency and performance.  

As per claim 2, Sun, Hundley and Gloster teach the invention according to claim 1 above. Sun further teaches wherein the function is accelerated on multiple GPU devices (Sun, Col 6, lines 35-42, the GPU service controller 140 is configured to receive a service request from the client system 110 for GPU processing services provided by the GPU service platform 130, and then invoke the GPU server allocation and scheduling module 142 to allocate and schedule one or more of the GPU server nodes 150-1, 150-2, . . . , 150-s within the GPU server cluster 150 to handle execution of GPU processing tasks associated with the received service request; Col 13, handle execution of tasks at the scheduled times using the assigned GPU devices). In addition, Hundley teaches GPU devices are Field Programmable Gate Arrays (Hundley, Fig. 4, 420, 440 transmit header to selected hardware accelerator, 450 perform operation; Fig. 5A; [0020] lines 1-3, FIG. 5A is a table illustrating an exemplary playlist containing instructions for multiple accelerators to perform multiple operations on a block of data; [0027] lines 2-6, hardware accelerators 130 all occupying a fraction of the logic space available on a circuit card assembly. The logic space consumed by the hardware accelerators 130 may be implemented in a variety of methods, including Field Programmable Gate Arrays (FPGAs)).

As per claim 3, Sun, Hundley and Gloster teach the invention according to claim 1 above. Gloster further teaches wherein the bit stream associated with the function to be accelerated is loaded on one or more Field Programmable Gate Arrays based on one or more of a type of function to be accelerated, a size of a data set to be operated on, or a time period in which acceleration of the function is to be completed (Gloster, Fig. 1, 210 FPGA; [0055] lines 5-7, Each bit stream in the library is used to program an FPGA to function as an algorithm-specific digital signal processor; [0053] lines 1-8,  digital signal processors is provided wherein each processor executes a specific DSP algorithm…these algorithm specific digital signal processors (ASDSPs) are used to mitigate bottlenecks in software by replacing computationally intense portions of a high-level DSP application with custom hardware; also see [0059] The goal of the system is to produce algorithm-specific DSPs that best utilize the available FPGA resources; [0070] lines 1-6, In step 730, the data unit and control unit are loaded onto a circuit board of the integrated circuit device, such as for example an FPGA. In one embodiment, one or both the data unit and control unit comprise a bit stream which is configured to be loaded onto the FPGA; [0025] lines 1-4, An application-specific digital signal processor (ASDSP) is a high-performance, floating-point or fixed-point, vector processor that executes a specific digital signal processing algorithm; also see claim 15, generating a function core configured to perform a specific mathematical expression in order to perform at least a portion of a specific application; (as bit stream loaded on one or more Field Programmable Gate Arrays based on one or more of a type of function to be accelerated (i.e., algorithm-specific DSPs that best utilize to perform at least a portion of a specific application)).

As per claim 6, Sun, Hundley and Gloster teach the invention according to claim 1 above. Hundley further teaches wherein acceleration of the function is scheduled based on a type of function each of the one or more Field Programmable Gate Arrays is presently configured to accelerate (Hundley, [0027] lines 4-6, the hardware accelerators 130 may be implemented in a variety of methods, including Field Programmable Gate Arrays (FPGAs); [0057] lines 8-9, determining which accelerator 330 should next operate on the data; [0069] lines 3-13, The TMU 355 accesses the file stored in memory 340 and determines which operations listed in Rule 1 list 560 should next be executed. Each of the comparisons with Results1 data may determine what type of file is in the Results1 data. For example, the Result_A may represent an image file (e.g. .jpg, .gif, .tif), the Result_B may represent an executable file (e.g. .exe), and the Result_C may represent a compressed file (e.g. .zip, .rar, .hqx). The accelerators A, B, C, and D may perform different types of antivirus or decompression operations that are suitable for specific file types; [0041] lines 22-27, the TMU 355 may generate instructions for any of hardware accelerators 130, transmit the instructions to the hardware accelerators 130 via the interconnect 150, and allow the accelerators 130 to perform the requested operations by accessing the input data directly from memory 140).

As per claim 7, Sun, Hundley and Gloster teach the invention according to claim 1 above. Hundley further teaches wherein the one or more Field Programmable Gate Arrays is to send, a notification indicative of completion of acceleration of the function (Hundley, Fig. 4, 460 Notify TMU of completion of operation; [0054] lines 1-3, the accelerator 330 that operated on the data transmits a signal (as notification) to the TMU 355 indicating that the operation has been completed).

As per claims 8-10 and 13, they are method claims of claims 1-3 and 6 respectively above. Therefore, they are rejected for the same reasons as claims 1-3 and 6 respectively above.

As per claim 14, it is a system claim of claim 1 above. Therefore, it is rejected for the same reason as claim 1 above. In addition, Hundley further teaches circuitry to schedule (Hundley, Fig. 7, 355 Task management unit; Abstract, lines 12-14, A task management unit on the circuit card assembly receive the playlist and schedules the hardware acceleration operations); Gloster teaches circuitry to store/load (Gloster, Abstract, lines 1-3, An integrated circuit device is provided comprising a circuit board and one or more digital signal processors implemented thereon).

As per claims 15-16 and 19-20, they are system claims of claims 2-3 and 6-7 respectively above. Therefore, they are rejected for the same reasons as claims 2-3 and 6-7 respectively above.


Claims 4, 11 and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Sun, Hundley and Gloster, as applied to claims 1, 8 and 14 respectively above, and further in view of Fong et al. (US Pub. 2018/0052709 A1).
	Fong was cited in the IDS filed 06/02/2022.

As per claim 4, Sun, Hundley and Gloster teach the invention according to claim 1 above. Sun, Hundley and Gloster fail to specifically teach wherein the one or more Field Programmable Gate Arrays to perform the function to be accelerated based on a queue depth associated with each of the one or more Field Programmable Gate Arrays.

	However, Fong teaches wherein the one or more Field Programmable Gate Arrays to perform the function to be accelerated based on a queue depth associated with each of the one or more Field Programmable Gate Arrays (Fong, Fig. 4, 418-(1-n) Queues, 408-(1-n) GPUs (as FPGA); [0074] line 24, FPGA; Abstract, lines 3-4, receiving a task request for associated with a workload; [0001] lines 2-5, a hybrid computing infrastructure may be comprised…one or more accelerators, such as graphical processing units (GPUs); [0037] lines 5-6, a plurality of GPUs 408-1 through 408-n, each having a queue 418-1 through 418-n, respectively; [0038] lines 2-9, determine an appropriate GPU among GPU 408-1 through GPU 408-n for offloading a task workload. In one embodiment, the appropriate GPU is determined based on one or more considerations. One such consideration is queue length (as queue depth). For example, if CPU 406 first selects GPU 408-1 to offload thread 406a, but GPU 408-1 has a long queue 418-1, the CPU can look for a GPU with a shorter queue; [0042] lines 1-3, The GPU information may include the current queue length of each GPU). 

It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have combined the teaching of Sun, Hundley and Gloster with Fong because Fong’s teaching of determining queue length/depth and assigning the task workload according to the queue length/depth would have provided Sun, Hundley and Gloster’s system with the advantage and capability to evenly distributing the tasks among the different accelerators which improving the workload balancing and efficiency.

As per claim 11, it is a method claim of claim 4 above. Therefore, it is rejected for the same reason as claim 4 above.

As per claim 17, it is a system claim of claim 4 above. Therefore, it is rejected for the same reason as claim 4 above.

Claims 5, 12 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Sun, Hundley, Gloster and Fong, as applied to claims 4, 11 and 17 respectively above, and further in view of Bird et al. (US Pub. 2014/0181833 A1).
	Bird was cited in the IDS filed on 11/15/2021.

As per claim 5, Sun, Hundley, Gloster and Fong teach the invention according to claim 4 above. Gloster teaches wherein the bit stream associated with the function to be accelerated is loaded on a Field Programmable Gate Array (Gloster, Fig. 7, 730; [0055] lines 5-7, Each bit stream in the library is used to program an FPGA to function as an algorithm-specific digital signal processor; [0070] lines 1-6, In step 730, the data unit and control unit are loaded onto a circuit board of the integrated circuit device, such as for example an FPGA. In one embodiment, one or both the data unit and control unit comprise a bit stream which is configured to be loaded onto the FPGA). In addition, Fong teaches a Field Programmable Gate Array that has a shorter queue depth (Fong, Fig. 4, 418-(1-n) Queues, 408-(1-n) GPUs (as FPGA); [0074] line 24, FPGA; Abstract, lines 3-4, receiving a task request for associated with a workload; [0001] lines 2-5, a hybrid computing infrastructure may be comprised…one or more accelerators, such as graphical processing units (GPUs); [0037] lines 5-6, a plurality of GPUs 408-1 through 408-n, each having a queue 418-1 through 418-n, respectively; [0038] lines 2-9, determine an appropriate GPU among GPU 408-1 through GPU 408-n for offloading a task workload. In one embodiment, the appropriate GPU is determined based on one or more considerations. One such consideration is queue length (as queue depth). For example, if CPU 406 first selects GPU 408-1 to offload thread 406a, but GPU 408-1 has a long queue 418-1, the CPU can look for a GPU with a shorter queue; [0042] lines 1-3, The GPU information may include the current queue length of each GPU).

However, Sun, Hundley, Gloster and Fong fail to specifically teach that the bit stream associated with the function to be accelerated that has a shortest queue depth.

However, Bird teaches the bit stream associated with the function to be accelerated that has a shortest queue depth. (Bird, Abstract, lines 7-8, A single processing queue is created for each processor; [0049] lines 11-14, Simple load balancing such as a round robin dispatching or scheduling of incoming threads /tasks, or adding new tasks to the shortest queue can be used to ensure the run queue lengths remain relatively even).

It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have combined the teaching of Sun, Hundley, Gloster and Fong with Bird because Bird’s teaching of assigning/scheduling the tasks to the shortest queue would have provided Sun, Hundley, Gloster and Fong’s system with the advantage and capability to allow the system to evenly distributing the tasks which improving the system efficiency.

As per claim 12, it is a method claim of claim 5 above. Therefore, it is rejected for the same reason as claim 5 above.

As per claim 18, it is a system claim of claim 5 above. Therefore, it is rejected for the same reason as claim 5 above.


Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ZUJIA XU whose telephone number is (571)272-0954. The examiner can normally be reached M-F 9:00-5:30 EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Meng-Ai An can be reached on (571) 272-3756. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/MENG AI T AN/Supervisory Patent Examiner, Art Unit 2195                                                                                                                                                                                                        

/Z.X./Examiner, Art Unit 2195