DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This Office Action is in response to Applicant’s Amendment and Remarks filed on 13 June 2022. 
Claims 1-3, 5-9, 12-15, 17-21 and 24-28 are pending for examination. Claims 4, 10-11, 16 and 22-23 were cancelled. 


Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f): 
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation(s) are: “A compute device…means for scheduling” in claim 25.
Because these claim limitations are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, they are being interpreted to cover the corresponding either structure, material, or acts to the function described in the specification as performing the claimed function, and equivalents thereof.  The corresponding structure can be found in paragraph [0008] that discloses “The disclosed embodiments may be implemented, in some cases, in hardware, firmware, software, or any combination thereof.”

If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may: (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.


Claim Rejections - 35 USC § 112(b)
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.

Claims 1-3, 5-9, 12-15, 17-21 and 24-28 are rejected under 35 U.S.C. 112(b), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor, or for pre-AIA  the applicant regards as the invention.
As per claims 1, 13, 25 and 26 (line# refers to claim 1):
In line 9, it recites the phrase “multiple FPGAs”. However, prior to this phrase at line 3, it recites “multiple field programmable gate arrays (FPGA)”. Thus, it is unclear whether the second recitation of “multiple FPGAs” is the same or different from the first recitation of “multiple field programmable gate arrays (FPGA)”. If they are the same, the or said should be used. For examining purpose, examiner will interpret as the same one.

As per claims, 2-3, 5-9, 12, 14-15, 17-21, 24 and 27-28:
They are compute device, one or more non-transitory machine-readable storage media, computing device and method claims that depend on claims 1, 13, 25 and 26 respectively above. Therefore, they have same deficiencies as claims 1, 13, 25 and 26 respectively above. 


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-3, 5-6, 12-15, 18 and 24-28 are rejected under 35 U.S.C. 103 as being unpatentable over Hundley (US Pub. 2005/0278502 A1) in view of Fong et al. (US Pub. 2018/0052709 A1) and further in view of Hebert et al. (US Pub. 2017/0346902 A1).
Hundley and Fong were cited in the previous Office Action.
Hebert was cited in the PTO-892 on 04/14/2021.

As per claim 1, Hundley teaches the invention substantially as claimed including A compute device comprising (Hundley, Fig. 1; [0004] lines 1-2, Data processing hardware, such as computers and personal computers): 
a processor to execute an application (Hundley, Fig. 1, 100, 122 CPU; [0004] lines 1-2, Data processing hardware, such as computers and personal computers (as include processor); [0032] lines 1-4, the computing device 100 executes software instructions. The software instructions can direct the computing device to operate on a block of data, such as a file or some other predetermined block of data); and
an accelerator pool including multiple field programmable gate arrays (FPGA) (Hundley, Fig. 1, 125 (as accelerator pool), 130 HW accelerators; [0027] lines 4-6, the hardware accelerators 130 may be implemented in a variety of methods, including Field Programmable Gate Arrays (FPGAs)); 
the processor to (i) receive, from the application, a request to accelerate a function (Hundley, Fig. 7, 355 Task management unit; Abstract, lines 12-14, A task management unit on the circuit card assembly receive the playlist and schedules the hardware acceleration operations; [0011] lines 3-4, reducing the CPU cycles required to manage the hardware acceleration operations (as Task management unit that tied to CPU for manage the hardware acceleration operations); [0040] line 2, a task management unit (‘TMU”) 355 in the circuit card assembly; [0032] lines 5-10, When the computing device 100 determines that a block of data (as function) is to be operated on by one of the hardware accelerators 130, a command (as request) to perform the operation is transmitted to the circuit card assembly 125 via the I/O bus 132; [0048] lines 1-3, the computing device 320 generates a data structure of commands to be performed by accelerators 330 in the hardware domain. Lines 11-14, The table of commands, also referred to herein as a playlist, lists one or more of the accelerators 330A-330N and available acceleration command options for particular accelerators; [0049] lines 1-3, the computing device 320 transmits the playlist to the TMU 355 in the hardware domain); 
(ii) determine suitability associated with each FPGA in the accelerator pool (Hundley, [0065] lines 6-14, The TMU 355 may receive an input from the computing device 320, either as part of the playlist 500, a rules based playlist, or a separate instruction, indicating the operations that may be executed out of order. Alternatively, the TMU 355 may intelligently determine, based on the types of accelerators 540 in the playlist 500 and the options 550 associated with the listed accelerators 540; [0069] lines 3-13, The TMU 355 accesses the file stored in memory 340 and determines which operations listed in Rule 1 list 560 should next be executed. Each of the comparisons with Results1 data may determine what type of file is in the Results1 data. For example, the Result_A may represent an image file (e.g. .jpg, .gif, .tif), the Result_B may represent an executable file (e.g. .exe), and the Result_C may represent a compressed file (e.g. .zip, .rar, .hqx). The accelerators A, B, C, and D may perform different types of antivirus or decompression operations that are suitable for specific file types (as suitability to perform the function); also see [0069] lines 16-17, accelerator A, which may be an image decompression and/or optimization accelerator; lines 20-21, accelerator B, which may be an antivirus accelerator; lines 28-29, accelerator C, which may be a decompression accelerator; [0004] lines 11-13, a hardware accelerator can be any hardware that is designated to perform specific algorithmic operations on data; [Examiner noted: each accelerator’s suitability is determined in order to allow each hardware accelerator to perform different types of operations]); 
(iii) schedule, in response to the request and the determined suitability associated with FPGA, acceleration of the function on multiple FPGAs to produce output data (Hundley, Fig. 4, 420, 440 transmit header to selected hardware accelerator, 450 perform operation; Fig. 5A; [0020] lines 1-3, FIG. 5A is a table illustrating an exemplary playlist containing instructions for multiple accelerators to perform multiple operations on a block of data; [0041] lines 22-27, the TMU 355 may generate instructions for any of hardware accelerators 130, transmit the instructions to the hardware accelerators 130 via the interconnect 150, and allow the accelerators 130 to perform the requested operations by accessing the input data directly from memory 140; [0036] lines 3-4, The hardware accelerator 130A uses data 210A as input data and processes the data to produce output data); and 
(iv) provide in response to completion of acceleration of the function, the output data (Hundley, Fig. 4, 470 additional commands in Playlist, NO to 480 Transmit requested output data; [0032] lines 30-31, the hardware accelerator 130 may process the data, ultimately returning the results to the computing device 120 in the software domain; [0058] lines 1-6, If, in block 470, the TMU 355 determines that there are no algorithmic operations remaining to be performed on the block of data, the method continues to block 480 where output data is transmitted from the memory 340 to the computing device 320 via the interconnect 350 and I/O bus 332); [0080] lines 16-20, each command is executed by the associated hardware accelerator until…the end of processing and the data is returned to the software domain).

Hundley fails to specifically teach determine a queue depth associated with each FPGA, and when scheduling, it is in response to the request and the determined queue depth associated with each FPGA.

 However, Fong teaches determine a queue depth associated with each FPGA, and when scheduling, it is in response to the request and the determined queue depth associated with each FPGA (Fong, Fig. 4, 418-(1-n) Queues, 408-(1-n) GPUs (as FPGA); [0074] line 24, FPGA; Abstract, lines 3-4, receiving a task request for associated with a workload; [0001] lines 2-5, a hybrid computing infrastructure may be comprised…one or more accelerators, such as graphical processing units (GPUs); [0037] lines 5-6, a plurality of GPUs 408-1 through 408-n, each having a queue 418-1 through 418-n, respectively; [0038] lines 2-9, determine an appropriate GPU among GPU 408-1 through GPU 408-n for offloading a task workload. In one embodiment, the appropriate GPU is determined based on one or more considerations. One such consideration is queue length (as queue depth). For example, if CPU 406 first selects GPU 408-1 to offload thread 406a, but GPU 408-1 has a long queue 418-1, the CPU can look for a GPU with a shorter queue (as scheduling based on queue depth/length); also see [0042] lines 1-3, The GPU information may include the current queue length of each GPU).

It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have combined the teaching of Hundley with Fong because Fong’s teaching of determining queue length/depth and assigning the task workload according to the queue length/depth would have provided Hundley’s system with the advantage and capability to evenly distributing the tasks among the different accelerators which improving the workload balancing and efficiency.

Both Hundley and Fong fail to specifically teach each of the multiple FPGAs to load a bit stream associated with the function to be accelerated.

However, Hebert teaches each of the multiple FPGAs to load a bit stream associated with the function to be accelerated (Hebert, [0066] lines 3-19, server module 114 can determine that FPGA, GPU, and/or CPU computing resource(s) can execute faster once configured for the determined application…the model can indicate (i.e., in resource requirement(s)) that a certain bit stream of an FPGA is needed. Server module 114 can compare this resource requirement to the resource attributes of the selected computing resource (i.e., the FPGA). Server module 114 can then determine that this FPGA, while can support the resource requirement, should be reconfigured prior to executing the determined application. Furthermore, server module 114 can determine that multiple FPGAs (e.g., on a single resource node) should be reconfigured to support a parallel execution mode (as specified by resource requirements; [0106] lines 1-6, each FPGA can be re-configured (e.g., using a configuration script for the relevant request) in order to facilitate execution of application 1106. For example, this configuration script can configure FPGA computing resources 1104(1)-1104(N) with at least portions of application 1106 (e.g., with certain functions of application 1106 that can be executed in parallel by multiple FPGAs); Claim 3, wherein the reconfigurable logic device is a field-programmable gate array (FPGA), and the executing the configuration script causes a bit stream to be loaded into the FPGA). 

It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have combined the teaching of Hundley and Fong with Hebert because Hebert’s teaching of loading bit stream to each FPGA for reconfiguring the FPGAs to performing the requested operation would have provided Hundley and Fong’s system with the advantage and capability to allow the system to ensuring each FPGAs in the resource node having correct corresponding configurations for performing the application which improving the system performance and efficiency.

As per claim 2, Hundley, Fong and Hebert teach the invention according to claim 1 above. Hundley further teaches determine parameters of the request to accelerate the function and wherein to schedule acceleration of the function further comprises to schedule acceleration of the function based on the determined parameters of the request (Hundley, [0065] lines 6-14, The TMU 355 may receive an input from the computing device 320, either as part of the playlist 500, a rules based playlist, or a separate instruction, indicating the operations that may be executed out of order. Alternatively, the TMU 355 may intelligently determine, based on the types of accelerators 540 in the playlist 500 and the options 550 associated with the listed accelerators 540 (as determine parameters of the request); [0069] lines 3-13, The TMU 355 accesses the file stored in memory 340 and determines which operations listed in Rule 1 list 560 should next be executed. Each of the comparisons with Results1 data may determine what type of file is in the Results1 data. For example, the Result_A may represent an image file (e.g. .jpg, .gif, .tif), the Result_B may represent an executable file (e.g. .exe), and the Result_C may represent a compressed file (e.g. .zip, .rar, .hqx). The accelerators A, B, C, and D may perform different types of antivirus or decompression operations that are suitable for specific file types).

As per claim 3, Hundley, Fong and Hebert teach the invention according to claim 2 above. Hundley further teaches determine one or more of a type of function to be accelerated, a size of a data set to be operated on, or a time period in which acceleration of the function is to be completed (Hundley, [0069] lines 3-13, The TMU 355 accesses the file stored in memory 340 and determines which operations listed in Rule 1 list 560 should next be executed. Each of the comparisons with Results1 data may determine what type of file is in the Results1 data. For example, the Result_A may represent an image file (e.g. .jpg, .gif, .tif), the Result_B may represent an executable file (e.g. .exe), and the Result_C may represent a compressed file (e.g. .zip, .rar, .hqx). The accelerators A, B, C, and D may perform different types of antivirus or decompression operations that are suitable for specific file types (as determine type of function to be accelerated)).

As per claim 5, Hundley, Fong and Hebert teach the invention according to claim 1 above. Fong further teaches wherein to schedule acceleration of the function comprises to assign the function to multiple accelerator devices based on number of functions presently assigned (Fong, [0038] lines 2-9, determine an appropriate GPU among GPU 408-1 through GPU 408-n for offloading a task workload. In one embodiment, the appropriate GPU is determined based on one or more considerations. One such consideration is queue length (as queue depth). For example, if CPU 406 first selects GPU 408-1 to offload thread 406a, but GPU 408-1 has a long queue 418-1, the CPU can look for a GPU with a shorter queue (as scheduling based on number of functions presently assigned).

As per claim 6, Hundley, Fong and Hebert teach the invention according to claim 1 above. Hundley further teaches determine a type of function each FPGA is presently configured to accelerate (Hundley, Fig. 5A, [0027] lines 4-6, the hardware accelerators 130 may be implemented in a variety of methods, including Field Programmable Gate Arrays (FPGAs); [0065] lines 6-14, The TMU 355 may receive an input from the computing device 320, either as part of the playlist 500, a rules based playlist, or a separate instruction, indicating the operations that may be executed out of order. Alternatively, the TMU 355 may intelligently determine, based on the types of accelerators 540 in the playlist 500 and the options 550 associated with the listed accelerators 540); and 
wherein to schedule acceleration of the function comprises to schedule acceleration of the function based additionally on the determined type of function each FPGA is presently configured to accelerate (Hundley, [0027] lines 4-6, the hardware accelerators 130 may be implemented in a variety of methods, including Field Programmable Gate Arrays (FPGAs); [0057] lines 8-9, determining which accelerator 330 should next operate on the data; [0069] lines 3-13, The TMU 355 accesses the file stored in memory 340 and determines which operations listed in Rule 1 list 560 should next be executed. Each of the comparisons with Results1 data may determine what type of file is in the Results1 data. For example, the Result_A may represent an image file (e.g. .jpg, .gif, .tif), the Result_B may represent an executable file (e.g. .exe), and the Result_C may represent a compressed file (e.g. .zip, .rar, .hqx). The accelerators A, B, C, and D may perform different types of antivirus or decompression operations that are suitable for specific file types; [0041] lines 22-27, the TMU 355 may generate instructions for any of hardware accelerators 130, transmit the instructions to the hardware accelerators 130 via the interconnect 150, and allow the accelerators 130 to perform the requested operations by accessing the input data directly from memory 140).

As per claim 12, Hundley, Fong and Hebert teach the invention according to claim 1 above. Hundley further teaches wherein the FPGA is to send, to the processor, a notification indicative of completion of the acceleration of the function (Hundley, Fig. 4, 460 Notify TMU of completion of operation; [0054] lines 1-3, the accelerator 330 that operated on the data transmits a signal to the TMU 355 indicating that the operation has been completed).


As per claims 13-15, 18 and 24, they are one or more non-transitory machine-readable storage media claims of claims 1-3, 6 and 12 respectively above. Therefore, they are rejected for the same reason as claims 1-3, 6 and 12 respectively above.

As per claim 25, it is a computing device claim of claim 1 above. Therefore, it is rejected for the same reason as claim 1 above. In addition, Hundley further teaches circuitry for executing an application (Hundley, Fig. 1, 100, 122 CPU, 126 memory (as circuitry); [0032] lines 1-4, the computing device 100 executes software instructions. The software instructions can direct the computing device to operate on a block of data, such as a file or some other predetermined block of data)).

As per claims 26-28, they are method claims of claims 1-3 respectively above. Therefore, they are rejected for the same reason as claims 1-3 respectively above.


Claims 7-8 and 19-20 are rejected under 35 U.S.C. 103 as being unpatentable over Hundley, Fong and Hebert, as applied to claims 1 and 13 respectively above, and further in view of Krishnamurthy et al. (US Pub. 2011/0131430 A1).
Krishnamurthy was cited in the previous Office Action.

As per claim 7, Hundley, Fong and Hebert teach the invention according to claim 1 above. Hundley further teaches wherein the function is one of multiple functions in a sequence of functions to be accelerated (Hundley, Fig. 5A; [0020] lines 1-3, FIG. 5A is a table illustrating an exemplary playlist containing instructions for multiple accelerators to perform multiple operations (as sequence of functions) on a block of data).

Hundley, Fong and Hebert fail to specifically teach the processor is further to determine whether to accelerate the multiple functions on a single FPGA in the accelerator pool.

However, Krishnamurthy teaches the processor is further to determine whether to accelerate the multiple functions on a single FPGA in the accelerator pool (Krishnamurthy, [0033] lines 1-10, Assume Virtual Queue 4, which runs tasks with constant known execution times, has six tasks queued and each task has a completion time of 10 ms. The accelerator scheduler examines the completion time of all six tasks (not just the first task) and determines that all the tasks can be run on its corresponding accelerator and still finish on time. Thus, other accelerators are not brought up for these tasks (i.e., not run at all or remain in a hibernate state), and these tasks do not need to be moved for proper servicing (as determine whether to accelerate the multiple functions on a single accelerator)).

It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have combined the teaching of Hundley, Fong and Hebert with Krishnamurthy because Krishnamurthy’s teaching of determining whether the tasks can be performed within one accelerator based on the tasks processing time would have provided Hundley, Fong and Hebert’s system with the advantage and capability to allow the system to distributing the tasks among the accelerators based on the time of the processing tasks which improving the system performance.

As per claim 8, Hundley, Fong, Hebert and Krishnamurthy teach the invention according to claim 7 above. Krishnamurthy further teaches determine a time estimate to reconfigure the FPGA for each function in the sequence (Krishnamurthy, [0033] lines 1-10, Assume Virtual Queue 4, which runs tasks with constant known execution times, has six tasks queued and each task has a completion time of 10 ms. The accelerator scheduler examines the completion time of all six tasks (not just the first task) and determines that all the tasks can be run on its corresponding accelerator and still finish on time. Thus, other accelerators are not brought up for these tasks (i.e., not run at all or remain in a hibernate state), and these tasks do not need to be moved for proper servicing (as reconfigure the accelerator device for processing tasks if the time is not meet); [0042] lines 1-5, A determination is then made as to whether the selected queue could meet the specified criteria of the task, such as, for instance, start time and completion time, and/or acceptable energy level, etc., INQUIRY 510. If so, then the task is enqueued on that queue).

As per claims 19-20, they are one or more non-transitory machine-readable storage media claims of claims 7-8 respectively above. Therefore, they are rejected for the same reason as claims 7-8 respectively above.

Claims 9 and 21 are rejected under 35 U.S.C. 103 as being unpatentable over Hundley, Fong, Hebert and Krishnamurthy, as applied to claims 7 and 19 respectively above, and further in view of Bharadwaj et al. (US Pub. 2019/0042110 A1).
Bharadwaj was cited in the previous Office Action.

As per claim 9, Hundley, Fong, Hebert and Krishnamurthy teach the invention according to claim 7 above. Hundley teaches transfer output data from one FPGA to another FPGA in the accelerator pool (Hundley, Fig. 3, step 2, data 310A to HW accelerator 330A, step 3, 312 data from output of HW accelerator 330A is transfer to HW accelerator 330B at step 4).

Hundley, Fong, Hebert and Krishnamurthy fail to specifically teach determine a time estimate to transfer output data.

However, Bharadwaj teaches determine a time estimate to transfer output data (Bharadwaj, [0094] lines 12-14, determining a transfer time required to transfer the data length over the port; upon receiving a next IO request, determining whether a time interval between the IO request and the next IO request is less than the transfer time).

It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have combined the teaching of Hundley, Fong, Hebert and Krishnamurthy with Bharadwaj because Bharadwaj’s teaching of determining the time for transferring the data length would have provided Hundley, Fong, Hebert and Krishnamurthy’s system with the advantage and capability to allow the system to calculate the total amount time needed for processing the tasks and transferring the data in order to determining the different approaches for processing the tasks which improving the system efficiency.

As per claim 21, it is one or more non-transitory machine-readable storage media claim of claim 9 above. Therefore, it is rejected for the same reason as claim 9 above.

Claim 17 is rejected under 35 U.S.C. 103 as being unpatentable over Hundley, Fong and Hebert, as applied to claim 13 above, and further in view of Bird et al. (US Pub. 2014/0181833 A1).
Bird was cited in the previous Office Action.

As per claim 17, Hundley, Fong and Hebert teach the invention according to claim 13 above. Fong further teaches wherein to schedule acceleration of the function comprises to assign the function to one of the FPGAs that has shorter queue depth (Fong, [0038] lines 2-9, determine an appropriate GPU among GPU 408-1 through GPU 408-n for offloading a task workload. In one embodiment, the appropriate GPU is determined based on one or more considerations. One such consideration is queue length (as queue depth). For example, if CPU 406 first selects GPU 408-1 to offload thread 406a, but GPU 408-1 has a long queue 418-1, the CPU can look for a GPU with a shorter queue (as scheduling based on number of functions presently assigned). 

Hundley, Fong and Hebert fail to specifically teach when assigning the function to one of the FPGAs that has the shortest queue depth.

However, Bird teaches when assigning the function to one of the FPGAs that has the shortest queue depth (Bird, Abstract, lines 7-8, A single processing queue is created for each processor; [0049] lines 11-14, Simple load balancing such as a round robin dispatching or scheduling of incoming threads /tasks, or adding new tasks to the shortest queue can be used to ensure the run queue lengths remain relatively even).

It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have combined the teaching of Hundley, Fong and Hebert with Bird because Bird’s teaching of assigning/scheduling the tasks to the shortest queue would have provided Hundley, Fong and Hebert’s system with the advantage and capability to allow the system to evenly distributing the tasks which improving the system efficiency.


Response to Arguments
The Amendment filed on 06/13/2022 has been entered. Applicant’s amendment has overcome the previous rejections under 35 U.S.C § 112(b). However, new 112(b) rejection has been made in response to the Applicant’s amendment.

In the remark applicant’s argue in substance: 
(a), Fong fails to disclose or suggest at least the applicant's claimed:  "determine, a queue depth associated with each of multiple field programmable gate arrays (FPGA) in an accelerator pool of the compute device; schedule in response to the request and the determined queue depth associated with each FPGA, acceleration of the function on multiple FPGAs to produce output data, each of the multiple FPGAs to load a bit stream associated with the function to be accelerated" as claimed by the applicant in Claim 1 (as amended). The additional references Hundley and LI fail to cure the deficiencies of Fong noted above. Therefore, separately or in combination, Hundley, Fong and LI do not teach or suggest the applicant's claimed invention.

Examiner respectfully disagreed with Applicant’s argument for the following reasons:
As to point (a), Examiner would like to point out that Hundley teaches each of multiple field programmable gate arrays (FPGA) in an accelerator pool of the compute device and schedule in response to the request, acceleration of the function on multiple FPGAs to produce output data. 
For example, Hundley teaches a system that including an accelerator pool having multiple FPGAs, and these FPGAs will accelerating the functions based on the request and outputting result after performing the acceleration (see Hundley, Fig. 1, 125 (as accelerator pool), 130 HW accelerators; [0027] lines 4-6, the hardware accelerators 130 may be implemented in a variety of methods, including Field Programmable Gate Arrays (FPGAs)), Fig. 4, 420, 440 transmit header to selected hardware accelerator, 450 perform operation; Fig. 5A; [0020] lines 1-3; [0041] lines 22-27; [0036] lines 3-4; [0032] lines 30-31; [0058] lines 1-6; [0080] lines 16-20, each command is executed by the associated hardware accelerator until…the end of processing and the data is returned to the software domain).
In addition, Examiner used Fong for teaching the concept of determine a queue depth associated with each FPGA, and when scheduling, it is in response to the request and the determined queue depth associated with each FPGA (see Fong, Fig. 4, 418-(1-n) Queues, 408-(1-n) GPUs (as FPGA); [0074] line 24, FPGA; Abstract, lines 3-4, receiving a task request for associated with a workload; [0001] lines 2-5; [0037] lines 5-6; [0038] lines 2-9, determine an appropriate GPU among GPU 408-1 through GPU 408-n for offloading a task workload. In one embodiment, the appropriate GPU is determined based on one or more considerations. One such consideration is queue length (as queue depth). For example, if CPU 406 first selects GPU 408-1 to offload thread 406a, but GPU 408-1 has a long queue 418-1, the CPU can look for a GPU with a shorter queue (as scheduling based on queue depth/length); also see [0042] lines 1-3, The GPU information may include the current queue length of each GPU).
Further, Examiner used Hebert (was cited in the PTO-892 on 04/14/2021) for teaching the newly amended limitation of each of the multiple FPGAs to load a bit stream associated with the function to be accelerated (see Hebert, [0066] lines 3-19, server module 114 can determine that FPGA, GPU, and/or CPU computing resource(s) can execute faster once configured for the determined application…the model can indicate (i.e., in resource requirement(s)) that a certain bit stream of an FPGA is needed; [0106] lines 1-6; Claim 3, wherein the reconfigurable logic device is a field-programmable gate array (FPGA), and the executing the configuration script causes a bit stream to be loaded into the FPGA). Please refer to the rejection under 35 U.S.C § 103 above. To the extent that applicants are arguing against the references individually, the examiner reminds the applicants that one cannot show nonobviousness by attacking references individually where the rejections are based on combinations of references.  See In re Keller, 642 F.2d 413, 208 USPQ 871 (CCPA 1981); In re Merck & Co., 800 F.2d 1091, 231 USPQ 375 (Fed. Cir. 1986).

For the reasons above, Applicant’s argument has not been found to be persuasive, and therefore the rejections are maintained. 


Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ZUJIA XU whose telephone number is (571)272-0954. The examiner can normally be reached M-F 9:00-5:30 EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Meng-Ai An can be reached on (571) 272-3756. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/MENG AI T AN/Supervisory Patent Examiner, Art Unit 2195                                                                                                                                                                                                        

/Z.X./Examiner, Art Unit 2195