DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claims 1-12 and 14-21 are pending in this application.

Response to Arguments
Applicant’s arguments regarding the rejections of claims 3-6, 8-10, 12, 14, 17, and 18 under 35 U.S.C. 112b have been fully considered and are persuasive. The rejections have been withdrawn. However, new 112(b) rejections are applied to claims 9-10 based on the amendments.

Applicant’s arguments regarding 35 U.S.C. 101 rejections of claims 14-18 have been fully considered and are persuasive. The rejections have been withdrawn. 

Applicant's arguments regarding the 35 U.S.C. 103 rejections of claims 1-12, 14-20 have been fully considered. The arguments are moot because the arguments do not apply to the new grounds of rejection from the references being used in the current rejection. 

Claim Objections
Claims 2, 12, 15, and 16 are objected to because of the following informalities: 
Lines 8-9 of claim 2 and lines 9-10 of claim 16 recite “migrate the the
Line 12 of claim 12 has a spacing issue because there isn’t a space between the comma before “wherein” and the word “wherein”; and
Line 2 of claim 15 recites “circuity” which is spelled incorrectly.
Appropriate correction is required.

Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:

(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do means for identifying, means for migrating, and means for migrating in claims 19-20.
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof. The corresponding structure can be found in paragraphs [0088-0089] that discloses “The kernel analysis and decision logic unit 1617 may be embodied as any device or circuitry… the kernel analysis and decision logic unit 1617 may evaluate the telemetry data relative to one or more policies (e.g., a service level agreement (SLA) or one or more quality-of-service (QoS) requirements). The policies may include thresholds, which, when exceeded, trigger the kernel analysis and decision logic unit 1617 to orchestrate actions to perform in response. For instance, as further described herein, the kernel analysis and decision logic unit 1617 may cause a given accelerator device to migrate an accelerator kernel to a target accelerator device.” 
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitations are: circuitry to monitor…identify…and migrate, circuitry to…suspend…serialize…migrate…deserialize, circuitry is further to monitor a power consumption, circuitry is further to receive a notification, circuitry is further to determine, circuitry is further to, upon a determination that the fragmentation situation is present, migrate, circuitry is further to: detect…determine…migrate, circuitry to…generate…configure…initialize…update, circuitry is further to…generate an alert, circuitry for monitoring in claims 1-12, 14-19, and 21. 
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof. The corresponding structure can be found in paragraph [0088] that discloses “The kernel analysis and decision logic unit 1617 may be embodied as any device or circuitry to obtain telemetry data indicative of resource usage and power consumption of the accelerator sleds 1610, 1612 and the compute sled 1617”, in paragraph [0107] which recites “orchestrator server 1616 monitors resource usage of a kernel on a source accelerator device”, in paragraph [0108] which recites “orchestrator server 1616 identifies a target accelerator device”, in paragraph [0110] which recites “the orchestrator server 1616 performs a migration process”, in paragraph [0110] which recites “the orchestrator server 1616 causes the source accelerator device to serialize data”, in paragraph [0110] which recites “orchestrator server 1616 causes the target accelerator device to deserialize data”, in paragraph [0114] which recites “the orchestrator server 1616 receives a notification of available accelerator devices”, in paragraph [0116] which recites “orchestrator server 1616 determines, based on the evaluation, whether a fragmenting situation is present. If not, then the method 2400 ends. Otherwise, in block 2414, the orchestrator server 1616 migrates a portion of the workload”, in paragraph [0120] which recites “the orchestrator server 1616 detects a trigger”, in paragraph [0121] which recites “the orchestrator server 1616 may determine a configuration of accelerator devices”, in paragraph [0122] which recites “the orchestrator server 1616 migrates one or more of the accelerator kernels associated with the workload to the accelerator devices identified in the configuration”, in paragraph [0122] which recites “the orchestrator server 1616 generates, from each kernel to be scaled-out to a given accelerator device, a corresponding kernel bit stream…the orchestrator server 1616 configures the kernel bit streams on the accelerator devices… the orchestrator server 1616 initializes inter-kernel communication channels…the orchestrator server 1616 may, in block 2616, update a managed node registry”, in paragraph [0093] which recites “the orchestrator server 1616 may generate an alert”, and in paragraph [0082] which recites “an orchestrator server 1520, which may be embodied as a managed node comprising a compute device (e.g., a processor 820 on a compute sled 800)”.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.

Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.

The following is a quotation of the first paragraph of pre-AIA  35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.

Claims 1-12 and 14-21 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA  35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention. 

As per claims 1, 15, and 19 (line numbers refer to claim 1):
	Lines 5-8 recite “responsive to the monitored resource usage meeting or exceeding a resource usage amount indicated in one or more policies, identify a target accelerator device to which to migrate the accelerator kernel” but paragraph [00107] of the specification recites “In block 2106, the orchestrator server 1616 determines whether a specified threshold is exceeded. If not, then the method 2100 returns to block 2102. Otherwise, if a threshold is exceeded, then in block 2108, the orchestrator server 1616 identifies a target accelerator device (e.g., from a registry of accelerator devices in the system 1600) to which to migrate the kernel.” Therefore, there is a lack of written description for “responsive to the monitored resource usage meeting a resource usage amount indicated in one or more policies, identify a target accelerator device to which to migrate the accelerator kernel.”


	Lines 4-6 recite “responsive to the monitored power consumption meeting or exceeding a power consumption amount indicated in the one or more policies, scale-out the workload to one or more accelerator devices on a target accelerator sled”, but paragraph [00110] of the specification recites “In block 2306, the orchestrator server 1616 determines whether a power consumption threshold is exceeded. If not, then the method 2300 returns to block 2302. Otherwise, the orchestrator server 1616 determines whether accelerator resources in the system 1300 are available for a scale-out operation.” Therefore, there is a lack of written description for “responsive to the monitored power consumption meeting a power consumption amount indicated in the one or more policies, scale-out the workload to one or more accelerator devices on a target accelerator sled”.

Dependent claims 2, 4-12, 14, 16-18, and 20-21 are dependent claims of claims 1, 15, and 19 and fail to resolve the deficiencies of claims 1, 15, and 19. Therefore, they are rejected for the same reasons as claims 1, 15, and 19 above. 

The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 9-10 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.

As per claim 9:
	Line 3 recites “the one or more first accelerator devices” which lacks antecedent basis. 

As per claim 10 it is a dependent claim of claim 9 and does not cure the deficiencies of claim 9. Therefore it is rejected for the same issue as claim 9 above.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 3, 4, 6, 11, 15, and 17-19 are rejected under 35 U.S.C. 103 as being unpatentable over Krishnamurthy et al. (US 8739171 B2 herein Krishnamurthy) in view of Ratering et al. (US 20110161495 A1 herein Ratering).

As per claim 1, Krishnamurthy teaches a device comprising: circuitry to: monitor resource usage of an accelerator kernel while the accelerator kernel is being executed by a source accelerator device (Figs. 1, 3, 102 server, 116 processor(s), 118 workload manager (as circuitry), 104 accelerators; Fig. 8, 804 monitor…accelerator system; Col. 1 lines 47-49 a second set of resources at the set of accelerator systems are monitored based on the set of high-resource guarantees with respect to input/output, memory, and processor components; Col. 11 lines 18-20 Kernels executing similar functions can run on the host or the accelerator; Col. 14 lines 45-47 The workload manager 118, at step 606, monitors a second set of resources such as, but not limited to, compute kernels (as accelerator kernel)); 
responsive to the monitored resource usage meeting or exceeding a resource usage amount indicated in one or more policies, identify a target accelerator device to which to migrate a workload; and migrate the workload from the source accelerator device to the target accelerator device to cause the workload to be executed by the target accelerator device (Figs. 8, 9; Col. 9 lines 37-41 determines that the given threshold is not being met or, alternatively, is being exceeded then the mobility manager 134 migrates at least a portion of the workload from its current system (server 102 or accelerator 104) to the other system (server 102 or accelerator 104); Col. 15 lines 13-32 The workload manager 118, at step 804, monitors a set of workloads scheduled on each of a server system 102 and an accelerator system 104. The workload manager 118, at step 806, determines if at least one of the workloads has exceeded a threshold of highest priority SLA limits on one of the systems 102, 104…If the result of this determination is positive, the workload manager 118, at step 808, determines if the workload is likely to meet highest SLA priority limits on a second system that has the resource capacity for one additional workload (as identify)…If the result of this determination is positive, the workload manager 118, at step 810, dynamically reallocates the at least one workload to the threshold has been met)…If the result of this determination is positive, the workload manager 118, at step 1110, saves workload data required to migrate the workload associated with the condition and/or threshold over to the server system 102; Col. 9 lines 4-6 the executable can be stopped at the migrating from processor and restarted on the migrated to processor).  

	Krishnamurthy disclosed migrate the workload, but fails to specifically teach identify a target accelerator device to which to migrate the accelerator kernel and cause the accelerator kernel to be executed by the target accelerator device.

	However, Ratering teaches identify a target accelerator device to which to migrate the accelerator kernel and cause the accelerator kernel to be executed by the target accelerator device (abstract lines 4-6 one or more computing operations (as accelerator kernel) may be offloaded from a local processor to a virtual device that represents available resources of a cloud; [0020] lines 1-9 the virtual OpenCL device 110 may represent the available resources in the cloud 130 to the client(s) 102. If the application 104 is for instance looking for the device with the highest performance, it may select (as identify) the virtual device 110 from the list and use it through the same OpenCL functions (as accelerator kernel) as a local device. In an embodiment, one special property of the virtual device is that it may not execute the OpenCL functions locally, but instead forwards them over the network to a compute cloud 130; [0021] line 1 handle would select the device from the list and use it through the same OpenCL functions as a local device. Accordingly, an application may determine if it makes sense to run a given OpenCL kernel on a cloud system or locally). 

It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have combined Krishnamurthy with the teachings of Ratering because Ratering’s teaching of offloading kernels to a virtual accelerator in the cloud allows for the performance advantages of the cloud to be utilized (see Ratering, [0012] lines 4-12 compute-intensive OpenCL applications are accelerated by offloading one or more compute kernel(s) of an application to a compute cloud over a local network (such as the Internet or an intranet)…This allows OpenCL applications to run on light-weight systems and tap into the performance potential of large servers in a back-end cloud.).

As per claim 3, Krishnamurthy and Ratering teach the device of claim 1. Krishnamurthy specifically teaches wherein the circuitry is further to: monitor a power consumption of a source accelerator sled having one or more accelerator devices executing a workload; and responsive to the monitored power consumption meeting or exceeding a power consumption amount indicated in the one or more policies, scale-out the workload to one or more accelerator devices on a target accelerator sled (Col. 9 lines 33-41 The mobility manager 134 monitors performance metrics and/or energy metrics with respect to the workload certain energy requirement; Col. 8 lines 53-56 Therefore, the workload manager 118, at T2, migrates the workload 304 from the accelerator 104 to a compute core (or accelerator) 310 at the server system 102; Col. 10 lines 28-29 unused capacity of server processors can be used to execute accelerator kernels; Col. 4 lines 55-59 Each accelerator 104 also comprises one or more processors 116 that comprise a set of architectural registers (not shown) that defines the accelerator architecture. It should be noted that each of the accelerator systems 104 can comprise the same or different type of processor; In other words, accelerators 104 are accelerator sleds with one or more accelerator devices because the accelerators 104 are comprised of multiple processors with accelerator architecture. Server systems 102 are also accelerator sleds because they contain accelerator 310 and they have a plurality of processors that can execute accelerator kernels.).

As per claim 4, Krishnamurthy and Ratering teach the device of claim 3. Krishnamurthy specifically teaches wherein to scale-out the workload to the one or more accelerator devices on the target accelerator sled comprises to migrate the workload from one or more of the accelerator devices of the source accelerator sled to the one or more accelerator devices of the target accelerator sled (Col. 9 lines 33-41 The mobility manager 134 monitors performance metrics and/or energy metrics with respect to the workload to .  

As per claim 6, Krishnamurthy and Ratering teach the device of claim 3. Krishnamurthy specifically teaches wherein to scale-out the workload to the one or more accelerator devices on the target accelerator sled comprises to migrate the workload from one or more accelerator sleds to one or more instances of an accelerator device on the target accelerator sled (Col. 9 lines 33-41 The mobility manager 134 monitors performance metrics and/or energy metrics with respect to the workload…the mobility manager 134 migrates at least a portion of the workload from its current system (server 102 or accelerator 104) to the other system (server 102 or accelerator 104); Col. 8 lines 53-55 Therefore, the workload manager 118, at T2, migrates the workload 304 from the accelerator 104 to a compute core (or accelerator) 310; One or more instances of an accelerator device on the target accelerator sled is taught because as mentioned above, each accelerator 104 (accelerator sled) has a plurality of processors (accelerator device) and each processor contains accelerator architecture.). 

As per claim 11, Krishnamurthy and Ratering teach the device of claim 1. Krishnamurthy specifically teaches wherein the circuitry is further to: detect a trigger to initiate a scale-out operation of a workload (Col. 9 lines 37-41 determines that the given threshold is not being met or, alternatively, is being exceeded then the mobility manager 134 migrates at least a portion of ; 
determine, as a function of the one or more policies, one or more types of accelerator devices to which to migrate the workload; and migrate the workload to the one or more types of accelerator devices to scale-out the workload (Col. 15 lines 41-52 The workload manager 118, at step 906, determines that a resource at a first one of the systems 102, 104 has the capacity for at least one additional workload. The workload manager 118, at step 908, determines if reallocating one additional workload from a second one of the systems 102, 104 would violate an SLA requirement…the workload manager 118, at step 912, dynamically reallocates a workload currently scheduled on a second one of the systems 102, 104 to the resource at the first one of the systems 102, 104; Col. 15 lines 13-32 The workload manager 118, at step 804, monitors a set of workloads scheduled on each of a server system 102 and an accelerator system 104. The workload manager 118, at step 806, determines if at least one of the workloads has exceeded a threshold of highest priority SLA limits (as policy) on one of the systems 102, 104… If the result of this determination is positive, the workload manager 118, at step 810, dynamically reallocates the at least one workload to the second system 102, 104; Col. 6 lines 2-4 programming across a number of heterogeneous devices such as CPUs, GPUs (as type), and accelerators).
Additionally, Ratering teaches initiate a scale-out operation is for one or more accelerator kernels associated with a workload ([0011] lines 1-4 Even on standard desktops and workstations compute-intensive OpenCL applications could be accelerated by offloading OpenCL workloads to server farms in a compute cloud; [0012] lines 4-6 compute-intensive ; 
determine one or more types of accelerator devices to which to migrate the one or more accelerator kernels; and migrate the one or more accelerator kernels to the one or more types of accelerator devices to scale-out the one or more accelerator kernels ([0020] lines 1-9 the virtual OpenCL device 110 may represent the available resources in the cloud 130 to the client(s) 102. If the application 104 is for instance looking for the device with the highest performance, it may select the virtual device 110 from the list and use it through the same OpenCL functions (as accelerator kernel) as a local device. In an embodiment, one special property of the virtual device is that it may not execute the OpenCL functions locally, but instead forwards them over the network to a compute cloud 130; [0021] line 1 handle the kernel offload; [0024] lines 4-8 the cloud systems may contain GPUs (Graphics Processing Units), accelerators, etc., in which case the device type would be CL_DEVICE_TYPE_GPU or CL_DEVICE_TYPE_ACCELERATOR).  
	
As per claim 15, it is a non-transitory machine-readable storage media claim of claim 1, so it rejected for the same reasons as claim 1. Additionally, Krishnamurthy teaches a plurality of instructions, which, when executed, causes circuity of a device to perform operations (Col. 2 lines 12-15 The computer program product comprises a storage medium readable by a processing circuit and storing instructions for execution by the processing circuit for performing a method).

As per claims 17 and 18, they are non-transitory machine-readable storage media claims of claims 3 and 4, so they are rejected for the same reasons as claims 3 and 4 above.

As per claim 19, it is a device claim of claim 1, so it rejected for the same reasons as claim 1 above.

Claims 2, 16, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Krishnamurthy and Ratering, as applied to claims 1, 15, and 19 above, in view of Becker (US 20140063027 A1).
Becker was cited in the previous office action.

As per claim 2, Krishnamurthy and Ratering teach the device of claim 1. Krishnamurthy specifically teaches wherein to migrate the workload from the source accelerator device to the target accelerator device comprises the circuitry to (Col. 9 lines 39-41 migrates at least a portion of the workload from its current system (server 102 or accelerator 104) to the other system (server 102 or accelerator 104); Col. 8 lines 53-55 Therefore, the workload manager 118, at T2, migrates the workload 304 from the accelerator 104 to a compute core (or accelerator) 310; Col. 4 lines 55-58 Each accelerator 104 also comprises one or more processors 116 that comprise a set of architectural registers (not shown) that defines the accelerator architecture): 
cause the source accelerator device to suspend execution of the accelerator kernel (Col. 9 lines 4-6 the executable can be stopped at the migrating from processor and restarted on the migrated to processor; Col. 6 lines 7-10 Each process may execute several tasks and each task can offload certain compute intensive portions to a compute kernel, which are basic units of executable code.); 
data associated with the execution of the accelerator kernel by the source accelerator device (Col. 6 lines 31-35 These kernels can be invoked when a calling process on the server is run. These compute kernels are then launched on the accelerators 104. The server passes data to these compute kernels and computes a result based on the data.); and 
cause the source accelerator device to migrate the the workload to the target accelerator device (Col. 8 lines 54-55 migrates the workload 304 from the accelerator 104 to a compute core (or accelerator) 310).
Additionally, Ratering teaches to migrate the accelerator kernel to the target accelerator device comprises: migrate the the accelerator kernel to the target accelerator device (abstract lines 4-6 one or more computing operations (as accelerator kernel) may be offloaded from a local processor to a virtual device that represents available resources of a cloud; [0020] lines 1-9 the virtual OpenCL device 110 may represent the available resources in the cloud 130 to the client(s) 102. If the application 104 is for instance looking for the device with the highest performance, it may select the virtual device 110 from the list and use it through the same OpenCL functions (as accelerator kernel) as a local device. In an embodiment, one special property of the virtual device is that it may not execute the OpenCL functions locally, but instead forwards them over the network to a compute cloud 130; [0021] line 1 handle the kernel offload; [0024] lines 4-8 the cloud systems may contain GPUs (Graphics Processing Units), accelerators, etc., in which case the device type would be CL_DEVICE_TYPE_GPU or CL_DEVICE_TYPE_ACCELERATOR).

Krishnamurthy and Ratering fail to teach cause the source accelerator device to serialize data associated with the execution of the accelerator kernel; and cause the target accelerator device to deserialize the data associated with execution of the accelerator kernel for use of the deserialized data by the target accelerator device to execute the accelerator kernel.

However, Becker teaches cause the source accelerator device to serialize data associated with the execution of the accelerator kernel (Abstract lines 6-7 serialized request formed from the GPU kernel function code and input data; [0028] lines 5-9 For this reason, data is serialized using JSON. Each request consists of a dictionary that contains the kernel code to be executed, a list of the data, and its structure definition (which are passed as parameters to the kernel), and the grid and block sizes of the kernel.); and 
cause the target accelerator device to deserialize the data associated with execution of the accelerator kernel for use of the deserialized data by the target accelerator device to execute the accelerator kernel (Abstract lines 8-12 The request is then sent to the remote computer and programmable GPU, where the request is deserialized, kernel code is compiled, and input data copied to the GPU memory on the remote computer. The GPU kernel function is then executed; [0054] lines 1-7 the request data, including grid and block size as well as kernel parameters, is retrieved from the request structure and stored. Each kernel parameter has already been pre-processed and given additional meta data to allow faster data deserialization. Each parameter is deserialized into a NumPy array or NumPy scalar type using the meta data.).

It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have combined Krishnamurthy and Ratering with Becker’s teaching of serializing and deserializing data associated with a kernel because sent over a Web service to a remote computer equipped with a programmable GPU (Graphics Processing Unit) for execution…This is accomplished by incorporating a serialized request formed from the GPU kernel function code and input data set by using JavaScript.RTM. Object Notation (JSON) serialization. The request is then sent to the remote computer and programmable GPU, where the request is deserialized, kernel code is compiled, and input data copied to the GPU memory on the remote computer.).

As per claims 16 and 20, they are non-transitory machine-readable storage media and device claims of claim 2, so they are rejected for the same reasons as claim 2 above. 

Claim 5 is rejected under 35 U.S.C. 103 as being unpatentable over Krishnamurthy and Ratering, as applied to claim 4 above, in view of Bernat et al. (WO 2018102414 A1 herein Bernat).

As per claim 5, Krishnamurthy and Ratering teach the device of claim 4. Krishnamurthy specifically teaches wherein to scale-out the workload to the one or more accelerator devices on the target accelerator sled (Col. 9 lines 33-41 The mobility manager 134 monitors performance metrics and/or energy metrics with respect to the workload to determine if these metrics are above or below a respective given threshold. If the workload mobility manager 134 determines that the given threshold is not being met or, alternatively, is being exceeded then the mobility manager 134 migrates at least a portion of the workload from .  

Krishnamurthy and Ratering fail to teach wherein to scale-out the workload further comprises to update a registry of managed nodes of a system.

However, Bernat teaches wherein to scale-out the workload further comprises to update a registry of managed nodes of a system ([0054] lines 7-11 The system 1210 may provide kernels to accelerate specific functions of the application 1234 and offload execution of the functions to the accelerator devices 1262 or 1266. The pod manager 1220 may track (e.g., via a database) which kernels are registered to which accelerator sleds 1260 and accelerator devices 1264, 1268; [0092] lines 2-3 once validated, register the kernel with an accelerator device).

It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have combined Krishnamurthy and Ratering with the teachings of Bernat because Bernat’s teaching of a database that registers kernels with accelerator devices provides the advantage knowing when an accelerator device is available (see Bernat, [0093] lines 1-7 Once determined, the configuration component 1734 may then register the bit stream by assigning an available slot (e.g., a subset of circuitry or other logic units) of the .
	
Claim 7 is rejected under 35 U.S.C. 103 as being unpatentable over Krishnamurthy and Ratering, as applied to claim 1 above, in view of Cao et al. (US 20190266006 A1 herein Cao).
Cao was cited in the previous office action.

As per claim 7, Krishnamurthy and Ratering teach the device of claim 1. Krishnamurthy specifically teaches wherein the circuitry is further to determine available accelerator devices of an accelerator sled (Col. 14 lines 65-67 The workload manager 118, at step 710, determines that a set of resources at either the server system 102 or the set of accelerator systems 104 is available.).

Krishnamurthy and Ratering fail to teach receive a notification of available accelerator device.

However, Cao teaches receive a notification of available accelerator device ([0047] lines 1-4 sends a notification message to the accelerator loading apparatus 200, and the notification message includes the identifier of the host 300, the identifier of the available accelerator).

identifier of the available accelerator).

Claim 8 is rejected under 35 U.S.C. 103 as being unpatentable over Krishnamurthy, Ratering, and Cao, as applied to claim 7 above, in view of Li et al. (US 10929187 B2 herein Li).
Li was cited in the previous office action.

As per claim 8, Krishnamurthy, Ratering, and Cao teach the device of claim 7. Krishnamurthy specifically teaches available accelerator devices of an accelerator sled (Col. 14 lines 65-67 The workload manager 118, at step 710, determines that a set of resources at either the server system 102 or the set of accelerator systems 104 is available), completion of a workload by the accelerator sled, and one or more first accelerator devices that completed the workload on the accelerator sled (Col. 8 lines 35-37 The workload can complete within the relaxed batch window using the optimized set of accelerator resources.).
Additionally, Cao teaches wherein to receive the notification of available accelerator devices, the notification also specifying one or more first accelerator devices ([0047] lines 1-4 sends a notification message to the accelerator loading apparatus 200, and the notification message includes the identifier of the host 300, the identifier of the available accelerator).

	Krishnamurthy, Ratering, and Cao fail to teach comprises to receive the notification responsive to a completion of a workload by the accelerator.

	However, Li teaches comprises to receive the notification responsive to a completion of a workload by the accelerator (Col. 1 lines 23-25 Hardware accelerators such as an accelerator function unit (AFU) are used mainly to accelerate specific computing tasks; Col. 2 lines 32-34 after the task has been executed by the AFU, receiving a message indicating that the task is finished, wherein the message is transmitted by the AFU.).

It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have combined Krishnamurthy, Ratering, and Cao with the teachings of Li because Li’s teaching of sending a message indicating that a task is finished provides an indicator of whether the task is still running or completed (see Li, Col. 21 lines 40-43 In step S2208, the AFU 110 executes the above command packets. In step S2210, a complete flag is arranged for each command packet at the ROB to determine whether the execution of the command packet has been completed.). 

Claims 9 and 10 are rejected under 35 U.S.C. 103 as being unpatentable over Krishnamurthy, Ratering, and Cao, as applied to claim 7 above, in view of Guim Bernat et al. (US 20180026868 A1 herein Guim Bernat).
Guim Bernat was cited in the previous office action.

As per claim 9, Krishnamurthy, Ratering, and Cao teach the device of claim 7. Krishnamurthy specifically teaches circuitry and an evaluation of resource usage of one or more second accelerator devices currently executing a second workload (Col. 2 lines 23-25 a second set of resources at the set of accelerator systems are monitored based on the set of high-throughput computing SLAs; Col. 8 lines 46-48 a plurality of workloads 302, 304 are allocated at the accelerator processors 116).

Krishnamurthy, Ratering, and Cao fail to teach wherein the circuitry is further to determine, as a function of an evaluation of resource usage of one or more second accelerator devices currently executing a second workload, whether a fragmenting situation is present in the one or more first accelerator devices.

However, Guim Bernat teaches wherein the circuitry is further to determine, as a function of an evaluation of resource usage of one or more second accelerator devices currently executing a second workload, whether a fragmenting situation is present in the one or more first accelerator devices ([0038] lines 4-11 This non-uniformity may be referred to as resource fragmentation: resources that should provide a certain performance (i.e., latency and bandwidth) may provide lower performance due their location in the data center. Fragmentation may appear as a heterogeneity effect, as illustrated and described below, or as a distance effect (e.g., compute sleds are far from the selected resources); [0068] lines 15-20 The illustrative data center 1100 additionally receives usage information for the various resources, predicts resource usage for different types of workloads based on past resource usage, and dynamically reallocates the resources based on this information; [0043] lines 7-15 The fragmentation optimizer 415 may .  

It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have combined Krishnamurthy, Ratering, and Cao with Guim Bernat’s teaching of detecting fragmentation based on resource usage because once fragmentation is detected it can be addressed so that system requirements are met (see Guim Bernat, [0043] lines 9-15 If the distance is outside the specified dynamic tolerated fragmentation, the fragmentation optimizer 415 may contact the system layer (OS or orchestrator 404) determine what action to take, and contact the SDI manager 402 in order to dynamically change the composite node definition in order to fulfill the requirements.).

As per claim 10, Krishnamurthy, Ratering, and Guim Bernat teach the device of claim 9. Guim Bernat specifically teaches wherein the circuitry is further to, upon a determination that the fragmentation situation is present, migrate a portion of the workload to one or more of the available accelerator devices of the accelerator sled ([0044] lines 4-11 Using the change a subset of the physical resources associated to the composite node. The SDI manager 402 may be responsible for negotiating with the compute sled 406 to increase the amount of the resource violating the tolerated fragmentation or change the actual pool being used to satisfy the sled; [0046] lines 14-19 This interface may allow the compute sled to require increasing the amount for a particular resource type (i.e., memory), and remove a particular pool serving a particular resource to the composite node and request to find a new one that satisfies the composite node requirements in terms of fragmentation; [0055] lines 7-12 Using the techniques set forth herein, monitoring logic 712 may determine that a fragmentation violation has occurred at 701, and receive information about the violation from monitoring logic 712 at 703. At 705, dynamic provisioner 714 may determine to perform a remapping of resources, for example, to maintain a DTF required by compute node 702; [0066] lines 16-19 Physical resources 1106 may include resources of multiple types, such as—for example—processors, co-processors, accelerators, field-programmable gate arrays (FPGAs)).  

Claims 12 and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Krishnamurthy and Ratering, as applied to claim 11 above, in view of Bernat, further in view of Muslim et al. (Efficient FPGA implementation of OpenCL High-Performance Computing applications via High-Level Synthesis herein Muslim), and further in view of Melkild (US 10701000 B1).
Melkild was cited in the previous office action.

As per claim 12, Krishnamurthy and Ratering teach the device of claim 11. Krishnamurthy specifically teaches wherein to migrate the workloads to the accelerator devices of the one or more types of accelerator devices comprises circuitry and accelerator kernels associated with the workload (Col. 9 lines 37-41 determines that the given threshold is not being met or, alternatively, is being exceeded then the mobility manager 134 migrates at least a portion of the workload from its current system (server 102 or accelerator 104) to the other system (server 102 or accelerator 104); Col. 4 lines 55-59 Each accelerator 104 also comprises one or more processors 116 that comprise a set of architectural registers (not shown) that defines the accelerator architecture. It should be noted that each of the accelerator systems 104 can comprise the same or different type of processor; Col. 6 lines 2-4 programming across a number of heterogeneous devices such as CPUs, GPUs, and accelerators; Col. 7 lines 17-21 The workload manager 118 schedules a portion 202, 204 of the accelerator workload (e.g., tasks) as one or more compute kernels 206, 208 at the accelerator processors 116 and a remaining portion 210, 212 of the workload as one or more compute kernels 214, 216; Col. 8 lines 53-55 Therefore, the workload manager 118, at T2, migrates the workload 304 from the accelerator 104 to a compute core (or accelerator) 310).
Additionally, Ratering teaches wherein to migrate the accelerator kernels to the accelerator devices of the one or more types of accelerator devices comprises the circuitry (abstract lines 4-6 one or more computing operations (as accelerator kernel) may be offloaded from a local processor to a virtual device that represents available resources of a cloud; [0021] line 1 handle the kernel offload; [0024] lines 4-8 the cloud systems may contain GPUs (Graphics Processing Units), accelerators, etc., in which case the device type would be CL_DEVICE_TYPE_GPU or CL_DEVICE_TYPE_ACCELERATOR).
 Bernat teaches generate, for each migrated accelerator kernel a bit stream compatible for a type of accelerator device to which each migrated accelerator kernel has been migrated; configure, for each migrated accelerator kernel, the bit stream on the accelerator device to which each migrated accelerator kernel has been migrated ([0054] lines 7-9 The system 1210 may provide kernels to accelerate specific functions of the application 1234 and offload (as migrate) execution of the functions to the accelerator devices 1262 or 1266; [0055] lines 1-3 provide a catalog of bit streams that can be encoded into the accelerator devices 1262 and 1264 as the kernels 1264 and 1268; [0056] lines 12-13 remotely configuring the bit stream onto a compatible accelerator device); and update a registry of managed nodes in a system ,wherein the updated registry is to register, for each migrated accelerator kernel, a notification to be sent by each migrated accelerator kernel, the registry maintained at the device ([0054] lines 9-11 The pod manager 1220 may track (e.g., via a database) which kernels are registered to which accelerator sleds 1260 and accelerator devices 1264, 1268; [0092] lines 1-3 The configuration component 1734 may validate the bit stream based on the metadata and security signature, and, once validated, register the kernel with an accelerator device (e.g., accelerator device 1262 or 1266); [00104] lines 9-11 The accelerator sled 1260 sends an acknowledgement to the requesting compute sled that the kernel is registered on the accelerator device).  

	Krishnamurthy, Ratering, and Bernat fail to teach initialize communication channels between each of the accelerator kernels and a specified interval for a heartbeat notification.

	However, Muslim teaches initialize communication channels between each of the accelerator kernels (Fig. 3(b); VI. Conclusion paragraph 2 lines 3-5 They mainly included pipelining work-items and using on-chip global memory buffers for inter-kernel communications).

It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have combined Krishnamurthy, Ratering, and Bernat with the teachings of Muslim because Muslim’s teaching of using on-chip global memory buffers for communicating among kernels allows for faster processing (see Muslim, VI. Conclusion paragraph 2 lines 3-6 They mainly included pipelining work-items and using on-chip global memory buffers for inter-kernel communications rather than using the traditional slower offchip DRAM-based global memory buffers).
	
Krishnamurthy, Ratering, Bernat, and Muslim fail to teach a specified interval for a heartbeat notification.

However, Melkild teaches a specified interval for a heartbeat notification (Col. 15 lines 32-42 Each VNFCI sends heartbeat notifications to its peer VNFCI at certain intervals…However, if a heartbeat notification or acknowledgement is not received from a peer VNFCI in a period of time controlled by a failure timer, then the heartbeat mode is transitioned into Failure mode).

It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have modified Krishnamurthy, Ratering, Bernat, and 

As per claim 14, Krishnamurthy, Ratering, Bernat, Muslim, and Melkild teach the device of claim 12. Bernat specifically teaches circuitry and sent by one of the migrated accelerator kernels per the updated registry (0054] lines 7-11 The system 1210 may provide kernels to accelerate specific functions of the application 1234 and offload execution of the functions to the accelerator devices 1262 or 1266. The pod manager 1220 may track (e.g., via a database) which kernels are registered to which accelerator sleds 1260 and accelerator devices 1264, 1268; [0092] lines 2-3 once validated, register the kernel with an accelerator device; [0092] lines 1-3 The configuration component 1734 may validate the bit stream based on the metadata and security signature, and, once validated, register the kernel with an accelerator device (e.g., accelerator device 1262 or 1266); [00104] lines 9-11 The accelerator sled 1260 sends an acknowledgement to the requesting compute sled that the kernel is registered on the accelerator device). 
Additionally, Melkild teaches upon a determination that the heartbeat notification is not sent, generate an alert indicating that the heartbeat notification was not received at the specified interval (Col. 15 lines 32-43 Each VNFCI sends heartbeat notifications to its peer VNFCI at certain intervals…However, if a heartbeat notification or acknowledgement is not received from a peer VNFCI in a period of time controlled by a failure timer, then the heartbeat mode is transitioned into Failure mode and a heartbeat failure notification is raised).

Claim 21 is rejected under 35 U.S.C. 103 as being unpatentable over Krishnamurthy and Ratering, as applied to claim 1 above, in view of Perry et al. (US 7631284 B1 herein Perry).

As per claim 21, Krishnamurthy and Ratering teach the device of claim 1. Ratering specifically teaches the target accelerator device is a separate field programmable gate array (FPGA) ([0010] lines 1-4 In OpenCL, parallel compute kernels may be offloaded from a host (usually a CPU) to an accelerator device in the same system (e.g., a GPU, CPU or FPGA (Field-Programmable Gate Array).).   

Krishnamurthy and Ratering fail to teach wherein the source accelerator device is a field programmable gate array (FPGA).   

However, Perry teaches wherein the source accelerator device and the target accelerator device are separate field programmable gate arrays (FPGAs) (Col. 3 lines 20-26  While the present embodiment relates to the migration from an FPGA source device to an ASIC equivalent target device, it should be understood that device selector guide 10 may be used for any suitable migration, including for example, migrating from one type of programmable logic device to another type of programmable logic device; Col. 1 lines 13-16 A typical programmable logic device ("PLD") or field-programmable gate array ("FPGA") includes many logic elements ("LEs") of a fixed size. (For convenience herein, the term FPGA is used as a generic term for PLDs and FPGAs.)).

It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have combined Krishnamurthy and Ratering with the teachings of Perry because Perry’s teaching of a source accelerator device being a FPGA provides the advantages of the FPGA such as the fact that is programmable which makes it customizable.
	
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to HSING CHUN LIN whose telephone number is (571)272-8522.  The examiner can normally be reached on Mon - Fri 9AM-5PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is 
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Meng-Ai An can be reached on (571)272-3756.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/H.L./Examiner, Art Unit 2195                                                                                                                                                                                                        
/MENG AI T AN/Supervisory Patent Examiner, Art Unit 2195