DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claims 1-23 are pending in this application.

Information Disclosure Statement
The IDS filed on 06/21/2019, 10/27/2020, and 07/01/2022 has been considered. 

Drawings
The drawings are objected to because they fail to comply with 37 CFR 1.84(q) which recites that “Lead lines are required for each reference character except for those which indicate the surface or cross section on which they are placed. Such a reference character must be underlined to make it clear that a lead line has not been left out by mistake.” Corrected drawing sheets in compliance with 37 CFR 1.121(d) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. The figure or figure number of an amended drawing should not be labeled as “amended.” If a drawing figure is to be canceled, the appropriate figure must be removed from the replacement sheet, and where necessary, the remaining figures must be renumbered and appropriate changes made to the brief description of the several views of the drawings for consistency. Additional replacement sheets may be necessary to show the renumbering of the remaining figures. Each drawing sheet submitted after the filing date of an application must be labeled in the top margin as either “Replacement Sheet” or “New Sheet” pursuant to 37 CFR 1.121(d). If the changes are not accepted by the examiner, the applicant will be notified and informed of any required corrective action in the next Office action. The objection to the drawings will not be held in abeyance.


Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 1-23 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
As per claim 1:
	Line 2 recites “an input interface to receive a combined work descriptor” and line 5 recites “an ingress queue to receive a work request” but it is unclear where the combined work descriptor and the work request are sent from. 
	Lines 5 and 8 recite “an ingress queue” and it is unclear if they refer to the same element. If so, line 8 should recite “the ingress queue”. 
	Lines 5, 7, 8, and 9 recite “a work request” and it is unclear if they refer to the same element. If so, lines 7, 8, and 9 should recite “the work request”.
	Lines 7 and 8 recite “an egress queue” and it is unclear if they refer to the same element. If so, line 8 should recite “the egress queue”.
	Lines 10-11 recite “logic to provide an identifier of a result data to a requesting entity that requested operations based on the combined work descriptor” and it is unclear what results are being provided (ie. Results of performing a work request and another work request?).
	Lines 11-12 recite “wherein performance and availability of data between work requests occur independent from oversight by the requesting entity” but it is unclear what performance the limitation is referring to (ie. Performance of work requests?). Additionally, it is unclear what is meant by data between work requests. 

As per claims 2, 3, 4, 5, 9, 10, and 11 (line numbers refer to claim 2):
	Lines 3 and 4 recite “a target accelerator” and it is unclear whether they refer to the same or different target accelerator recited in claim 1.

As per claims 2 and 22 (line numbers refer to claim 2):
	There are two recitations of “a first work request” and it is unclear whether they refer to the same element. 

As per claims 6, 7, 9-11, 17, and 18 (line numbers refer to claim 6):
	Line 6 recites “an egress queue” and it is unclear whether this refers to “an egress queue” in claim 1. 

As per claim 6:
	Line 2 recites “next accelerator” but it is unclear what next implies (ie. After the target accelerator completes execution of a work request, the following accelerator that executes another work request is the next accelerator.)
	Lines 2-3 recite “after completion of a work request” but it is unclear what is executing the work request. 

As per claims 7 and 17 (line numbers refer to claim 7):
	Lines 2-3 recite “assign a work request from an ingress queue to an egress queue based on quality of service (QoS) associated with the assigned work request” but it is unclear why it is necessary to know a quality of service associated with a work request to assign the work request from an ingress queue to an egress queue (ie. Is there a plurality of egress queues to select from?). 

As per claims 7, 8, 17, and 18 (line numbers refer to claim 7):
	Line 6 recites “an ingress queue” and it is unclear whether this refers to “an ingress queue” in claim 1. 

As per claim 8:
	Lines 3-4 recite “provide load balance of the divided work request to distribute work requests to different accelerators” but it is unclear why the work requests are being distributed when the purpose is to load balance the divided work request (ie. Shouldn’t the divided work request be distributed in order to provide load balancing?). 

As per claims 9-11 (line numbers refer to claim 9):
	Line 1 recites “selection of an egress queue” and it is unclear why there is a need to select an egress queue when there is only one egress queue (ie. Are there a plurality of egress queues?). 
Lines 3 and 4 recite “the entity that requested operations” but it is unclear if this refers to “the requesting entity that request operation” recited in claim 1. 
Line 3 recites “the work scheduler” and it is unclear whether this refers to “the work scheduler apparatus” or “the scheduler”. 

As per claim 11:
	Line 4 recites “the target accelerator” and it is unclear whether this refers to “a target accelerator” in line 2 of claim 11 or “a target accelerator” in claim 1. 

As per claims 12, 18, 19, and 23 (line numbers refer to claim 12):
	Lines 1-2 recite “an accelerator” and it is unclear whether this refers to “an accelerator” in claim 1.

As per claim 14:
	Line 2 recites “receiving a combined work descriptor” but it is unclear what is receiving the combined work descriptor. 
	Line 5 recites “allocating a work descriptor” but it is unclear what is doing the allocating step. 
	Line 9 recites “providing a result” but it unclear what is providing a result and what receives the result.
	Line 5 recites “allocating a work descriptor associated with the combined work descriptor to an egress queue based on a scheduling policy” but it is unclear why a scheduling policy is needed to allocate a work descriptor to an egress queue when there is only one egress queue (ie. Are there a plurality of egress queues?).
	 Lines 7-8 recite “receiving a queue entry in an ingress queue that identifies a next operation for an accelerator” but it is unclear what it mean by “next” (Was there an operation that was already executed by the accelerator?). 
	Line 9 recites “providing a result from processing” but it is unclear what processing this limitation is referring to (Is it processing of an operation on an accelerator?). 

As per claims 14 and 18 (line numbers refer to claim 14):
	Line 5 recites “allocating a work descriptor associated with the combined work descriptor to an egress queue based on a scheduling policy” but it is unclear why a scheduling policy is needed to allocate a work descriptor to an egress queue when there is only one egress queue (ie. Are there a plurality of egress queues?).

As per claim 18:
	Line 2 recites “a scheduling policy” and it is unclear whether this refers to “a scheduling policy” in claim 14. 

As per claim 21:
	Line 16 recites “an egress queue” twice and it is unclear if both instances refer to the same element. If so, the second instance of “an egress queue” should be changed to “the egress queue”.
	Lines 16 recites “a scheduler logic is to determine an egress queue” and it is unclear why scheduling logic is needed to determine an egress queue (Are there a plurality of egress queues?).
	Line 18-19 recite “after execution by the accelerator” but it is unclear what is being executed. 
	Lines 20-21 recite “the work scheduler is to indicate data is available from the sequence of work to the application” and it is unclear why the application needs know that data is available when the application is the one that is requesting the sequence of work and not the one executing the sequence of work. 

Claims 13, 15, 16, 20, and 23 are dependent claims of claims 1, 14, and 21 so they are rejected for the same reasons as claims 1, 14, and 21 above.

Claim Rejections - 35 USC§ 101

35 U.S.C. 101 reads as follows:

Whoever invents or discovers any new and useful process, machine, manufacture, or composition of
matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the
conditions and requirements of this title.
 Claims 1-23 are rejected under 35 U.S.C. 101 because the claimed invention is
directed to a judicial exception (abstract idea) without significantly more.

As per claim 1, in step 1 of the 101 analysis, the examiner has determined that the claim
is directed to a machine. Therefore, the claim is directed to one of the four statutory categories of
invention.
In step 2A prong 1 of the 101 analysis, the examiner has determined that the claim recites
a judicial exception. Specifically, the limitation of “a scheduler to assign a work request in an ingress queue to an egress queue” is a mental process. This is a scheduling step and it can be considered a mental process because one can mentally decide which egress queue to assign a work request. 
In step 2A prong 2 of the 101 analysis, the examiner has determined that the additional
elements, alone or in combination do not integrate the judicial exceptions into a practical
application for the following rationale:
	The limitations “an input interface to receive a combined work descriptor”, “an ingress queue to receive a work request based on the combined work descriptor for performance by an accelerator”, “an egress queue to store a work request assigned to a target accelerator”, and “logic to provide an identifier of a result data to a requesting entity that requested operations based on the combined work descriptor” represent insignificant, extra-solution activities. The term "extra-solution activity" can be understood as "activities incidental to the primary process or product that are merely a nominal or tangential addition to the claim" (MPEP 2106.05(g)). The examiner has determined that the limitations “an input interface to receive a combined work descriptor”, “an ingress queue to receive a work request based on the combined work descriptor for performance by an accelerator”, “an egress queue to store a work request assigned to a target accelerator”, and “logic to provide an identifier of a result data to a requesting entity that requested operations based on the combined work descriptor” are directed to mere data gathering activities which is a category of insignificant extra-solution activities (MPEP 2106.05(g)). 
	The limitations “the combined work descriptor associated with at least one processing operation, the at least one processing operation to be managed by the work scheduler apparatus”, “wherein a work request includes a reference to another work request”, and “wherein performance and availability of data between work requests occur independent from oversight by the requesting entity” merely describe attributes of the technological environment in with the abstract idea is operating. The courts have identified that generally linking the use of a judicial exception into a technological environment do not integrate a judicial exception into a practical application (MPEP 2106.04(d)(I)). 

In step 2B of the 101 analysis, the examiner has determined that the additional elements,
alone or in combination do not recite significantly more than the abstract ideas identified above
for the following rationale:
	The limitations “an input interface to receive a combined work descriptor”, “an ingress queue to receive a work request based on the combined work descriptor for performance by an accelerator”, “an egress queue to store a work request assigned to a target accelerator”, and “logic to provide an identifier of a result data to a requesting entity that requested operations based on the combined work descriptor” represent insignificant, extra-solution activities. The limitations “an input interface to receive a combined work descriptor”, “an ingress queue to receive a work request based on the combined work descriptor for performance by an accelerator”, and “logic to provide an identifier of a result data to a requesting entity that requested operations based on the combined work descriptor” are well-understood, routine, or conventional because they are directed to "receiving or transmitting data" and “an egress queue to store a work request assigned to a target accelerator” is well-understood, routine, or conventional because it is directed to “storing and retrieving information in memory” (MPEP 2106.05(d)). These are additional elements that the courts have recognized as well understood, routine, or conventional (MPEP 2106.05(d)). The citation of court cases in the MPEP meets the Berkheimer evidentiary burden since citation of a court case in the MPEP is one of the 4 types of evidentiary support that can be used to prove that the additional elements are well-understood, routine, or conventional (see 125 USPQ2d 1649 Berkheimer v. HP, Inc.). Thus, the limitation does not amount to significantly more than the abstract idea. 
The limitation “the combined work descriptor associated with at least one processing operation, the at least one processing operation to be managed by the work scheduler apparatus”, “wherein a work request includes a reference to another work request”, and “wherein performance and availability of data between work requests occur independent from oversight by the requesting entity” merely describe attributes of the technological environment and therefore do not amount to significantly more than the exception itself (MPEP 2106.05(h)). 

As per claim 2, the limitation “wherein the combined work descriptor is to refer to a first work request, the first work request to include a reference to a second work request to be performed by a target accelerator” describes attributes of the technological environment and therefore does not integrate the judicial exception into a practical application and does not amount to significantly more. The limitation “the work scheduler comprising a translator to translate a first work request to a format accepted by the target accelerator” is a mental process. Translation can involve translating from a virtual to physical address and that merely involves looking a table to determine the correspondence between a virtual and physical address. 

As per claim 3, it further describes attributes of the technological environment and therefore does not integrate the judicial exceptions into a practical application and does not recite significantly more than the abstract idea.

As per claim 4, it describes an insignificant, extra-solution activity which is well-understood, routine, or conventional because it is directed to "receiving or transmitting data". Therefore, it does not integrate the judicial exceptions into a practical application and does not recite significantly more than the abstract idea.

As per claim 5, it describes an insignificant, extra-solution activity which is well-understood, routine, or conventional because it is directed to "receiving or transmitting data". Therefore, it does not integrate the judicial exceptions into a practical application and does not recite significantly more than the abstract idea.

As per claim 6, it describes an insignificant, extra-solution activity which is well-understood, routine, or conventional because it is directed to "receiving or transmitting data". Additionally, it further describes the abstract idea. Therefore, it does not integrate the judicial exceptions into a practical application and does not recite significantly more than the abstract idea.

As per claim 7, it further describes the abstract idea. Therefore, it does not integrate the judicial exceptions into a practical application and does not recite significantly more than the abstract idea.

As per claim 8, it describes mental processes. Therefore, it does not integrate the judicial exceptions into a practical application and does not recite significantly more than the abstract idea.

As per claim 9, it describes mental processes, the technological environment, and an insignificant extra-solution activity which is well-understood, routine, or conventional because it is directed to "receiving or transmitting data". Therefore, it does not integrate the judicial exceptions into a practical application and does not recite significantly more than the abstract idea.

As per claim 10, it describes mental processes, the technological environment, and an insignificant extra-solution activity which is well-understood, routine, or conventional because it is directed to "receiving or transmitting data". Therefore, it does not integrate the judicial exceptions into a practical application and does not recite significantly more than the abstract idea.

As per claim 11, it describes mental processes, the technological environment, and an insignificant extra-solution activity which is well-understood, routine, or conventional because it is directed to "receiving or transmitting data". Therefore, it does not integrate the judicial exceptions into a practical application and does not recite significantly more than the abstract idea.

As per claim 12, it further describes the technological environment so it does not integrate the judicial exceptions into a practical application and does not recite significantly more than the abstract idea.

As per claim 13, it further describes the technological environment so it does not integrate the judicial exceptions into a practical application and does not recite significantly more than the abstract idea.

As per claim 14, in step 1 of the 101 analysis, the examiner has determined that the claim
is directed to a method. Therefore, the claim is directed to one of the four statutory categories of
invention.
In step 2A prong 1 of the 101 analysis, the examiner has determined that the claim recites
a judicial exception. Specifically, the limitation of “allocating a work descriptor associated with the combined work descriptor to an egress queue based on a scheduling policy specified by the combined work descriptor” is an mental process. This is a scheduling step and it can be considered a mental process because one can mentally assign a work descriptor to an egress queue. 

In step 2A prong 2 of the 101 analysis, the examiner has determined that the additional
elements, alone or in combination do not integrate the judicial exceptions into a practical
application for the following rationale:
The limitations “receiving a combined work descriptor that identifies at least one work descriptor for performance by an accelerator” and “receiving a queue entry in an ingress queue that identifies a next operation for an accelerator; and providing a result from processing based on the combined work descriptor” represent insignificant, extra-solution activities. The term "extra-solution activity" can be understood as "activities incidental to the primary process or product that are merely a nominal or tangential addition to the claim" (MPEP 2106.05(g)). The examiner has determined that the limitations “receiving a combined work descriptor that identifies at least one work descriptor for performance by an accelerator” and “receiving a queue entry in an ingress queue that identifies a next operation for an accelerator; and providing a result from processing based on the combined work descriptor” are directed to mere data gathering activities which is a category of insignificant extra-solution activities (MPEP 2106.05(g)). 
The limitation “the combined work descriptor specifies a policy for managing work associated with the combined work descriptor” merely describes attributes of the technological environment in with the abstract idea is operating. The courts have identified that generally linking the use of a judicial exception into a technological environment do not integrate a judicial exception into a practical application (MPEP 2106.04(d)(I)).

In step 2B of the 101 analysis, the examiner has determined that the additional elements,
alone or in combination do not recite significantly more than the abstract ideas identified above
for the following rationale:
The limitations “receiving a combined work descriptor that identifies at least one work descriptor for performance by an accelerator” and “receiving a queue entry in an ingress queue that identifies a next operation for an accelerator; and providing a result from processing based on the combined work descriptor” represent insignificant, extra-solution activities. The limitations “receiving a combined work descriptor that identifies at least one work descriptor for performance by an accelerator” and “receiving a queue entry in an ingress queue that identifies a next operation for an accelerator; and providing a result from processing based on the combined work descriptor” are well-understood, routine, or conventional because they are directed to "receiving or transmitting data" (MPEP 2106.05(d)). These are additional elements that the courts have recognized as well understood, routine, or conventional (MPEP 2106.05(d)). The citation of court cases in the MPEP meets the Berkheimer evidentiary burden since citation of a court case in the MPEP is one of the 4 types of evidentiary support that can be used to prove that the additional elements are well-understood, routine, or conventional (see 125 USPQ2d 1649 Berkheimer v. HP, Inc.). Thus, the limitation does not amount to significantly more than the abstract idea. 
The limitation “the combined work descriptor specifies a policy for managing work associated with the combined work descriptor” merely describes attributes of the technological environment and therefore do not amount to significantly more than the exception itself (MPEP 2106.05(h)). 

As per claim 15, the limitation “wherein the combined work descriptor refers to a first work request, the first work request to include a reference to a second work request to be performed by a target accelerator” describes attributes of the technological environment and therefore does not integrate the judicial exception into a practical application and does not amount to significantly more. The limitation “comprising translating the first work request to a format accepted by the target accelerator” is a mental process. Translation can involve translating from a virtual to physical address and that merely involves looking a table to determine the correspondence between a virtual and physical address. 

As per claim 16, it further describes attributes of the technological environment and therefore does not integrate the judicial exceptions into a practical application and does not recite significantly more than the abstract idea.

As per claim 17, it further describes the abstract idea. Therefore, it does not integrate the judicial exceptions into a practical application and does not recite significantly more than the abstract idea.

As per claim 18, it describes mental processes. Therefore, it does not integrate the judicial exceptions into a practical application and does not recite significantly more than the abstract idea.

As per claim 19, it further describes the technological environment so it does not integrate the judicial exceptions into a practical application and does not recite significantly more than the abstract idea.

As per claim 20, it further describes the technological environment so it does not integrate the judicial exceptions into a practical application and does not recite significantly more than the abstract idea.

As per claim 21, in step 1 of the 101 analysis, the examiner has determined that the claim
is directed to a system. Therefore, the claim is directed to one of the four statutory categories of
invention.

In step 2A prong 1 of the 101 analysis, the examiner has determined that the claim recites
a judicial exception. Specifically, the limitation of “allocate the work descriptor to an ingress queue for execution by an accelerator”, “the scheduler logic is to determine an egress queue and position in an egress queue for the work descriptor based in part on a configuration” are mental processes. Allocating is a scheduling step and it can be considered a mental process because one can mentally assign a work descriptor to an ingress queue. Humans can use their judgement to select an egress queue and select a position in an egress queue. 

In step 2A prong 2 of the 101 analysis, the examiner has determined that the additional
elements, alone or in combination do not integrate the judicial exceptions into a practical
application for the following rationale:
The limitations “provide the combined work descriptor to the work scheduler via the interconnect”, “the work scheduler is to access a work descriptor from the memory based on content of the combined work descriptor”, “the ingress queue is to receive another work descriptor after execution by the accelerator”, and “the work scheduler is to indicate data is available from the sequence of work to the application” represent insignificant, extra-solution activities. The term "extra-solution activity" can be understood as "activities incidental to the primary process or product that are merely a nominal or tangential addition to the claim" (MPEP 2106.05(g)). The examiner has determined that the limitations “provide the combined work descriptor to the work scheduler via the interconnect”, “the work scheduler is to access a work descriptor from the memory based on content of the combined work descriptor”, “the ingress queue is to receive another work descriptor after execution by the accelerator”, and “the work scheduler is to indicate data is available from the sequence of work to the application” are directed to mere data gathering activities which is a category of insignificant extra-solution activities (MPEP 2106.05(g)). 
The limitations “a core; a memory; a work scheduler; at least one accelerator; and 
an interconnect to communicatively couple the core, the memory, the work scheduler, and the at least one accelerator”, and “the work scheduler comprises a scheduler logic, ingress queues, egress queues, and a command translator” apply judicial exceptions on a generic computer. "Alappat 's rationale that an otherwise ineligible algorithm or software could be made patent-eligible by merely adding a generic computer to the claim was superseded by the Supreme Court's Bilski and Alice Corp. decisions" so therefore applying judicial exceptions on “a core; a memory; a work scheduler; at least one accelerator; and an interconnect to communicatively couple the core, the memory, the work scheduler, and the at least one accelerator”, and “the work scheduler comprises a scheduler logic, ingress queues, egress queues, and a command translator” which are generic computers does not integrate the judicial exceptions into a practical application (MPEP 2106.05(b)).
The limitation “the core is to execute an application that is to request performance of a sequence of work based on a combined work descriptor” merely describes attributes of the technological environment in with the abstract idea is operating and describes an intended use. The courts have identified that generally linking the use of a judicial exception into a technological environment do not integrate a judicial exception into a practical application (MPEP 2106.04(d)(I)).

In step 2B of the 101 analysis, the examiner has determined that the additional elements,
alone or in combination do not recite significantly more than the abstract ideas identified above
for the following rationale:
The limitations “provide the combined work descriptor to the work scheduler via the interconnect”, “the work scheduler is to access a work descriptor from the memory based on content of the combined work descriptor”, “the ingress queue is to receive another work descriptor after execution by the accelerator”, and “the work scheduler is to indicate data is available from the sequence of work to the application” represent insignificant, extra-solution activities. The limitations “provide the combined work descriptor to the work scheduler via the interconnect”, “the ingress queue is to receive another work descriptor after execution by the accelerator”, and “the work scheduler is to indicate data is available from the sequence of work to the application” are well-understood, routine, or conventional because they are directed to "receiving or transmitting data" (MPEP 2106.05(d)). The limitation “the work scheduler is to access a work descriptor from the memory based on content of the combined work descriptor” is well-understood, routine, or conventional because it is directed to “storing and retrieving information in memory” (MPEP 2106.05(d)). These are additional elements that the courts have recognized as well understood, routine, or conventional (MPEP 2106.05(d)). The citation of court cases in the MPEP meets the Berkheimer evidentiary burden since citation of a court case in the MPEP is one of the 4 types of evidentiary support that can be used to prove that the additional elements are well-understood, routine, or conventional (see 125 USPQ2d 1649 Berkheimer v. HP, Inc.). Thus, the limitation does not amount to significantly more than the abstract idea. 
The limitations “a core; a memory; a work scheduler; at least one accelerator; and  an interconnect to communicatively couple the core, the memory, the work scheduler, and the at least one accelerator”, and “the work scheduler comprises a scheduler logic, ingress queues, egress queues, and a command translator” apply judicial exceptions on a generic computer and therefore do not provide significantly more. 
The limitation “the core is to execute an application that is to request performance of a sequence of work based on a combined work descriptor” merely describes attributes of the technological environment and therefore do not amount to significantly more than the exception itself (MPEP 2106.05(h)).

As per claim 22, the limitation “wherein the combined work descriptor is to refer to a first work request, the first work request to include a reference to a second work request to be performed by a target accelerator” describes attributes of the technological environment and therefore does not integrate the judicial exception into a practical application and does not amount to significantly more. The limitation “the command translator to translate a first work request to a format accepted by the target accelerator” is a mental process. Translation can involve translating from a virtual to physical address and that merely involves looking a table to determine the correspondence between a virtual and physical address. 

As per claim 23, it further describes generic computing components so it does not integrate the judicial exceptions into a practical application and does not recite significantly more than the abstract idea.

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claims 14, 16, 19, and 20 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Drysdale et al. (US 20180095750 A1 hereinafter Drysdale).

As per claim 14, Drysdale teaches a computer-implemented method comprising ([0095] a method includes offloading an operation from a hardware processor to a first hardware accelerator): 
receiving a combined work descriptor that identifies at least one work descriptor for performance by an accelerator and the combined work descriptor specifies a policy for managing work associated with the combined work descriptor ([0032] the work descriptor contains (e.g., all) the information on what the job is and (e.g., all) pointers to the data; [0039] The context pointer may point to the control and/or configuration information for the job specific operation to be performed on a specific device; [0079] a (e.g., compression, decompression, or other accelerator specific) job may be created for the accelerator, for example, by being offloaded from the processor…The job may have an associated context structure…the context structure may include one or more of: static context data, for example, job information (e.g., a work descriptor); [0033] device (e.g., accelerator); A work descriptor identifies which accelerator jobs should be placed on which is a policy for managing work.); 
allocating a work descriptor associated with the combined work descriptor to an egress queue based on a scheduling policy specified by the combined work descriptor ([0049] A processing device (e.g., core) 406 may receive a request (e.g., from software) to perform an operation and may offload the operation (e.g., thread) to hardware accelerator 1 402. Request(s) may be stored in each respective, optional accelerator command queue (403, 453); [0039] The context pointer may point to the control and/or configuration information for the job specific operation to be performed on a specific device; [0087] In one embodiment there may be a plurality of accelerators, for example, accelerators of different types, e.g., a compression accelerator, a decompression accelerator, an accelerator for each of specific compression and/or decompression algorithms; [0079] a (e.g., compression, decompression, or other accelerator specific) job may be created for the accelerator, for example, by being offloaded from the processor; [0032] the work descriptor contains (e.g., all) the information on what the job is and (e.g., all) pointers to the data; A work descriptor identifies which accelerator jobs should be placed on which is a scheduling policy. Each accelerator has a queue so the scheduling policy indicates which queue work should be placed on.); 
receiving a queue entry in an ingress queue that identifies a next operation for an accelerator ([0089] request may be added to (e.g., the end of) the request queue 508; [0089] next job from the request queue; [0089] When an accelerator becomes available, there may be an on-deck job ready to be immediately started on it from the on-deck jobs queue 510. In one embodiment, when there is space in the on-deck jobs queue 510, requests may be loaded thereto from the request queue 508.); and 
providing a result from processing based on the combined work descriptor ([0073] Accelerator 2 may send an interrupt to the processing device to indicate to the processing device that accelerator 2 has completed consuming output buffer 493 (as result), and the processing device can read this information from response descriptors 480 and 478; [0074] Accelerator 2 may send an interrupt to the processing device indicating to the processing device that accelerator 2 has completed filling an output buffer 460; [0105] An accelerator may operate on data until either an input buffer is emptied or an output buffer is filled (e.g., with the input and output data buffers returned for reuse (e.g., by a device) when each request completes); The request is a work descriptor and once it has been processed a notification is sent to the processing device which is analogous to what is described in the specification of the instant application.).

As per claim 16, Drysdale teaches the method of claim 14, wherein the combined work descriptor is to refer to a first work request and the first work request is in a format accepted by a target accelerator ([0086] a (e.g., hardware) acceleration request manager to manage requests to the accelerator or accelerators; [0057] a processing device may decode and/or execute an instruction to cause a request (e.g., a command packet) to be sent to an accelerator; [0032] the work descriptor contains (e.g., all) the information on what the job is and (e.g., all) pointers to the data).

As per claim 19, Drysdale teaches the method of claim 14, wherein an accelerator comprising one or more of: field programmable gate arrays (FPGAs), graphics processor units (GPUs), artificial intelligence (AI) inference engines, image recognition, object detection, speech recognition, memory, storage, central processing units (CPUs), software executed by a hardware device, or network interface (Fig. 2A; [0036] Certain embodiments herein of accelerators and/or I/O devices include, but are not limited to, compression, machine learning (e.g., neural networks), cryptography, I/O fabrics, storage controllers, network interfaces, graphics; [0212] accelerators (such as, e.g., graphics accelerators or digital signal processing (DSP) units)).

As per claim 20, Drysdale teaches the method of claim 14, wherein the work request comprises a request to process data, decrypt data, encrypt data, store data, transfer data, parse data, copy data, perform an inference using data, or transform data ([0069] since data is available, the accelerator may read input buffer 409 and process that data to determine a result; [0033] An accelerator may couple to (e.g., on die with an accelerator or off die) one or more buffers to store data; [0034] Certain embodiments herein provide a highly scalable interface to enables accelerators (e.g., offload devices) to queue requests from one device to another, for example, with no processor (e.g., CPU) involvement in the individual requests or data transfers; [0030] Two non-limiting examples of operations are a compression operation and a decompression operation; [0033] buffers to store data; [0059] Accelerator 2 may look to each input buffer descriptor (e.g., buffer descriptor 0 (484)) and respective input buffer response descriptor (e.g., buffer descriptor 0 (496)) for a shared buffer (e.g., output/input buffer 0 (493)) of the shared buffers 491 to determine what data to operate on.).

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1- 6, 8, 9, 12, 13, 15, and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Drysdale et al. (US 20180095750 A1 hereinafter Drysdale) in view of Goyal et al. (US 20200159568 A1 hereinafter Goyal).

As per claim 1, Drysdale teaches the invention substantially as claimed including a work scheduler apparatus comprising: an input interface to receive a combined work descriptor, the combined work descriptor associated with at least one processing operation, the at least one processing operation to be managed by the work scheduler apparatus ([0086] a (e.g., hardware) acceleration request manager to manage requests to the accelerator or accelerators… The number of job requests added to the request queue or returned (e.g., to the requestor (e.g., software)) may be defined by a counter (e.g., N.sub.IN and/or N.sub.OUT). The number of job requests added to the queue by the requestor (e.g., processing device or software) may be called the Submit Count; [0089] a request for an operation is received; [0058] A notification may be sent between an accelerator and a processor through a dedicated communication channel (e.g., line) and/or a queuing interface; [0032] the work descriptor contains (e.g., all) the information on what the job is and (e.g., all) pointers to the data); 
an ingress queue to receive a work request based on the combined work descriptor for performance by an accelerator; an egress queue to store a work request assigned to a target accelerator ([0069] Accelerator 1 (402) may receive a command packet from the processing device. In one embodiment, the command packet (e.g., request therein) is put into an accelerator command queue 403; [0049] A processing device (e.g., core) 406 may receive a request (e.g., from software) to perform an operation and may offload the operation (e.g., thread) to hardware accelerator 1 402. Request(s) may be stored in each respective, optional accelerator command queue (403, 453); [0089] request may be added to (e.g., the end of) the request queue 508…When an accelerator becomes available, there may be an on-deck job ready to be immediately started on it from the on-deck jobs queue 510. In one embodiment, when there is space in the on-deck jobs queue 510, requests may be loaded thereto from the request queue 508; [0039] The context pointer may point to the control and/or configuration information for the job specific operation to be performed on a specific device.); 
a scheduler to assign a work request in an ingress queue to an egress queue ([0089] that request may be added to (e.g., the end of) the request queue 508. When an accelerator becomes available, there may be an on-deck job ready to be immediately started on it from the on-deck jobs queue 510. In one embodiment, when there is space in the on-deck jobs queue 510, requests may be loaded thereto from the request queue 508; [0049] A processing device (e.g., core) 406 may receive a request (e.g., from software) to perform an operation and may offload the operation (e.g., thread) to hardware accelerator 1 402. Request(s) may be stored in each respective, optional accelerator command queue (403, 453); [0185] The scheduler unit(s) 1056 represents any number of different schedulers; [0039] The context pointer may point to the control and/or configuration information for the job specific operation to be performed on a specific device; [0034] device (e.g., accelerator)); and 
logic to provide an identifier of a result data to a requesting entity that requested operations based on the combined work descriptor, wherein performance and availability of data between work requests occur independent from oversight by the requesting entity ([0073] Accelerator 2 may send an interrupt to the processing device to indicate to the processing device that accelerator 2 has completed consuming output buffer 493, and the processing device can read this information from response descriptors 480 and 478 (as result data); [0034] a highly scalable interface to enables accelerators (e.g., offload devices) to queue requests from one device to another, for example, with no processor (e.g., CPU) (as requesting entity) involvement in the individual requests or data transfers; [0040] with two devices to offload work to in succession, a processor (e.g., CPU) may offload to a device and enable the direct communication between that device and another device through shared memory buffers and hardware (e.g., and a communication protocol), e.g., to eliminate the processor (e.g., CPU) as the intervener between the offload to successive devices; [0039] Certain embodiments herein provide for (e.g., general purpose) device to device chaining between (e.g., unrelated) devices because the communication channel between devices is defined and standard across the devices, e.g., as is the data transmitted across this channel. The information transmitted across this channel may be minimal, e.g., and only identifies the availability of data to work on; [0054] In one embodiment of sending a request to an accelerator, processing device 406 may allocate and fill some number of input buffers; [0037] this communication channel between devices may (e.g., only) then be used to carry information about the availability of data).

Drysdale fails to teach wherein a work request includes a reference to another work request.

However, Goyal teaches wherein a work request includes a reference to another work request ([0097] Each work unit may be represented by a fixed length data structure, or message, including an action value and one or more arguments…The other arguments of the work unit data structure may include a frame argument having a value acting as a pointer to a continuation work unit to invoke a subsequent work unit handler; [0107] the WU stack may represent a service chain of operations to be performed by one or more of processing cores 182; [0146] a group of software functions and/or accelerator operations may, when chained together for processing).

It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have combined Drysdale with the teachings of Goyal because Goyal’s teaching of chaining operations provides technical advantages (see Goyal [0175] the WU stack execution model described herein seamlessly blends hardware (e.g., accelerator nodes 462) and software functions (e.g., virtual processor nodes 461) to perform call chaining, pipelining, parallelization, and continuation processing. The WU stack enables standardization of a pipeline and service execution model. The WU stack also provides familiar call/return semantics for operations on streams of work units (e.g., packets), and enables optional bundling of state carried with a work unit (packet). Furthermore, the WU stack allows stream processing model and a more traditional computational model to be integrated in a two-dimensional execution model, as illustrated in FIG. 12A, thereby providing significant technical advantages during software development as well as execution at run-time.).

As per claim 2, Drysdale and Goyal teach the work scheduler apparatus of claim 1. Drysdale specifically teaches wherein the combined work descriptor is to refer to a first work request, the work scheduler comprising a translator to translate a first work request to a format accepted by a target accelerator ([0057] a processing device may decode and/or execute an instruction to cause a request (e.g., a command packet) to be sent to an accelerator; [0032] the work descriptor contains (e.g., all) the information on what the job is and (e.g., all) pointers to the data).
Additionally, Goyal teaches the first work request to include a reference to a second work request to be performed by a target accelerator (Fig. 12A; [0097] Each work unit may be represented by a fixed length data structure, or message, including an action value and one or more arguments…The other arguments of the work unit data structure may include a frame argument having a value acting as a pointer to a continuation work unit to invoke a subsequent work unit handler; [0107] the WU stack may represent a service chain of operations to be performed by one or more of processing cores 182; [0146] a group of software functions and/or accelerator operations may, when chained together for processing; [0175] the WU stack execution model described herein seamlessly blends hardware (e.g., accelerator nodes 462) and software functions (e.g., virtual processor nodes 461) to perform call chaining, pipelining, parallelization, and continuation processing. The WU stack enables standardization of a pipeline and service execution model. The WU stack also provides familiar call/return semantics for operations on streams of work units (e.g., packets), and enables optional bundling of state carried with a work unit (packet). Furthermore, the WU stack allows stream processing model and a more traditional computational model to be integrated in a two-dimensional execution model, as illustrated in FIG. 12A, thereby providing significant technical advantages during software development as well as execution at run-time).
	
As per claim 3, Drysdale and Goyal teach the work scheduler apparatus of claim 1. Drysdale specifically teaches wherein the combined work descriptor is to refer to a first work request and the first work request is in a format accepted by a target accelerator ([0086] a (e.g., hardware) acceleration request manager to manage requests to the accelerator or accelerators; [0057] a processing device may decode and/or execute an instruction to cause a request (e.g., a command packet) to be sent to an accelerator; [0032] the work descriptor contains (e.g., all) the information on what the job is and (e.g., all) pointers to the data).

As per claim 4, Drysdale and Goyal teach the work scheduler apparatus of claim 1. Drysdale specifically teaches wherein the work scheduler is to push work requests from the egress queue to a target accelerator ([0089] a (e.g., hardware) acceleration request manager to manage requests to the accelerator or accelerators; [0089] request may be added to (e.g., the end of) the request queue 508…When an accelerator becomes available, there may be an on-deck job ready to be immediately started on it from the on-deck jobs queue 510. In one embodiment, when there is space in the on-deck jobs queue 510, requests may be loaded thereto from the request queue 508; [0049] A processing device (e.g., core) 406 may receive a request (e.g., from software) to perform an operation and may offload the operation (e.g., thread) to hardware accelerator 1 402. Request(s) may be stored in each respective, optional accelerator command queue (403, 453)).

As per claim 5, Drysdale and Goyal teach the work scheduler apparatus of claim 1. Drysdale specifically teaches wherein a target accelerator is to pull a work request from the egress queue ([0089] request may be added to (e.g., the end of) the request queue 508…When an accelerator becomes available, there may be an on-deck job ready to be immediately started on it from the on-deck jobs queue 510. In one embodiment, when there is space in the on-deck jobs queue 510, requests may be loaded thereto from the request queue 508; [0049] A processing device (e.g., core) 406 may receive a request (e.g., from software) to perform an operation and may offload the operation (e.g., thread) to hardware accelerator 1 402. Request(s) may be stored in each respective, optional accelerator command queue (403, 453)).

As per claim 6, Drysdale and Goyal teach the work scheduler apparatus of claim 1. Drysdale specifically teaches wherein the work scheduler is to enqueue a work request to an egress queue to assign to a next accelerator after completion of a work request ([0089] request may be added to (e.g., the end of) the request queue 508…When an accelerator becomes available, there may be an on-deck job ready to be immediately started on it from the on-deck jobs queue 510. In one embodiment, when there is space in the on-deck jobs queue 510, requests may be loaded thereto from the request queue 508; [0046] when accelerator 0 receives input data from the processor (e.g., CPU) (e.g., or other device) it is to begin processing when it has both input buffers and output buffers available. In one embodiment, once accelerator 0 fills an output buffer (e.g., or encounters some other signal on the input indicating that it should transmit current completed data) it may mark the buffer as full/complete and send a command packet to accelerator 1 indicating that a buffer is ready for processing in the circular buffer descriptor array; [0049] A processing device (e.g., core) 406 may receive a request (e.g., from software) to perform an operation and may offload the operation (e.g., thread) to hardware accelerator 1 402. Request(s) may be stored in each respective, optional accelerator command queue (403, 453)).

As per claim 8, Drysdale and Goyal teach the work scheduler apparatus of claim 1. Drysdale specifically teaches work request in an ingress queue ([0089] that request may be added to (e.g., the end of) the request queue 508. When an accelerator becomes available, there may be an on-deck job ready to be immediately started on it from the on-deck jobs queue 510. In one embodiment, when there is space in the on-deck jobs queue 510, requests may be loaded thereto from the request queue 508) and distribute work requests to different accelerators that perform a function specified in the work request ([0087] In one embodiment there may be a plurality of accelerators, for example, accelerators of different types, e.g., a compression accelerator, a decompression accelerator, an accelerator for each of specific compression and/or decompression algorithms; [0079] a (e.g., compression, decompression, or other accelerator specific) job may be created for the accelerator, for example, by being offloaded from the processor.).
Additionally, Goyal teaches wherein the scheduler is to: divide a work request into multiple portions and provide load balance of the divided work request to distribute work requests to different accelerators that perform a function specified in the work request (Fig. 12A; [0076] stream processing may be divided into work units; [0096] Central cluster 158 may include a central dispatch unit responsible for work unit queuing and flow control, work unit and completion notification dispatch, and load balancing and processor selection from among processing cores of processing clusters 156 and/or central cluster 158; [0100] processing cluster 180 includes…accelerators 189A-189X; [0107] the service chain may be executed across multiple processing clusters 156).

As per claim 9, Drysdale and Goyal teach the work scheduler apparatus of claim 1. Drysdale specifically teaches wherein after selection of an egress queue by the scheduler and based on a target accelerator sharing physical memory space but not virtual memory spaces with the entity that requested operations, the work scheduler is to receive a pointer to data from the entity that requested operations and perform pointer translation (Fig. 3; [0049] A processing device (e.g., core) 406 may receive a request (e.g., from software) to perform an operation and may offload the operation (e.g., thread) to hardware accelerator 1 402. Request(s) may be stored in each respective, optional accelerator command queue (403, 453); [0032] the work descriptor contains (e.g., all) the information on what the job is and (e.g., all) pointers to the data; [0039] The context pointer may point to the control and/or configuration information for the job specific operation to be performed on a specific device. This context pointer may be provided during initialization (e.g., of a device) by the processor (e.g., CPU); [0184] an instruction translation lookaside buffer (TLB) 1036, which is coupled to an instruction fetch unit 1038, which is coupled to a decode unit 1040. The decode unit 1040 (or decoder or decoder unit) may decode instructions (e.g., macro-instructions); [0086] a hardware accelerator includes a (e.g., hardware) acceleration request manager to manage requests to the accelerator or accelerators; [0041] Hardware processor 300 (e.g., accelerator(s) and/or core(s) thereof) may be coupled to a data storage device 304 (e.g., memory).).

As per claim 12, Drysdale and Goyal teach the work scheduler apparatus of claim 1. Drysdale specifically teaches comprising at least two accelerators, an accelerator comprising one or more of: field programmable gate arrays (FPGAs), graphics processor units (GPUs), artificial intelligence (AI) inference engines, image recognition, object detection, speech recognition, memory, storage, central processing units (CPUs), software executed by a hardware device, or network interface (Fig. 2A; [0036] Certain embodiments herein of accelerators and/or I/O devices include, but are not limited to, compression, machine learning (e.g., neural networks), cryptography, I/O fabrics, storage controllers, network interfaces, graphics; [0212] accelerators (such as, e.g., graphics accelerators or digital signal processing (DSP) units)).

As per claim 13, Drysdale and Goyal teach the work scheduler apparatus of claim 1. Drysdale specifically teaches wherein the work request comprises a request to process data, decrypt data, encrypt data, store data, transfer data, parse data, copy data, perform an inference using data, or transform data ([0069] since data is available, the accelerator may read input buffer 409 and process that data to determine a result; [0033] An accelerator may couple to (e.g., on die with an accelerator or off die) one or more buffers to store data; [0034] Certain embodiments herein provide a highly scalable interface to enables accelerators (e.g., offload devices) to queue requests from one device to another, for example, with no processor (e.g., CPU) involvement in the individual requests or data transfers; [0030] Two non-limiting examples of operations are a compression operation and a decompression operation; [0033] buffers to store data; [0059] Accelerator 2 may look to each input buffer descriptor (e.g., buffer descriptor 0 (484)) and respective input buffer response descriptor (e.g., buffer descriptor 0 (496)) for a shared buffer (e.g., output/input buffer 0 (493)) of the shared buffers 491 to determine what data to operate on.).

As per claim 15, Drysdale teaches the method of claim 14, wherein the combined work descriptor is to refer to a first work request, comprising translating the first work request to a format accepted by the target accelerator ([0057] a processing device may decode and/or execute an instruction to cause a request (e.g., a command packet) to be sent to an accelerator; [0032] the work descriptor contains (e.g., all) the information on what the job is and (e.g., all) pointers to the data; [0184] an instruction translation lookaside buffer (TLB) 1036, which is coupled to an instruction fetch unit 1038, which is coupled to a decode unit 1040. The decode unit 1040 (or decoder or decoder unit) may decode instructions (e.g., macro-instructions)).

Drysdale fails to teach the first work request to include a reference to a second work request to be performed by a target accelerator.

However, Goyal teaches the first work request to include a reference to a second work request to be performed by a target accelerator (Fig. 12A; [0097] Each work unit may be represented by a fixed length data structure, or message, including an action value and one or more arguments…The other arguments of the work unit data structure may include a frame argument having a value acting as a pointer to a continuation work unit to invoke a subsequent work unit handler; [0107] the WU stack may represent a service chain of operations to be performed by one or more of processing cores 182; [0146] a group of software functions and/or accelerator operations may, when chained together for processing; [0175] the WU stack execution model described herein seamlessly blends hardware (e.g., accelerator nodes 462) and software functions (e.g., virtual processor nodes 461) to perform call chaining, pipelining, parallelization, and continuation processing. The WU stack enables standardization of a pipeline and service execution model. The WU stack also provides familiar call/return semantics for operations on streams of work units (e.g., packets), and enables optional bundling of state carried with a work unit (packet). Furthermore, the WU stack allows stream processing model and a more traditional computational model to be integrated in a two-dimensional execution model, as illustrated in FIG. 12A, thereby providing significant technical advantages during software development as well as execution at run-time).

It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have combined Drysdale with the teachings of Goyal because Goyal’s teaching of chaining operations provides technical advantages (see Goyal [0175] the WU stack execution model described herein seamlessly blends hardware (e.g., accelerator nodes 462) and software functions (e.g., virtual processor nodes 461) to perform call chaining, pipelining, parallelization, and continuation processing. The WU stack enables standardization of a pipeline and service execution model. The WU stack also provides familiar call/return semantics for operations on streams of work units (e.g., packets), and enables optional bundling of state carried with a work unit (packet). Furthermore, the WU stack allows stream processing model and a more traditional computational model to be integrated in a two-dimensional execution model, as illustrated in FIG. 12A, thereby providing significant technical advantages during software development as well as execution at run-time.).

As per claim 18, Drysdale teaches the method of claim 14, wherein allocating a work descriptor associated with the combined work descriptor to an egress queue based on a scheduling policy specified by the combined work descriptor comprises ([0049] A processing device (e.g., core) 406 may receive a request (e.g., from software) to perform an operation and may offload the operation (e.g., thread) to hardware accelerator 1 402. Request(s) may be stored in each respective, optional accelerator command queue (403, 453); [0032] the work descriptor contains (e.g., all) the information on what the job is and (e.g., all) pointers to the data; [0039] The context pointer may point to the control and/or configuration information for the job specific operation to be performed on a specific device. This context pointer may be provided during initialization (e.g., of a device) by the processor (e.g., CPU)).

Drysdale fails to teach providing load balancing of work requests in an ingress queue to an accelerator to distribute work requests to different accelerators that perform a function specified in the distributed work requests.

However, Goyal teaches providing load balancing of work requests in an ingress queue to an accelerator to distribute work requests to different accelerators that perform a function specified in the distributed work requests (Fig. 12A; [0076] stream processing may be divided into work units; [0096] Central cluster 158 may include a central dispatch unit responsible for work unit queuing and flow control, work unit and completion notification dispatch, and load balancing and processor selection from among processing cores of processing clusters 156 and/or central cluster 158; [0100] processing cluster 180 includes…accelerators 189A-189X; [0107] the service chain may be executed across multiple processing clusters 156).

It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have combined Drysdale with the teachings of Goyal because Goyal’s teaching of load balancing allows for even distribution of work across accelerators. 
	

Claim 7 is rejected under 35 U.S.C. 103 as being unpatentable over Drysdale and Goyal, as applied to claim 1 above, in view of Chan et al. (US 20200074363 A1 hereinafter Chan).

As per claim 7, Drysdale and Goyal teach the work scheduler apparatus of claim 1. Drysdale specifically teaches wherein the scheduler is to: assign a work request from an ingress queue to an egress queue ([0089] request may be added to (e.g., the end of) the request queue 508…When an accelerator becomes available, there may be an on-deck job ready to be immediately started on it from the on-deck jobs queue 510. In one embodiment, when there is space in the on-deck jobs queue 510, requests may be loaded thereto from the request queue 508; [0049] A processing device (e.g., core) 406 may receive a request (e.g., from software) to perform an operation and may offload the operation (e.g., thread) to hardware accelerator 1 402. Request(s) may be stored in each respective, optional accelerator command queue (403, 453)).

Drysdale and Goyal fail to teach assign a work request based on quality of service (QoS) associated with the assigned work request.

However, Chan teaches assign a work request based on quality of service (QoS) associated with the assigned work request ([0131] The routing rules may specify other criteria that indicate when a work item should be assigned to VIP queue 702, such as urgency or the subject matter associated with the represented request or issue; [0118] The routing rules may factor other criteria that helps organize the work items in a manner that can reduce the duration that each work item spends assigned to a queue.).

It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have combined Drysdale and Goyal with the teachings of Chan because Chan’s teaching of assigning a work request based on the urgency or quality of service of the work request allows urgent work requests to execute faster (see Chan, [0131] The routing rules may specify other criteria that indicate when a work item should be assigned to VIP queue 702, such as urgency or the subject matter associated with the represented request or issue; [0118] The routing rules may factor other criteria that helps organize the work items in a manner that can reduce the duration that each work item spends assigned to a queue.).
	
	
Claim 10 is rejected under 35 U.S.C. 103 as being unpatentable over Drysdale and Goyal, as applied to claim 1 above, in view of Chinya et al. (US 20110072234 A1 hereinafter Chinya).

As per claim 10, Drysdale and Goyal teach the work scheduler apparatus of claim 1. Drysdale specifically teaches wherein after selection of an egress queue by the scheduler, the work scheduler is to receive a pointer to data from the entity that requested operations and perform pointer translation ([0049] A processing device (e.g., core) 406 may receive a request (e.g., from software) to perform an operation and may offload the operation (e.g., thread) to hardware accelerator 1 402. Request(s) may be stored in each respective, optional accelerator command queue (403, 453); [0032] the work descriptor contains (e.g., all) the information on what the job is and (e.g., all) pointers to the data; [0039] The context pointer may point to the control and/or configuration information for the job specific operation to be performed on a specific device. This context pointer may be provided during initialization (e.g., of a device) by the processor (e.g., CPU); [0184] an instruction translation lookaside buffer (TLB) 1036, which is coupled to an instruction fetch unit 1038, which is coupled to a decode unit 1040. The decode unit 1040 (or decoder or decoder unit) may decode instructions (e.g., macro-instructions); [0086] a hardware accelerator includes a (e.g., hardware) acceleration request manager to manage requests to the accelerator or accelerators.).

Drysdale and Goyal fail to teach wherein based on a target accelerator sharing virtual memory space with the entity that requested operations, the work scheduler is to receive a pointer to data 

However, Chinya teaches wherein based on a target accelerator sharing virtual memory space with the entity that requested operations, the work scheduler is to receive a pointer to data ([0011] a central processing unit (CPU) on a socket) to create and manage a fully shared virtual address space with accelerators; [0014] a shared virtual address space between cores on cache-coherent CPU sockets and accelerators; [0043] As seen each entry may include a page base address (PBA) field 410 which stores a PBA that points to a first address of a page stored in memory; [0021] Data can be shared on a demand basis based on access patterns to the shared data from the CPU; [0023] In order to construct a shared address space between the CPU and accelerator, a memory management unit may allow load/store accesses to the shared virtual address space to be sent to the remote memory based on the contents of page tables used to translate virtual-to-physical addresses; [0026] an access request from the CPU; [0041] allow applications to access memory on the accelerator, the CPU may analyze attribute bits so that it can route a load/store to a given virtual address to a remote physical memory location).

It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have combined Drysdale and Goyal with the teachings of Chinya because Chinya’s teaching of sharing virtual address space simplifies data sharing (see Chinya, [0022] the creation of a shared virtual address space between the CPU and accelerator cores in accordance with an embodiment of the present invention without needing explicit management of DMA operations greatly simplifies data sharing, as the entire application code and data can be placed in a common shared virtual address space without having to explicitly move the data by changing the application program, e.g., with a programmer's explicit orchestration of DMA operations.).

	
Claim 11 is rejected under 35 U.S.C. 103 as being unpatentable over Drysdale and Goyal, as applied to claim 1 above, in view of Hwang et al. (WO2014178450A1 hereinafter Hwang).
The claim mappings are made with a translation of WO2014178450A1.
As per claim 11, Drysdale and Goyal teach the work scheduler apparatus of claim 1. Drysdale specifically teaches wherein after selection of an egress queue by the scheduler ([0049] A processing device (e.g., core) 406 may receive a request (e.g., from software) to perform an operation and may offload the operation (e.g., thread) to hardware accelerator 1 402. Request(s) may be stored in each respective, optional accelerator command queue (403, 453); [0039] The context pointer may point to the control and/or configuration information for the job specific operation to be performed on a specific device. This context pointer may be provided during initialization (e.g., of a device) by the processor (e.g., CPU)).

Drysdale and Goyal fail to teach based on a target accelerator not sharing virtual or physical memory space with the entity that requested operations, the work scheduler is to use a data mover to copy data to memory accessible to the target accelerator.

However, Hwang teaches based on a target accelerator not sharing virtual or physical memory space with the entity that requested operations, the work scheduler is to use a data mover to copy data to memory accessible to the target accelerator (Fig. 1; [68] That is, the cached data itself is also copied as well as the ownership of the data. Thus, the data shared with the GPU; [56] since the physically separated memory has a physically separated memory, the virtual memory address space used by the CPU and the memory address space used by the GPU have been developed to be used differently; claim 1 receiving an operation requested by the CPU).

It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have combined Drysdale and Goyal with the teachings of Hwang because Hwang’s teaching of separate memories allows for the CPUs and GPUs to access local memory faster.

	
Claim 17 is rejected under 35 U.S.C. 103 as being unpatentable over Drysdale, as applied to claim 14 above, and in view of Chan.

As per claim 17, Drysdale teaches the method of claim 14, wherein allocating a work descriptor associated with the combined work descriptor to an egress queue based on a scheduling policy specified by the combined work descriptor comprises assigning a work request from an ingress queue to an egress queue ([0049] A processing device (e.g., core) 406 may receive a request (e.g., from software) to perform an operation and may offload the operation (e.g., thread) to hardware accelerator 1 402. Request(s) may be stored in each respective, optional accelerator command queue (403, 453); [0032] the work descriptor contains (e.g., all) the information on what the job is and (e.g., all) pointers to the data; [0039] The context pointer may point to the control and/or configuration information for the job specific operation to be performed on a specific device. This context pointer may be provided during initialization (e.g., of a device) by the processor (e.g., CPU); [0089] that request may be added to (e.g., the end of) the request queue 508. When an accelerator becomes available, there may be an on-deck job ready to be immediately started on it from the on-deck jobs queue 510. In one embodiment, when there is space in the on-deck jobs queue 510, requests may be loaded thereto from the request queue 508).

Drysdale fails to teach assigning a work request based on quality of service (QoS) associated with the work request.

However, Chan teaches assigning a work request based on quality of service (QoS) associated with the work request ([0131] The routing rules may specify other criteria that indicate when a work item should be assigned to VIP queue 702, such as urgency or the subject matter associated with the represented request or issue; [0118] The routing rules may factor other criteria that helps organize the work items in a manner that can reduce the duration that each work item spends assigned to a queue.).

It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have combined Drysdale with the teachings of Chan because Chan’s teaching of assigning a work request based on the urgency or quality of service of the work request allows urgent work requests to execute faster (see Chan, [0131] The routing rules may specify other criteria that indicate when a work item should be assigned to VIP queue 702, such as urgency or the subject matter associated with the represented request or issue; [0118] The routing rules may factor other criteria that helps organize the work items in a manner that can reduce the duration that each work item spends assigned to a queue.).
	
Claims 21 and 23 are rejected under 35 U.S.C. 103 as being unpatentable over Drysdale in view of Krishnakumar et al. (US 20110035752 A1 hereinafter Krishnakumar).

As per claim 21, Drysdale teaches a system comprising: 
a core (Fig. 3, 305 processor core 0…N; [0029] A (e.g., hardware) processor (e.g., having one or more cores)); 
a memory (Fig. 3, 304 a data storage device; [0041] a data storage device  (e.g., memory));
a work scheduler ([0086] a (e.g., hardware) acceleration request manager to manage requests to the accelerator or accelerators); 
at least one accelerator (Fig. 3, 302 hardware accelerator 0…M); and 
an interconnect to communicatively couple the core, the memory, the work scheduler, and the at least one accelerator ([0042] Cores, accelerators, and data storage device 304 may communicate (e.g., be coupled) with each other; [0079] One embodiment of a hardware acceleration request manager may be discussed in reference to its interaction with the processor; [0086] One example of the architecture of a hardware accelerator includes a (e.g., hardware) acceleration request manager to manage requests to the accelerator or accelerators. This may be centralized or each accelerator may include its own acceleration request manager), wherein: 
the core is to execute an application that is to request performance of a sequence of work based on a combined work descriptor and provide the combined work descriptor to the work scheduler via the interconnect ([0029] software may request an operation and a hardware processor (e.g., a core or cores thereof) may perform the operation in response to the request; [0079] One embodiment of a hardware acceleration request manager may be discussed in reference to its interaction with the processor, for example, by looking at the application programming interface (API) between the software drivers (e.g., of a software program requesting an operation) and the accelerator hardware. Initially, a (e.g., compression, decompression, or other accelerator specific) job may be created for the accelerator, for example, by being offloaded from the processor. In one embodiment, the software, processor, and/or the accelerator may reference this job via some unique job identification (e.g., Job ID); [0052] a field for a job description (e.g., job identification (ID) or other work descriptor(s))), 
the work scheduler comprises a scheduler logic, ingress queues, egress queues, and a command translator (Fig. 5; [0057] a processing device may decode and/or execute an instruction to cause a request (e.g., a command packet) to be sent to an accelerator; [0086] One example of the architecture of a hardware accelerator includes a (e.g., hardware) acceleration request manager to manage requests to the accelerator or accelerators. This may be centralized or each accelerator may include its own acceleration request manager. In the following discussion, reference may be made to a single stream but this may be applicable to embodiments with multiple streams. The number of job requests added to the request queue; [0089] that request may be added to (e.g., the end of) the request queue 508. When an accelerator becomes available, there may be an on-deck job ready to be immediately started on it from the on-deck jobs queue 510. In one embodiment, when there is space in the on-deck jobs queue 510, requests may be loaded thereto from the request queue 508; [0049] A processing device (e.g., core) 406 may receive a request (e.g., from software) to perform an operation and may offload the operation (e.g., thread) to hardware accelerator 1 402. Request(s) may be stored in each respective, optional accelerator command queue (403, 453); [0039] The context pointer may point to the control and/or configuration information for the job specific operation to be performed on a specific device. This context pointer may be provided during initialization (e.g., of a device) by the processor (e.g., CPU); [0184] an instruction translation lookaside buffer (TLB) 1036, which is coupled to an instruction fetch unit 1038, which is coupled to a decode unit 1040. The decode unit 1040 (or decoder or decoder unit) may decode instructions (e.g., macro-instructions)), 
the work scheduler is to access a work descriptor from the memory based on content of the combined work descriptor and allocate the work descriptor to an ingress queue for execution by an accelerator ([0032] Embodiments of accelerators (e.g., offload devices) may use different queuing methods. In one embodiment, the queuing methods rely on writing complete work descriptors to shared queues; [0089] when a request for an operation is received (e.g., a request to perform an operation by the processor and/or to be offloaded to the accelerator, the Job ID may be effectively compared to the Job ID for the active jobs 512 and the on-deck jobs queue 510. In one embodiment, if there is a match, the appropriate state is updated, for example, otherwise that request may be added to (e.g., the end of) the request queue 508. When an accelerator becomes available, there may be an on-deck job ready to be immediately started on it from the on-deck jobs queue 510; [0086] a hardware accelerator includes a (e.g., hardware) acceleration request manager to manage requests to the accelerator or accelerators; [0086] The Job ID may be stored in a context structure, but may (e.g., also) be stored in separate memory), 
the scheduler logic is to determine an egress queue and position in an egress queue for the work descriptor based in part on a configuration, the ingress queue is to receive another work descriptor after execution by the accelerator ([0058] Command packet may be read (e.g., by the accelerator) and values updated, e.g., in accelerator command queue 403, and the request in the command packet may be placed at the end of the queue (e.g., in an accelerator's static random-access memory (SRAM); [0049] A processing device (e.g., core) 406 may receive a request (e.g., from software) to perform an operation and may offload the operation (e.g., thread) to hardware accelerator 1 402. Request(s) may be stored in each respective, optional accelerator command queue (403, 453); [0032] the work descriptor contains (e.g., all) the information on what the job is and (e.g., all) pointers to the data; [0039] The context pointer may point to the control and/or configuration information for the job specific operation to be performed on a specific device; [0089] that request may be added to (e.g., the end of) the request queue 508. When an accelerator becomes available, there may be an on-deck job ready to be immediately started on it from the on-deck jobs queue 510. In one embodiment, when there is space in the on-deck jobs queue 510, requests may be loaded thereto from the request queue 508; [0046] Once accelerator 1 finishes consuming an input buffer, it may generates and send a command packet back to accelerator 0 indicating that buffer is ready for it to use), and 
to indicate data is available from the sequence of work ([0057] The processing device may write a go command (e.g., via an interconnect or port, such as, but not limited to, a memory-mapped input/output (MMIO) port) to the accelerator to indicate that new data is available; [0046] this shared buffer passing back and forth continues for as long as data is available for accelerator 0 to process and send to accelerator 1.).

	Drysdale fails to teach the work scheduler is to indicate data is available to the application

	However, Krishnakumar teaches the work scheduler is to indicate data is available to the application ([0065] scheduler 210 collects natively available trace data and transmits them to the separate application).

It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have combined Drysdale with the teachings of Krishnakumar to reduce costs (see Krishnakumar, [0010] The present invention enables the scheduling and execution of tasks without some of the costs).

As per claim 23, Drysdale and Krishnakumar teach the system of claim 21. Drysdale specifically teaches wherein an accelerator comprising one or more of: field programmable gate arrays (FPGAs), graphics processor units (GPUs), artificial intelligence (AI) inference engines, image recognition, object detection, speech recognition, memory, storage, central processing units (CPUs), software executed by a hardware device, or network interface (Fig. 2A; [0036] Certain embodiments herein of accelerators and/or I/O devices include, but are not limited to, compression, machine learning (e.g., neural networks), cryptography, I/O fabrics, storage controllers, network interfaces, graphics; [0212] accelerators (such as, e.g., graphics accelerators or digital signal processing (DSP) units)).

Claim 22 is rejected under 35 U.S.C. 103 as being unpatentable over Drysdale and Krishnakumar, as applied in claim 21 above, in view of Goyal.

As per claim 22, Drysdale and Krishnakumar teach the system of claim 21. Drysdale specifically teaches wherein the combined work descriptor is to refer to a first work request, the command translator to translate a first work request to a format accepted by the target accelerator ([0057] a processing device may decode and/or execute an instruction to cause a request (e.g., a command packet) to be sent to an accelerator; [0032] the work descriptor contains (e.g., all) the information on what the job is and (e.g., all) pointers to the data; [0184] an instruction translation lookaside buffer (TLB) 1036, which is coupled to an instruction fetch unit 1038, which is coupled to a decode unit 1040. The decode unit 1040 (or decoder or decoder unit) may decode instructions (e.g., macro-instructions)).

Drysdale and Krishnakumar fail to teach the first work request to include a reference to a second work request to be performed by a target accelerator.

However, Goyal teaches the first work request to include a reference to a second work request to be performed by a target accelerator (Fig. 12A; [0097] Each work unit may be represented by a fixed length data structure, or message, including an action value and one or more arguments…The other arguments of the work unit data structure may include a frame argument having a value acting as a pointer to a continuation work unit to invoke a subsequent work unit handler; [0107] the WU stack may represent a service chain of operations to be performed by one or more of processing cores 182; [0146] a group of software functions and/or accelerator operations may, when chained together for processing; [0175] the WU stack execution model described herein seamlessly blends hardware (e.g., accelerator nodes 462) and software functions (e.g., virtual processor nodes 461) to perform call chaining, pipelining, parallelization, and continuation processing. The WU stack enables standardization of a pipeline and service execution model. The WU stack also provides familiar call/return semantics for operations on streams of work units (e.g., packets), and enables optional bundling of state carried with a work unit (packet). Furthermore, the WU stack allows stream processing model and a more traditional computational model to be integrated in a two-dimensional execution model, as illustrated in FIG. 12A, thereby providing significant technical advantages during software development as well as execution at run-time).

It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have combined Drysdale and Krishnakumar with the teachings of Goyal because Goyal’s teaching of chaining operations provides technical advantages (see Goyal [0175] the WU stack execution model described herein seamlessly blends hardware (e.g., accelerator nodes 462) and software functions (e.g., virtual processor nodes 461) to perform call chaining, pipelining, parallelization, and continuation processing. The WU stack enables standardization of a pipeline and service execution model. The WU stack also provides familiar call/return semantics for operations on streams of work units (e.g., packets), and enables optional bundling of state carried with a work unit (packet). Furthermore, the WU stack allows stream processing model and a more traditional computational model to be integrated in a two-dimensional execution model, as illustrated in FIG. 12A, thereby providing significant technical advantages during software development as well as execution at run-time.).


Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to HSING CHUN LIN whose telephone number is (571)272-8522.  The examiner can normally be reached on Mon - Fri 9AM-5PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Meng-Ai An can be reached on (571)272-3756.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/MENG AI T AN/Supervisory Patent Examiner, Art Unit 2195                                                                                                                                                                                                        



/H.L./Examiner, Art Unit 2195