DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
  
Claims 20, 22-27, 29-34, 36-40, and 42-45 are pending in this office action and presented for examination. Claims 20, 22-25, 27, 29-32, 34, 36-40, and 42-45 are newly amended by the response received October 19, 2022.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 20, 22-25, 27, 29-32, 34, 36-38, 40, and 42-45 is/are rejected under 35 U.S.C. 103 as being unpatentable over Applicant Admitted Prior Art (AAPA) in view of Achilles et al. (Achilles) (US 20110307890) in view of Marshall et al. (Marshall) (US 6986022 B1) in view of Ashfield et al. (Ashfield) (US 20070294592 A1).
Consider claim 20, AAPA discloses a graphics processor ([0005], lines 1-4, modern graphics processors provide shared function (also referred to as "fixed function") pipeline and programmable execution units (EUs) or shaders pipeline for use by applications. However, current solutions are limited to using either EUs or shared function units ("SFUs", "shared functions", or simply "shaders")), the graphics processor comprising an execution resource to execute graphics processing operations ([00177], line 2, EU) and to invoke shared function units (SFUs) comprising hardware circuitry of the graphics processor that provides supplemental functionality for the execution resource of the graphics processor ([0005], line 4, SFUs; FIG. 8D, shared function unit 885; [00177], lines 5-7, an EU thread at instruction pipeline 881 assembles a message payload that contains message descriptor, input data, etc.; [00177], lines 9-10, the target SFU 885 receives the message and services accordingly; [00177], lines 10-11, SFU 865 sends output data to the EU); and messages between the execution resource and the SFU ([00177], lines 2-3, communication between EUs and shared pipelines is accomplished using packets of information called messages).
However, AAPA does not disclose an apparatus comprising: one or more processor hardware circuitry to: detect workloads for the aforementioned graphics processor; facilitate the aforementioned execution resource of the aforementioned graphics processor to place a status query message with a message gateway, the status query message to check on a workload distribution status of the aforementioned SFUs, wherein the message gateway is hosted to process the status query message from the aforementioned execution resource, and wherein the message gateway is to maintain a count to record a total number of the SFUs that are in use; determine, based on a response to the status query message from the message gateway, the workload distribution status of the aforementioned SFUs of the aforementioned graphics processor, wherein the workload distribution status indicates that either a workload of the aforementioned graphics processing operations can be distributed to the aforementioned SFUs responsive to the total number being less than a determined threshold or the workload can be distributed to the aforementioned execution resource responsive to the total number being greater than or equal to the determined threshold; and based on the workload distribution status of the aforementioned SFUs, determine distribution of the workload between the aforementioned SFUs and the aforementioned execution resource of the aforementioned graphics processor.
On the other hand, Achilles discloses an apparatus comprising: one or more processor hardware circuitry to: detect workloads for a graphics processor ([0032], lines 1-4, special-purpose accelerators are well-known devices used to provide an efficient method of offloading computationally intensive tasks from the general-purpose processor (e.g., CPU or microprocessor); [0032], lines 7-11, modern graphics processing units (GPUs) are very efficient at manipulating computer graphics, and their highly parallel structure makes them more effective than general-purpose CPUs for a range of complex algorithms); facilitate a processor to place a status query message with a message gateway, the status query message to check on a workload distribution status of an accelerator, wherein the message gateway is hosted to process the status query message from the processor ([0045], lines 1-9, in operation, the background monitoring task 52 (running in kernel space) periodically measures the status of the special purpose accelerator 58. One or more Boolean values are written to the shared memory 54. During runtime, the shim redirection layer 64 in the library function 62 reads the Boolean values in the shared memory in making a determination of whether to run the library call task on the general purpose processor (i.e. in software) or on the special purpose accelerator (i.e. in hardware); in other words, the recited status query message corresponds to the read request of the shared memory, and the message gateway corresponds to the shared memory), and wherein the message gateway is to maintain a value to record the accelerator is in use ([0041], lines 1-6, the mechanism generates one or more Boolean values (i.e. true/false values) as a result of the comparison of the current queue status to the one or more thresholds. Boolean examples include "overutilized" or not, system is busy or idle, etc. The resulting Boolean results are stored in shared memory for access by the shim redirection layer; in other words, when there is one accelerator, a Boolean 0 to indicate that the accelerator is idle is a count that records that 0 accelerators are in use, and a Boolean 1 to indicate that the accelerator is busy is a count that records that 1 accelerator is in use); determine, based on a response to the status query message from the message gateway, the workload distribution status of the graphics processor, wherein the workload distribution status indicates that either a workload of graphics processing operations can be distributed to the graphics processor responsive to the value being less than a determined threshold or the workload can be distributed to the processor responsive to the value being greater than or equal to the determined threshold; and based on the workload distribution status of the accelerator, determine distribution of the workload between the accelerator and the processor ([0041], lines 1-6, the mechanism generates one or more Boolean values (i.e. true/false values) as a result of the comparison of the current queue status to the one or more thresholds. Boolean examples include "overutilized" or not, system is busy or idle, etc. The resulting Boolean results are stored in shared memory for access by the shim redirection layer; in other words, when there is one accelerator, a Boolean 0 to indicate that the accelerator is idle is a count that records that 0 accelerators are in use, and a Boolean 1 to indicate that the accelerator is busy is a count that records that 1 accelerator is in use; [0036], lines 8-14, the mechanism provides two new components: (1) a background monitoring process/thread for monitoring the status of the special purpose accelerator (i.e. queue status, etc.); and (2) a shim redirection layer at the head of the library function operative to determine whether to send a task to the special purpose accelerator (i.e. hardware execution) or the general purpose processor (i.e. software execution)).
Achilles’s teaching results in improvements in throughput, latency, and quality (Achilles, [0037], lines 1-3).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Achilles with the invention of AAPA in order to result in improvements in throughput, latency, and quality. Note that the teaching of Achilles which entails checking on a status of an accelerator to determine distribution of workloads between the accelerator and a processor, when applied to AAPA which discloses SFUs and the execution resource in particular, results in the overall subject matter of “facilitate the execution resource of the graphics processor to place a status query message with a message gateway, the status query message to check on a workload distribution status of the SFUs, wherein the message gateway is hosted to process the status query message from the execution resource, and wherein the message gateway is to maintain a value to record each SFU that is in use; determine, based on a response to the status query message from the message gateway, the workload distribution status of the SFUs of the graphics processor, wherein the workload distribution status indicates that either a workload of the graphics processing operations can be distributed to the SFUs responsive to a value being less than a determined threshold or the workload can be distributed to the execution resource responsive to the values being greater than or equal to the determined threshold; and based on the workload distribution status of the SFUs, determine distribution of the workload between the SFUs and the execution resource of the graphics processor.
However, the combination thus far does not disclose a count to record a total number of the SFUs that are in use, wherein the workload can be distributed to the SFUs responsive to the total number being less than a determined threshold, or the workload can be distributed to the execution resource responsive to the total number being greater than or equal to the determined threshold. 
On the other hand, Marshall discloses a count that is based on a total number of units of a resource that are in use (col. 2, lines 4-10, in a typical implementation of a semaphore, a counter located in a memory stores the number of units of a resource that are free. When a processor accesses the resource the counter is decremented, and when a processor finishes with the resource the counter is incremented. While the counter is at zero, processors simply "busy-wait" until the resource becomes free.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Marshall with the combination of AAPA and Achilles, as this modification merely entails simple substitution of one known element (the teaching of the combination of AAPA and Achilles of using a value to determine whether an SFU is free to be used) for another (the teaching of Marshall of using a count that is based on a total number of units of a resource that are in use, to determine whether a unit of the resource is free to be used) to obtain predictable results (the combination of AAPA and Achilles, wherein a count that is based on a total number of the SFUs that are in use is used to determine whether an SFU is free to be used), which is an example of a rationale that may support a conclusion of obviousness, as per MPEP 2143. Note that the teaching of Marshall of using a count that is based on a total number of units of a resource that are in use, to determine whether a unit of the resource is free to be used, when applied to the invention of the combination of AAPA and Achilles wherein workload is distributed to SFUs or an execution resource based on whether an SFU is free to be used, results in the overall subject matter that the workload can be distributed to the SFUs responsive to the total number being greater than or equal to a determined threshold, or the workload can be distributed to the execution resource responsive to the total number being less than the determined threshold.
However, while the combination thus far entails a count that is based on a total number of the SFUs that are in use, the combination thus far does not entail a count that records a total number of the SFUs that are in use (because the count of the combination thus far records a total number of SFUs that are free, rather than a total number of SFUs that are busy), and thus does not entail that the workload can be distributed to the SFUs responsive to the total number being less than a determined threshold, or the workload can be distributed to the execution resource responsive to the total number being greater than or equal to the determined threshold.
On the other hand, Ashfield explicitly discloses the concept that a counter can be implemented in a number of ways, all of which are functionally equivalent; including counting up from zero to a maximum number and counting down from an initial maximum to zero ([0026], last 6 lines, as should be clear to a skilled person, a counter can be implemented in a number of ways, all of which are functionally equivalent; including counting up from zero to a maximum number and counting down from an initial maximum to zero).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Ashfield with the combination of AAPA, Achilles, and Marshall, as this modification merely entails simple substitution of one known element (the specific counter implementation of the combination of AAPA, Achilles, and Marshall, wherein the counter counts down from an initial maximum to zero) for another (a specific counter implementation of Ashfield, wherein the counter counts up from zero to a maximum number, which is explicitly disclosed as an alternative to a counter that counts down from an initial maximum to zero) to obtain predictable results (the combination of AAPA, Achilles, and Marshall, wherein a counter counts up from zero to a maximum number, to implemental functionally equivalent behavior). Note that in the combination of AAPA, Achilles, and Marshall, the counter counts down from an initial maximum to zero as SFUs are allocated, such that the counter records a total number of SFUs that are free; therefore, the combination of AAPA, Achilles, Marshall, and Ashfield, in implementing functionally equivalent behavior, entails a counter counting up from zero to a maximum number as SFUs are allocated, such that the counter records a total number of SFUs that are in use, and such that the workload can be distributed to the SFUs responsive to the total number being less than a determined threshold, or the workload can be distributed to the execution resource responsive to the total number being greater than or equal to the determined threshold.

Consider claim 22, the overall combination entails the apparatus of claim 20 (see above), wherein the one or more processor hardware circuitry are further to facilitate the message gateway to reply with a status notification message indicating the workload distribution status of the SFUs (Achilles, [0045], lines 1-9, in operation, the background monitoring task 52 (running in kernel space) periodically measures the status of the special purpose accelerator 58. One or more Boolean values are written to the shared memory 54. During runtime, the shim redirection layer 64 in the library function 62 reads the Boolean values in the shared memory in making a determination of whether to run the library call task on the general purpose processor (i.e. in software) or on the special purpose accelerator (i.e. in hardware); in other words, the recited status notification message corresponds to the read data of the shared memory, and the message gateway corresponds to the shared memory; AAPA, [00177], lines 2-3, communication between EUs and shared pipelines is accomplished using packets of information called messages; see the citations in the rejection of the independent claim regarding how the overall combination teaches the workload distribution status limitation).

Consider claim 23, the overall combination entails the apparatus of claim 22 (see above), wherein the one or more processor hardware circuitry are further to facilitate the SFUs to process one or more of the workloads responsive to the workload distribution status indicating that the workload can be distributed to the SFUs (Achilles, [0041], lines 1-6, the mechanism generates one or more Boolean values (i.e. true/false values) as a result of the comparison of the current queue status to the one or more thresholds. Boolean examples include "overutilized" or not, system is busy or idle, etc. The resulting Boolean results are stored in shared memory for access by the shim redirection layer; AAPA, [0005], line 4, SFUs; see the citations in the rejection of the independent claim regarding how the overall combination teaches the workload distribution status limitation).

Consider claim 24, the overall combination entails the apparatus of claim 23 (see above), wherein the one or more processor hardware circuitry are further to direct the one or more of the workloads to the execution resource for processing responsive to the workload distribution status indicating that the workload can be distributed to the execution resource (Achilles, [0041], lines 1-6, the mechanism generates one or more Boolean values (i.e. true/false values) as a result of the comparison of the current queue status to the one or more thresholds. Boolean examples include "overutilized" or not, system is busy or idle, etc. The resulting Boolean results are stored in shared memory for access by the shim redirection layer; [0037], lines 1-3, the advantage provided by the accelerator utilization improvement mechanism over using accelerators on their own is in throughput, latency and quality; AAPA, [00177], line 2, EU; see the citations in the rejection of the independent claim regarding how the overall combination teaches the workload distribution status limitation).

Consider claim 25, the overall combination entails the apparatus of claim 20 (see above), wherein the one or more processor hardware circuitry are further to generate a workload message to be communicated to a message register file, and deliver the workload message to the SFUs via the message register file, and facilitate the SFUs to process the workload message, wherein the workload message comprises the workload or a request for processing the workload (AAPA, [00177], FIG. 8D illustrates a conventional transaction 880 sequence of message flow between EU and shared function pipeline. As illustrated, communication between EUs and shared pipelines is accomplished using packets of information called messages. Message transmission is requested via send instructions. There are four basic phases of a message's lifetime, such as: 1) message creation, where an EU thread at instruction pipeline 881 assembles a message payload that contains message descriptor, input data, etc., and put it into message register file (MRF) 883; 2) message delivery, where EU thread issues the message for delivery from MRF 883 to SFU 885 via send instructions; 3) message processing, where the target SFU 885 receives the message and services accordingly; and 4) writing back response, where once the processing has complete, SFU 865 sends output data to the EU thread's general register file (GRF) in response to the message.)

Consider claim 27, AAPA discloses a graphics processor of a computing device ([0005], lines 1-4, modern graphics processors provide shared function (also referred to as "fixed function") pipeline and programmable execution units (EUs) or shaders pipeline for use by applications. However, current solutions are limited to using either EUs or shared function units ("SFUs", "shared functions", or simply "shaders")), the graphics processor comprising an execution resource to execute graphics processing operations ([00177], line 2, EU) and to invoke shared function units (SFUs) comprising hardware circuitry of the graphics processor that provides supplemental functionality for the execution resource of the graphics processor ([0005], line 4, SFUs; FIG. 8D, shared function unit 885; [00177], lines 5-7, an EU thread at instruction pipeline 881 assembles a message payload that contains message descriptor, input data, etc.; [00177], lines 9-10, the target SFU 885 receives the message and services accordingly; [00177], lines 10-11, SFU 865 sends output data to the EU); and messages between the execution resource and the SFU ([00177], lines 2-3, communication between EUs and shared pipelines is accomplished using packets of information called messages).
However, AAPA does not disclose a method comprising detecting workloads for the aforementioned graphics processor; facilitating the aforementioned execution resource of the aforementioned graphics processor to place a status query message with a message gateway, the status query message to check on a workload distribution status of the aforementioned SFUs, wherein the message gateway is hosted to process the status query message from the aforementioned execution resource, and wherein the message gateway is to maintain a count to record a total number of the SFUs that are in use; determining, based on a response to the status query message from the message gateway, the workload distribution status of the aforementioned SFUs of the aforementioned graphics processor, wherein the workload distribution status indicates that either a workload of the aforementioned graphics processing operations can be distributed to the aforementioned SFUs responsive to the total number being less than a determined threshold or the workload can be distributed to the aforementioned execution resource responsive to the total number being greater than or equal to the determined threshold; and based on the workload distribution status of the aforementioned SFUs, determining distribution of the workload between the aforementioned SFUs and the aforementioned execution resource of the aforementioned graphics processor.
On the other hand, Achilles discloses a method comprising detecting workloads for a graphics processor ([0032], lines 1-4, special-purpose accelerators are well-known devices used to provide an efficient method of offloading computationally intensive tasks from the general-purpose processor (e.g., CPU or microprocessor); [0032], lines 7-11, modern graphics processing units (GPUs) are very efficient at manipulating computer graphics, and their highly parallel structure makes them more effective than general-purpose CPUs for a range of complex algorithms); facilitate a processor to place a status query message with a message gateway, the status query message to check on a workload distribution status of an accelerator, wherein the message gateway is hosted to process the status query message from the processor ([0045], lines 1-9, in operation, the background monitoring task 52 (running in kernel space) periodically measures the status of the special purpose accelerator 58. One or more Boolean values are written to the shared memory 54. During runtime, the shim redirection layer 64 in the library function 62 reads the Boolean values in the shared memory in making a determination of whether to run the library call task on the general purpose processor (i.e. in software) or on the special purpose accelerator (i.e. in hardware); in other words, the recited status query message corresponds to the read request of the shared memory, and the message gateway corresponds to the shared memory), and wherein the message gateway is to maintain a value to record the accelerator is in use ([0041], lines 1-6, the mechanism generates one or more Boolean values (i.e. true/false values) as a result of the comparison of the current queue status to the one or more thresholds. Boolean examples include "overutilized" or not, system is busy or idle, etc. The resulting Boolean results are stored in shared memory for access by the shim redirection layer; in other words, when there is one accelerator, a Boolean 0 to indicate that the accelerator is idle is a count that records that 0 accelerators are in use, and a Boolean 1 to indicate that the accelerator is busy is a count that records that 1 accelerator is in use); determine, based on a response to the status query message from the message gateway, the workload distribution status of the graphics processor, wherein the workload distribution status indicates that either a workload of graphics processing operations can be distributed to the graphics processor responsive to the value being less than a determined threshold or the workload can be distributed to the processor responsive to the value being greater than or equal to the determined threshold; and based on the workload distribution status of the accelerator, determine distribution of the workload between the accelerator and the processor ([0041], lines 1-6, the mechanism generates one or more Boolean values (i.e. true/false values) as a result of the comparison of the current queue status to the one or more thresholds. Boolean examples include "overutilized" or not, system is busy or idle, etc. The resulting Boolean results are stored in shared memory for access by the shim redirection layer; in other words, when there is one accelerator, a Boolean 0 to indicate that the accelerator is idle is a count that records that 0 accelerators are in use, and a Boolean 1 to indicate that the accelerator is busy is a count that records that 1 accelerator is in use; [0036], lines 8-14, the mechanism provides two new components: (1) a background monitoring process/thread for monitoring the status of the special purpose accelerator (i.e. queue status, etc.); and (2) a shim redirection layer at the head of the library function operative to determine whether to send a task to the special purpose accelerator (i.e. hardware execution) or the general purpose processor (i.e. software execution)).
Achilles’s teaching results in improvements in throughput, latency, and quality (Achilles, [0037], lines 1-3).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Achilles with the invention of AAPA in order to result in improvements in throughput, latency, and quality. Note that the teaching of Achilles which entails checking on a status of an accelerator to determine distribution of workloads between the accelerator and a processor, when applied to AAPA which discloses SFUs and the execution resource in particular, results in the overall subject matter of “facilitate the execution resource of the graphics processor to place a status query message with a message gateway, the status query message to check on a workload distribution status of the SFUs, wherein the message gateway is hosted to process the status query message from the execution resource, and wherein the message gateway is to maintain a value to record each SFU that is in use; determine, based on a response to the status query message from the message gateway, the workload distribution status of the SFUs of the graphics processor, wherein the workload distribution status indicates that either a workload of the graphics processing operations can be distributed to the SFUs responsive to a value being less than a determined threshold or the workload can be distributed to the execution resource responsive to the values being greater than or equal to the determined threshold; and based on the workload distribution status of the SFUs, determine distribution of the workload between the SFUs and the execution resource of the graphics processor.
However, the combination thus far does not disclose a count to record a total number of the SFUs that are in use, wherein the workload can be distributed to the SFUs responsive to the total number being less than a determined threshold, or the workload can be distributed to the execution resource responsive to the total number being greater than or equal to the determined threshold. 
On the other hand, Marshall discloses a count that is based on a total number of units of a resource that are in use (col. 2, lines 4-10, in a typical implementation of a semaphore, a counter located in a memory stores the number of units of a resource that are free. When a processor accesses the resource the counter is decremented, and when a processor finishes with the resource the counter is incremented. While the counter is at zero, processors simply "busy-wait" until the resource becomes free.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Marshall with the combination of AAPA and Achilles, as this modification merely entails simple substitution of one known element (the teaching of the combination of AAPA and Achilles of using a value to determine whether an SFU is free to be used) for another (the teaching of Marshall of using a count that is based on a total number of units of a resource that are in use, to determine whether a unit of the resource is free to be used) to obtain predictable results (the combination of AAPA and Achilles, wherein a count that is based on a total number of the SFUs that are in use is used to determine whether an SFU is free to be used), which is an example of a rationale that may support a conclusion of obviousness, as per MPEP 2143. Note that the teaching of Marshall of using a count that is based on a total number of units of a resource that are in use, to determine whether a unit of the resource is free to be used, when applied to the invention of the combination of AAPA and Achilles wherein workload is distributed to SFUs or an execution resource based on whether an SFU is free to be used, results in the overall subject matter that the workload can be distributed to the SFUs responsive to the total number being greater than or equal to a determined threshold, or the workload can be distributed to the execution resource responsive to the total number being less than the determined threshold.
However, while the combination thus far entails a count that is based on a total number of the SFUs that are in use, the combination thus far does not entail a count that records a total number of the SFUs that are in use (because the count of the combination thus far records a total number of SFUs that are free, rather than a total number of SFUs that are busy), and thus does not entail that the workload can be distributed to the SFUs responsive to the total number being less than a determined threshold, or the workload can be distributed to the execution resource responsive to the total number being greater than or equal to the determined threshold.
On the other hand, Ashfield explicitly discloses the concept that a counter can be implemented in a number of ways, all of which are functionally equivalent; including counting up from zero to a maximum number and counting down from an initial maximum to zero ([0026], last 6 lines, as should be clear to a skilled person, a counter can be implemented in a number of ways, all of which are functionally equivalent; including counting up from zero to a maximum number and counting down from an initial maximum to zero).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Ashfield with the combination of AAPA, Achilles, and Marshall, as this modification merely entails simple substitution of one known element (the specific counter implementation of the combination of AAPA, Achilles, and Marshall, wherein the counter counts down from an initial maximum to zero) for another (a specific counter implementation of Ashfield, wherein the counter counts up from zero to a maximum number, which is explicitly disclosed as an alternative to a counter that counts down from an initial maximum to zero) to obtain predictable results (the combination of AAPA, Achilles, and Marshall, wherein a counter counts up from zero to a maximum number, to implemental functionally equivalent behavior). Note that in the combination of AAPA, Achilles, and Marshall, the counter counts down from an initial maximum to zero as SFUs are allocated, such that the counter records a total number of SFUs that are free; therefore, the combination of AAPA, Achilles, Marshall, and Ashfield, in implementing functionally equivalent behavior, entails a counter counting up from zero to a maximum number as SFUs are allocated, such that the counter records a total number of SFUs that are in use, and such that the workload can be distributed to the SFUs responsive to the total number being less than a determined threshold, or the workload can be distributed to the execution resource responsive to the total number being greater than or equal to the determined threshold.

Consider claim 29, the overall combination entails the method of claim 27 (see above), further comprising facilitating the message gateway to reply with a status notification message indicating the workload distribution status of the SFUs (Achilles, [0045], lines 1-9, in operation, the background monitoring task 52 (running in kernel space) periodically measures the status of the special purpose accelerator 58. One or more Boolean values are written to the shared memory 54. During runtime, the shim redirection layer 64 in the library function 62 reads the Boolean values in the shared memory in making a determination of whether to run the library call task on the general purpose processor (i.e. in software) or on the special purpose accelerator (i.e. in hardware); in other words, the recited status notification message corresponds to the read data of the shared memory, and the message gateway corresponds to the shared memory; AAPA, [00177], lines 2-3, communication between EUs and shared pipelines is accomplished using packets of information called messages; see the citations in the rejection of the independent claim regarding how the overall combination teaches the workload distribution status limitation).

Consider claim 30, the overall combination entails the method of claim 29 (see above), further comprising facilitating the SFUs to process one or more of the workloads responsive to the workload distribution status indicating that the workload can be distributed to the SFUs (Achilles, [0041], lines 1-6, the mechanism generates one or more Boolean values (i.e. true/false values) as a result of the comparison of the current queue status to the one or more thresholds. Boolean examples include "overutilized" or not, system is busy or idle, etc. The resulting Boolean results are stored in shared memory for access by the shim redirection layer; AAPA, [0005], line 4, SFUs; see the citations in the rejection of the independent claim regarding how the overall combination teaches the workload distribution status limitation).

Consider claim 31, the overall combination entails the method of claim 30 (see above), further comprising directing the one or more of the workloads to the execution resource for processing responsive to the workload distribution status indicating that the workload can be distributed to the execution resource (Achilles, [0041], lines 1-6, the mechanism generates one or more Boolean values (i.e. true/false values) as a result of the comparison of the current queue status to the one or more thresholds. Boolean examples include "overutilized" or not, system is busy or idle, etc. The resulting Boolean results are stored in shared memory for access by the shim redirection layer; [0037], lines 1-3, the advantage provided by the accelerator utilization improvement mechanism over using accelerators on their own is in throughput, latency and quality; AAPA, [00177], line 2, EU; see the citations in the rejection of the independent claim regarding how the overall combination teaches the workload distribution status limitation).

Consider claim 32, the overall combination entails the method of claim 27 (see above), further comprising generating a workload message to be communicated to a message register file, delivering the workload message to the SFUs via the message register file, and facilitating the SFUs to process the workload message, wherein the workload message comprises the workload or a request for processing the workload (AAPA, [00177], FIG. 8D illustrates a conventional transaction 880 sequence of message flow between EU and shared function pipeline. As illustrated, communication between EUs and shared pipelines is accomplished using packets of information called messages. Message transmission is requested via send instructions. There are four basic phases of a message's lifetime, such as: 1) message creation, where an EU thread at instruction pipeline 881 assembles a message payload that contains message descriptor, input data, etc., and put it into message register file (MRF) 883; 2) message delivery, where EU thread issues the message for delivery from MRF 883 to SFU 885 via send instructions; 3) message processing, where the target SFU 885 receives the message and services accordingly; and 4) writing back response, where once the processing has complete, SFU 865 sends output data to the EU thread's general register file (GRF) in response to the message.)

Consider claim 34, AAPA discloses a graphics processor of a local computing device ([0005], lines 1-4, modern graphics processors provide shared function (also referred to as "fixed function") pipeline and programmable execution units (EUs) or shaders pipeline for use by applications. However, current solutions are limited to using either EUs or shared function units ("SFUs", "shared functions", or simply "shaders")), the graphics processor comprising an execution resource to execute graphics processing operations ([00177], line 2, EU) and to invoke shared function units (SFUs) comprising hardware circuitry of the graphics processor that provides supplemental functionality for the execution resource of the graphics processor ([0005], line 4, SFUs; FIG. 8D, shared function unit 885; [00177], lines 5-7, an EU thread at instruction pipeline 881 assembles a message payload that contains message descriptor, input data, etc.; [00177], lines 9-10, the target SFU 885 receives the message and services accordingly; [00177], lines 10-11, SFU 865 sends output data to the EU); and messages between the execution resource and the SFU ([00177], lines 2-3, communication between EUs and shared pipelines is accomplished using packets of information called messages).
However, AAPA does not disclose a method comprising detecting workloads for the aforementioned graphics processor; facilitating the aforementioned execution resource of the aforementioned graphics processor to place a status query message with a message gateway, the status query message to check on a workload distribution status of the aforementioned SFUs, wherein the message gateway is hosted to process the status query message from the aforementioned execution resource, and wherein the message gateway is to maintain a count to record a total number of the SFUs that are in use; determining, based on a response to the status query message from the message gateway, the workload distribution status of the aforementioned SFUs of the aforementioned graphics processor, wherein the workload distribution status indicates that either a workload of the aforementioned graphics processing operations can be distributed to the aforementioned SFUs responsive to the total number being less than a determined threshold or the workload can be distributed to the aforementioned execution resource responsive to the total number being greater than or equal to the determined threshold; and based on the workload distribution status of the aforementioned SFUs, determining distribution of the workload between the aforementioned SFUs and the aforementioned execution resource of the aforementioned graphics processor. AAPA also does not disclose a non-transitory machine-readable medium comprising instructions that when executed by the local computing device, cause the local computing device to perform the aforementioned operations.
On the other hand, Achilles discloses an apparatus comprising: one or more processor hardware circuitry to: detect workloads for a graphics processor ([0032], lines 1-4, special-purpose accelerators are well-known devices used to provide an efficient method of offloading computationally intensive tasks from the general-purpose processor (e.g., CPU or microprocessor); [0032], lines 7-11, modern graphics processing units (GPUs) are very efficient at manipulating computer graphics, and their highly parallel structure makes them more effective than general-purpose CPUs for a range of complex algorithms); facilitate a processor to place a status query message with a message gateway, the status query message to check on a workload distribution status of an accelerator, wherein the message gateway is hosted to process the status query message from the processor ([0045], lines 1-9, in operation, the background monitoring task 52 (running in kernel space) periodically measures the status of the special purpose accelerator 58. One or more Boolean values are written to the shared memory 54. During runtime, the shim redirection layer 64 in the library function 62 reads the Boolean values in the shared memory in making a determination of whether to run the library call task on the general purpose processor (i.e. in software) or on the special purpose accelerator (i.e. in hardware); in other words, the recited status query message corresponds to the read request of the shared memory, and the message gateway corresponds to the shared memory), and wherein the message gateway is to maintain a value to record the accelerator is in use ([0041], lines 1-6, the mechanism generates one or more Boolean values (i.e. true/false values) as a result of the comparison of the current queue status to the one or more thresholds. Boolean examples include "overutilized" or not, system is busy or idle, etc. The resulting Boolean results are stored in shared memory for access by the shim redirection layer; in other words, when there is one accelerator, a Boolean 0 to indicate that the accelerator is idle is a count that records that 0 accelerators are in use, and a Boolean 1 to indicate that the accelerator is busy is a count that records that 1 accelerator is in use); determine, based on a response to the status query message from the message gateway, the workload distribution status of the graphics processor, wherein the workload distribution status indicates that either a workload of graphics processing operations can be distributed to the graphics processor responsive to the value being less than a determined threshold or the workload can be distributed to the processor responsive to the value being greater than or equal to the determined threshold; and based on the workload distribution status of the accelerator, determine distribution of the workload between the accelerator and the processor ([0041], lines 1-6, the mechanism generates one or more Boolean values (i.e. true/false values) as a result of the comparison of the current queue status to the one or more thresholds. Boolean examples include "overutilized" or not, system is busy or idle, etc. The resulting Boolean results are stored in shared memory for access by the shim redirection layer; in other words, when there is one accelerator, a Boolean 0 to indicate that the accelerator is idle is a count that records that 0 accelerators are in use, and a Boolean 1 to indicate that the accelerator is busy is a count that records that 1 accelerator is in use; [0036], lines 8-14, the mechanism provides two new components: (1) a background monitoring process/thread for monitoring the status of the special purpose accelerator (i.e. queue status, etc.); and (2) a shim redirection layer at the head of the library function operative to determine whether to send a task to the special purpose accelerator (i.e. hardware execution) or the general purpose processor (i.e. software execution)). Achilles also discloses a non-transitory machine-readable medium comprising instructions that when executed by a local computing device, cause the local computing device to perform the aforementioned operations ([0018], lines 9-12, the present invention may take the form of a computer program product embodied in any tangible medium of expression having computer usable program code embodied in the medium).
Achilles’s teaching results in improvements in throughput, latency, and quality (Achilles, [0037], lines 1-3).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Achilles with the invention of AAPA in order to result in improvements in throughput, latency, and quality. Note that the teaching of Achilles which entails checking on a status of an accelerator to determine distribution of workloads between the accelerator and a processor, when applied to AAPA which discloses SFUs and the execution resource in particular, results in the overall subject matter of “facilitate the execution resource of the graphics processor to place a status query message with a message gateway, the status query message to check on a workload distribution status of the SFUs, wherein the message gateway is hosted to process the status query message from the execution resource, and wherein the message gateway is to maintain a value to record each SFU that is in use; determine, based on a response to the status query message from the message gateway, the workload distribution status of the SFUs of the graphics processor, wherein the workload distribution status indicates that either a workload of the graphics processing operations can be distributed to the SFUs responsive to a value being less than a determined threshold or the workload can be distributed to the execution resource responsive to the values being greater than or equal to the determined threshold; and based on the workload distribution status of the SFUs, determine distribution of the workload between the SFUs and the execution resource of the graphics processor. Furthermore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to implement the aforementioned combination of AAPA and Achilles using a non-transitory machine-readable medium, as further taught by Achilles, as this modification merely entails a combination of prior art elements according to known methods to yield predictable results, which is an exemplary rationale that may support a conclusion of obviousness, as per MPEP 2143.
However, the combination thus far does not disclose a count to record a total number of the SFUs that are in use, wherein the workload can be distributed to the SFUs responsive to the total number being less than a determined threshold, or the workload can be distributed to the execution resource responsive to the total number being greater than or equal to the determined threshold. 
On the other hand, Marshall discloses a count that is based on a total number of units of a resource that are in use (col. 2, lines 4-10, in a typical implementation of a semaphore, a counter located in a memory stores the number of units of a resource that are free. When a processor accesses the resource the counter is decremented, and when a processor finishes with the resource the counter is incremented. While the counter is at zero, processors simply "busy-wait" until the resource becomes free.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Marshall with the combination of AAPA and Achilles, as this modification merely entails simple substitution of one known element (the teaching of the combination of AAPA and Achilles of using a value to determine whether an SFU is free to be used) for another (the teaching of Marshall of using a count that is based on a total number of units of a resource that are in use, to determine whether a unit of the resource is free to be used) to obtain predictable results (the combination of AAPA and Achilles, wherein a count that is based on a total number of the SFUs that are in use is used to determine whether an SFU is free to be used), which is an example of a rationale that may support a conclusion of obviousness, as per MPEP 2143. Note that the teaching of Marshall of using a count that is based on a total number of units of a resource that are in use, to determine whether a unit of the resource is free to be used, when applied to the invention of the combination of AAPA and Achilles wherein workload is distributed to SFUs or an execution resource based on whether an SFU is free to be used, results in the overall subject matter that the workload can be distributed to the SFUs responsive to the total number being greater than or equal to a determined threshold, or the workload can be distributed to the execution resource responsive to the total number being less than the determined threshold.
However, while the combination thus far entails a count that is based on a total number of the SFUs that are in use, the combination thus far does not entail a count that records a total number of the SFUs that are in use (because the count of the combination thus far records a total number of SFUs that are free, rather than a total number of SFUs that are busy), and thus does not entail that the workload can be distributed to the SFUs responsive to the total number being less than a determined threshold, or the workload can be distributed to the execution resource responsive to the total number being greater than or equal to the determined threshold.
On the other hand, Ashfield explicitly discloses the concept that a counter can be implemented in a number of ways, all of which are functionally equivalent; including counting up from zero to a maximum number and counting down from an initial maximum to zero ([0026], last 6 lines, as should be clear to a skilled person, a counter can be implemented in a number of ways, all of which are functionally equivalent; including counting up from zero to a maximum number and counting down from an initial maximum to zero).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Ashfield with the combination of AAPA, Achilles, and Marshall, as this modification merely entails simple substitution of one known element (the specific counter implementation of the combination of AAPA, Achilles, and Marshall, wherein the counter counts down from an initial maximum to zero) for another (a specific counter implementation of Ashfield, wherein the counter counts up from zero to a maximum number, which is explicitly disclosed as an alternative to a counter that counts down from an initial maximum to zero) to obtain predictable results (the combination of AAPA, Achilles, and Marshall, wherein a counter counts up from zero to a maximum number, to implemental functionally equivalent behavior). Note that in the combination of AAPA, Achilles, and Marshall, the counter counts down from an initial maximum to zero as SFUs are allocated, such that the counter records a total number of SFUs that are free; therefore, the combination of AAPA, Achilles, Marshall, and Ashfield, in implementing functionally equivalent behavior, entails a counter counting up from zero to a maximum number as SFUs are allocated, such that the counter records a total number of SFUs that are in use, and such that the workload can be distributed to the SFUs responsive to the total number being less than a determined threshold, or the workload can be distributed to the execution resource responsive to the total number being greater than or equal to the determined threshold.

Consider claim 36, the non-transitory machine-readable medium of claim 34 (see above), wherein the operations further comprise facilitating the message gateway to reply with a status notification message indicating the workload distribution status of the SFUs (Achilles, [0045], lines 1-9, in operation, the background monitoring task 52 (running in kernel space) periodically measures the status of the special purpose accelerator 58. One or more Boolean values are written to the shared memory 54. During runtime, the shim redirection layer 64 in the library function 62 reads the Boolean values in the shared memory in making a determination of whether to run the library call task on the general purpose processor (i.e. in software) or on the special purpose accelerator (i.e. in hardware); in other words, the recited status notification message corresponds to the read data of the shared memory, and the message gateway corresponds to the shared memory; AAPA, [00177], lines 2-3, communication between EUs and shared pipelines is accomplished using packets of information called messages; see the citations in the rejection of the independent claim regarding how the overall combination teaches the workload distribution status limitation).

Consider claim 37, the overall combination entails the non-transitory machine-readable medium of claim 36 (see above), wherein the operations further comprise facilitating the SFUs to process one or more of the workloads responsive to the workload distribution status indicating that the workload can be distributed to the SFUs (Achilles, [0041], lines 1-6, the mechanism generates one or more Boolean values (i.e. true/false values) as a result of the comparison of the current queue status to the one or more thresholds. Boolean examples include "overutilized" or not, system is busy or idle, etc. The resulting Boolean results are stored in shared memory for access by the shim redirection layer; AAPA, [0005], line 4, SFUs; see the citations in the rejection of the independent claim regarding how the overall combination teaches the workload distribution status limitation).

Consider claim 38, the overall combination entails the non-transitory machine-readable medium of claim 37 (see above), wherein the operations further comprise directing the one or more of the workloads to the execution resource for processing responsive to the workload distribution status indicating that the workload can be distributed to the execution resource (Achilles, [0041], lines 1-6, the mechanism generates one or more Boolean values (i.e. true/false values) as a result of the comparison of the current queue status to the one or more thresholds. Boolean examples include "overutilized" or not, system is busy or idle, etc. The resulting Boolean results are stored in shared memory for access by the shim redirection layer; [0037], lines 1-3, the advantage provided by the accelerator utilization improvement mechanism over using accelerators on their own is in throughput, latency and quality; AAPA, [00177], line 2, EU; see the citations in the rejection of the independent claim regarding how the overall combination teaches the workload distribution status limitation).

Consider claim 40, AAPA discloses a graphics processor ([0005], lines 1-4, modern graphics processors provide shared function (also referred to as "fixed function") pipeline and programmable execution units (EUs) or shaders pipeline for use by applications. However, current solutions are limited to using either EUs or shared function units ("SFUs", "shared functions", or simply "shaders")), the graphics processor comprising an execution resource to execute graphics processing operations ([00177], line 2, EU) and to invoke shared function units (SFUs) comprising hardware circuitry of the graphics processor that provides supplemental functionality for the execution resource of the graphics processor ([0005], line 4, SFUs; FIG. 8D, shared function unit 885; [00177], lines 5-7, an EU thread at instruction pipeline 881 assembles a message payload that contains message descriptor, input data, etc.; [00177], lines 9-10, the target SFU 885 receives the message and services accordingly; [00177], lines 10-11, SFU 865 sends output data to the EU); and messages between the execution resource and the SFU ([00177], lines 2-3, communication between EUs and shared pipelines is accomplished using packets of information called messages).
However, AAPA does not disclose a system comprising: a memory; and one or more processor hardware circuitry communicably coupled to the memory, the one or more processor hardware circuitry to: detect workloads for the aforementioned graphics processor; facilitate the aforementioned execution resource of the aforementioned graphics processor to place a status query message with a message gateway, the status query message to check on a workload distribution status of the aforementioned SFUs, wherein the message gateway is hosted to process the status query message from the aforementioned execution resource, and wherein the message gateway is to maintain a count to record a total number of the SFUs that are in use; determine, based on a response to the status query message from the message gateway, the workload distribution status of the aforementioned SFUs of the aforementioned graphics processor, wherein the workload distribution status indicates that either a workload of the aforementioned graphics processing operations can be distributed to the aforementioned SFUs responsive to the total number being less than a determined threshold or the workload can be distributed to the aforementioned execution resource responsive to the total number being greater than or equal to the determined threshold; and based on the workload distribution status of the aforementioned SFUs, determine distribution of the workload between the aforementioned SFUs and the aforementioned execution resource of the aforementioned graphics processor.
On the other hand, Achilles discloses a system comprising a memory (Figure 1, ROM 18, RAM 20, data storage 21, Flash 16); and one or more processor hardware circuitry communicably coupled to the memory, the one or more processor hardware circuitry to: detect workloads for a graphics processor ([0032], lines 1-4, special-purpose accelerators are well-known devices used to provide an efficient method of offloading computationally intensive tasks from the general-purpose processor (e.g., CPU or microprocessor); [0032], lines 7-11, modern graphics processing units (GPUs) are very efficient at manipulating computer graphics, and their highly parallel structure makes them more effective than general-purpose CPUs for a range of complex algorithms); facilitate a processor to place a status query message with a message gateway, the status query message to check on a workload distribution status of an accelerator, wherein the message gateway is hosted to process the status query message from the processor ([0045], lines 1-9, in operation, the background monitoring task 52 (running in kernel space) periodically measures the status of the special purpose accelerator 58. One or more Boolean values are written to the shared memory 54. During runtime, the shim redirection layer 64 in the library function 62 reads the Boolean values in the shared memory in making a determination of whether to run the library call task on the general purpose processor (i.e. in software) or on the special purpose accelerator (i.e. in hardware); in other words, the recited status query message corresponds to the read request of the shared memory, and the message gateway corresponds to the shared memory), and wherein the message gateway is to maintain a value to record the accelerator is in use ([0041], lines 1-6, the mechanism generates one or more Boolean values (i.e. true/false values) as a result of the comparison of the current queue status to the one or more thresholds. Boolean examples include "overutilized" or not, system is busy or idle, etc. The resulting Boolean results are stored in shared memory for access by the shim redirection layer; in other words, when there is one accelerator, a Boolean 0 to indicate that the accelerator is idle is a count that records that 0 accelerators are in use, and a Boolean 1 to indicate that the accelerator is busy is a count that records that 1 accelerator is in use); determine, based on a response to the status query message from the message gateway, the workload distribution status of the graphics processor, wherein the workload distribution status indicates that either a workload of graphics processing operations can be distributed to the graphics processor responsive to the value being less than a determined threshold or the workload can be distributed to the processor responsive to the value being greater than or equal to the determined threshold; and based on the workload distribution status of the accelerator, determine distribution of the workload between the accelerator and the processor ([0041], lines 1-6, the mechanism generates one or more Boolean values (i.e. true/false values) as a result of the comparison of the current queue status to the one or more thresholds. Boolean examples include "overutilized" or not, system is busy or idle, etc. The resulting Boolean results are stored in shared memory for access by the shim redirection layer; in other words, when there is one accelerator, a Boolean 0 to indicate that the accelerator is idle is a count that records that 0 accelerators are in use, and a Boolean 1 to indicate that the accelerator is busy is a count that records that 1 accelerator is in use; [0036], lines 8-14, the mechanism provides two new components: (1) a background monitoring process/thread for monitoring the status of the special purpose accelerator (i.e. queue status, etc.); and (2) a shim redirection layer at the head of the library function operative to determine whether to send a task to the special purpose accelerator (i.e. hardware execution) or the general purpose processor (i.e. software execution)).
Achilles’s teaching results in improvements in throughput, latency, and quality (Achilles, [0037], lines 1-3).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Achilles with the invention of AAPA in order to result in improvements in throughput, latency, and quality. Note that the teaching of Achilles which entails checking on a status of an accelerator to determine distribution of workloads between the accelerator and a processor, when applied to AAPA which discloses SFUs and the execution resource in particular, results in the overall subject matter of “facilitate the execution resource of the graphics processor to place a status query message with a message gateway, the status query message to check on a workload distribution status of the SFUs, wherein the message gateway is hosted to process the status query message from the execution resource, and wherein the message gateway is to maintain a value to record each SFU that is in use; determine, based on a response to the status query message from the message gateway, the workload distribution status of the SFUs of the graphics processor, wherein the workload distribution status indicates that either a workload of the graphics processing operations can be distributed to the SFUs responsive to a value being less than a determined threshold or the workload can be distributed to the execution resource responsive to the values being greater than or equal to the determined threshold; and based on the workload distribution status of the SFUs, determine distribution of the workload between the SFUs and the execution resource of the graphics processor.
However, the combination thus far does not disclose a count to record a total number of the SFUs that are in use, wherein the workload can be distributed to the SFUs responsive to the total number being less than a determined threshold, or the workload can be distributed to the execution resource responsive to the total number being greater than or equal to the determined threshold. 
On the other hand, Marshall discloses a count that is based on a total number of units of a resource that are in use (col. 2, lines 4-10, in a typical implementation of a semaphore, a counter located in a memory stores the number of units of a resource that are free. When a processor accesses the resource the counter is decremented, and when a processor finishes with the resource the counter is incremented. While the counter is at zero, processors simply "busy-wait" until the resource becomes free.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Marshall with the combination of AAPA and Achilles, as this modification merely entails simple substitution of one known element (the teaching of the combination of AAPA and Achilles of using a value to determine whether an SFU is free to be used) for another (the teaching of Marshall of using a count that is based on a total number of units of a resource that are in use, to determine whether a unit of the resource is free to be used) to obtain predictable results (the combination of AAPA and Achilles, wherein a count that is based on a total number of the SFUs that are in use is used to determine whether an SFU is free to be used), which is an example of a rationale that may support a conclusion of obviousness, as per MPEP 2143. Note that the teaching of Marshall of using a count that is based on a total number of units of a resource that are in use, to determine whether a unit of the resource is free to be used, when applied to the invention of the combination of AAPA and Achilles wherein workload is distributed to SFUs or an execution resource based on whether an SFU is free to be used, results in the overall subject matter that the workload can be distributed to the SFUs responsive to the total number being greater than or equal to a determined threshold, or the workload can be distributed to the execution resource responsive to the total number being less than the determined threshold.
However, while the combination thus far entails a count that is based on a total number of the SFUs that are in use, the combination thus far does not entail a count that records a total number of the SFUs that are in use (because the count of the combination thus far records a total number of SFUs that are free, rather than a total number of SFUs that are busy), and thus does not entail that the workload can be distributed to the SFUs responsive to the total number being less than a determined threshold, or the workload can be distributed to the execution resource responsive to the total number being greater than or equal to the determined threshold.
On the other hand, Ashfield explicitly discloses the concept that a counter can be implemented in a number of ways, all of which are functionally equivalent; including counting up from zero to a maximum number and counting down from an initial maximum to zero ([0026], last 6 lines, as should be clear to a skilled person, a counter can be implemented in a number of ways, all of which are functionally equivalent; including counting up from zero to a maximum number and counting down from an initial maximum to zero).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Ashfield with the combination of AAPA, Achilles, and Marshall, as this modification merely entails simple substitution of one known element (the specific counter implementation of the combination of AAPA, Achilles, and Marshall, wherein the counter counts down from an initial maximum to zero) for another (a specific counter implementation of Ashfield, wherein the counter counts up from zero to a maximum number, which is explicitly disclosed as an alternative to a counter that counts down from an initial maximum to zero) to obtain predictable results (the combination of AAPA, Achilles, and Marshall, wherein a counter counts up from zero to a maximum number, to implemental functionally equivalent behavior). Note that in the combination of AAPA, Achilles, and Marshall, the counter counts down from an initial maximum to zero as SFUs are allocated, such that the counter records a total number of SFUs that are free; therefore, the combination of AAPA, Achilles, Marshall, and Ashfield, in implementing functionally equivalent behavior, entails a counter counting up from zero to a maximum number as SFUs are allocated, such that the counter records a total number of SFUs that are in use, and such that the workload can be distributed to the SFUs responsive to the total number being less than a determined threshold, or the workload can be distributed to the execution resource responsive to the total number being greater than or equal to the determined threshold.

Consider claim 42, the overall combination entails the system of claim 40 (see above), wherein the one or more processor hardware circuitry are further to facilitate the message gateway to reply with a status notification message indicating the workload distribution status of the SFUs (Achilles, [0045], lines 1-9, in operation, the background monitoring task 52 (running in kernel space) periodically measures the status of the special purpose accelerator 58. One or more Boolean values are written to the shared memory 54. During runtime, the shim redirection layer 64 in the library function 62 reads the Boolean values in the shared memory in making a determination of whether to run the library call task on the general purpose processor (i.e. in software) or on the special purpose accelerator (i.e. in hardware); in other words, the recited status notification message corresponds to the read data of the shared memory, and the message gateway corresponds to the shared memory; AAPA, [00177], lines 2-3, communication between EUs and shared pipelines is accomplished using packets of information called messages; see the citations in the rejection of the independent claim regarding how the overall combination teaches the workload distribution status limitation).

Consider claim 43, the overall combination entails the system of claim 42 (see above), wherein the one or more processor hardware circuitry are further to facilitate the SFUs to process one or more of the workloads responsive to the workload distribution status indicating that the workload can be distributed to the SFUs (Achilles, [0041], lines 1-6, the mechanism generates one or more Boolean values (i.e. true/false values) as a result of the comparison of the current queue status to the one or more thresholds. Boolean examples include "overutilized" or not, system is busy or idle, etc. The resulting Boolean results are stored in shared memory for access by the shim redirection layer; AAPA, [0005], line 4, SFUs; see the citations in the rejection of the independent claim regarding how the overall combination teaches the workload distribution status limitation).

Consider claim 44, the overall combination entails the system of claim 43 (see above), wherein the one or more processor hardware circuitry are further to direct the one or more of the workloads to the execution resource for processing responsive to the workload distribution status indicating that the workload can be distributed to the execution resource (Achilles, [0041], lines 1-6, the mechanism generates one or more Boolean values (i.e. true/false values) as a result of the comparison of the current queue status to the one or more thresholds. Boolean examples include "overutilized" or not, system is busy or idle, etc. The resulting Boolean results are stored in shared memory for access by the shim redirection layer; [0037], lines 1-3, the advantage provided by the accelerator utilization improvement mechanism over using accelerators on their own is in throughput, latency and quality; AAPA, [00177], line 2, EU; see the citations in the rejection of the independent claim regarding how the overall combination teaches the workload distribution status limitation).

Consider claim 45, the overall combination entails the system of claim 40 (see above), wherein the one or more processor hardware circuitry are further to generate a workload message to be communicated to a message register file, and deliver the workload message to the SFUs via the message register file, and facilitate the SFUs to process the workload message, wherein the workload message comprises the workload or a request for processing the workload (AAPA, [00177], FIG. 8D illustrates a conventional transaction 880 sequence of message flow between EU and shared function pipeline. As illustrated, communication between EUs and shared pipelines is accomplished using packets of information called messages. Message transmission is requested via send instructions. There are four basic phases of a message's lifetime, such as: 1) message creation, where an EU thread at instruction pipeline 881 assembles a message payload that contains message descriptor, input data, etc., and put it into message register file (MRF) 883; 2) message delivery, where EU thread issues the message for delivery from MRF 883 to SFU 885 via send instructions; 3) message processing, where the target SFU 885 receives the message and services accordingly; and 4) writing back response, where once the processing has complete, SFU 865 sends output data to the EU thread's general register file (GRF) in response to the message.)

Claims 26, 33, and 39 is/are rejected under 35 U.S.C. 103 as being unpatentable over AAPA, Achilles, Marshall, and Ashfield as applied to claims 20, 27, and 34 above, and further in view of Kim et al. (Kim) (US 20150112997 A1).
Consider claim 26, the combination thus far discloses the apparatus of claim 20 (see above), but does not explicitly disclose the graphics processor is co-located with an application processor on a common semiconductor package.
On the other hand, Kim explicitly discloses a graphics processor (Kim, [0046], line 29, the processor 120 may further include a graphics processor) is co-located with an application processor (Kim, [0046], lines 1-2, application processor) on a common semiconductor package (Kim, [0046], lines 3-5, the AP and the CP may be included in the processor 120 or may be respectively included in different IC packages; in other words, that which is included in the processor 120 is in a same IC package). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention for the graphics processor of the combination of AAPA, Achilles, Marshall, and Ashfield to be co-located with an application processor on a common semiconductor package, as taught by Kim, in order to increase performance (e.g., latency between the application processor and graphics processor) relative to the graphics processor being on a different semiconductor package than an application processor. Alternatively, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention for the graphics processor of the combination of AAPA, Achilles, Marshall, and Ashfield to be co-located with an application processor on a common semiconductor package, as taught by Kim, as this modification merely entails a combination of prior art elements (a graphics processor of the combination of AAPA, Achilles, Marshall, and Ashfield and the application processor of Kim) according to known methods (Kim’s teaching of co-locating a graphics processor and an application processor on a common semiconductor package) to yield predictable results (a graphics processor being co-located with an application processor on a common semiconductor package), which is an exemplary rationale that may support a conclusion of obviousness, as per MPEP 2143.

Consider claim 33, the combination thus far discloses the method of claim 27 (see above), but does not explicitly disclose the graphics processor is co-located with an application processor on a common semiconductor package.
On the other hand, Kim explicitly discloses a graphics processor (Kim, [0046], line 29, the processor 120 may further include a graphics processor) is co-located with an application processor (Kim, [0046], lines 1-2, application processor) on a common semiconductor package (Kim, [0046], lines 3-5, the AP and the CP may be included in the processor 120 or may be respectively included in different IC packages; in other words, that which is included in the processor 120 is in a same IC package). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention for the graphics processor of the combination of AAPA, Achilles, Marshall, and Ashfield to be co-located with an application processor on a common semiconductor package, as taught by Kim, in order to increase performance (e.g., latency between the application processor and graphics processor) relative to the graphics processor being on a different semiconductor package than an application processor. Alternatively, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention for the graphics processor of the combination of AAPA, Achilles, Marshall, and Ashfield to be co-located with an application processor on a common semiconductor package, as taught by Kim, as this modification merely entails a combination of prior art elements (a graphics processor of the combination of AAPA, Achilles, Marshall, and Ashfield, and the application processor of Kim) according to known methods (Kim’s teaching of co-locating a graphics processor and an application processor on a common semiconductor package) to yield predictable results (a graphics processor being co-located with an application processor on a common semiconductor package), which is an exemplary rationale that may support a conclusion of obviousness, as per MPEP 2143.

Consider claim 39, the combination thus far discloses the non-transitory machine-readable medium of claim 34 (see above), wherein the operations further comprise: generating a workload message to be communicated to a message register file; delivering the workload message to the SFUs via the message register file; and facilitating the SFUs to process the workload message, wherein the workload message comprises the workload or a request for processing the workload (AAPA, [00177], FIG. 8D illustrates a conventional transaction 880 sequence of message flow between EU and shared function pipeline. As illustrated, communication between EUs and shared pipelines is accomplished using packets of information called messages. Message transmission is requested via send instructions. There are four basic phases of a message's lifetime, such as: 1) message creation, where an EU thread at instruction pipeline 881 assembles a message payload that contains message descriptor, input data, etc., and put it into message register file (MRF) 883; 2) message delivery, where EU thread issues the message for delivery from MRF 883 to SFU 885 via send instructions; 3) message processing, where the target SFU 885 receives the message and services accordingly; and 4) writing back response, where once the processing has complete, SFU 865 sends output data to the EU thread's general register file (GRF) in response to the message.)
However, the combination thus far does not explicitly disclose the graphics processor is co-located with an application processor on a common semiconductor package.
On the other hand, Kim explicitly discloses a graphics processor (Kim, [0046], line 29, the processor 120 may further include a graphics processor) is co-located with an application processor (Kim, [0046], lines 1-2, application processor) on a common semiconductor package (Kim, [0046], lines 3-5, the AP and the CP may be included in the processor 120 or may be respectively included in different IC packages; in other words, that which is included in the processor 120 is in a same IC package). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention for the graphics processor of the combination of AAPA, Achilles, Marshall, and Ashfield to be co-located with an application processor on a common semiconductor package, as taught by Kim, in order to increase performance (e.g., latency between the application processor and graphics processor) relative to the graphics processor being on a different semiconductor package than an application processor. Alternatively, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention for the graphics processor of the combination of AAPA, Achilles, Marshall, and Ashfield to be co-located with an application processor on a common semiconductor package, as taught by Kim, as this modification merely entails a combination of prior art elements (a graphics processor of the combination of AAPA, Achilles, Marshall, and Ashfield, and the application processor of Kim) according to known methods (Kim’s teaching of co-locating a graphics processor and an application processor on a common semiconductor package) to yield predictable results (a graphics processor being co-located with an application processor on a common semiconductor package), which is an exemplary rationale that may support a conclusion of obviousness, as per MPEP 2143.

Response to Arguments
Applicant on page 9 argues: “Without any concessions regarding the subject matter of this rejection, the claims have been amended to address the noted §112 claim, first and second paragraph, rejections.” Applicant on page 11 argues: “Accordingly, Applicants respectfully request the withdrawal of the 112 rejection of claims 20, 22-27, 29-34, 36-40, and 42-45.”
In view of the aforementioned amendments, the previously presented rejections under 35 USC 112 are withdrawn.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to KEITH E VICARY whose telephone number is (571)270-1314. The examiner can normally be reached Monday to Friday, 9:00 AM to 5:00 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jyoti Mehta can be reached on (571)270-3995. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/KEITH E VICARY/            Primary Examiner, Art Unit 2182