DETAILED ACTION
Claims 1-20 are pending.
The office acknowledges the following papers:
Claims and remarks filed on 8/10/2022.

	Withdrawn objections and rejections
The specification objection has been withdrawn.
The 35 U.S.C. 101 rejection for claim 20 has been withdrawn due to amendment.

New Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.

Claims 1-7, 9, 18, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Kesiraju et al. (U.S. 2022/0083343), in view of Krig (U.S. 2014/0136816), in view of Official Notice.
As per claim 1:
Kesiraju and Krig disclosed a system comprising: 
a memory (Kesiraju: Figure 11 element 158, paragraph 64); and 
a processing device operatively coupled to the memory (Kesiraju: Figure 11 element 152, paragraph 62), wherein the processing device comprises: 
a vector arithmetic logic unit comprising a plurality of arithmetic logic units (ALUs) (Kesiraju: Figure 10 elements 126-128, paragraphs 59-60)(The coprocessor PEs include MAC circuitry for execution of vector MAC operations. Official notice is given that vector execution elements can include logical execution circuitry for the advantage of performing logical and permutation operations on vector operands. Thus, it would have been obvious to one of ordinary skill in the art that the coprocessor PEs (i.e. ALUs) include MAC, logical, and permutation circuitry.); and 
a first processor core operatively coupled to the vector arithmetic logic unit (Kesiraju: Figure 6 elements 10A-N and 14, paragraphs 47-48), the processing device to:
identify one or more first requested ALU operations in a first ALU operation queue (Kesiraju: Figure 10 elements 116 and 122, paragraph 61)(The scheduler identifies read operations in the queue to schedule.), wherein the first queued ALU operations are generated from one or more first vector instructions received from a first processor core (Kesiraju: Figures 1, 6, and 10 elements 10A-N, 112-116, and 122, paragraphs 58-61)(The instruction buffer receives coprocessor instructions from processors. The decoder generates decoded vector operations that are added to the Op Queue.);
identify one or more first ALU operations from the first requested ALU operations, wherein the first ALU operations are identified in view of one or more allocation criteria (Krig: Figures 4B and 5 elements 404, 422-424, and 502, paragraphs 20, 37, and 57)(Kesiraju: Figures 1, 6, and 10 elements 12, 60, 122, and 126, paragraphs 23, 48, and 60-61)(Krig disclosed dynamically configuring the width of a processor at execution time. Kesiraju disclosed implementing multiple coprocessor pipelines for concurrent coprocessor execution. Kesiraju disclosed using up to all PEs for coprocessor vector instructions, based on the vector operand sizes. The combination allows for dynamically allocating a number of PEs for a coprocessor instruction based on the vector operand sizes. Ready operations in the Op Queue (i.e. identifying and allocation criteria) are allocated a number of PEs based on vector operand sizes of the coprocessor instruction.), wherein the allocation criteria specify that a particular ALU operation is included in the first ALU operations if the particular ALU operation has been deferred from a previous clock cycle (Kesiraju: Figure 10 elements 116 and 122, paragraphs 60-61)(The scheduler selects ready operations for execution, including operations that were previously ready in previous clock cycles.); and
execute, using a first subset of the plurality of ALUs, the first ALU operations, wherein the vector arithmetic logic unit executes the first ALU operations in parallel with one or more second ALU operations specified by a second vector instruction received from a second processor core (Krig: Figures 4B and 5 elements 404, 422-424, and 502-506, paragraphs 20, 37, and 57)(Kesiraju: Figures 1, 6, and 10 elements 12, 60, 122, and 126, paragraphs 23, 48, and 60-61)(Krig disclosed dynamically configuring the width of a processor at execution time. Kesiraju disclosed implementing multiple coprocessor pipelines for concurrent coprocessor execution. Kesiraju disclosed using up to all PEs for coprocessor vector instructions, based on the vector operand sizes. The combination allows for dynamically allocating a number of PEs for a coprocessor instruction based on the vector operand sizes. The combination allows for concurrent vector instruction execution for instructions from multiple processors.).
The advantage of dynamically allocating SIMD execution lanes to SIMD instructions at runtime is that all available SIMD lanes can be allocated for parallel instruction execution, which results in improved performance. Thus, it would have been obvious to one of ordinary skill in the art at the time of the effective filing date to implement dynamically allocating SIMD execution lanes to different threads with vector instructions for the above advantage.
As per claim 2:
Kesiraju and Krig disclosed the system of claim 1, wherein to execute the first ALU operations in parallel with the second ALU operations, the processing device is further to: 
execute, using a second subset of the plurality of ALUs, the second ALU operations in a same clock cycle as the first ALU operations (Krig: Figures 4B and 5 elements 404, 422-424, and 502-506, paragraphs 20, 37, and 57)(Kesiraju: Figures 1, 6, and 10 elements 12, 60, 122, and 126, paragraphs 23, 48, and 60-61)(The combination allows for dynamically allocating a number of PEs for a coprocessor instruction based on the vector operand sizes. The combination allows for concurrent vector instruction execution for instructions from multiple processors. Official notice is given that instructions executing concurrently are performed in the same clock cycle for the advantage of increased performance and synchronized execution. Thus, it would have been obvious to one of ordinary skill in the art that execution of concurrent thread vector instructions occurs in the same clock cycle.).
As per claim 3:
Kesiraju and Krig disclosed the system of claim 2, wherein the processing device executes the first subset of ALU operations in a particular clock cycle of the vector arithmetic logic unit, and to execute the second subset of ALU operations in the same clock cycle as the first subset of ALU operations, the processing device is further to: 
execute the second subset of ALU operations in the particular clock cycle (Krig: Figures 4B and 5 elements 404, 422-424, and 502-506, paragraphs 20, 37, and 57)(Kesiraju: Figures 1, 6, and 10 elements 12, 60, 122, and 126, paragraphs 23, 48, and 60-61)(The combination allows for dynamically allocating a number of PEs for a coprocessor instruction based on the vector operand sizes. The combination allows for concurrent vector instruction execution for instructions from multiple processors. In view of the above official notice, the vector instructions are executed in the same clock cycle.).
As per claim 4:
Kesiraju and Krig disclosed the system of claim 2, wherein the one or more first vector instructions specify at least one first input vector having a first vector length, the second vector instruction specifies at least one second input vector having a second vector length (Kesiraju: Figure 6 and 10 elements 14, 18A-N, and 120, paragraph 60)(The coprocessor vector instructions specify source input vectors with operand sizes.), and the processing device further comprises the second processor core, wherein the second processor core is operatively coupled to the vector arithmetic logic unit, and the processing device is further to: 
identify the second subset of the plurality of ALUs in view of the second vector length and the allocation criteria (Krig: Figures 4B and 5 elements 404, 422-424, and 502, paragraphs 20, 37, and 57)(Kesiraju: Figures 1, 6, and 10 elements 12, 60, 122, and 126, paragraphs 23, 48, and 60-61)(Krig disclosed dynamically configuring the width of a processor at execution time. Kesiraju disclosed implementing multiple coprocessor pipelines for concurrent coprocessor execution. Kesiraju disclosed using up to all PEs for coprocessor vector instructions, based on the vector operand sizes. The combination allows for dynamically allocating a number of PEs for a coprocessor instruction based on the vector operand sizes. Kesiraju also disclosed a coprocessor arbiter that selects coprocessor instructions to input into the coprocessor based on priorities (e.g. allocation criteria). Kesiraju disclosed a scheduler to select ready vector instructions (e.g. allocation criteria).); and 
execute, using the second subset of the plurality of ALUs, the one or more second ALU operations specified by the second vector instruction (Krig: Figures 4B and 5 elements 404, 422-424, and 502-506, paragraphs 20, 37, and 57)(Kesiraju: Figures 1, 6, and 10 elements 12, 60, 122, and 126, paragraphs 23, 48, and 60-61)(The combination allows for concurrent vector instruction execution for instructions from multiple processors. The second vector instruction concurrently executed is executed using a subset of the coprocessor PE array.).
As per claim 5:
Kesiraju and Krig disclosed the system of claim 4, wherein the first subset of the ALUs executes the first ALU operations in a particular clock cycle, and the second subset of the ALUs executes the second ALU operations in the particular clock cycle (Krig: Figures 4B and 5 elements 404, 422-424, and 502-506, paragraphs 20, 37, and 57)(Kesiraju: Figures 1, 6, and 10 elements 12, 60, 122, and 126, paragraphs 23, 48, and 60-61)(The combination allows for dynamically allocating a number of PEs for a coprocessor instruction based on the vector operand sizes. The combination allows for concurrent vector instruction execution for instructions from multiple processors. In view of the above official notice, the vector instructions are executed in the same clock cycle.).
As per claim 6:
Kesiraju and Krig disclosed the system of claim 1, wherein the allocation criteria is further based on a number of deferred ALU operations that have previously been executed (Krig: Figures 4B and 5 elements 404, 422-424, and 502-506, paragraphs 20, 37, and 57)(Kesiraju: Figure 10 elements 116, 122, 126, and 130, paragraphs 60-61)(The combination allows for dynamically allocating a number of PEs for a coprocessor instruction based on the vector operand sizes. The combination allows for concurrent vector instruction execution for instructions from multiple processors. The scheduler selects ready operations for execution, including operations that were previously ready in previous clock cycles. Official notice is given that loops can be used in instruction execution for the advantage of reducing code size. Thus, it would have been obvious to one of ordinary skill in the art that the scheduler can select ready operations within loops that have been previously executed in previous loop iterations.).
As per claim 7:
Kesiraju and Krig disclosed the system of claim 6, wherein the first ALU operations cause the first subset of the ALUs to generate a first output vector in view of a first input vector specified by the one of more first vector instructions (Kesiraju: Figure 10 element 130, paragraph 59), and the processing device is further to: 
provide the first output vector to the first processor core (Kesiraju: Figure 10 element 130, paragraph 59)(Official notice is given that coprocessors return execution results to main processors for the advantage of storage and/or further processing. Thus, it would have been obvious to one of ordinary skill in the art at the time of the effective filing date to implement returning of coprocessor execution results to the main processor.); and 
provide the second output vector to the second processor core (Kesiraju: Figure 10 element 130, paragraph 59)(In view of the above official notice, coprocessor execution results are returned to the primary processors.).
As per claim 9:
Kesiraju and Krig disclosed the system of claim 1, wherein the first subset of the plurality of ALUs comprises one or more ALUs that are available to execute the first subset of ALU operations (Krig: Figures 4B and 5 elements 404, 422-424, and 502-506, paragraphs 20, 37, and 57)(Kesiraju: Figure 10 elements 116, 122, 126, and 130, paragraphs 60-61)(The combination allows for dynamically allocating a number of PEs for a coprocessor instruction based on the vector operand sizes. The combination allows for concurrent vector instruction execution for instructions from multiple processors. The selected PEs are available for execution.).
As per claim 18:
Kesiraju and Krig disclosed the system of claim 1, wherein the first vector instruction is performed on a first number of elements of a first input vector, wherein the first number of elements corresponds to a number of ALUs in the first subset of the plurality of ALUs, and the first vector instruction is performed on each element of the first input vector by a corresponding one of the first subset of the ALUs (Krig: Figures 4B and 5 elements 404, 422-424, and 502-506, paragraphs 20, 37, and 57)(Kesiraju: Figures 1, 6, and 10 elements 12, 60, 122, and 126, paragraphs 23, 48, and 60-61)(The combination allows for dynamically allocating a number of PEs for a coprocessor instruction based on the vector operand sizes. The combination allows for concurrent vector instruction execution for instructions from multiple processors. Each vector instruction is executed using a number of PEs based on the vector operand size and number of data elements.).
As per claim 20:
Claim 20 essentially recites the same limitations of claim 1. Therefore, claim 20 is rejected for the same reasons as claim 1.

Claims 8, 12-17, and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Kesiraju et al. (U.S. 2022/0083343), in view of Krig (U.S. 2014/0136816), in view of Official Notice, further in view of Greathouse et al. (U.S. 2016/0085551).
As per claim 8:
Kesiraju and Krig disclosed the system of claim 1, wherein a first vector length is received from a program executing on the first processor core (Kesiraju: Figure 6 elements 10A-N and 60, paragraphs 47)(The coprocessor receives vector instructions from the processors.).
Kesiraju and Krig failed to teach the first vector length is greater than a number of ALUs in the first subset of the plurality of ALUs, and the processing device is further to: determine, in view of a difference between the first vector length and the number of ALUs in the first subset of the plurality of ALUs, a first allocated vector length; and provide the first allocated vector length to the program.
However, Greathouse combined with Kesiraju and Krig disclosed the first vector length is greater than a number of ALUs in the first subset of the plurality of ALUs (Greathouse: Figure 4 elements 406, 412, and 418, paragraphs 30-32)(Krig: Figures 4B and 5 elements 404, 422-424, and 502, paragraphs 20, 37, and 57)(Kesiraju: Figure 10 elements 120, paragraph 60)(Greathouse disclosed scheduling wide SIMD operations over multiple clock cycles when not enough ALUs are available for execution. Krig disclosed dynamically configuring the width of a processor at execution time. The combination allows for vector operations with greater widths than the execution array to be executed over multiple clock cycles.), and the processing device is further to: 
determine, in view of a difference between the first vector length and the number of ALUs in the first subset of the plurality of ALUs, a first allocated vector length (Greathouse: Figure 4 elements 406, 412, and 418, paragraphs 30-32)(Krig: Figures 4B and 5 elements 404, 422-424, and 502, paragraphs 20, 37, and 57)(Kesiraju: Figure 10 elements 120, paragraph 60)(Greathouse disclosed scheduling wide SIMD operations over multiple clock cycles when not enough ALUs are available for execution. Krig disclosed dynamically configuring the width of a processor at execution time. The combination allows for vector operations with greater widths than the execution array to be executed over multiple clock cycles. An allocation of PEs for the vector operation executed over multiple clock cycles is determined and provided for execution.); and 
provide the first allocated vector length to the program (Greathouse: Figure 4 elements 406, 412, and 418, paragraphs 30-32)(Krig: Figures 4B and 5 elements 404, 422-424, and 502, paragraphs 20, 37, and 57)(Kesiraju: Figure 10 elements 120, paragraph 60)(Greathouse disclosed scheduling wide SIMD operations over multiple clock cycles when not enough ALUs are available for execution. Krig disclosed dynamically configuring the width of a processor at execution time. The combination allows for vector operations with greater widths than the execution array to be executed over multiple clock cycles. An allocation of PEs for the vector operation executed over multiple clock cycles is determined and provided for execution.).
The advantage of executing wide vector operations over multiple clock cycles is that execution hardware can be reduced, which reduced processor costs. Thus, it would have been obvious to one of ordinary skill in the art at the time of the effective filing date to implement multi-cycle execution of large vector operations in Kesiraju for the above advantage.
As per claim 12:
Kesiraju and Krig disclosed the system of claim 1, wherein the processing device is further to:
identify the first subset of the ALUs in view of a first vector length specified by the one or more first vector instructions and one or more allocation criteria, and to identify the first subset of the ALUs (Krig: Figures 4B and 5 elements 404, 422-424, and 502, paragraphs 20, 37, and 57)(Kesiraju: Figures 1, 6, and 10 elements 12, 60, 122, and 126, paragraphs 23, 48, and 60-61)(The combination allows for dynamically allocating a number of PEs for a coprocessor instruction based on the vector operand sizes. Kesiraju also disclosed a coprocessor arbiter that selects coprocessor instructions to input into the coprocessor based on priorities (e.g. allocation criteria). Kesiraju disclosed a scheduler to select ready vector instructions (e.g. allocation criteria).). 
Kesiraju and Krig failed to teach the processing device is further to: determine whether a sum of the first vector length and a second vector length of a second input vector specified by the second vector instruction received from the second processor core is greater than a total number of ALUs of the vector arithmetic logic unit; and responsive to determining that the sum is greater than a total number of ALUs, set the number of ALUs in the first subset to a value less than the first vector length.
However, Greathouse combined with Kesiraju and Krig disclosed the processing device is further to: 
determine whether a sum of the first vector length and a second vector length of a second input vector specified by the second vector instruction received from the second processor core is greater than a total number of ALUs of the vector arithmetic logic unit (Greathouse: Figure 4 elements 406, 412, and 418, paragraphs 30-32)(Krig: Figures 4B and 5 elements 404, 422-424, and 502-506, paragraphs 20, 37, and 57)(Kesiraju: Figure 10 elements 120, paragraph 60)(Greathouse disclosed scheduling wide SIMD operations over multiple clock cycles when not enough ALUs are available for execution. Krig disclosed dynamically configuring the width of a processor at execution time. The combination allows for vector operations with greater widths than the execution array to be executed over multiple clock cycles. The combination allows for concurrent vector instruction execution for instructions from multiple processors, including vector instructions executed over multiple clock cycles.); and 
responsive to determining that the sum is greater than the total number of ALUs, set the number of ALUs in the first subset to a value less than the first vector length (Greathouse: Figure 4 elements 406, 412, and 418, paragraphs 30-32)(Krig: Figures 4B and 5 elements 404, 422-424, and 502, paragraphs 20, 37, and 57)(Kesiraju: Figure 10 elements 120, paragraph 60)(Greathouse disclosed scheduling wide SIMD operations over multiple clock cycles when not enough ALUs are available for execution. Krig disclosed dynamically configuring the width of a processor at execution time. The combination allows for vector operations with greater widths than the execution array to be executed over multiple clock cycles. An allocation of PEs for the vector operation executed over multiple clock cycles is determined and provided for execution.).
The advantage of executing wide vector operations over multiple clock cycles is that execution hardware can be reduced, which reduced processor costs. Thus, it would have been obvious to one of ordinary skill in the art at the time of the effective filing date to implement multi-cycle execution of large vector operations in Kesiraju for the above advantage.
As per claim 13:
Kesiraju, Krig, and Greathouse disclosed the system of claim 12, wherein the allocation criteria comprise a Quality of Service associated with the first vector instruction, the total number of ALUs of the vector arithmetic logic unit, and the first and second vector lengths  (Kesiraju: Figures 6-7 and 10 elements 16A-N, 60-64, 70, and 116-122, paragraphs 47-50 and 60-61)(Kesiraju disclosed assigning threads priority values (i.e. QoS) for a weighted round robin arbiter to send coprocessing instructions from processors to the coprocessor. The priority values are associated with coprocessor instructions within each thread. Coprocessor vector instructions are scheduled for execution based on their thread priority values and sent into the coprocessor for processing.), and 
wherein the value less than the first vector length is determined using a resource allocation model in view of the Quality of Service associated with the first vector instruction (Greathouse: Figure 4 elements 406, 412, and 418, paragraphs 30-32)(Krig: Figures 4B and 5 elements 404, 422-424, and 502, paragraphs 20, 37, and 57)(Kesiraju: Figures 7 and 10 elements 70 and 120, paragraphs 50 and 60)(Greathouse disclosed scheduling wide SIMD operations over multiple clock cycles when not enough ALUs are available for execution. Krig disclosed dynamically configuring the width of a processor at execution time. The combination allows for vector operations with greater widths than the execution array to be executed over multiple clock cycles. An allocation of PEs for the vector operation executed over multiple clock cycles is determined and provided for execution based on thread priority.).
As per claim 14:
Kesiraju, Krig, and Greathouse disclosed the system of claim 12, wherein at least one of the first ALU operations is not allocated to an ALU, and the processing device is further to: 
defer the at least one of the first ALU operations to a subsequent clock cycle (Greathouse: Figure 4 elements 406, 412, and 418, paragraphs 30-32)(Krig: Figures 4B and 5 elements 404, 422-424, and 502, paragraphs 20, 37, and 57)(Kesiraju: Figure 10 elements 120, paragraph 60)(The combination allows for vector operations with greater widths than the execution array to be executed over multiple clock cycles. Vector operations not executed in the first clock cycle are deferred.).
As per claim 15:
Kesiraju, Krig, and Greathouse disclosed the system of claim 14, wherein to defer the at least one of the first ALU operations to the subsequent clock cycle, the processing device is further to include the at least one of the first ALU operations in the first ALU operation queue (Greathouse: Figure 4 elements 406, 412, and 418, paragraphs 30-32)(Krig: Figures 4B and 5 elements 404, 422-424, and 502, paragraphs 20, 37, and 57)(Kesiraju: Figure 10 elements 116 and 120, paragraphs 60-61)(The combination allows for vector operations with greater widths than the execution array to be executed over multiple clock cycles. Vector operations not executed in the first clock cycle are deferred. It would have been obvious to one of ordinary skill in the art that such operations can be held in the Op Queue until finished for the advantage of buffering the remaining operations and not losing them within the processing system.).
As per claim 16:
Kesiraju, Krig, and Greathouse disclosed the system of claim 12, wherein responsive to determining that the sum of the first vector length and the second vector length is greater than the total number of ALUs, the processing device is further to set a number of ALUs in a second subset of the plurality of ALUs to a value less than the second vector length (Greathouse: Figure 4 elements 406, 412, and 418, paragraphs 30-32)(Krig: Figures 4B and 5 elements 404, 422-424, and 502-506, paragraphs 20, 37, and 57)(Kesiraju: Figure 10 elements 120, paragraph 60)(Greathouse disclosed scheduling wide SIMD operations over multiple clock cycles when not enough ALUs are available for execution. Krig disclosed dynamically configuring the width of a processor at execution time. The combination allows for vector operations with greater widths than the execution array to be executed over multiple clock cycles. The combination allows for concurrent vector instruction execution for instructions from multiple processors, including vector instructions executed over multiple clock cycles.).
As per claim 17:
Kesiraju, Krig, and Greathouse disclosed the system of claim 16, wherein the allocation criteria comprise a Quality of Service associated with the second vector instruction (Kesiraju: Figures 6-7 and 10 elements 16A-N, 60-64, 70, and 116-122, paragraphs 47-50 and 60-61)(Kesiraju disclosed assigning threads priority values (i.e. QoS) for a weighted round robin arbiter to send coprocessing instructions from processors to the coprocessor. The priority values are associated with coprocessor instructions within each thread. Coprocessor vector instructions are scheduled for execution based on their thread priority values and being sent into the coprocessor for processing.), and the value less than the second vector length is determined using a resource allocation model in view of the Quality of Service associated with the second vector instruction (Greathouse: Figure 4 elements 406, 412, and 418, paragraphs 30-32)(Krig: Figures 4B and 5 elements 404, 422-424, and 502, paragraphs 20, 37, and 57)(Kesiraju: Figures 7 and 10 elements 70 and 120, paragraphs 50 and 60)(Greathouse disclosed scheduling wide SIMD operations over multiple clock cycles when not enough ALUs are available for execution. Krig disclosed dynamically configuring the width of a processor at execution time. The combination allows for vector operations with greater widths than the execution array to be executed over multiple clock cycles. An allocation of PEs for the vector operation executed over multiple clock cycles is determined and provided for execution based on thread priority.).
As per claim 19:
Claim 19 essentially recites the same limitations of claim 1. Claim 19 additionally recites the following limitations:	
determining that the plurality of ALUs is insufficient to execute the first requested ALU operations (Greathouse: Figure 4 elements 406, 412, and 418, paragraphs 30-32)(Krig: Figures 4B and 5 elements 404, 422-424, and 502, paragraphs 20, 37, and 57)(Kesiraju: Figure 10 elements 120, paragraph 60)(Greathouse disclosed scheduling wide SIMD operations over multiple clock cycles when not enough ALUs are available for execution. Krig disclosed dynamically configuring the width of a processor at execution time. The combination allows for vector operations with greater widths than the execution array to be executed over multiple clock cycles. An allocation of PEs for the vector operation executed over multiple clock cycles is determined and provided for execution.).
The advantage of executing wide vector operations over multiple clock cycles is that execution hardware can be reduced, which reduced processor costs. Thus, it would have been obvious to one of ordinary skill in the art at the time of the effective filing date to implement multi-cycle execution of large vector operations in Kesiraju for the above advantage.

Claims 10-11 are rejected under 35 U.S.C. 103 as being unpatentable over Kesiraju et al. (U.S. 2022/0083343), in view of Krig (U.S. 2014/0136816), in view of Official Notice, further in view of Oshiyama et al. (U.S. 2020/0183684).
As per claim 10:
Kesiraju and Krig disclosed the system of claim 1.
Kesiraju and Krig failed to teach wherein the allocation criteria comprise a specified Quality of Service associated with each of the first requested ALU operations, each specified Quality of Service is one of low, medium, or high, and the allocation criteria further specify that the particular ALU operation is included prior to including another ALU operation having a medium or low quality of service in the first subset if the particular ALU operation has been deferred.
However, Oshiyama combined with Kesiraju and Krig disclosed wherein the allocation criteria comprise a specified Quality of Service associated with each of the first requested ALU operations, each specified Quality of Service is one of low, medium, or high, and the allocation criteria further specify that the particular ALU operation is included prior to including another ALU operation having a medium or low quality of service in the first subset if the particular ALU operation has been deferred (Oshiyama: Figures 6-8 elements 272-280, paragraphs 74-75, 81, and 87)(Kesiraju: Figure 10 elements 116 and 122, paragraph 61)(Kesiraju disclosed a coprocessor arbiter with priorities, but not a coprocessor scheduler with priorities. Oshiyama disclosed an instruction scheduler selecting highest priority ready instructions to output to execution resources. The combination allows for dispatching ready highest priority instructions to the PEs for execution. The variable priority levels read upon the low/medium/high QoS. Ready instructions with the same priority can be deferred multiple clock cycles prior to selection based on PE availability.).
The advantage of using instruction priorities for instruction scheduling is that the highest priority ready instructions are selected first for execution. Thus, it would have been obvious to one of ordinary skill in the art at the time of the effective filing date to implement priority scheduling in Kesiraju for the above advantage.
As per claim 11:
Kesiraju, Krig, and Oshiyama disclosed the system of claim 10, wherein the allocation criteria further specify that the particular ALU operation having a medium quality of service is included prior to including another ALU operation having a medium quality of service if fewer medium quality of service ALU operations have been executed than deferred operations (Oshiyama: Figures 6-8 elements 272-280, paragraphs 74-75, 81, and 87)(Kesiraju: Figure 10 elements 116 and 122, paragraph 61)(Kesiraju disclosed a coprocessor arbiter with priorities, but not a coprocessor scheduler with priorities. Oshiyama disclosed an instruction scheduler selecting highest priority ready instructions to output to execution resources. The combination allows for dispatching ready highest priority instructions to the PEs for execution. Ready instructions with the same priority can be deferred multiple clock cycles prior to selection based on PE availability. This allows for a deferred medium priority ready instruction to be selected over a medium priority not-ready instruction.).

Response to Arguments
The arguments presented by Applicant in the response, received on 8/10/2022 are not considered persuasive.
Applicant argues for claim 1:
“As an example, the combination of Kesiraju and Krig does not teach or suggest identifying one or more first ALU operations from the first requested ALU operations, wherein the first ALU operations are identified in view of one or more allocation criteria, wherein the allocation criteria specify that a particular ALU operation is included in the first ALU operations if the particular ALU operation has been deferred from a previous clock cycle, as claim 1 recites. Applicant has found no mention in Kesiraju, Krig, or Greathouse of allocation criteria that is based on whether a particular ALU operation has been deferred from a previous clock cycle, much less allocation criteria specifying that a particular ALU operation is included in the first ALU operations if the particular ALU operation has been deferred from a previous clock cycle, as claim 1 recites.”  

This argument is not found to be persuasive for the following reason. The scheduler of Kesiraju allows for selection of ready operations for execution. It would have been obvious to one of ordinary skill in the art that with a plurality of processors sharing a single coprocessor, ready coprocessor operations can fill up the Op Queue of Kesiraju. When ready coprocessor operations fill up the Op Queue to the point of not being able to be instantly scheduled, then these ready operations are deferred clock cycle(s) until sufficient PEs are available for execution. Selection of deferred ready operations reads upon the newly claimed allocation criteria. 
Applicant argues regarding the official notices taken:
“In addition, it would not be appropriate for the examiner to take official notice of facts without citing a prior art reference where the facts asserted to be well known are not capable of instant and unquestionable demonstration as being well-known. For example, assertions of technical facts in the areas of esoteric technology or specific knowledge of the prior art must always be supported by citation to some reference work recognized as standard in the pertinent art. In re Ahlert, 424 F.2d at 1091, 165 USPQ at 420-21. See also In re Grose, 592 F.2d 1161, 1167-68, 201 USPQ 57, 63 (CCPA 1979) ("[W]hen the PTO seeks to rely upon a chemical theory, in establishing a prima facie case of obviousness, it must provide evidentiary support for the existence and meaning of that theory."); In re Eynde, 480 F.2d 1364, 1370, 178 USPQ 470, 474 (CCPA 1973) ("[W]e reject the notion that judicial or administrative notice may be taken of the state of the art. The facts constituting the state of the art are normally subject to the possibility of rational disagreement among reasonable men and are not amenable to the taking of such notice."). MPEP 2144.03.
…
It is respectfully submitted that the above assertions are not capable of such instant and unquestionable demonstration as to defy dispute, let alone determine whether such "facts can be instantly and unquestionably demonstrated" as required by MPEP §2144.03. It is respectfully submitted that such "considerations" are not capable of instant and unquestionable demonstration as to defy dispute. For these reasons, the Official Notice in the rejection of Claims 1, 2, 7, 19, and 20 appears to be taken improperly. Applicant submits that these claim limitations are not capable of instant and unquestionable demonstration as being well-known in the art and respectfully requests that the Examiner submit documentary evidence in accordance with the requirements of MPEP 2144.03.”  

This argument is not found to be persuasive for the following reason. MPEP 2144.03 C states “To adequately traverse such a finding, an applicant must specifically point out the supposed errors in the examiner’s action, which would include stating why the noticed fact is not considered to be common knowledge or well-known in the art … A general allegation that the claims define a patentable invention without any reference to the examiner’s assertion of official notice would be inadequate.” Applicant’s response hasn’t included why the noticed fact isn’t considered well-known in the art. Thus, the official notices taken are maintained. 
The official notices taken for vector logical operations, concurrent execution, and coprocessors returning execution results are all very well-known in the art. These concepts capable of instant and unquestionable demonstration. As an example, the CPC subclass G06F9/30036 (SIMD/Vector operation) contains thousands of references that likely mention SIMD/vector logical operations. The examiner has previously worked on dozens of patent applications that solely concern various types of SIMD/vector logical operations. 

	Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
The following is text cited from 37 CFR 1.111(c): In amending in reply to a rejection of claims in an application or patent under reexamination, the applicant or patent owner must clearly point out the patentable novelty which he or she thinks the claims present in view of the state of the art disclosed by the references cited or the objections made. The applicant or patent owner must also show how the amendments avoid such references or objections.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JACOB A. PETRANEK whose telephone number is (571)272-5988.  The examiner can normally be reached on M-F 8:00-4:30.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Aimee Li can be reached on (571) 272-4169.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/JACOB PETRANEK/Primary Examiner, Art Unit 2183