DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claims 1, 3-11, and 13-20 are presented for examination.

Continued Examination under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on May 9, 2022 has been entered.

Claim Rejections - 35 USC § 112
The text of those sections of Title 35, U.S. Code not included in this action can be found in a prior Office action.
Claims 11 and 13-20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claim 11 recites the limitations "the maximum number", “the reference elements”, and “the computation circuit”.  There is insufficient antecedent basis for these limitations in the claim.  Note also that “a maximum number” appears to be recited after the first recitation of “the maximum number”.  For purposes of examination, these will be construed as referring to the same thing.
All claims dependent on a claim rejected hereunder are also rejected for being dependent on a rejected base claim.

Claim Rejections - 35 USC § 103
Claims 1, 3-5, 11, and 13-15 are rejected under 35 U.S.C. 103 as being unpatentable over Desai et al. (US 20030014457) (“Desai”) in view of Muralimanohar et al. (US 20190034201) (“Muralimanohar”) and further in view of Sazegari (US 6901422) (“Sazegari”).
Regarding claim 1, Desai teaches “[a]n apparatus for neural network processing, comprising: 
a computation circuit configured to perform operations between two vectors in accordance with one or more instructions (vector registers are partitioned into four 32-bit operands, eight 16-bit operands, sixteen 8-bit operands, and so forth; vector arithmetic operations including addition, subtraction, multiplication, division, etc. may be performed on operands of various bit-lengths in a single clock cycle – Desai, paragraph 17; see also Fig. 1 [two vector registers]) …; [and]
 a data input/output (I/O) circuit configured to:
receive … data formatted in a first vector and a second vector (vector registers are partitioned into various size slices depending on the precision of the operands [so data would be received in these registers] – Desai, paragraph 17; see also Fig. 1), 
wherein the first vector includes multiple first elements and each of the multiple first elements is formatted as multiple bits (vector registers are partitioned into four 32-bit operands, eight 16-bit operands, sixteen 8-bit operands, and so forth; vector data consist of several related elements of data, a group of which forms a vector – Desai, paragraph 17 [such that each operand is an element formatted as multiple bits, e.g., 32 bits, 16 bits, etc.]), and 
wherein the second vector includes multiple second elements and each of the multiple second elements is formatted as multiple bits (vector registers are partitioned into four 32-bit operands, eight 16-bit operands, sixteen 8-bit operands, and so forth; vector data consist of several related elements of data, a group of which forms a vector – Desai, paragraph 17 [such that each operand is an element formatted as multiple bits, e.g., 32 bits, 16 bits, etc.])….”
Desai appears not to disclose explicitly the further limitations of the claim.  However, Muralimanohar discloses “receiv[ing] neural network data formatted in a first vector and a second vector (memristor crossbar arrays store input weights for convolutional neural networks – Muralimanohar, paragraph 8; dot product is performed between the input vector [first vector] and a stored vector [second vector] stored in the memory array – id. at paragraph 43)….”
Muralimanohar and the instant application are both in the field of hardware processing of vectors and are analogous.  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Desai to perform the operations on neural network data, as disclosed by Muralimanohar, and an ordinary artisan could reasonably have expected to do so successfully.  Doing so would provide an optimized architecture in which to perform neural network operations involving large numbers of linear algebra calculations in parallel.  See Muralimanohar, paragraphs 1, 8.
Neither Desai nor Muralimanohar appears to disclose explicitly the further limitations of the claim.  However, Sazegari discloses that “a maximum capability of the computation circuit is to process … two vectors each having a maximum number of reference elements (principle of calculating the product of two matrices using partial products can be extended to matrices of greater sizes; such matrices may be so large that their rows cannot be contained within a single vector register; in such cases, the product of the two matrices may be determined by treating each matrix as a series of smaller matrices [i.e., the computation circuit may only process two matrices divided up into vector registers each having at most the maximum number of elements that will fit in the register] – Sazegari, col. 11, ll. 23-42; see also Figs. 10-11); ... [and the system]
determine[s] that at least one of a count of the first elements or a count of the second elements is greater than the maximum number of reference elements to determine that at least one of the first vector and the second vector exceeds the maximum capability of the computation circuit (principle of calculating the product of two matrices using partial products can be extended to matrices of greater sizes; such matrices may be so large that their rows cannot be contained within a single vector register; in such cases, the product of the two matrices may be determined by treating each matrix as a series of smaller matrices [maximum capability/maximum number of reference elements = maximum number of elements of each row that can be contained within a single vector register, which for a sufficiently large matrix is smaller than the row dimension of the matrix [count of the first/second elements]] – Sazegari, col. 11, ll. 23-42; see also Figs. 10-11); and [contains]
a data adjustment circuit (vector processor 102 of Fig. 1 of Sazegari) configured to:
divide the first elements of the first vector into one or more first segments and divide the second elements of the second vector into one or more second segments based on the determination that at least one of the first vector and the second vector exceeds the maximum capability of the computation circuit (principle of calculating the product of two matrices using partial products can be extended to matrices of greater sizes; such matrices may be so large that their rows cannot be contained within a single vector register; in such cases, the product of the two matrices may be determined by treating each matrix as a series of smaller matrices; in this manner, for example, any large matrix may be divided into 4 x 4 submatrices – Sazegari, col. 11, ll. 23-42; see also Fig. 11 (showing a 4 x 4 matrix being divided into four 2 x 2 submatrices, each of which is stored in two vector registers [so here, for instance, the first four-element row vector is divided into two two-element row vectors based on the determination that the vector register holds no more than two elements])), wherein a count of elements in each of the first segments and the second segments is equal to or less than the maximum number of the reference elements (principle of calculating the product of two matrices using partial products can be extended to matrices of greater sizes; such matrices may be so large that their rows cannot be contained within a single vector register; in such cases, the product of the two matrices may be determined by treating each matrix as a series of smaller matrices; in this manner, for example, any large matrix may be divided into 4 x 4 submatrices – Sazegari, col. 11, ll. 23-42; see also Fig. 11 (showing that, for instance, elements x00 and x01 fit inside vector register 1102 – i.e., the number of elements in the first two-element segment of the first row is equal to the maximum number 2 of elements each vector register can hold)); and
transmit the one or more first segments and the one or more second segments to the computation circuit (Sazegari Fig. 11A, for example, shows the duplicated contents of vector registers 1102 and 1104 being dot multiplied with the contents of the vector registers 1118, 1120, 1122, and 1124 and the results being summed to add to the vector registers of submatrix I of resultant matrix Z [computation circuit = circuit for performing the adding and multiplying functions]; see also col. 13, l. 62-col. 14, l. 21 (detailing the mechanics of the operations between the vector registers)), wherein the computation circuit is configured to respectively and sequentially perform the operations between the one or more first segments and the one or more second segments (Sazegari col. 13, l. 62-col. 15, l. 12 and Figs. 11A-D detail a process by which vector register contents corresponding to segments of matrix X are dot multiplied by their counterparts in matrix Y to generate results for placement in the vector registers of matrix Z; note that, for instance, in Fig. 11A, the resultant vector registers 1134-1140 are designated with “a” to indicate that they do not represent the final values of the vector registers of matrix Z and that the operations shown in Fig. 11B must also be performed for the final result to be obtained (i.e., the operations occur sequentially)).”
Sazegari and the instant application both relate to the performance of operations on vector segments and are analogous.  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Desai and Muralimanohar to divide the vectors into segments based on the maximum capability of the circuit and perform operations on the segments, as disclosed by Sazegari, and an ordinary artisan could reasonably expect to have done so successfully.  Doing so would allow the system to perform calculations on arbitrarily large vectors without regard to memory limitations on the number of vector elements that can be stored at once.  See Sazegari, col. 11, ll. 23-43.

Regarding claim 3, Desai, as modified by Muralimanohar and Sazegari, teaches that “the data adjustment circuit is configured to transmit one of the first segments and one of the second segments as a pair to the computation circuit each time (vector registers are partitioned into four 32-bit operands, eight 16-bit operands, sixteen 8-bit operands, and so forth; vector arithmetic operations including addition, subtraction, multiplication, division, etc. may be performed on operands of various bit-lengths in a single clock cycle [implying that two of the operands are transmitted to the arithmetic operator in the same cycle] – Desai, paragraph 17; see also Fig. 1 [two arrows going toward the vector arithmetic operation]).”

Regarding claim 4, Desai, as modified by Muralimanohar and Sazegari, teaches that “the computation circuit includes at least one of one or more addition processors, one or more subtraction processors, one or more logical conjunction processors, or one or more dot product processors (vector arithmetic operations including addition, subtraction, multiplication, division, etc. are performed on operands of various bit lengths in a single cycle – Desai, paragraph 17).”

Regarding claim 5, Desai, as modified by Muralimanohar and Sazegari, teaches that “each of the first elements and the second elements is a value represented in a predetermined number of bits (vector registers are partitioned into four 32-bit operands, eight 16-bit operands, sixteen 8-bit operands, and so forth – Desai, paragraph 17).” 

Regarding claim 11, Desai teaches “[a] method for neural network processing, comprising: 
receiving, by a data I/O circuit, … data formatted in a first vector and a second vector (vector registers are partitioned into various size slices depending on the precision of the operands [so data would be received in these registers] – Desai, paragraph 17; see also Fig. 1 [two vector registers]),
wherein the first vector includes multiple first elements and each of the multiple first elements is formatted as multiple bits (vector registers are partitioned into four 32-bit operands, eight 16-bit operands, sixteen 8-bit operands, and so forth; vector data consist of several related elements of data, a group of which forms a vector – Desai, paragraph 17 [such that each operand is an element formatted as multiple bits, e.g., 32 bits, 16 bits, etc.]), and 
wherein the second vector includes multiple second elements and each of the second elements is formatted as multiple bits (vector registers are partitioned into four 32-bit operands, eight 16-bit operands, sixteen 8-bit operands, and so forth; vector data consist of several related elements of data, a group of which forms a vector – Desai, paragraph 17 [such that each operand is an element formatted as multiple bits, e.g., 32 bits, 16 bits, etc.])….”
Desai appears not to disclose explicitly the further limitations of the claim.  However, Muralimanohar discloses “receiving, by a data I/O circuit, neural network data formatted in a first vector and a second vector (memristor crossbar arrays store input weights for convolutional neural networks – Muralimanohar, paragraph 8; dot product is performed between the input vector [first vector] and a stored vector [second vector] stored in the memory array – id. at paragraph 43)….”
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Desai to make the input data neural network data, as disclosed by Muralimanohar, and an ordinary artisan could reasonably have expected to do so successfully.  Doing so would provide an optimized architecture in which to perform neural network operations involving large numbers of linear algebra calculations in parallel.  See Muralimanohar, paragraphs 1, 8.
Neither Desai nor Muralimanohar appears to disclose explicitly the further limitations of the claim.  However, Sazegari discloses “determining, by a data I/O circuit, that at least one of a count of the first elements or a count of the second elements is greater than a threshold count to determine that at least one of the first vector and the second vector exceeds the maximum capability of the computation circuit (principle of calculating the product of two matrices using partial products can be extended to matrices of greater sizes; such matrices may be so large that their rows cannot be contained within a single vector register; in such cases, the product of the two matrices may be determined by treating each matrix as a series of smaller matrices [maximum capability/threshold count = maximum number of elements of each row that can be contained within a single vector register, which for a sufficiently large matrix is smaller than the row dimension of the matrix [count of the first/second elements]] – Sazegari, col. 11, ll. 23-42; see also Figs. 10-11); 
dividing, by a data adjustment circuit (vector processor 102 of Sazegari Fig. 1), the first elements of the first vector into one or more first segments and dividing the second elements of the second vector into one or more second segments based on the determination that at least one of the first vector and the second vector exceeds the maximum capability of the computation circuit (principle of calculating the product of two matrices using partial products can be extended to matrices of greater sizes; such matrices may be so large that their rows cannot be contained within a single vector register; in such cases, the product of the two matrices may be determined by treating each matrix as a series of smaller matrices; in this manner, for example, any large matrix may be divided into 4 x 4 submatrices – Sazegari, col. 11, ll. 23-42; see also Fig. 11 (showing a 4 x 4 matrix being divided into four 2 x 2 submatrices, each of which is stored in two vector registers [so here, for instance, the first four-element row vector is divided into two two-element row vectors based on the determination that the vector register holds no more than two elements])), wherein a count of elements in each of the first segments and the second segments is equal to or less than the maximum number of the reference elements (principle of calculating the product of two matrices using partial products can be extended to matrices of greater sizes; such matrices may be so large that their rows cannot be contained within a single vector register; in such cases, the product of the two matrices may be determined by treating each matrix as a series of smaller matrices; in this manner, for example, any large matrix may be divided into 4 x 4 submatrices – Sazegari, col. 11, ll. 23-42; see also Fig. 11 (showing that, for instance, elements x00 and x01 fit inside vector register 1102 – i.e., the number of elements in the first two-element segment of the first row is equal to the maximum number 2 of elements each vector register can hold)); and
transmitting, by the data adjustment circuit, the one or more first segments and the one or more second segments to a computation circuit (Sazegari Fig. 11A, for example, shows the duplicated contents of vector registers 1102 and 1104 being dot multiplied with the contents of the vector registers 1118, 1120, 1122, and 1124 and the results being summed to add to the vector registers of submatrix I of resultant matrix Z [computation circuit = circuit for performing the adding and multiplying functions; operations = addition and multiplication]; see also col. 13, l. 62-col. 14, l. 21 (detailing the mechanics of the operations between the vector registers)), 
wherein the computation circuit is configured to perform operations between two vectors in accordance with one or more instructions (Sazegari Fig. 11A, for example, shows the duplicated contents of vector registers 1102 and 1104 being dot multiplied with the contents of the vector registers 1118, 1120, 1122, and 1124 and the results being summed to add to the vector registers of submatrix I of resultant matrix Z [computation circuit = circuit for performing the adding and multiplying functions; operations = addition and multiplication]; see also col. 13, l. 62-col. 14, l. 21 (detailing the mechanics of the operations between the vector registers)),
wherein the maximum capability of the computation circuit is to process the two vectors having [the] maximum number of reference elements (principle of calculating the product of two matrices using partial products can be extended to matrices of greater sizes; such matrices may be so large that their rows cannot be contained within a single vector register; in such cases, the product of the two matrices may be determined by treating each matrix as a series of smaller matrices [i.e., the computation circuit may only process two matrices divided up into vector registers each having at most the maximum number of elements that will fit in the register] – Sazegari, col. 11, ll. 23-42; see also Figs. 10-11), and
wherein the maximum number of the reference elements is equal to the threshold count (principle of calculating the product of two matrices using partial products can be extended to matrices of greater sizes; such matrices may be so large that their rows cannot be contained within a single vector register; in such cases, the product of the two matrices may be determined by treating each matrix as a series of smaller matrices [threshold count = maximum number of elements of each row that can be contained within a single vector register] – Sazegari, col. 11, ll. 23-42; see also Figs. 10-11); and
respectively and sequentially performing, by the computation circuit, the operations between the one or more first segments and the one or more second segments (Sazegari col. 13, l. 62-col. 15, l. 12 and Figs. 11A-D detail a process by which vector register contents corresponding to segments of matrix X are dot multiplied by their counterparts in matrix Y to generate results for placement in the vector registers of matrix Z; note that, for instance, in Fig. 11A, the resultant vector registers 1134-1140 are designated with “a” to indicate that they do not represent the final values of the vector registers of matrix Z and that the operations shown in Fig. 11B must also be performed for the final result to be obtained (i.e., the operations occur sequentially)).”
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Desai and Muralimanohar to divide the vectors into segments based on the maximum capability of the circuit and perform operations on the segments, as disclosed by Sazegari, and an ordinary artisan could reasonably expect to have done so successfully.  Doing so would allow the system to perform calculations on arbitrarily large vectors without regard to memory limitations on the number of vector elements that can be stored at once.  See Sazegari, col. 11, ll. 23-43.

Regarding claim 13, Desai, as modified by Muralimanohar and Sazegari, teaches that “the transmitting includes transmitting one of the first segments and one of the second segments as a pair to the computation circuit each time (vector registers are partitioned into four 32-bit operands, eight 16-bit operands, sixteen 8-bit operands, and so forth; vector arithmetic operations including addition, subtraction, multiplication, division, etc. may be performed on operands of various bit-lengths in a single clock cycle [implying that two of the operands are transmitted to the arithmetic operator in the same cycle] – Desai, paragraph 17; see also Fig. 1 [two arrows going toward the vector arithmetic operation]).”

Regarding claim 14, Desai, as modified by Muralimanohar and Sazegari, teaches that “the computation circuit includes at least one of one or more vector addition processors, one or more vector subtraction processors, one or more logical conjunction processors, or one or more dot product processors (vector arithmetic operations including addition, subtraction, multiplication, division, etc. are performed on operands of various bit lengths in a single cycle – Desai, paragraph 17).”

Regarding claim 15, Desai, as modified by Muralimanohar and Sazegari, teaches that “each of the first elements and the second elements is a value represented in a predetermined number of bits (vector registers are partitioned into four 32-bit operands, eight 16-bit operands, sixteen 8-bit operands, and so forth – Desai, paragraph 17).” 

Claims 6-10 and 16-20 are rejected under 35 U.S.C. 103 as being unpatentable over Desai in view of Muralimanohar and Sazegari and further in view of Ahsan et al. (US 10331583) (“Ahsan”).
Regarding claim 6, Desai, as modified by Muralimanohar, Sazegari, and Ahsan, discloses “an instruction obtaining circuit configured to obtain the one or more instructions from an instruction storage device (instruction prefetcher fetches instructions from memory and feeds them to an instruction decoder – Ahsan, col. 21, ll. 5-21).”
Ahsan and the instant application are both in the field of computer architecture and are analogous.  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Desai and Muralimanohar to include a circuit for obtaining instructions, as taught by Ahsan, and an ordinary artisan could reasonably have expected to do so successfully.  Doing so would allow the system to store the instructions remotely from the processor, thereby compartmentalizing the memory and processing functions and reducing the computational burden on both the memory and the processor.  See Ahsan, col. 21, ll. 5-21.

Regarding claim 7, Desai, as modified by Muralimanohar, Sazegari, and Ahsan, teaches “a decoding circuit configured to decode each of the one or more instructions into respective one or more micro-instructions (instruction decoder decodes or interprets instructions by decoding a received instruction into micro-instructions (also called uops) – Ahsan, col. 21, ll. 5-21).”
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Desai, Muralimanohar, and Sazegari to include a circuit for decoding instructions, as taught by Ahsan, and an ordinary artisan could reasonably have expected to do so successfully.  Doing so would allow the system to process instructions in smaller portions, thereby reducing the computational burden per operation.  See Ahsan, col. 21, ll. 5-21.

Regarding claim 8, Desai, as modified by Muralimanohar, Sazegari, and Ahsan, teaches “an instruction queue circuit configured to store the one or more micro-instructions (trace cache takes decoded uops and assembles them into program ordered sequences or traces in the uop queue for execution – Ahsan, col. 21, ll. 5-21).”
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Desai, Muralimanohar, and Sazegari to include a circuit for queueing instructions, as taught by Ahsan, and an ordinary artisan could reasonably have expected to do so successfully.  Doing so would allow the micro-instructions to be fed into the processor sequentially for execution, thereby ensuring good program flow control.  See Ahsan, col. 21, ll. 5-21.

Regarding claim 9, Desai, as modified by Muralimanohar, Sazegari, and Ahsan, teaches “a dependency processing circuit configured to determine whether at least one of the one or more instructions has a dependency relationship with a previously received instruction (uop schedulers [dependency processing circuits] determine when a uop is ready to execute based on the readiness of their dependent input register operand sources and the availability of the execution resources the uops need to complete their execution – Ahsan, col. 21, ll. 40-61; see also id. at col. 22, ll. 49-67 (uops schedulers dispatch dependent operations before the parent load has finished scheduling); Fig. 8).”
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Desai, Muralimanohar, and Sazegari to include a circuit for dependency processing, as taught by Ahsan, and an ordinary artisan could reasonably have expected to do so successfully.  Doing so would allow for better program flow control by ensuring that instructions do not execute prior to the instructions on which they are dependent.  See Ahsan, col. 21, ll. 40-61 and col. 22, ll. 49-67.

Regarding claim 10, Desai, as modified by Muralimanohar, Sazegari, and Ahsan, teaches “a storage queue circuit configured to store the one or more instructions while the dependency processing circuit is determining an existence of the dependency relationship (instruction decoder decodes or interprets instructions by decoding a received instruction into micro-instructions (also called uops) – Ahsan, col. 21, ll. 5-21; uop schedulers [dependency processing circuits] determine when a uop is ready to execute based on the readiness of their dependent input register operand sources and the availability of the execution resources the uops need to complete their execution – Ahsan, col. 21, ll. 40-61 [so the uop scheduler stores the data while a dependency relationship is being determined]).”
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Desai, Muralimanohar, and Sazegari to include a circuit for storage queueing, as taught by Ahsan, and an ordinary artisan could reasonably have expected to do so successfully.  Doing so would allow for better program flow control by ensuring that instructions do not execute prior to the instructions on which they are dependent.  See Ahsan, col. 21, ll. 40-61 and col. 22, ll. 49-67.

Regarding claim 16, Desai, as modified by Muralimanohar, Sazegari, and Ahsan, discloses “obtaining, by an instruction obtaining circuit, the one or more instructions from an instruction storage device (instruction prefetcher [instruction obtaining circuit] fetches instructions from memory [instruction storage device] and feeds them to an instruction decoder – Ahsan, col. 21, ll. 5-21).”
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Desai, Muralimanohar, and Sazegari to include a circuit for obtaining instructions, as taught by Ahsan, and an ordinary artisan could reasonably have expected to do so successfully.  Doing so would allow the system to store the instructions remotely from the processor, thereby compartmentalizing the memory and processing functions and reducing the computational burden on both the memory and the processor.  See Ahsan, col. 21, ll. 5-21.

Regarding claim 17, Desai, as modified by Muralimanohar, Sazegari, and Ahsan, teaches “decoding, by a decoding circuit, each of the one or more instructions into respective one or more micro-instructions (instruction decoder [decoding circuit] decodes or interprets instructions by decoding a received instruction into micro-instructions (also called uops) – Ahsan, col. 21, ll. 5-21).”
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Desai, Muralimanohar, and Sazegari to include a circuit for decoding instructions, as taught by Ahsan, and an ordinary artisan could reasonably have expected to do so successfully.  Doing so would allow the system to process instructions in smaller portions, thereby reducing the computational burden per operation.  See Ahsan, col. 21, ll. 5-21.

Regarding claim 18, Desai, as modified by Muralimanohar, Sazegari, and Ahsan, teaches “storing, by an instruction queue circuit, the one or more micro-instructions (trace cache takes decoded uops and assembles them into program ordered sequences or traces in the uop queue for execution – Ahsan, col. 21, ll. 5-21).”
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Desai, Muralimanohar, and Sazegari to include a circuit for queueing instructions, as taught by Ahsan, and an ordinary artisan could reasonably have expected to do so successfully.  Doing so would allow the micro-instructions to be fed into the processor sequentially for execution, thereby ensuring good program flow control.  See Ahsan, col. 21, ll. 5-21.

Regarding claim 19, Desai, as modified by Muralimanohar, Sazegari, and Ahsan, teaches “determining, by a dependency processing circuit, whether at least one of the one or more instructions has a dependency relationship with a previously received instruction (uop schedulers [dependency processing circuits] determine when a uop is ready to execute based on the readiness of their dependent input register operand sources and the availability of the execution resources the uops need to complete their execution – Ahsan, col. 21, ll. 40-61; see also id. at col. 22, ll. 49-67 (uops schedulers dispatch dependent operations before the parent load has finished scheduling); Fig. 8).”
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Desai, Muralimanohar, and Sazegari to include a circuit for dependency processing, as taught by Ahsan, and an ordinary artisan could reasonably have expected to do so successfully.  Doing so would allow for better program flow control by ensuring that instructions do not execute prior to the instructions on which they are dependent.  See Ahsan, col. 21, ll. 40-61 and col. 22, ll. 49-67.

Regarding claim 20, Desai, as modified by Muralimanohar, Ahsan, and Sazegari, teaches “storing, by a storage queue circuit, the one or more instructions while the dependency processing circuit is determining an existence of the dependency relationship (instruction decoder decodes or interprets instructions by decoding a received instruction into micro-instructions (also called uops) – Ahsan, col. 21, ll. 5-21; uop schedulers [dependency processing circuits] determine when a uop is ready to execute based on the readiness of their dependent input register operand sources and the availability of the execution resources the uops need to complete their execution – Ahsan, col. 21, ll. 40-61 [so the uop scheduler stores the data while a dependency relationship is being determined]).”
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Desai, Muralimanohar, and Sazegari to include a circuit for storage queueing, as taught by Ahsan, and an ordinary artisan could reasonably have expected to do so successfully.  Doing so would allow for better program flow control by ensuring that instructions do not execute prior to the instructions on which they are dependent.  See Ahsan, col. 21, ll. 40-61 and col. 22, ll. 49-67.

Response to Arguments
Applicant's arguments filed March 4, 2022 (“Remarks”) have been fully considered but they are, to the extent not rendered moot by the modification of the ground of rejection, not persuasive.
Applicant’s arguments were largely responded to in the Advisory Action of March 18, 2022 (“Advisory Action”), and those remarks, to the extent not rendered moot by the modification of the ground of rejection, are incorporated into this response by this reference.  Applicant’s argument that the multi-bit elements are not divided into segments in Desai, Remarks at 10, is rendered moot by the introduction of Sazegari into the rejection.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to RYAN C VAUGHN whose telephone number is (571)272-4849.  The examiner can normally be reached on M-R 7:50a-5:50p ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kamran Afshar, can be reached at 571-272-7796.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/RYAN C VAUGHN/             Examiner, Art Unit 2125