DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claims 1, 3-11, and 13-20 are presented for examination.

Continued Examination under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on April 16, 2021 has been entered.
 	
Information Disclosure Statement
The information disclosure statement filed April 16, 2021 fails to comply with 37 CFR 1.98(a)(3)(i) because it does not include a concise explanation of the relevance, as it is presently understood by the individual designated in 37 CFR 1.56(c) most knowledgeable about the content of the information, of each reference listed that is not in the English language.  It has been placed in the application file, but the information referred to therein has not been considered.

Claim Rejections - 35 USC § 112
The text of those sections of Title 35, U.S. Code not included in this action can be found in a prior Office action.
Claims 11 and 13-20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter 
Claim 11 recites the limitation "the maximum capability" in line 8.  There is insufficient antecedent basis for this limitation in the claim.  Examiner recommends changing this recitation to “a maximum capability” and the subsequent recitation of “a maximum capability” in the sixth line from the bottom to “the maximum capability.”
Claims 13-20 are rejected for dependency on claim 11.

Claim Rejections - 35 USC § 103
Claims 1, 3-5, 11, and 13-15 are rejected under 35 U.S.C. 103 as being unpatentable over Desai et al. (US 20030014457) (“Desai”) in view of Muralimanohar et al. (US 20190034201) (“Muralimanohar”) and further in view of Gary, “Matrix-Vector Multiplication Using Digital Partitioning for More Accurate Optical Computing,” in 31.29 Applied Optics 6205-11 (1992) (“Gary”).
Regarding claim 1, Desai teaches “[a]n apparatus for neural network processing (intended use language with no patentable weight), comprising: 
a computation circuit configured to perform operations between two vectors in accordance with one or more instructions (vector registers are partitioned into four 32-bit operands, eight 16-bit operands, sixteen 8-bit operands, and so forth; vector arithmetic operations including addition, subtraction, multiplication, division, etc. may be performed on operands of various bit-lengths in a single clock cycle – Desai, paragraph 17; see also Fig. 1 [two vector registers]) …
 a data input/output (I/O) circuit configured to:
receive … data formatted in a first vector and a second vector (vector registers are partitioned into various size slices depending on the precision of the operands [so data would be received in these registers] – Desai, paragraph 17; see also Fig. 1), 
vector registers are partitioned into four 32-bit operands, eight 16-bit operands, sixteen 8-bit operands, and so forth – Desai, paragraph 17), and 
wherein the second vector includes multiple second elements (vector registers are partitioned into four 32-bit operands, eight 16-bit operands, sixteen 8-bit operands, and so forth – Desai, paragraph 17), and 
determine that at least one of a count of the first elements or a count of the second elements is greater than [a] maximum number of … reference elements (128-bit ALU is partitioned into units to perform identical and parallel operations by ALU elements on operands; vector registers are partitioned into four 32-bit operands, eight 16-bit operands, sixteen 8-bit operands, and so forth [element maximum number = 128 bits; reference element count = 32 bits, 16 bits, etc.] – Desai, paragraph 17)…; and 
a data adjustment circuit configured to: 
respectively divide the first vector and the second vector into one or more first segments and one or more second segments based on the determination (vector registers of 128-bit ALU are partitioned into four 32-bit operands [segments], eight 16-bit operands, sixteen 8-bit operands, and so forth – Desai, paragraph 17 [ALU determines that, for instance, 128 > 32 and partitions accordingly]), wherein a count of elements in each of the first segments and the second segments is equal to or less than the maximum number of the reference elements (vector registers are partitioned into four 32-bit operands, eight 16-bit operands, sixteen 8-bit operands, and so forth [segment element count = 32 bits, 16 bits, etc.; reference element maximum number = also 32 bits, 16 bits, etc.] – Desai, paragraph 17), and 
transmit the one or more first segments and the one or more second segments to the computation circuit (vector arithmetic operations are performed on operands of various bit-lengths in a single cycle – Desai, paragraph 17 [operands from the vector register are transferred to the arithmetic circuit – see Fig. 1]), wherein the computation circuit is configured to respectively and sequentially perform the operations between the one or more first segments and the one or more second segments (vector registers are partitioned into four 32-bit operands, eight 16-bit operands, sixteen 8-bit operands, and so forth – Desai, paragraph 17; vector arithmetic operations including addition, subtraction, multiplication, division, etc. may be performed on operands of various bit-lengths in a single clock cycle, though the vector processor design if pipelined for higher clock rates may perform the operations within a single instruction cycle (which would typically comprise multiple clock cycles) – id.; see also Fig. 1).”
Muralimanohar discloses “receiv[ing] neural network data formatted in a first vector and a second vector (memristor crossbar arrays store input weights for convolutional neural networks – Muralimanohar, paragraph 8; dot product is performed between the input vector [first vector] and a stored vector [second vector] stored in the memory array – id. at paragraph 43)….”
Desai and Muralimanohar are both in the field of hardware processing of vectors and are analogous.  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Desai to make the input data neural network data, as disclosed by Muralimanohar.  In so doing, an ordinary artisan as of the effective filing date would merely be applying the known technique of splitting vectors into segments and conducting arithmetic operations on the segments to the domain of neural network processing, with the predictable result that operations are carried out on more easily digestible segments. See KSR Int’l. Co. v. Teleflex Inc., 550 U.S. 398, 127 S. Ct. 1727, 167 L. Ed. 2d 705 (2007).
Gary discloses that “a maximum capability of the computation circuit is to process … two vectors each having a maximum number of reference elements (in the case of optical linear algebra processors, the maximum dynamic range [as of 1992] is limited to 8 to 10 bits [so that the maximum capability of the processor is to process 8 to 10 bits/reference elements] – Gary, Introduction; in a scheme for digital partitioning of matrix-vector products, the input vector and the matrix are each divided into two partitions, one with the most significant bits and one with the least significant bits; combinations of these partitions produce four product vectors weighted according to the value of the operands; the weighted vectors are added to produce the final result – id. at p. 6206, third paragraph under “Digital Partitioning” [note that a vector is merely an n x 1 or a 1 x n matrix, so that the method applies equally well to vector-vector multiplication); [and] ...
determin[ing] that at least one of the first vector and the second vector exceeds the maximum capability of the computation circuit (for scalar products, to form the product of two 8-bit numbers a and b (which yields a 16-significant-bit result), when multipliers are used that can handle no more than an 8-bit product, the product must be partitioned into subproducts each of which consists of 8 or fewer bits; digital partitioning of matrix-vector products is analogous, so that the input vector and the matrix are each divided into two partitions, one with the most significant bits and one with the least significant bits – Gary, p. 6206, second and third paragraphs under “Digital Partitioning” [so the system has determined that the original vector had more bits than the system could handle])….”
Desai, Muralimanohar, and Gary all relate to the hardware processing of vectors and are analogous.  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Desai and Muralimanohar to perform the processing on vectors having a maximum number of elements, as disclosed by Gary, and an ordinary artisan could reasonably expect to do so successfully.  Doing so would increase the accuracy and speed of the vector calculations in systems with processing limitations.  See Gary, Introduction.

Regarding claim 3, Desai, as modified by Muralimanohar and Gary, teaches that “the data adjustment circuit is configured to transmit one of the first segments and one of the second segments as a pair to the computation circuit each time (vector registers are partitioned into four 32-bit operands, eight 16-bit operands, sixteen 8-bit operands, and so forth; vector arithmetic operations including addition, subtraction, multiplication, division, etc. may be performed on operands of various bit-lengths in a single clock cycle [implying that two of the operands are transmitted to the arithmetic operator in the same cycle] – Desai, paragraph 17; see also Fig. 1 [two arrows going toward the vector arithmetic operation]).”

Regarding claim 4, Desai, as modified by Muralimanohar and Gary, teaches that “the computation circuit includes at least one of one or more addition processors, one or more subtraction processors, one or more logical conjunction processors, or one or more dot product processors (vector arithmetic operations including addition, subtraction, multiplication, division, etc. are performed on operands of various bit lengths in a single cycle – Desai, paragraph 17).”

Regarding claim 5, Desai, as modified by Muralimanohar and Gary, teaches that “each of the first elements and the second elements is a value represented in a predetermined number of bits (vector registers are partitioned into four 32-bit operands, eight 16-bit operands, sixteen 8-bit operands, and so forth [so that each operand/segment can be viewed as containing 32, 16, 8, etc. 1-bit elements] – Desai, paragraph 17).” 

Regarding claim 11, Desai teaches “[a] method for neural network processing (intended use language with no patentable weight), comprising: 
receiving, by a data I/O circuit, … data formatted in a first vector and a second vector (vector registers are partitioned into various size slices depending on the precision of the operands [so data would be received in these registers] – Desai, paragraph 17; see also Fig. 1 [two vector registers]),
wherein the first vector includes multiple first elements (vector registers are partitioned into four 32-bit operands, eight 16-bit operands, sixteen 8-bit operands, and so forth – Desai, paragraph 17), and 
wherein the second vector includes multiple second elements (vector registers are partitioned into four 32-bit operands, eight 16-bit operands, sixteen 8-bit operands, and so forth – Desai, paragraph 17); 
128-bit ALU is partitioned into units to perform identical and parallel operations by ALU elements on operands; vector registers are partitioned into four 32-bit operands, eight 16-bit operands, sixteen 8-bit operands, and so forth [element count = 128 bits; threshold count = 32 bits, 16 bits, etc.] – Desai, paragraph 17) …; 
respectively dividing, by a data adjustment circuit, the first vector and the second vector into one or more first segments and one or more second segments based on the determination (vector registers of 128-bit ALU are partitioned into four 32-bit operands [segments], eight 16-bit operands, sixteen 8-bit operands, and so forth – Desai, paragraph 17 [ALU determines that, for instance, 128 > 32 and partitions accordingly]), wherein a count of elements in each of the first segments and the second segments is equal to or less than the maximum number of the reference elements (vector registers are partitioned into four 32-bit operands, eight 16-bit operands, sixteen 8-bit operands, and so forth [segment element count = 32 bits, 16 bits, etc.; reference element maximum number = also 32 bits, 16 bits, etc.] – Desai, paragraph 17);
transmitting, by the data adjustment circuit, the one or more first segments and the one or more second segments to a computation circuit (vector arithmetic operations are performed on operands of various bit-lengths in a single cycle – Desai, paragraph 17 [operands from the vector register are transferred to the arithmetic circuit – see Fig. 1]), 
wherein the computation circuit is configured to perform operations between two vectors in accordance with one or more instructions (vector registers are partitioned into four 32-bit operands, eight 16-bit operands, sixteen 8-bit operands, and so forth; vector arithmetic operations including addition, subtraction, multiplication, division, etc. may be performed on operands of various bit-lengths in a single clock cycle – Desai, paragraph 17; see also Fig. 1), … and 
wherein the maximum number of the reference elements is equal to the threshold count (128-bit ALU is partitioned into units to perform identical and parallel operations by ALU elements on operands; vector registers are partitioned into four 32-bit operands, eight 16-bit operands, sixteen 8-bit operands, and so forth [reference element maximum number = 32 bits, 16 bits, etc.; threshold count = also 32 bits, 16 bits, etc.] – Desai, paragraph 17); and 
respectively and sequentially performing, by the computation circuit, the operations between the one or more first segments and the one or more second segments (vector registers are partitioned into four 32-bit operands, eight 16-bit operands, sixteen 8-bit operands, and so forth – Desai, paragraph 17; vector arithmetic operations including addition, subtraction, multiplication, division, etc. may be performed on operands of various bit-lengths in a single clock cycle, though the vector processor design if pipelined for higher clock rates may perform the operations within a single instruction cycle (which would typically comprise multiple clock cycles) – id.; see also Fig. 1).”
Muralimanohar discloses “receiving, by a data I/O circuit, neural network data formatted in a first vector and a second vector (memristor crossbar arrays store input weights for convolutional neural networks – Muralimanohar, paragraph 8; dot product is performed between the input vector [first vector] and a stored vector [second vector] stored in the memory array – id. at paragraph 43)….”
Desai and Muralimanohar are both in the field of hardware processing of vectors and are analogous.  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Desai to make the input data neural network data, as disclosed by Muralimanohar.  In so doing, an ordinary artisan as of the effective filing date would merely be applying the known technique of splitting vectors into segments and conducting arithmetic operations on the segments to the domain of neural network processing, with the predictable result that operations are carried out on more easily digestible segments. See KSR Int’l. Co. v. Teleflex Inc., 550 U.S. 398, 127 S. Ct. 1727, 167 L. Ed. 2d 705 (2007).
Gary discloses “determin[ing] that at least one of the first vector and the second vector exceeds [a] maximum capability of the computation circuit (for scalar products, to form the product of two 8-bit numbers a and b (which yields a 16-significant-bit result), when multipliers are used that can handle no more than an 8-bit product, the product must be partitioned into subproducts each of which consists of 8 or fewer bits; digital partitioning of matrix-vector products is analogous, so that the input vector and the matrix are each divided into two partitions, one with the most significant bits and one with the least significant bits – Gary, p. 6206, second and third paragraphs under “Digital Partitioning” [so the system has determined that the original vector had more bits than the system could handle]); [and] …
[the] maximum capability of the computation circuit is to process … two vectors having a maximum number of reference elements (in the case of optical linear algebra processors, the maximum dynamic range [as of 1992] is limited to 8 to 10 bits [so that the maximum capability of the processor is to process 8 to 10 bits/reference elements] – Gary, Introduction; in a scheme for digital partitioning of matrix-vector products, the input vector and the matrix are each divided into two partitions, one with the most significant bits and one with the least significant bits; combinations of these partitions produce four product vectors weighted according to the value of the operands; the weighted vectors are added to produce the final result – id. at p. 6206, third paragraph under “Digital Partitioning” [note that a vector is merely an n x 1 or a 1 x n matrix, so that the method applies equally well to vector-vector multiplication])….”
Desai, Muralimanohar, and Gary all relate to the hardware processing of vectors and are analogous.  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Desai and Muralimanohar to perform the processing on vectors having a maximum number of elements, as disclosed by Gary, and an ordinary artisan could reasonably expect to do so successfully.  Doing so would increase the accuracy and speed of the vector calculations in systems with processing limitations.  See Gary, Introduction.

Regarding claim 13, Desai, as modified by Muralimanohar and Gary, teaches that “the transmitting includes transmitting one of the first segments and one of the second segments as a pair to the computation circuit each time (vector registers are partitioned into four 32-bit operands, eight 16-bit operands, sixteen 8-bit operands, and so forth; vector arithmetic operations including addition, subtraction, multiplication, division, etc. may be performed on operands of various bit-lengths in a single clock cycle [implying that two of the operands are transmitted to the arithmetic operator in the same cycle] – Desai, paragraph 17; see also Fig. 1 [two arrows going toward the vector arithmetic operation]).”

Regarding claim 14, Desai, as modified by Muralimanohar and Gary, teaches that “the computation circuit includes at least one of one or more vector addition processors, one or more vector subtraction processors, one or more logical conjunction processors, or one or more dot product processors (vector arithmetic operations including addition, subtraction, multiplication, division, etc. are performed on operands of various bit lengths in a single cycle – Desai, paragraph 17).”

Regarding claim 15, Desai, as modified by Muralimanohar and Gary, teaches that “each of the first elements and the second elements is a value represented in a predetermined number of bits (vector registers are partitioned into four 32-bit operands, eight 16-bit operands, sixteen 8-bit operands, and so forth [so that each operand/segment can be viewed as containing 32, 16, 8, etc. 1-bit elements] – Desai, paragraph 17).” 

Claims 6-10 and 16-20 are rejected under 35 U.S.C. 103 as being unpatentable over Desai in view of Muralimanohar and Gary and further in view of Ahsan et al. (US 10331583) (“Ahsan”).
Regarding claim 6, Desai, as modified by Muralimanohar, Gary, and Ahsan, discloses “an instruction obtaining circuit configured to obtain the one or more instructions from an instruction storage device (instruction prefetcher fetches instructions from memory and feeds them to an instruction decoder – Ahsan, col. 21, ll. 5-21).”
Desai, Muralimanohar, and Ahsan are all in the field of computer architecture and are analogous.  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Desai and Muralimanohar to include a circuit for obtaining instructions, as taught by Ahsan.  The circuit as taught by Ahsan would not function any differently in the See KSR Int’l. Co. v. Teleflex Inc., 550 U.S. 398, 127 S. Ct. 1727, 167 L. Ed. 2d 705 (2007).

Regarding claim 7, Desai, as modified by Muralimanohar, Gary, and Ahsan, teaches “a decoding circuit configured to decode each of the one or more instructions into respective one or more micro-instructions (instruction decoder decodes or interprets instructions by decoding a received instruction into micro-instructions (also called uops) – Ahsan, col. 21, ll. 5-21).”
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Desai, Muralimanohar, and Gary to include a circuit for decoding instructions, as taught by Ahsan.  The circuit as taught by Ahsan would not function any differently in the context of vector processing as it would in other contexts, and an ordinary artisan before the effective filing date would recognize that adding an instruction decoding circuit would yield predictable results.  See KSR Int’l. Co. v. Teleflex Inc., 550 U.S. 398, 127 S. Ct. 1727, 167 L. Ed. 2d 705 (2007).

Regarding claim 8, Desai, as modified by Muralimanohar, Gary, and Ahsan, teaches “an instruction queue circuit configured to store the one or more micro-instructions (trace cache takes decoded uops and assembles them into program ordered sequences or traces in the uop queue for execution – Ahsan, col. 21, ll. 5-21).”
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Desai, Muralimanohar, and Gary to include a circuit for queueing instructions, as taught by Ahsan.  The circuit as taught by Ahsan would not function any differently in the context of vector processing as it would in other contexts, and an ordinary artisan before the effective filing date would recognize that adding an instruction queue circuit would yield predictable results.  See KSR Int’l. Co. v. Teleflex Inc., 550 U.S. 398, 127 S. Ct. 1727, 167 L. Ed. 2d 705 (2007).

uop schedulers [dependency processing circuits] determine when a uop is ready to execute based on the readiness of their dependent input register operand sources and the availability of the execution resources the uops need to complete their execution – Ahsan, col. 21, ll. 40-61; see also id. at col. 22, ll. 49-67 (uops schedulers dispatch dependent operations before the parent load has finished scheduling); Fig. 8).”
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Desai, Muralimanohar, and Gary to include a circuit for dependency processing, as taught by Ahsan.  The circuit as taught by Ahsan would not function any differently in the context of vector processing as it would in other contexts, and an ordinary artisan before the effective filing date would recognize that adding a dependency processing circuit would yield predictable results.  See KSR Int’l. Co. v. Teleflex Inc., 550 U.S. 398, 127 S. Ct. 1727, 167 L. Ed. 2d 705 (2007).

Regarding claim 10, Desai, as modified by Muralimanohar, Gary, and Ahsan, teaches “a storage queue circuit configured to store the one or more instructions while the dependency processing circuit is determining an existence of the dependency relationship (instruction decoder decodes or interprets instructions by decoding a received instruction into micro-instructions (also called uops) – Ahsan, col. 21, ll. 5-21; uop schedulers [dependency processing circuits] determine when a uop is ready to execute based on the readiness of their dependent input register operand sources and the availability of the execution resources the uops need to complete their execution – Ahsan, col. 21, ll. 40-61 [so the uop scheduler stores the data while a dependency relationship is being determined]).”
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Desai, Muralimanohar, and Gary to include a circuit for storage queueing, as taught by Ahsan.  The circuit as taught by Ahsan would not function any differently See KSR Int’l. Co. v. Teleflex Inc., 550 U.S. 398, 127 S. Ct. 1727, 167 L. Ed. 2d 705 (2007).

Regarding claim 16, Desai, as modified by Muralimanohar, Gary, and Ahsan, discloses “obtaining, by an instruction obtaining circuit, the one or more instructions from an instruction storage device (instruction prefetcher fetches instructions from memory and feeds them to an instruction decoder – Ahsan, col. 21, ll. 5-21).”
Desai, Muralimanohar, Gary, and Ahsan are all in the field of computer architecture and are analogous.  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Desai, Muralimanohar, and Gary to include a circuit for obtaining instructions, as taught by Ahsan.  The circuit as taught by Ahsan would not function any differently in the context of vector processing as it would in other contexts, and an ordinary artisan before the effective filing date would recognize that adding an instruction obtaining circuit would yield predictable results.  See KSR Int’l. Co. v. Teleflex Inc., 550 U.S. 398, 127 S. Ct. 1727, 167 L. Ed. 2d 705 (2007).

Regarding claim 17, Desai, as modified by Muralimanohar, Gary, and Ahsan, teaches “decoding, by a decoding circuit, each of the one or more instructions into respective one or more micro-instructions (instruction decoder decodes or interprets instructions by decoding a received instruction into micro-instructions (also called uops) – Ahsan, col. 21, ll. 5-21).”
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Desai, Muralimanohar, and Gary to include a circuit for decoding instructions, as taught by Ahsan.  The circuit as taught by Ahsan would not function any differently in the context of vector processing as it would in other contexts, and an ordinary artisan before See KSR Int’l. Co. v. Teleflex Inc., 550 U.S. 398, 127 S. Ct. 1727, 167 L. Ed. 2d 705 (2007).

Regarding claim 18, Desai, as modified by Muralimanohar, Gary, and Ahsan, teaches “storing, by an instruction queue circuit, the one or more micro-instructions (trace cache takes decoded uops and assembles them into program ordered sequences or traces in the uop queue for execution – Ahsan, col. 21, ll. 5-21).”
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Desai, Muralimanohar, and Gary to include a circuit for queueing instructions, as taught by Ahsan.  The circuit as taught by Ahsan would not function any differently in the context of vector processing as it would in other contexts, and an ordinary artisan before the effective filing date would recognize that adding an instruction queue circuit would yield predictable results.  See KSR Int’l. Co. v. Teleflex Inc., 550 U.S. 398, 127 S. Ct. 1727, 167 L. Ed. 2d 705 (2007).

Regarding claim 19, Desai, as modified by Muralimanohar, Gary, and Ahsan, teaches “determining, by a dependency processing circuit, whether at least one of the one or more instructions has a dependency relationship with a previously received instruction (uop schedulers [dependency processing circuits] determine when a uop is ready to execute based on the readiness of their dependent input register operand sources and the availability of the execution resources the uops need to complete their execution – Ahsan, col. 21, ll. 40-61; see also id. at col. 22, ll. 49-67 (uops schedulers dispatch dependent operations before the parent load has finished scheduling); Fig. 8).”
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Desai, Muralimanohar, and Gary to include a circuit for dependency processing, as taught by Ahsan.  The circuit as taught by Ahsan would not function any differently in the context of vector processing as it would in other contexts, and an ordinary artisan before the effective filing date would recognize that adding a dependency processing circuit would yield See KSR Int’l. Co. v. Teleflex Inc., 550 U.S. 398, 127 S. Ct. 1727, 167 L. Ed. 2d 705 (2007).

Regarding claim 20, Desai, as modified by Muralimanohar, Ahsan, and Gary, teaches “storing, by a storage queue circuit, the one or more instructions while the dependency processing circuit is determining an existence of the dependency relationship (instruction decoder decodes or interprets instructions by decoding a received instruction into micro-instructions (also called uops) – Ahsan, col. 21, ll. 5-21; uop schedulers [dependency processing circuits] determine when a uop is ready to execute based on the readiness of their dependent input register operand sources and the availability of the execution resources the uops need to complete their execution – Ahsan, col. 21, ll. 40-61 [so the uop scheduler stores the data while a dependency relationship is being determined]).”
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Desai, Muralimanohar, and Gary to include a circuit for storage queueing, as taught by Ahsan.  The circuit as taught by Ahsan would not function any differently in the context of vector processing as it would in other contexts, and an ordinary artisan before the effective filing date would recognize that adding a storage queue circuit would yield predictable results.  See KSR Int’l. Co. v. Teleflex Inc., 550 U.S. 398, 127 S. Ct. 1727, 167 L. Ed. 2d 705 (2007).

Response to Arguments
Applicant's arguments filed April 16, 2021 (“Remarks”) have been fully considered but they are not persuasive.
Applicant first argues that Gary “discloses partitioning each of the elements in a vector, rather than partitioning the elements into two or more groups of elements.”  Remarks at 10 (emphasis deleted).  As support for this argument, Applicant argues that the “elements” must be synonymous with the entries of the vector, whereas Gary, according to Applicant, partitions the input vector according to the significance of the digits.  Id. at 10-11.  However, critically, the claims never define the term “elements,” and moreover, value represented in a predetermined number of bits.”  (Emphasis added.)  Gary p. 6206, second paragraph under “Digital Partitioning,” indicates that “[t]o form the product of two 8-bit numbers a and b (which yields a 16-significant-bit result), when we use multipliers that can handle no more than an 8-bit product, we must partition the product into subproducts, each of which consists of 8 or fewer bits.”  The next paragraph indicates that partitioning matrix-vector products is similar insofar as the input vector and the matrix are divided into two partitions, one with the most significant bits and one with the least significant bits.  A bit, by the definition above, may be a type of “element,” because it is a “value represented in a predetermined number of bits” (namely, one bit).  Insofar as the number of most significant bits is predetermined by the capability of the underlying processor, and insofar as the specification itself defines an “element” as any number that can be represented in a predetermined number of bits, Gary clearly teaches that the “maximum capability” of the computation circuit is to process vectors “each having a maximum number of reference elements,” namely the maximum number of significant bits the system can handle at any one time.
Applicant then argues that Gary is not combinable with Desai because “considering the teaching of Gary, Desai still does not show a determination that an input/received vector exceeds the capability of an ALU.  Rather, the operands in Desai are never shown to be greater than 128 bits.”  Remarks at 11-12.  Whatever point Applicant is attempting to make with this remark, it surely is not a point about the alleged combinability of Desai with Gary.  Examiner has not suggested that the ALU of Desai has a “maximum capability;” it is for precisely that reason that Gary was introduced.  To the extent that Applicant’s argument is intelligible to Examiner, it seems to be that because none of the vectors input to the ALU of Desai happen to exceed its capacity, there is no need to alter it to partition the vectors in the event that the vector input exceeds the capacity of the ALU.  But the fact that the vectors input to the system of Desai do not happen to exceed its capacity does not mean that an ordinary artisan would not find it advantageous to introduce a mechanism for partitioning the vectors in the event that an input vector does exceed its capacity.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to RYAN C VAUGHN whose telephone number is (571)272-4849.  The examiner can normally be reached on M-R 7a-5:30p ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kamran Afshar can be reached on 571-272-7796.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


/R.C.V./             Examiner, Art Unit 2125

/KAMRAN AFSHAR/             Supervisory Patent Examiner, Art Unit 2125