DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claims 1, 4, 8, and 14 have been amended.
Claims 1-20 have been examined.
The § 112 rejections in the previous Office Action have been addressed and are withdrawn.

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on December 28, 2021 has been entered.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-4, 6-17, 19, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over US Publication No. 2015/0086134 by Hameed et al. (hereinafter referred to as “Hameed”) in view of US Publication No. 2009/0089540 by Hansen et al. (hereinafter referred to as “Hansen”) in view of US Publication No. 2005/0125636 by Ford et al. (hereinafter referred to as “Ford”) in view of US Publication No. 2015/0261534 by Uliel et al. (hereinafter referred to as “Uliel”). 
Regarding claims 1, 8, and 14, taking claim 1 as representative, Hameed discloses:
a system, comprising: a processor configured to perform a convolution on a vector comprising a plurality of data lanes of multiple data bits, wherein to perform the convolution on the vector the processor is configured to execute a plurality of vector instructions, wherein each of the plurality of vector instructions is configured to perform an operation on a plurality of data lanes of multiple data bits, and wherein the plurality of vector instructions are configured to (Hameed discloses, at ¶ [0045], a convolution processor configured to perform convolution on a vector, which discloses a plurality of data lanes of multiple data bits. As disclosed at ¶ [0059], the convolution involves executing a plurality of instructions, which are vector instructions by virtue of operating on vectors, as disclosed at, e.g., ¶ [0044], which discloses a register having 32 eight-bit values, i.e., a vector register. Hameed also discloses, at Figure 5, pseudocode to implement convolution, which discloses non-transitory computer readable storage medium storing instructions to implement the convolution operations.):
load into respective different vector registers of the processor: three or more source vectors comprising: a center vector comprising a contiguous plurality of data lanes of the vector starting with a first data lane of the vector; a left vector comprising a contiguous plurality of data lanes of the vector starting one data lane prior to the first data lane of the vector; and a right vector comprising a contiguous plurality of data lanes of the vector starting one data lane after the first data lane of the vector; and a kernel vector comprising a plurality of weighting values... (Hameed discloses, at ¶ [0044] and Figure 3, loading shifted subsets of input data into respective registers where the subsets are shifted by 1 with respect to one another (e.g., 26A, 26B, and 26C correspond to the claimed left, center, and right vectors, respectively) and loading data into a coefficient (weighting) register, where any three values are center, left, and right values, respectively.); and 
subsequent to the loading, generate one or more output vectors...of the three or more source vectors loaded into the respective different vector registers wherein to generate a particular output vector of the one or more output vectors a particular vector instruction of the plurality of vector instructions is configured to scale... a vector comprising respective lanes of a source vector of the three or more source vectors...each scaled by a weighting value of the kernel vector selected according to the particular output vector and the source vector. (Hameed discloses, at ¶ [0044], generating an output vector comprising input data scaled by the coefficient data, where the center vector, e.g., 26B, is scaled, the left vector, e.g., 26A, is scaled, and the right vector, e.g., 26C, is scaled. The scaling operations correspond to the selection of the input vectors, which discloses the weighting values are selected according to the source vector and output vector.).
Hameed does not explicitly disclose the aforementioned weighting vectors include a center weighting value, a left weighting value, and a right weighting value, that the aforementioned one or more output vectors respectively comprise weighted sums, that the aforementioned particular vector instruction is configured to add, to a vector accumulator, the aforementioned vector having lanes scaled by a weighting value, and that the aforementioned vector is specified as an operand of the particular vector instruction.  
However, in the same field of endeavor (e.g., convolution operations) Hansen discloses:
a weighting vector including a center weighting value and a left weighting value (Hansen discloses, at ¶¶ [0545]- [0548], a scale add extract instruction that uses two values as scalar multipliers to two vector multiplicands. The scalar multipliers are stored as consecutive elements of a register, i.e., rb. This discloses a center and left weighting value.), 
one or more output vectors respectively comprise weighted sums and a vector instruction to add, to a vector accumulator, a scaled vector (Hansen discloses, at ¶¶ [0545]- [0548], the output of the scale add extract instruction is the accumulated sum of the scaled vectors.).
Hansen does not explicitly disclose scaling a third vector by a third weighting value, but doing so is implicit in the disclosure of the instruction. That is, it would be obvious to utilize Hansen’s instruction to scale further registers because an instruction that could only be used on one set of data would be prohibitively expensive in terms of design and implementation. 
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify Hameed’s architecture to include the particular convolution operation taught by Hansen because all of the claimed elements were known in the prior art and one skilled in the art could have combined the elements as claimed by known methods with no change in their respective functions and the combination would have yielded predictable results to one of ordinary skill in the art at the time of filing. That is, performing convolution operations on shifted data is well-known, as evidenced by the pervasive attention to shifting data present in Hansen.
Also, in the same field of endeavor (e.g., vector operations) Ford discloses:
a right weighting value in the kernel vector (Ford discloses, at ¶ [0248], that each scalar in a vector of scalars, which includes three values, is selected and then used, as disclosed at ¶ [0252], in respective vector multiplication operations.).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify Hameed’s vector instructions to include a vector of selectable scalars used to multiply vectors because “Being able to select one scalar operand for a SIMD operation is particular efficient in situations involving matrices of data elements. Different scalar operands can be written to the SIMD register file 20 and then readily selected for different vector-by-scalar operations without the need to re-write data elements or move data elements around.” See, e.g., Ford, ¶ [0245].
Also, in the same field of endeavor (e.g., SIMD operations) Uliel discloses:
vector registers of the processor are configured to be accessed as operands of individual vector instructions (Uliel discloses, at ¶ [0045], vector instructions specifying vector registers.). 
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify Hameed’s vector instructions to explicitly identify vector registers, as disclosed by Uliel, because there are a finite number of ways to specify the locations in which data is stored, and one of ordinary skill in the art could have pursued the known potential options with a reasonable expectation of success.

Regarding claims 2, 9, and 15, taking claim 2 as representative, Hameed, as modified, discloses the features of claim 1, as discussed above. Hameed also discloses:
wherein to load the left and right vectors the plurality of vector instructions are configured to: load a previous vector immediately prior to the vector; load a next vector immediately subsequent to the vector (Hameed discloses, at ¶ [0044] and Figure 5, that one vector utilizes 64 multipliers, and covers, e.g., elements p0 through p18. Producing output for all 32 input elements discloses loading multiple vectors, i.e., previous and next vectors, such that the vector starting at p0 and ending at p18 is a previous vector the vector starting at p4 and ending at p22 is the vector and the vector starting at p8 and ending at p26 is the next vector.).
Hameed does not explicitly disclose executing a vector extraction instruction to generate the left vector using the previous vector and the vector, and executing another vector extraction instruction to generate the right vector using the vector and the next vector.
However, in the same field of endeavor (e.g., SIMD operations) Uliel discloses:
execute a vector extraction instruction to generate the left vector using the previous vector and the vector (Uliel discloses, at ¶ [0071], a packed inter-element shift merge left logic that is implemented as an instruction, as indicated at ¶ [0033].); and
execute another vector extraction instruction to generate the right vector using the vector and the next vector (Uliel discloses, at ¶ [0059], a packed inter-element shift merge right logic that is implemented as an instruction, as indicated at ¶ [0033].).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have combined Hameed’s disclosure directed to performing convolution on packed data with Uliel’s manipulation of packed data. One of ordinary skill in the art would have been motivated to make such a combination because all of the claimed elements were known in the prior art and one skilled in the art could have combined the elements as claimed by known methods with no change in their respective functions and the combination would have yielded predictable results (e.g., enabling greater control over selection of data on which to perform convolution functions) to one of ordinary skill in the art at the time of the invention.

Regarding claims 3, 10, and 16, taking claim 3 as representative, Hameed, as modified, discloses the features of claim 2, as discussed above. Hameed does not explicitly disclose the aforementioned vector extraction instruction is configured to concatenate a first vector register with a second vector register to produce an intermediate result, shift the intermediate result left a number of lanes, and generate an output vector comprising an upper portion of the intermediate result.
However, in the same field of endeavor (e.g., SIMD operations) Uliel discloses:
concatenate a first vector register with a second vector register to produce an intermediate result (Uliel discloses, at ¶ [0072], concatenating the vectors to produce an intermediate result.); 
shift the intermediate result left a number of lanes (Uliel discloses, at ¶ [0072], shifting the intermediate result left a specified number of positions.); and
generate an output vector comprising an upper portion of the intermediate result (Uliel discloses, at ¶ [0073], generating an output consisting of the most significant bits of the intermediate result.).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have combined Hameed’s disclosure directed to performing convolution on packed data with Uliel’s manipulation of packed data. One of ordinary skill in the art would have been motivated to make such a combination because all of the claimed elements were known in the prior art and one skilled in the art could have combined the elements as claimed by known methods with no change in their respective functions and the combination would have yielded predictable results (e.g., enabling greater control over selection of data on which to perform convolution functions) to one of ordinary skill in the art at the time of the invention.

Regarding claim 4, Hameed, as modified, discloses the features of claim 1, as discussed above. Hameed also discloses:
wherein to generate the particular output vector, the plurality of vector instructions comprise a vector scaling instruction for each one of the three or more of source vectors, wherein each one of the three or more source vectors is configured to: load a weighting value for the convolution from a specified lane of a register containing the kernel vector; multiply…by the weighting value to generate a scaled vector; and add the scaled vector to the particular one of the one or more the output vectors (Hameed discloses, at ¶ [0044], each of the plurality of input registers are multiplied, which discloses an instruction to do so, by coefficient (weighting) values, which discloses loading the coefficient values, combined (added), and delivered to the output.).  
Hameed does not explicitly disclose each of the plurality of lanes specified as an operand of the vector scaling instruction is multiplied by the aforementioned respective weighting value.
However, in the same field of endeavor (e.g., vector operations) Ford discloses:
each of a plurality of data lanes of a vector register specified in the configured operand is multiplied by a respective value (Ford discloses, at ¶ [0252], multiplying a vector by a scalar.).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify Hameed’s vector instructions to include multiplying a vector by a scalar because there is a finite number of ways of scaling vectors, such as multiplying each element of a vector by individual coefficient values, as disclosed by Hameed, and multiplying each element of a vector by a single value, as disclosed by Ford. While nothing in Hameed precludes each of the coefficient values being identical, that particular scenario is not explicitly disclosed. One of ordinary skill in the art could have pursued the known potential options with a reasonable expectation of success.

Regarding claims 6, 12, and 19, taking claim 6 as representative, Hameed, as modified, discloses the features of claim 1, as discussed above. Hameed also discloses:
wherein the convolution is a multi-dimensional convolution, and wherein the one or more output vectors comprises a plurality of output vectors (Hameed discloses, at ¶¶ [0046]-[0047], two dimensional convolution, which produces two dimensional output.).

Regarding claims 11 and 17, taking claim 11 as representative, Hameed, as modified, discloses the features of claim 8, as discussed above. Hameed also discloses:
generating the particular output vector comprises executing for each one of the three or more source vectors a vector scaling instruction, wherein executing the vector scaling instruction comprises: loading a respective weighting value from a specified lane of the register containing the kernel vector; generating a scaled vector comprising respective data lanes of a vector register specified in the configured operand by the weighting value; adding the scaled vector to the particular one of the one or more output vectors. (Hameed discloses, at ¶ [0044], each of the plurality of input registers are multiplied, which discloses an instruction to do so, by coefficient (weighting) values, which discloses loading the coefficient values, combined (added), and delivered to the output.).  

Claims 5 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Hameed in view of Hansen in view of Ford in view of Uliel in view of US Publication No. 2018/0004518 by Plotnikov et al. (hereinafter referred to as “Plotnikov”). 
Regarding claims 5 and 18, taking claim 5 as representative, Hameed, as modified, discloses the features of claim 1, as discussed above. Hameed also discloses:
the convolution on the vector…and wherein to load the three or more source vectors the plurality of vector instructions are configured to: load a previous vector immediately prior to the vector; load a next vector immediately subsequent to the vector (Hameed discloses, at ¶ [0044] and Figure 5, that one vector utilizes 64 multipliers, and covers, e.g., elements p0 through p18. Producing output for all 32 input elements discloses loading multiple vectors, i.e., previous and next vectors, such that the vector starting at p0 and ending at p18 is a previous vector the vector starting at p4 and ending at p22 is the vector and the vector starting at p8 and ending at p26 is the next vector.).
Hameed does not explicitly disclose that the aforementioned convolution implements a stride value, wherein a value of data lanes to omit in the convolution for each data lane to include is based on the stride value, execute vector extraction instructions to generate first and second intermediate vectors using the previous vector, the vector and the next vector; execute a vector shuffle instruction to load the left vector using the first and second intermediate vectors; and execute vector shuffle instructions to load the center vector and right vector using the vector and next vector.
However, in the same field of endeavor (e.g., SIMD operations) Uliel discloses:
execute vector extraction instructions to generate first and second intermediate vectors using the previous vector, the vector and the next vector (Uliel discloses, at ¶ [0073], vector extraction instructions that generate outputs consisting of the most significant bits of concatenated, shifted, and truncated registers. The results are intermediates results when subsequent operations are performed on the results.).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have combined Hameed’s disclosure directed to performing convolution on packed data with Uliel’s manipulation of packed data. One of ordinary skill in the art would have been motivated to make such a combination because all of the claimed elements were known in the prior art and one skilled in the art could have combined the elements as claimed by known methods with no change in their respective functions and the combination would have yielded predictable results (e.g., enabling greater control over selection of data on which to perform convolution functions) to one of ordinary skill in the art at the time of the invention.
Also in the same field of endeavor (e.g., vector operations) Plotnikov discloses:
stride value, wherein a value of data lanes to omit…for each data lane to include is based on the stride value (Plotnikov discloses, at ¶ [0080], an instruction that specifies a stride value that indicates which elements (lanes) are omitted and which are included.); 
execute a vector shuffle instruction to load a vector using multiple vectors (Plotnikov discloses, at ¶ [0085], loading a destination vector using multiple source vectors.); and
execute vector shuffle instructions to load vectors using multiple vectors (Plotnikov discloses, at ¶ [0085], loading a destination vector using multiple source vectors.).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify Hameed’s architecture to include the use of stride value to selectively load elements from multiple source vectors into a destination vector because all of the elements of the claims are known. The only difference is the combination of the “old elements” into a single device, and one skilled in the art could have combined the elements as claimed by known methods with no change in their respective functions and the combination would have yielded predictable results (e.g., flexible selection of input values based on a stride value) to one of ordinary skill in the art at the time of the invention.

Claims 7, 13, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Hameed in view of Hansen in view of Ford in view of Uliel in view of US Publication No. 2018/0307980 by Barik et al. (hereinafter referred to as “Barik”). 
Regarding claims 7, 13, and 20, taking claim 7 as representative, Hameed, as modified, discloses the features of claim 1, as discussed above. Hameed does not explicitly disclose the aforementioned convolution is performed as part of a convolutional neural network.
However, in the same field of endeavor (e.g., convolution operations) Barik discloses:
a convolutional neural network (Barik discloses, at ¶ [0153], a convolutional neural network.). 
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to implement Hameed’s architecture using the convolutional neural network taught by Barik because doing so can improve performance by optimizing training of neural networks. See Barik, ¶ [0141].

Response to Arguments
On pages 12-13 of the response filed December 28, 2022 (“response”), the Applicant argues, “Claim 1 features vector instructions configured to generate output vectors that include weighted sums of at least three registers of a processor previously loaded with vectors. A particular output register is generated by executing a portion of the vector instructions, where each one of portion of the vector instructions adds to a vector accumulator individual lanes of a source vector, specified as an operand, scaled by a weighting value of the kernel vector selected according to the particular output vector and the source vector. Applicant respectfully submits that the combination of references fails to disclose the above recited feature of amended claim 1.” In support of this position, the Applicant argues, “In the rejection, the Action recognizes that Hameed and Hansen do not disclose the above feature and instead cites the newly added reference Ford paragraphs [0248] and [0252]. Ford paragraph [0248] discusses various aspects of FIG. 34 which "illustrates schematically logic arranged to perform a vector-by-scalar operation of an embodiment." (Ford paragraph [0245]). While the vector-by-scalar operation of Ford discloses a register Dm from which a scalar Dm[1] may be selected, as shown in FIG. 34 element 510, and subsequently used to scale individual lanes of vector Dn at step 520 to produce a vector Dd, this vector-by-scalar operation of Ford does not disclose adding the scaled vector Dd to a vector accumulator, as recited in the claim. Applicant notes that the vector accumulator feature has been newly added to clarify the previous summing feature of the claim and is supported pervasively throughout the Specification including at least in descriptions of FIGs 5 - 11. As the Action recognizes that Hameed and Hansen do not disclose the above feature and Uliel is not cited for and does not disclose the feature, Applicant respectfully submits that the combination of references therefore fails to teach or suggest a plurality of vector instructions are configured to generate one or more output vectors respectively comprising weighted sums of the three or more source vectors loaded into the respective different vector registers, wherein to generate a particular output vector of the one or more output vectors a particular vector instruction of the plurality of vector instructions is configured to scale and add, to a vector accumulator, a vector comprising respective lanes of a source vector of the three or more source vectors, specified as an operand of the particular vector instruction, each scaled by a weighting value of the kernel vector selected according to the particular output vector and the source vector, as claimed. “
Though fully considered, the Examiner respectfully disagrees. As an initial matter, the Applicant mischaracterizes the previous office action. The previous office action conceded that Hansen and Hameed do not explicitly disclose that the weighting values in the kernel vector comprise a center weighting value, a left weighting value, and a right weighting value. It was for these features that Ford was cited. Specifically, it was for the right weighting value. The Examiner did not and does not rely on Ford for any of the limitations in the generating output vectors clause. 
Instead, the office action cites Hameed as generating output vectors comprising scaled vectors. See, e.g., Hameed’s Figure 3 and related description, which show multiplying shifted vectors by coefficient values, i.e., generating scaled vectors. Hansen discloses accumulating (summing) scaled vectors. See, e.g., ¶¶ [0545]- [0548]. The Examiner maintains that this disclosure teaches the newly added clarifications that summing scaled vectors comprises adding a scaled vector to a vector accumulator. Accordingly, the Applicant’s arguments are deemed unpersuasive. 

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SHAWN DOMAN whose telephone number is (571)270-5677.  The examiner can normally be reached on Monday through Friday 8:30am-6pm Eastern Time.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jyoti Mehta can be reached on 571-270-3995.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/SHAWN DOMAN/Primary Examiner, Art Unit 2183