DETAILED ACTION
This is in response to the application filed on August 5, 2020 in which claims 1 – 15, 17, 19, 22, and 25 – 27 are presented for examination.
Status of Claims
Claims 1 – 15, 17, 19, 22, and 25 – 27 are pending, of which claims 1, 26, and 27 are in independent form.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statement (IDS) submitted on October 6, 2022 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1 – 15, 17, 19, 22, and 25 – 27 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. The claims recite a mathematical equation for multiplication and accumulation/summation. This judicial exception is not integrated into a practical application because the claims do not include additional elements that are sufficient to amount to significantly more. The claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception because the claimed processor-implemented steps of obtaining of data, performing a sum of products, and outputting results are well-understood, routine, and conventional computer steps.
As in MPEP 2106.04(a)(2):
The Court’s rationale for identifying these "mathematical concepts" as judicial exceptions is that a ‘‘mathematical formula as such is not accorded the protection of our patent laws,’’ Diehr, 450 U.S. at 191, 209 USPQ at 15 (citing Benson, 409 U.S. 63, 175 USPQ 673), and thus ‘‘the discovery of [a mathematical formula] cannot support a patent unless there is some other inventive concept in its application.’’
C.Mathematical calculations
A claim that recites a mathematical calculation, when the claim is given its broadest reasonable interpretation in light of the specification, will be considered as falling within the "mathematical concepts" grouping. A mathematical calculation is a mathematical operation (such as multiplication) or an act of calculating using mathematical methods to determine a variable or number, e.g., performing an arithmetic operation such as exponentiation. There is no particular word or set of words that indicates a claim recites a mathematical calculation. That is, a claim does not have to recite the word "calculating" in order to be considered a mathematical calculation. For example, a step of "determining" a variable or number using mathematical methods or "performing" a mathematical operation may also be considered mathematical calculations when the broadest reasonable interpretation of the claim in light of the specification encompasses a mathematical calculation.
Further, as in MPEP 2106.05 (f) (2), use of a computer or other machinery in its ordinary capacity for economic or other tasks (e.g., to receive, store, or transmit data) or simply adding a general purpose computer or computer components after the fact to an abstract idea (e.g., a fundamental economic practice or mathematical equation) does not integrate a judicial exception into a practical application or provide significantly more. See Affinity Labs v. DirecTV, 838 F.3d 1253, 1262, 120 USPQ2d 1201, 1207 (Fed. Cir. 2016) (cellular telephone); TLI Communications LLC v. AV Auto, LLC, 823 F.3d 607, 613, 118 USPQ2d 1744, 1748 (Fed. Cir. 2016) (computer server and telephone unit).

In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 2, 14, 15, 19, and 25 – 27 are rejected under 35 U.S.C. 103 as being unpatentable over Cheung et al., U.S. Patent 8,959,136 (hereinafter referred to as Cheung) in view of Demjanenko, U.S. Patent Application 2004/0073773 (hereinafter referred to as Demjanenko).

Referring to claim 1, Cheung discloses “A processor-implemented method for data manipulation” (Abstract processor architecture) “comprising: obtaining a first left group comprising” “data” (Fig. 5 receiving ar) “and a first right group of” “data for computation using a processor” (Fig. 5 receiving aj); “obtaining a second left group comprising” “data” (Fig. 5 receiving br) “and a second right group of” “data” (Fig. 5 receiving bj); “performing a sum of products between the first left and right groups and the second left and right groups” (Fig. 5 multiplying at 512/514/516/518 and adding at 523/524 adders), “wherein the sum of products is performed on” “integer data” (column 5 lines 27 – 30 each data input signal may represent input data using any appropriate number representation), “and wherein: a first result is based on a summation of” “values that are products of the first group's left” data “and the second group's left” data (Fig. 5 multiplier 512 result is arbr); “and a second result is based on the summation of” “values that are products of the first group's left” data “and the second group's right” data (Fig. 5 multiplier 518 result is arbj); “and outputting the first result and the second result” “based on the performing” (Fig. 5 results are output).
	Cheung does not appear to explicitly disclose “eight bytes of data” per group of data, the sum of products is performed on bytes of “8-bit data,” results being based on a summation of “eight values,” and outputting “four-byte results.”
	However, Demjanenko discloses another processor implemented method ([0117] processor family) that supports “eight bytes of data” per group of data, the sum of products is performed on bytes of “8-bit data,” results being based on a summation of “eight values” ([0108] vector mode supports 8, 16, or 32-bit data sizes but (optional 64 bit in future versions). 8 bytes = 64 bits, and as in [0108] 8-bit data. Table 1 at [0114] up to 64 bit, data type integer. [0117] TOVEN processor family supports 8, 16, and 32-bit sizes as signed/unsigned integer. [0144] VALU responsible for Multiply and Accumulate).
	Demjanenko does not appear to explicitly disclose and outputting “four-byte results.”  However, Demjanenko discloses the VALU is responsible for Multiply and Accumulate, and the VRC rounds or saturates, reduces or increases precision ([0144]).
	It would have been obvious to one of ordinary skill in the art at the time of Applicant’s invention to use the rounding/reducing precision feature of Demjanenko’s VRC so that the results are output as smaller byte results (e.g. four-byte results).  As is understood in the art, this reduced precision saves space.
Cheung and Demjanenko are analogous art because they are from the same field of endeavor, which is processor-based multiply and accumulate methods.
Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art, having the teachings of Cheung and Demjanenko before him or her, to modify the teachings of Cheung to include the teachings of Demjanenko so that the system supports “eight bytes of data” per group of data, the sum of products is performed on bytes of “8-bit data,” and the results are based on a summation of “eight values.”
The motivation for doing so would have been to provide a means for different data sizes.  As explained by Demjanenko at [0108], data sizes commonly change over time with advancements in technology.
Therefore, it would have been obvious to combine Demjanenko with Cheung to obtain the invention as specified in the instant claim.

	As per claim 2, Cheung discloses “outputting a third result and a fourth result, wherein the third result is based on a summation of” “products of the first group's right eight bytes and the second group's left eight bytes” (Fig. 5 multiplier 516 result is ajbr), “and the fourth result is based on a summation of” “products of the first group' s right eight bytes and the second group's right eight bytes” (Fig. 5 multiplier 514 result is ajbj) 
	As above, Cheung does not appear to explicitly disclose a summation of “eight values that are products.”
However, Demjanenko discloses another processor implemented method ([0117] processor family) that supports eight bytes of data per group of data, the sum of products is performed on bytes of 8-bit data, and results being based on a summation of “eight values” ([0108] vector mode supports 8, 16, or 32-bit data sizes but (optional 64 bit in future versions). 8 bytes = 64 bits, and as in [0108] 8-bit data. Table 1 at [0114] up to 64 bit, data type integer. [0117] TOVEN processor family supports 8, 16, and 32-bit sizes as signed/unsigned integer. [0144] VALU responsible for Multiply and Accumulate).
Cheung and Demjanenko are analogous art because they are from the same field of endeavor, which is processor-based multiply and accumulate methods.
Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art, having the teachings of Cheung and Demjanenko before him or her, to modify the teachings of Cheung to include the teachings of Demjanenko so that the system supports “eight bytes of data” per group of data, the sum of products is performed on bytes of “8-bit data,” and the results are based on a summation of “eight values.”
The motivation for doing so would have been to provide a means for different data sizes.  As explained by Demjanenko at [0108], data sizes commonly change over time with advancements in technology.
Therefore, it would have been obvious to combine Demjanenko with Cheung to obtain the invention as specified in the instant claim.

	As per claims 14 and 15, Cheung discloses “the performing and the outputting comprise a reduction operation” “wherein the reduction operation is a dot-product calculation” (column 1 lines 42 - 52 computing a dot product vector).
	Also, Demjanenko discloses “the performing and the outputting comprise a reduction operation” “wherein the reduction operation is a dot-product calculation” ([0144] result conversion VRC rounds or saturates, converts formats, and reduces or increases precision and [0314] dot-product).
	As per claim 19, Demjanenko discloses “at least one of the first group and the second group represent image data” ([0112] image and other applications, [0469]).

	As per claim 25, neither Cheung nor Demjanenko appears to explicitly disclose “the first left group, the first right group, the second left group, and the second right group each include more than eight bytes of data.”
	However, Demjanenko discloses support for 8, 16, or 32-bit data sizes but “optional 64 bit in future versions” ([0108]).
	Therefore, it would have been obvious to one of ordinary skill in the art to expand the system of Cheung/Demjanenko so that “the first left group, the first right group, the second left group, and the second right group each include more than eight bytes of data.”
As above, the motivation for doing so would have been to provide a means for different data sizes.  As explained by Demjanenko at [0108], data sizes commonly change over time with advancements in technology.
Therefore, it would have been obvious to combine Demjanenko with Cheung to obtain the invention as specified in the instant claim.

Referring to claim 26, claim 1 recites the corresponding limitations as that of claim 26.  Therefore, the rejection of claim 1 applies to claim 26. 
	Further, Cheung discloses “A computer program product embodied in a non-transitory computer readable medium for data manipulation, the computer program product comprising code which causes one or more processors to perform operations of” claim 1 (column 1 lines 53 – 56 a machine-readable data storage medium encoded with software for performing the method, are also provided).

Referring to claim 27, claim 1 recites the corresponding limitations as that of claim 27.  Therefore, the rejection of claim 1 applies to claim 27. 
	Further, Cheung discloses “A computer system for data manipulation comprising:
a memory which stores instructions” (column 1 lines 53 – 56 a machine-readable data storage medium encoded with software for performing the method, are also provided); “one or more processors attached to the memory wherein the one or more processors, when executing the instructions which are stored, are configured to” carry out the steps of claim 1 (Fig. 7 processor 901, memory 902.  column 1 lines 53 – 56 a machine-readable data storage medium encoded with software for performing the method, are also provided).

Claim 3 is rejected under 35 U.S.C. 103 as being unpatentable over Cheung in view of Demjanenko, as applied to claims above, further in view of Tanaka et al., U.S. Patent 6,148,101 (hereinafter referred to as Tanaka).

As per claim 3, Cheung discloses “the first, second, third, and fourth results” (Fig. 5).
Neither Cheung nor Demjanenko appears to explicitly disclose “summing the first, second, third, and fourth results into four accumulation registers, wherein the first result is added to the first accumulation register, the second result is added to the second accumulation register, the third result is added to the third accumulation register, and the fourth result is added to the fourth accumulation register.”
However, Tanaka discloses multiple accumulation registers for individually storing multiple results (Fig. 4 has 2 acc registers, Fig. 10 has 3 acc registers).
While Tanaka only shows embodiments with 2 and 3 acc registers, it would have been obvious to one of ordinary skill in the art to combine the multiply individual acc register approach of Tanaka with the system of Cheung/Demjanenko so that all four results are stored in a separate accumulation register.
Cheung, Demjanenko, and Tanaka are analogous art because they are from the same field of endeavor, which is processor-based multiply and accumulate methods.
Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art, having the teachings of Cheung, Demjanenko, and Tanaka before him or her, to modify the teachings of Cheung and Demjanenko to include the teachings of Tanaka so that the first result is added to the first accumulation register, the second result is added to the second accumulation register, the third result is added to the third accumulation register, and the fourth result is added to the fourth accumulation register.
The motivation for doing so would have been to accumulate the results of certain data with easy access, not requiring offsets of a storage area (each register separate and distinct).  Also, as seen in Tanaka, the acc registers are extremely local to the multiplier and adder circuitry for fast access.
Therefore, it would have been obvious to combine Tanaka with Cheung and Demjanenko to obtain the invention as specified in the instant claim.

Claims 4 – 6 are rejected under 35 U.S.C. 103 as being unpatentable over Cheung in view of Demjanenko, further in view of Tanaka, and further in view of Covey, U.S. Patent 5,218,564 (hereinafter referred to as Covey).

As per claims 4 – 6, as above, it would have been obvious to combine Tanaka with Cheung/Demjanenko so that the system includes “four accumulation registers.”
Neither Cheung nor Demjanenko nor Tanaka appears to explicitly disclose “the four accumulation registers are initialized to zero,” “at least one of the four accumulation registers contains a result from a previous sum of products operation,” and “iterating the obtaining, the performing, and the outputting to complete a tensor operation on greater than 32 bytes of input data.”
However, Covey another multiply/accumulate execution system (column 5 lines 62 – 66) wherein “the” “accumulation registers are initialized to zero,” (column 5 lines 62 – 66 multiply/accumulate operation and storing result in a result register. Column 7 lines 10 – 11 the accumulating register having been cleared initially) “at least one of the” “accumulation registers contains a result from a previous sum of products operation,” (column 7 lines 39 – 47 rather than clearing out the result in the multiplier’s result register of one multiplication prior to the start of another, the result is kept in the result register and all of the subsequent partial products generated during the course of the next multiplication are added to it) and “iterating the obtaining, the performing, and the outputting to complete a tensor operation on greater than 32 bytes of input data” (column 2 lines 11 – 19 a microinstruction sequence for performing a series of repetitive math operations to sample or condition data, iterative multiply and accumulate steps).
Cheung, Demjanenko, Tanaka, and Covey are analogous art because they are from the same field of endeavor, which is processor-based multiply and accumulate methods.
Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art, having the teachings of Cheung, Demjanenko, Tanaka, and Covey before him or her, to modify the teachings of Cheung, Demjanenko, and Tanaka to include the teachings of Covey so that “the four accumulation registers are initialized to zero,” “at least one of the four accumulation registers contains a result from a previous sum of products operation,” and “iterating the obtaining, the performing, and the outputting to complete a tensor operation on greater than 32 bytes of input data.”
The motivation for doing so would have been to handle a first multiply/accumulate instruction as well as subsequent multiply/accumulate instructions (as stated by Covey in column 7 lines 40 – 47) as well as providing an iterative multiply and accumulate in order to filter data (as stated by Covey in column 2 lines 11 - 19).
Therefore, it would have been obvious to combine Covey with Cheung, Demjanenko, and Tanaka to obtain the invention as specified in the instant claim.

Claims 7 – 11 are rejected under 35 U.S.C. 103 as being unpatentable over Cheung in view of Demjanenko, further in view of Mansell et al., WIPO Publication WO 2019/002811 (hereinafter referred to as Mansell).

As per claims 7 – 11, neither Cheung nor Demjanenko appears to explicitly disclose “the first result, the second result, the third result, and the fourth result comprise a 2x2 word matrix,” “first result is contained in element (0, 0) of the word matrix,” “second result is contained in element (0, 1) of the word matrix,” “third result is contained in element (1 , 0) of the word matrix,” and “fourth result is contained in element (1, 1) of the word matrix.”
However, Mansell discloses another multiplication method wherein “the first result, the second result, the third result, and the fourth result comprise a 2x2 word matrix,” “first result is contained in element (0, 0) of the word matrix,” “second result is contained in element (0, 1) of the word matrix,” “third result is contained in element (1 , 0) of the word matrix,” and “fourth result is contained in element (1, 1) of the word matrix” (Fig. 5).
Cheung, Demjanenko, and Mansell are analogous art because they are from the same field of endeavor, which is processor-based multiply and accumulate methods.
Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art, having the teachings of Cheung, Demjanenko, and Mansell before him or her, to modify the teachings of Cheung and Demjanenko to include the teachings of Mansell so that “the first result, the second result, the third result, and the fourth result comprise a 2x2 word matrix,” “first result is contained in element (0, 0) of the word matrix,” “second result is contained in element (0, 1) of the word matrix,” “third result is contained in element (1 , 0) of the word matrix,” and “fourth result is contained in element (1, 1) of the word matrix.”
The motivation for doing so would have been to provide a means for matrix multiplication, since source data may be viewed as a 2x8 matrix and an 8x2 matrix (as seen in Fig. 5 of Mansell).  This provides for the efficient handling of larger amounts of data.
Therefore, it would have been obvious to combine Mansell with Cheung and Demjanenko to obtain the invention as specified in the instant claim.

As per claim 12, neither Cheung nor Demjanenko appear to explicitly disclose “using the word matrix as input for a subsequent sum of products operation.”
However, Mansell discloses “using the word matrix as input for a subsequent sum of products operation” (page 34 line 8 – page 35 line 10 iterative process).
Cheung, Demjanenko, and Mansell are analogous art because they are from the same field of endeavor, which is processor-based multiply and accumulate methods.
Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art, having the teachings of Cheung, Demjanenko, and Mansell before him or her, to modify the teachings of Cheung and Demjanenko to include the teachings of Mansell so that the word matrix is used as input for a subsequent sum of products operation.
The motivation for doing so would have been to provide a means for reusing the same execution logic, thus saving cost.
Therefore, it would have been obvious to combine Mansell with Cheung and Demjanenko to obtain the invention as specified in the instant claim.

As per claim 13, Demjanenko discloses “the input for a subsequent sum of products operation is converted to another format before the using” ([0144] within the result write path is a Vector Result Conversion (VRC) which rounds or saturates, convert formats).

	As per claim 22, neither Cheung nor Demjanenko appears to explicitly disclose “the obtaining a first group, the obtaining a second group, the performing, and the outputting are used to perform a 2x8*8x2 matrix multiplication.”
	However, Mansell discloses “the obtaining a first group, the obtaining a second group, the performing, and the outputting are used to perform a 2x8*8x2 matrix multiplication” (page 34 line 8 – page 35 line 10 iterative process).
Cheung, Demjanenko, and Mansell are analogous art because they are from the same field of endeavor, which is processor-based multiply and accumulate methods.
Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art, having the teachings of Cheung, Demjanenko, and Mansell before him or her, to modify the teachings of Cheung and Demjanenko to include the teachings of Mansell so that the obtaining a first group, the obtaining a second group, the performing, and the outputting are used to perform a 2x8*8x2 matrix multiplication.
The motivation for doing so would have been to allows for a trade off between the number of data elements represented by a given register content and the corresponding size of each data element represents, which the programmer using the instruction of the present techniques can balance depending on the computational context in which the instructions are being used (as stated by Mansell in the paragraph beginning on line 10 of page 24).
Therefore, it would have been obvious to combine Mansell with Cheung and Demjanenko to obtain the invention as specified in the instant claim.

Allowable Subject Matter
Claim 17 is objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Machine Translation of Japanese Patent Application JP 2021535393 A teaches matrix multiplication with 2x2 result matrix.
U.S. Patent Application 20020169813 and Patent 7072929 teach multiplying different groups of data.
U.S. Patent Application 20040117422 and Patent 7395298 teach multiply-add operations on packed data.
U.S. Patent Application 20210349690 and Patent 11455143 teach a dot product engine.
U.S. Patent Application 20180322382 teaches neural networks and multiply-add. 
U.S. Patent Application 20190332355 teaches rounding in a multiplier accumulator.
U.S. Patent 11256504 teaches multiplying different groups of data.


Contact Information
Any inquiry concerning this communication or earlier communications from the examiner should be directed to STEVEN G SNYDER whose telephone number is (571)270-1971.  The examiner can normally be reached on M-F 8:00am-4:30pm (flexible).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.  
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Henry Tsai can be reached on 571-272-4176.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/STEVEN G SNYDER/Primary Examiner, Art Unit 2184