DETAILED ACTION
This is in response to the application filed on September 9, 2021 in which claims 1 – 23 are presented for examination.
Status of Claims
Claims 1 – 23 are pending, of which claims 1, 10, and 18 are in independent form.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statement (IDS) submitted on September 19, 2022 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.
Claims 1 – 23 are provisionally rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1, 2, 5, 6, 8 – 10, 13, 14, 17 – 22, 25, 27, 28, 30, 31, 34, and 35 of copending Application No. 17/834427 (reference application). Although the claims at issue are not identical, they are not patentably distinct from each other because the claims of 17/834427 are broader.
This is a provisional nonstatutory double patenting rejection because the patentably indistinct claims have not in fact been patented.

17/834427
17/471126
1. A machine-readable medium having stored thereon an application
programming interface (API), which if performed by one or more processors, causes the one or more processors to at least:
select a general matrix-to-matrix multiply (GEMM) implementation to be
performed from among a plurality of GEMM implementations 













based, at least in part, on a parameter received by the API, wherein the API uses additional parameters to indicate:


a number of rows of a matrix;
a number of columns of the matrix;
a leading dimension of the matrix;
a transform operation to be performed; and
an output location.

2. The machine-readable medium of claim 1, wherein the output location
identifies a buffer in which to store a result of the GEMM implementation to be performed.
1. A multi-threaded processor comprising:
a plurality of cores comprising matrix multiplication logic to perform operations to accelerate a general matrix-to-matrix multiply (GEMM) implementation selected
from a plurality of GEMM implementations in a library accessible through a GEMM application programming interface (API) call, wherein each of the plurality of
cores comprises:

a front end to fetch instructions to be executed to perform the GEMM implementation;
an instruction cache to cache the instructions;
a decoder to decode the instructions;
an Ll cache to store data;
an L2 cache to store data;
memory to store a result of the GEMM implementation;

and wherein the GEMM implementation is to transform an input matrix using one or more parameters indicated to the GEMM implementation, where the one or
more parameters indicate:
a number of rows of an input matrix;
a number of columns of the input matrix;
one or more scaling factors;
a leading dimension of the input matrix;
a transform operation to be performed; and
a result buffer.

5. The machine-readable medium of claim 1, 



wherein the API, if performed
by the one or more processors, causes the one or more processors to perform the GEMM implementation by performing a set of instructions from a software library that comprises the API.












6. The machine-readable medium of claim 1, wherein the transform
operation is to be performed on the matrix and the matrix comprises input data to the GEMM implementation.

1. A multi-threaded processor comprising:
a plurality of cores comprising matrix multiplication logic to perform operations to accelerate a general matrix-to-matrix multiply (GEMM) implementation selected
from a plurality of GEMM implementations in a library accessible through a GEMM application programming interface (API) call, wherein each of the plurality of
cores comprises:

a front end to fetch instructions to be executed to perform the GEMM implementation;
an instruction cache to cache the instructions;
a decoder to decode the instructions;
an Ll cache to store data;
an L2 cache to store data;
memory to store a result of the GEMM implementation;

and wherein the GEMM implementation is to transform an input matrix using one or more parameters indicated to the GEMM implementation, where the one or
more parameters indicate:
a number of rows of an input matrix;
a number of columns of the input matrix;
one or more scaling factors;
a leading dimension of the input matrix;
a transform operation to be performed; and
a result buffer.

8. The machine-readable medium of claim 1, wherein the matrix comprises
floating-point data.

8. The multi-threaded processor of claim 1, wherein the input matrix comprises full-precision 32-bit floating point data.
9. The multi-threaded processor of claim 1, wherein the input matrix comprises half-precision 16-bit floating point data.
9. The machine-readable medium of claim 1, further comprising instructions
that, if performed by the one or more processors, cause the one or more processors to perform the GEMM implementation in response to the API.

1. A multi-threaded processor comprising:
a plurality of cores comprising matrix multiplication logic to perform operations to accelerate a general matrix-to-matrix multiply (GEMM) implementation selected
from a plurality of GEMM implementations in a library accessible through a GEMM application programming interface (API) call, wherein each of the plurality of
cores comprises:

10. A method comprising:

selecting, in response to an application programming interface (API), a general
matrix-to-matrix multiply (GEMM) implementation to be performed from among a plurality of GEMM implementations 

based, at least in part, on a parameter received by the API, wherein the API uses additional parameters to indicate:
a number of rows of a matrix;
a number of columns of the matrix;
a leading dimension of the matrix;
a transform operation to be performed; and
an output location.
18. A computer-implemented method comprising:
selecting a general matrix-to-matrix multiply (GEMM) implementation from a plurality of GEMM implementations in a library accessible through a GEMM application programming interface (API) call; and
transforming, by the GEMM implementation, an input matrix using one or more parameters indicated to the GEMM implementation, where the one or more parameters indicate:
a number of rows of an input matrix;
a number of columns of the input matrix;
one or more scaling factors;
a leading dimension of the input matrix;
a transform operation to be performed; and
a result buffer.
13. The method of claim 10, further comprising 







performing the transform operation in response to the API, where the transform operation is to transform data of the matrix and the parameter is to indicate the matrix.

18. A computer-implemented method comprising:
selecting a general matrix-to-matrix multiply (GEMM) implementation from a plurality of GEMM implementations in a library accessible through a GEMM application programming interface (API) call; and

transforming, by the GEMM implementation, an input matrix using one or more parameters indicated to the GEMM implementation, where the one or more parameters indicate:
a number of rows of an input matrix;
a number of columns of the input matrix;
one or more scaling factors;
a leading dimension of the input matrix;
a transform operation to be performed; and
a result buffer.
14. The method of claim 10, further comprising performing the GEMM implementation on the matrix, 


where the matrix comprises floating-point data.
18. 
transforming, by the GEMM implementation, an input matrix

8. The multi-threaded processor of claim 1, wherein the input matrix comprises full-precision 32-bit floating point data.
9. The multi-threaded processor of claim 1, wherein the input matrix comprises half-precision 16-bit floating point data.



18. The method of claim 10, further comprising performing the GEMM
implementation in response to the API.














17. The method of claim 10, wherein the output location identifies a buffer in
which to store a result of the GEMM implementation to be performed.
18. A computer-implemented method comprising:
selecting a general matrix-to-matrix multiply (GEMM) implementation from a plurality of GEMM implementations in a library accessible through a GEMM application programming interface (API) call; and

transforming, by the GEMM implementation, an input matrix using one or more parameters indicated to the GEMM implementation, where the one or more parameters indicate:
a number of rows of an input matrix;
a number of columns of the input matrix;
one or more scaling factors;
a leading dimension of the input matrix;
a transform operation to be performed; and
a result buffer.
19. A system comprising:
memory;
one or more processors; and
wherein the memory comprises an application programming interface (API) to select a general matrix-to-matrix multiply (GEMM) implementation to be performed by the one or more processors from among a plurality of GEMM implementations based, at least in part, on a parameter received by the API, wherein the API uses additional parameters to indicate:
a number of rows of a matrix;
a number of columns of the matrix;
a leading dimension of the matrix;
a transform operation to be performed; and
an output location.

22. The system of claim 19, wherein the output location is to indicate one or more buffers to store a result of performing the GEMM implementation.

10. A multi-threaded processor comprising:
a plurality of cores comprising matrix multiplication logic to perform operations to accelerate a general matrix-to-matrix multiply (GEMM) implementation selected
from a plurality of GEMM implementations in a library accessible through a GEMM application programming interface (API) call, wherein the GEMM implementation
is to transform an input matrix using one or more parameters indicated to the GEMM implementation and the one or more parameters indicate:
a number of rows of an input matrix;
a number of columns of the input matrix;
one or more scaling factors;
a leading dimension of the input matrix;
a transform operation to be perfom1ed; and
a result buffer.

20. The system of claim 19, wherein the GEMM implementation is to be performed based, at least in part, on the matrix and the matrix comprises floating-point data.

21. The system of claim 19, wherein the parameter is to indicate the matrix and the matrix comprises one or more sets of floating-point data.

8. The multi-threaded processor of claim 1, wherein the input matrix comprises full-precision 32-bit floating point data.
9. The multi-threaded processor of claim 1, wherein the input matrix comprises half-precision 16-bit floating point data.

10. the one or more parameters indicate:
a number of rows of an input matrix;
a number of columns of the input matrix;
one or more scaling factors;
a leading dimension of the input matrix;
a transform operation to be perfom1ed; and
a result buffer.
25. The system of claim 19, wherein




the memory further comprises a library and the library comprises instructions to be performed by the one or more processors in response to the API.
10. A multi-threaded processor comprising:
a plurality of cores comprising matrix multiplication logic to perform operations to accelerate a general matrix-to-matrix multiply (GEMM) implementation selected
from a plurality of GEMM implementations in a library accessible through a GEMM application programming interface (API) call, wherein the GEMM implementation
is to transform an input matrix using one or more parameters indicated to the GEMM implementation and the one or more parameters indicate:
a number of rows of an input matrix;
a number of columns of the input matrix;
one or more scaling factors;
a leading dimension of the input matrix;
a transform operation to be perfom1ed; and
a result buffer.
27. The system of claim 19, wherein 





the one or more processors are to perform the GEMM implementation in response to the API.

10. A multi-threaded processor comprising:
a plurality of cores comprising matrix multiplication logic to perform operations to accelerate a general matrix-to-matrix multiply (GEMM) implementation selected
from a plurality of GEMM implementations in a library accessible through a GEMM application programming interface (API) call, wherein the GEMM implementation
is to transform an input matrix using one or more parameters indicated to the GEMM implementation and the one or more parameters indicate:
a number of rows of an input matrix;
a number of columns of the input matrix;
one or more scaling factors;
a leading dimension of the input matrix;
a transform operation to be perfom1ed; and
a result buffer.
28. A processor comprising:

one or more circuits to perform an application programming interface (API) to select a general matrix-to-matrix multiply (GEMM) implementation to be performed from among a plurality of GEMM implementations based, at least in part, on a parameter received by the API, 




wherein the API uses additional parameters to indicate:
a number of rows of a matrix;
a number of columns of the matrix;
a leading dimension of the matrix;
a transform operation to be performed; and
an output location.

31. The processor of claim 28, wherein the output location comprises data to indicate one or more buffers.
10. A multi-threaded processor comprising:
a plurality of cores comprising matrix multiplication logic to perform operations to accelerate a general matrix-to-matrix multiply (GEMM) implementation selected
from a plurality of GEMM implementations in a library accessible through a GEMM application programming interface (API) call, wherein the GEMM implementation
is to transform an input matrix using one or more parameters indicated to the GEMM implementation and the one or more parameters indicate:
a number of rows of an input matrix;
a number of columns of the input matrix;
one or more scaling factors;
a leading dimension of the input matrix;
a transform operation to be perfom1ed; and
a result buffer.
30. The processor of claim 28, wherein 










the parameter is to indicate the matrix to the API by the parameter and the matrix comprises floating-point data.



31. The processor of claim 28, wherein the output location comprises data to indicate one or more buffers.
10. A multi-threaded processor comprising:
a plurality of cores comprising matrix multiplication logic to perform operations to accelerate a general matrix-to-matrix multiply (GEMM) implementation selected
from a plurality of GEMM implementations in a library accessible through a GEMM application programming interface (API) call, wherein the GEMM implementation
is to transform an input matrix using one or more parameters indicated to the GEMM implementation and the one or more parameters indicate:
a number of rows of an input matrix;
a number of columns of the input matrix;
one or more scaling factors;
a leading dimension of the input matrix;
a transform operation to be perfom1ed; and
a result buffer.
34. The processor of claim 28, wherein 

the one or more circuits 
are to perform the GEMM implementation 




in response to the API.

10. A multi-threaded processor comprising:
a plurality of cores comprising matrix multiplication logic to perform operations to accelerate a general matrix-to-matrix multiply (GEMM) implementation selected
from a plurality of GEMM implementations in a library accessible through a GEMM application programming interface (API) call, wherein the GEMM implementation is to transform an input matrix using one or more parameters indicated to the GEMM implementation and the one or more parameters indicate:
a number of rows of an input matrix;
a number of columns of the input matrix;
one or more scaling factors;
a leading dimension of the input matrix;
a transform operation to be perfom1ed; and
a result buffer.
35. The processor of claim 28, wherein 

the one or more circuits are to perform
one or more instructions of a library in response to the API, where the one or more instructions are to cause the one or more circuits to select the GEMM implementation.

10. A multi-threaded processor comprising:
a plurality of cores comprising matrix multiplication logic to perform operations to accelerate a general matrix-to-matrix multiply (GEMM) implementation selected
from a plurality of GEMM implementations in a library accessible through a GEMM application programming interface (API) call, wherein the GEMM implementation is to transform an input matrix using one or more parameters indicated to the GEMM implementation and the one or more parameters indicate:
a number of rows of an input matrix;
a number of columns of the input matrix;
one or more scaling factors;
a leading dimension of the input matrix;
a transform operation to be perfom1ed; and
a result buffer.


In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1 – 9, 11 – 17, 20, and 21 are rejected under 35 U.S.C. 103 as being unpatentable over Ekanadham et al., U.S. Patent Application 2016/0188385 (hereinafter referred to as Ekanadham), in view of Sankaran et al., WIPO Publication WO 2018/125250 (hereinafter referred to as Sankaran), further in view of ‘Guide and Reference’ from cenapad.unicamp.br., archived on October 27, 2007 (hereinafter referred to as ‘Guide and Reference II’).

Referring to claim 1, Ekanadham discloses “A multi-threaded processor comprising: a plurality of cores comprising matrix multiplication logic to perform operations to accelerate a general matrix-to-matrix multiply (GEMM) implementation selected from a plurality of GEMM implementations in a library accessible through a GEMM application programming interface (API) call” ([0039] SMT multi-threading and levels of cores, [0055] multiplying matrices, [0059] several implementations of matrix multiply.  [0064] In order to evaluate the applicability of an optimized implementation of a GPI graph analytics operator to a GPI graph analytics operator call, as well as its performance, each implementation of a GPI function also has metadata attached to it. [0047] 3. The Library API 603, for obtaining: [0048] a. Available optimized implementations of Graph API functions for the available functional capabilities, resource levels, and representation/attributes of the arguments: [0049] b. Type-casting functions to change the representation of the arguments of Graph API functions to match with optimized implementations: [0050] c. Analyze functions to determine the attribute values of the arguments of Graph API, when needed and not available from the metadata associated with the arguments), “instructions to be executed to perform the GEMM implementation,” “the instructions; an Ll cache to store data; an L2 cache to store data; memory” (Fig. 14 program 1440 stored in memory 1428, [0039] cache hierarchies, [0093] caches, memories, flash storage); “and wherein the GEMM implementation is to transform an input matrix using one or more parameters indicated to the GEMM implementation” ([0055] multiplying matrices and [0059] several implementations of matrix multiply.  [0064] each implementation of a GPI function also has metadata attached to it. [0047] 3. The Library API 603, for obtaining: [0048] a. Available optimized implementations of Graph API functions for the available functional capabilities, resource levels, and representation/attributes of the arguments).
Ekanadham does not appear to explicitly disclose “each of the plurality of cores comprises: a front end to fetch instructions to be executed to perform the GEMM implementation; an instruction cache to cache the instructions; a decoder to decode the instructions” and “memory to store a result of the GEMM implementation.”
However, processors commonly include front ends for fetching and decoding.  Processors also commonly store results.  For example, Sankaran discloses “each of the plurality of cores comprises: a front end to fetch instructions;” “an instruction cache to cache the instructions; a decoder to decode the instructions” and “memory to store a result” (Fig. 126B core 12690 with front end unit 12630 including instruction cache 12634, instruction fetch 12638, decode unit 12640, and physical register file 12658.  [1047] the physical register file unit 12658 perform the write back/memory write stage).
Ekanadham and Sankaran are analogous art because they are from the same field of endeavor, which is processors executing instructions.
Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art, having the teachings of Ekanadham and Sankaran before him or her, to modify the teachings of Ekanadham to include the teachings of Sankaran so that the cores include a front end to fetch instructions, an instruction cache, an instruction decoder, and a memory to store results.
The motivation for doing so would have been to provide for cores with pipelined execution in order to reduce cycles per instruction.
	Neither Ekanadham nor Sankaran appears to explicitly disclose “where the one or more parameters indicate: a number of rows of an input matrix; a number of columns of the input matrix; one or more scaling factors; a leading dimension of the input matrix; a transform operation to be performed; and a result buffer.”
However, Guide and Reference II discloses “where the one or more parameters indicate: a number of rows of an input matrix; a number of columns of the input matrix; one or more scaling factors; a leading dimension of the input matrix; a transform operation to be performed; and a result buffer” (‘Matrix-Vector Subprograms’ beginning on page 95, pages 96 – 98 ssgemv | dgemv | cgemv | zgemv (transa, m, n, alpha, a, lda, x, incx, beta, y, incy). These GEM implementations use parameters transa (the transform), m (the number of rows), n (the number of columns), alpha (scaling constant), lda (leading dimension), beta (scaling constant), y (the result of the computation).
Ekanadham, Sankaran, and Guide and Reference II are analogous art because they are from the same field of endeavor, which is processor instruction execution (Ekanadham and Guide and Reference II both detail matrix multiplication).
Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art, having the teachings of Ekanadham, Sankaran, and Guide and Reference II before him or her, to modify the teachings of Ekanadham and Sankaran to include the teachings of Guide and Reference II so that the one or more parameters indicate: a number of rows of an input matrix; a number of columns of the input matrix; one or more scaling factors; a leading dimension of the input matrix; a transform operation to be performed; and a result buffer.
The motivation for doing so would have been to provide a means for relaying the necessary information to the execution units.  This information allows for different possible computations (as stated by Guide and Reference II on page 99 under ‘Function’).
Therefore, it would have been obvious to combine Guide and Reference II with Ekanadham and Sankaran to obtain the invention as specified in the instant claim.

	
	As per claim 2, Sankaran discloses “the L1 cache comprises at least 24 kilobytes (KB) of storage” ([0904] - [0907] need 32-64 KB cache).

	As per claim 3, Sankaran discloses “an interface bus to connect the plurality of cores” ([0124] interconnects 11915 couple to cores to each other).

	As per claim 4, Sankaran discloses “the front end dispatches the MMA instruction to one or more cores of the plurality of cores” (Fig. 126B scheduler and execution units).

	As per claim 5, Sankaran discloses “the decoder decodes the instructions into one or more microoperations” (Fig. 126B decode unit 12640 + [1044]).

	As per claims 6 – 9, Sankaran discloses “the input matrix comprises 32-bit integer data,” “the input matrix comprises 16-bit integer data,” “the input matrix comprises full-precision 32-bit floating point data,” and “the input matrix comprises half-precision 16-bit floating point data” ([0898] 32-bit data types for matrix computations, [1054] vector unit is a 16-wide VPU which executes one or more of integer, single precision float, and double-precision float instructions. [0939] 32 or 64-bit values for vectors).

As per claim 11, neither Ekanadham nor Guide and Reference II appears to explicitly disclose “a front end to fetch instructions to be executed to perform the GEMM implementation.”
However, processors commonly include front ends for fetching and decoding.  Processors also commonly store results.  For example, Sankaran discloses “a front end to fetch instructions to be executed” (Fig. 126B core 12690 with front end unit 12630 including instruction cache 12634, instruction fetch 12638, decode unit 12640, and physical register file 12658.  [1047] the physical register file unit 12658 perform the write back/memory write stage).
Ekanadham, Guide and Reference II, and Sankaran are analogous art because they are from the same field of endeavor, which is processors executing instructions.
Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art, having the teachings of Ekanadham, Guide and Reference II, and Sankaran before him or her, to modify the teachings of Ekanadham and Guide and Reference II to include the teachings of Sankaran so that the cores include a front end to fetch instructions, an instruction cache, an instruction decoder, and a memory to store results.
The motivation for doing so would have been to provide for cores with pipelined execution in order to reduce cycles per instruction.
Therefore, it would have been obvious to combine Sankaran with Ekanadham and Guide and Reference II to obtain the invention as specified in the instant claim.

As per claims 12 – 15, neither Ekanadham nor Guide and Reference II appears to explicitly disclose “the input matrix comprises 32-bit integer data,” “the input matrix comprises 16-bit integer data,” “the input matrix comprises full-precision 32-bit floating point data,” and “the input matrix comprises half-precision 16-bit floating point data.”
However, Sankaran discloses “the input matrix comprises 32-bit integer data,” “the input matrix comprises 16-bit integer data,” “the input matrix comprises full-precision 32-bit floating point data,” and “the input matrix comprises half-precision 16-bit floating point data” ([0898] 32-bit data types for matrix computations, [1054] vector unit is a 16-wide VPU which executes one or more of integer, single precision float, and double-precision float instructions. [0939] 32 or 64-bit values for vectors).
Ekanadham, Guide and Reference II, and Sankaran are analogous art because they are from the same field of endeavor, which is processors executing instructions.
Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art, having the teachings of Ekanadham, Guide and Reference II, and Sankaran before him or her, to modify the teachings of Ekanadham and Guide and Reference II to include the teachings of Sankaran so that “the input matrix comprises 32-bit integer data,” “the input matrix comprises 16-bit integer data,” “the input matrix comprises full-precision 32-bit floating point data,” and “the input matrix comprises half-precision 16-bit floating point data.”
The motivation for doing so would have been to provide for the capability to handle different types and lengths of data.
Therefore, it would have been obvious to combine Sankaran with Ekanadham and Guide and Reference II to obtain the invention as specified in the instant claim.

As per claim 16, Ekanadham discloses “the plurality of cores” “accelerate the GEMM implementation” ([0039] SMT multi-threading and levels of cores, [0055] multiplying matrices, [0059] several implementations of matrix multiply.  [0064] In order to evaluate the applicability of an optimized implementation of a GPI graph analytics operator to a GPI graph analytics operator call, as well as its performance, each implementation of a GPI function also has metadata attached to it).
Neither Ekanadham nor Guide and Reference II appears to explicitly disclose “the plurality of cores further comprise a floating point unit (FPU) to accelerate the GEMM implementation.”
However, floating point units are commonly known in the art at the time of Applicant’s filing.  For example, Sankaran discloses a multi-core system (Fig. 69) comprising “a floating point unit (FPU)” ([0288] FPU).
Ekanadham, Guide and Reference II, and Sankaran are analogous art because they are from the same field of endeavor, which is processors executing instructions.
Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art, having the teachings of Ekanadham, Guide and Reference II, and Sankaran before him or her, to modify the teachings of Ekanadham and Guide and Reference II to include the teachings of Sankaran so that a floating point unit is included.
The motivation for doing so would have been to provide for the capability to handle different types of data efficiently.
Therefore, it would have been obvious to combine Sankaran with Ekanadham and Guide and Reference II to obtain the invention as specified in the instant claim.

As per claim 17, neither Ekanadham nor Guide and Reference II appears to explicitly disclose “the system further comprises: a system bus to connect the multi-threaded processor to one or more peripheral devices; and one or more dynamic random access memory (DRAM) devices.”
However, Sankaran discloses “the system further comprises: a system bus to connect the multi-threaded processor to one or more peripheral devices; and one or more dynamic random access memory (DRAM) devices” ([0220] - [0221] interfacing with peripherals. Fig. 30 host memory bus 3011, [0408] host memory 3007 may comprise a DRAM).	Ekanadham, Guide and Reference II, and Sankaran are analogous art because they are from the same field of endeavor, which is processors executing instructions.
Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art, having the teachings of Ekanadham, Guide and Reference II, and Sankaran before him or her, to modify the teachings of Ekanadham and Guide and Reference II to include the teachings of Sankaran so that the processor is connected to one or more peripheral devices and one or more dynamic random access memory (DRAM) devices.
The motivation for doing so would have been to provide for the capability to store data in an external memory, as well as to provide for the capability for external I/O for allowing user interaction.
Therefore, it would have been obvious to combine Sankaran with Ekanadham and Guide and Reference II to obtain the invention as specified in the instant claim.

Note, claim 20 recites the corresponding limitations of claim 14.  Therefore, the rejection of claim 14 applies to claim 20.

Note, claim 21 recites the corresponding limitations of claim 15.  Therefore, the rejection of claim 15 applies to claim 21.

Claims 10, 18, 19, 22, and 23 are rejected under 35 U.S.C. 103 as being unpatentable over Ekanadham in view of Guide and Reference II.

Referring to claim 10, Ekanadham discloses “A multi-threaded processor comprising: a plurality of cores” ([0039] SMT multi-threading and levels of cores) “comprising matrix multiplication logic to perform operations to accelerate a general matrix-to-matrix multiply (GEMM) implementation” ([0055] multiplying matrices, [0059] several implementations of matrix multiply) “selected from a plurality of GEMM implementations in a library accessible through a GEMM application programming interface (API) call, wherein the GEMM implementation is to transform an input matrix using one or more parameters indicated to the GEMM implementation” ([0064] In order to evaluate the applicability of an optimized implementation of a GPI graph analytics operator to a GPI graph analytics operator call, as well as its performance, each implementation of a GPI function also has metadata attached to it. [0047] 3. The Library API 603, for obtaining: [0048] a. Available optimized implementations of Graph API functions for the available functional capabilities, resource levels, and representation/attributes of the arguments: [0049] b. Type-casting functions to change the representation of the arguments of Graph API functions to match with optimized implementations: [0050] c. Analyze functions to determine the attribute values of the arguments of Graph API, when needed and not available from the metadata associated with the arguments).
	Ekanadham does not appear to explicitly disclose “the one or more parameters indicate: a number of rows of an input matrix; a number of columns of the input matrix; one or more scaling factors; a leading dimension of the input matrix; a transform operation to be performed; and a result buffer.”
However, Guide and Reference II discloses “the one or more parameters indicate: a number of rows of an input matrix; a number of columns of the input matrix; one or more scaling factors; a leading dimension of the input matrix; a transform operation to be performed; and a result buffer” (‘Matrix-Vector Subprograms’ beginning on page 95, pages 96 – 98 ssgemv | dgemv | cgemv | zgemv (transa, m, n, alpha, a, lda, x, incx, beta, y, incy). These GEM implementations use parameters transa (the transform), m (the number of rows), n (the number of columns), alpha (scaling constant), lda (leading dimension), beta (scaling constant), y (the result of the computation).
Ekanadham and Guide and Reference II are analogous art because they are from the same field of endeavor, which is matrix multiplication.
Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art, having the teachings of Ekanadham and Guide and Reference II before him or her, to modify the teachings of Ekanadham to include the teachings of Guide and Reference II so that the one or more parameters indicate: a number of rows of an input matrix; a number of columns of the input matrix; one or more scaling factors; a leading dimension of the input matrix; a transform operation to be performed; and a result buffer.
The motivation for doing so would have been to provide a means for relaying the necessary information to the execution units.  This information allows for different possible computations (as stated by Guide and Reference II on page 99 under ‘Function’).
Therefore, it would have been obvious to combine Guide and Reference II with Ekanadham to obtain the invention as specified in the instant claim.

Referring to claim 18, claim 10 recites the corresponding limitations as that of claim 18.  Therefore, the rejection of claim 10 applies to claim 18. 

Note, claim 19 recites the corresponding limitations of claim 10.  Therefore, the rejection of claim 10 applies to claim 19.

Note, claim 22 recites the corresponding limitations of claim 10.  Therefore, the rejection of claim 10 applies to claim 22.

Note, claim 23 recites the corresponding limitations of claim 11.  Therefore, the rejection of claim 11 applies to claim 23.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
WIPO Publication WO 2021076425 teaches user configurable settings for matrix multiplication, user using an API to specify configuration.
WIPO Publication WO 2019085655 teaches a matrix conversion method.
European Patent Applications EP 3343460 A1, EP 3343392 A1, EP 3343391 A1,
EP 3343390 A1, EP 3343383 Al teach a matrix multiplication method and APIs that read and update graph data.
Chinese Patent Application CN 113240570 A teaches using a matrix multiplication API and performing a GEMM operation.
U.S. Patent Application 20140289445 teaches an accelerator for GEMM.
U.S. Patent Application 20180189234, 20180189239, 20180189638,
20180189675 teach a matrix multiplication method and APIs that read and update graph data.
U.S. Patent Application 20210048991 teaches an API exposing matrix multiply
and accumulate to efficiently use tensor cores.
U.S. Patent Application 20190278593 teaches GEMM matrix multiplication, accepting an input program and configuration file via an API.
U.S. Patent Application 20210048991 teaches performing matrix multiplication on integer or floating point data.
U.S. Patents 9400700, 9772890, and 9778967 are granted patents to Ekanadham, with similar teachings.


Contact Information
Any inquiry concerning this communication or earlier communications from the examiner should be directed to STEVEN G SNYDER whose telephone number is (571)270-1971.  The examiner can normally be reached on M-F 8:00am-4:30pm (flexible).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.  
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Henry Tsai can be reached on 571-272-4176.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/STEVEN G SNYDER/Primary Examiner, Art Unit 2184