DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claims 1, 16, 21, and 26 have been amended.
Claims 12, 13, 18, and 25 have been cancelled.
Claims 1-11, 14-17, 19-24, and 26-28 have been examined.
The § 112 rejections in the previous Office Action have been addressed and are withdrawn.

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on July 30, 2021 has been entered.

Information Disclosure Statement
The Applicant's submission of the Information Disclosure Statements dated May 17, 2021, June 22, 2021, and July 30, 2021 is acknowledged by the Examiner and the cited references have been considered in the examination of the claims now pending. Copies of the PTOL-1449s initialed and dated by the Examiner are attached to the instant office action. 

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 1-11, 14-17, 19-24, and 26-28 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor, or for pre-AIA  the Applicant regards as the invention.
Claim 1 recites, at lines 25-26, “the processing resource.” There is insufficient antecedent basis for this limitation in the claim. For purposes of examination, this limitation is interpreted as, “the at least one processing resource.” Claims 16, 21, and 26 include similar language and are similarly rejected.
Claims 2-11, 14, 15, 17, 19, 20, 22-24, 27, 28 are rejected as depending from rejected base claims and failing to cure the indefiniteness of those base claims.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-11, 16, 17, 21-23, and 26 are rejected under 35 U.S.C. 103 as being unpatentable over US Publication No. 2019/0114534 by Teng et al. (hereinafter referred to as “Teng”) in view of US Publication No. 2019/0361954 by Page et al. (hereinafter referred to as “Page”) in view of US Publication No. 2021/0012197 by Simonyan et al. (previously cited by the Examiner in the action dated February 17, 2021 and hereinafter referred to as “Simonyan”). 
Regarding claim 1, Teng discloses:
a general purpose graphics processor comprising: a compute cluster including multiple processing resources coupled with a cache memory, at least one processing resource including a matrix accelerator, the matrix accelerator configured to perform a dot product operation on multiple elements of a…first matrix and a second matrix in response to a sparse dot product instruction…(Teng discloses, at ¶ [0068], a processing system that includes a cluster of processing circuitry including a GPU, which discloses that the system is a general purpose graphics processor, including neural network accelerators, which discloses matrix accelerators. Teng also discloses, at ¶ [0044], a host and accelerator coupled via a RAM, which is used as a cache memory. Teng discloses, at ¶ [0023], performing inner product operations, which are dot product operations and are understood to be performed in response to a dot product instruction and which involve multiple elements of a first and second matrix.); 
wherein the…[first matrix] is to be stored to the cache memory… (Teng discloses, at ¶ [0051], storing weight matrices to the RAM.); and 
wherein in response to the sparse dot product instruction, the at least one processing resource is configured to: load the…[first matrix] from the cache memory into a memory within the at least one processing resource (Teng discloses, at ¶ [0043], the accelerator loading the weights from the RAM to a cache, which is memory within the accelerator. As noted above, performing dot product operations, including loading input and storing output, are performed in response to dot product instructions.); 
load the second matrix from the cache memory into the memory within the at least one processing resource (Teng discloses, at ¶ [0052], reading input data (i.e., activation matrices, from the RAM, and, at ¶ [0043], storing them in FIFOs, which is memory within the accelerator.); 
perform the dot product operation on elements from the…[first matrix] and selected elements of the second matrix, wherein the selected elements of the second matrix correspond with …values of the…first matrix stored within the…[first matrix]…and are selected by the processing resource (Teng discloses, at ¶ [0043], performing matrix multiplication operations, which discloses selecting the elements and performing dot product operations.); and 
write output of the dot product operation to the memory within the at least one processing resource (Teng discloses, at ¶ [0042], an output of the operations is sent to FIFOs, which is memory within the accelerator.).
Teng does not explicitly disclose the aforementioned first matrix is sparse, the sparse first matrix has a structured sparsity in which elements of the sparse first matrix have been pruned to a predetermined pattern, elements of the sparse first matrix are compacted, based on the structured sparsity, into a compressed representation including a set of elements, the set of elements including at least one non- zero value element and an indication of the at least one non-zero value element, the 
However, in the same field of endeavor (e.g., matrix multiplication) Page discloses: 
the aforementioned first matrix is sparse and elements of the sparse first matrix are compacted into a compressed representation including a set of elements, the set of elements including at least one non- zero value element and an indication of the at least one non-zero value element, skipping computations associated with input including a zero value element, storing, loading, and operating on the non-zero elements of the compressed representation based on the indication of the at least one non-zero value (Page discloses, at ¶ [0030],storing non-zero elements and additional information to indicate their locations. Page also discloses, at ¶ [0037], loading and operating on the non-zero elements, which discloses skipping zero value elements.).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify Teng’s matrix operations to utilize a compressed format for sparse matrices, as disclosed by Page, in order to improve performance by eliminating unproductive operations, e.g., multiplying by zero. See Page, ¶ [0004].
Also in the same field of endeavor (e.g., matrix multiplication) Simonyan discloses: 
the sparse first matrix has a structured sparsity in which elements of the sparse first matrix have been pruned to a predetermined pattern (Simonyan discloses, at ¶ [0042], clearing values of blocks according to a predefined sparsity pattern, which discloses structured sparsity and pruning to a predetermined pattern.).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify Teng’s matrix operations to utilize a structured sparsity, as disclosed by Simonyan, in order to improve performance by reducing data storage requirements. See Simonyan, ¶ [0009].

Regarding claim 2, Teng, as modified, discloses the elements of claim 1, as discussed above. Teng also discloses:
(Teng discloses, at Figure 4, that the RAM is level 2 cache memory, as the RAM is farther from the processing elements than another memory.).

Regarding claim 3, Teng, as modified, discloses the elements of claim 1, as discussed above. Teng also discloses:
wherein the memory within the at least one processing resource includes a level one (L1) cache memory (Teng discloses, at Figure 3, that the memory within the accelerator is L1 cache, as the memory is closer to the processing elements than the RAM.).

Regarding claim 4, Teng, as modified, discloses the elements of claim 1, as discussed above. Teng also discloses:
wherein the memory…includes a shared memory (Teng discloses, at ¶ [0045], a shared memory.).
Teng does not explicitly disclose the aforementioned shared memory is within the at least one processing resource.
However, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify Teng such that the shared memory is within the accelerator. Doing so merely entails a combination of prior art elements (cited above) according to known methods to yield predictable results, which is an exemplary rationale to support a conclusion of obviousness, as per MPEP § 2143.

Regarding claim 5, Teng, as modified, discloses the elements of claim 1, as discussed above. Teng also discloses:
wherein the memory within the at least one processing resource includes a register file (Teng discloses, at ¶ [0044], a set of registers, which is a register file.).

Regarding claim 6, Teng, as modified, discloses the elements of claim 1, as discussed above. Teng also discloses:
wherein the memory within the at least one processing resource includes a memory within the matrix accelerator (Teng discloses, at ¶ [0034], the accelerator includes memory.).

Regarding claim 7, Teng, as modified, discloses the elements of claim 1, as discussed above. Teng also discloses:
wherein the sparse first matrix includes weight data associated with a neural network (Teng discloses, at ¶ [0043], the first matrix is a weight matrix.).

Regarding claim 8, Teng, as modified, discloses the elements of claim 7, as discussed above. Teng also discloses:
wherein the second matrix includes input activation data associated with the neural network (Teng discloses, at ¶ [0043], the first matrix includes activation data.).

Regarding claim 9, Teng, as modified, discloses the elements of claim 8, as discussed above. Teng also discloses:
wherein the output of the dot product operation includes output activation data associated with the neural network (Teng discloses, at ¶ [0003], the output is activation data.).

Regarding claim 10, Teng, as modified, discloses the elements of claim 9, as discussed above. Teng also discloses:
wherein the output of the dot product operation is a dense matrix (Teng discloses, at ¶ [0042], generating output data from matrix multiplication. The input matrices are interpreted as dense matrices, as there is no disclosure related to sparse matrices. The output of a dot product on dense input matrices is also a dense matrix.).

Regarding claim 11, Teng, as modified, discloses the elements of claim 1, as discussed above. Teng also discloses:
wherein the matrix accelerator includes a systolic array of processing elements (Teng discloses, at ¶ [0054], using a systolic array.).

Regarding claim 16, Teng discloses:
a method comprising: on a general-purpose graphics processor: performing a dot product operation on multiple elements of a…first matrix and a second matrix in response to a sparse dot product instruction, the dot product operation performed via a compute cluster including multiple processing resources coupled with a cache memory, at least one processing resource including a matrix accelerator…(Teng discloses, at ¶ [0068], a processing system that includes a cluster of processing circuitry including a GPU, which discloses that the system is a general purpose graphics processor, including neural network accelerators, which discloses matrix accelerators. Teng also discloses, at ¶ [0044], a host and accelerator coupled via a RAM, which is used as a cache memory. Teng discloses, at ¶ [0023], performing inner product operations, which are dot product operations and are understood to be performed in response to a dot product instruction and which involve multiple elements of a first and second matrix.);
storing the… [first matrix] to the cache memory…(Teng discloses, at ¶ [0051], storing weight matrices to the RAM.); and 
via the at least one processing resource and in response to the sparse dot product instruction: loading the…[first matrix] from the cache memory into a memory within the at least one processing resource (Teng discloses, at ¶ [0043], the accelerator loading the weights from the RAM to a cache, which is memory within the accelerator. As noted above, performing dot product operations, including loading input and storing output, are performed in response to dot product instructions.);  
loading the second matrix from the cache memory into the memory within the at least one processing resource (Teng discloses, at ¶ [0052], reading input data (i.e., activation matrices, from the RAM, and, at ¶ [0043], storing them in FIFOs, which is memory within the accelerator.);  
 (Teng discloses, at ¶ [0043], performing matrix multiplication operations, which discloses selecting the elements and performing dot product operations.); and 
writing output of the dot product operation to the memory within the at least one processing resource (Teng discloses, at ¶ [0042], an output of the operations is sent to FIFOs, which is memory within the accelerator.).
Teng does not explicitly disclose the aforementioned first matrix is sparse, the sparse first matrix has a structured sparsity in which elements of the sparse first matrix have been pruned to a predetermined pattern, elements of the sparse first matrix are compacted into a compressed representation including a set of elements, the set of elements including at least one non- zero value element and an indication of the at least one non-zero value element, the sparse dot product instruction is to cause the matrix accelerator to skip computations associated with input including a zero value element, storing, loading, and operating on the non-zero elements of the compressed representation based on the indication of the at least one non-zero value.
However, in the same field of endeavor (e.g., matrix multiplication) Page discloses: 
the aforementioned first matrix is sparse and elements of the sparse first matrix are compacted into a compressed representation including a set of elements, the set of elements including at least one non- zero value element and an indication of the at least one non-zero value element, skipping computations associated with input including a zero value element, storing, loading, and operating on the non-zero elements of the compressed representation based on the indication of the at least one non-zero value (Page discloses, at ¶ [0030],storing non-zero elements and additional information to indicate their locations. Page also discloses, at ¶ [0037], loading and operating on the elements, which discloses skipping zero value elements.).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify Teng’s matrix operations to utilize a compressed format for sparse 
Also in the same field of endeavor (e.g., matrix multiplication) Simonyan discloses: 
the sparse first matrix has a structured sparsity in which elements of the sparse first matrix have been pruned to a predetermined pattern (Simonyan discloses, at ¶ [0042], clearing values of blocks according to a predefined sparsity pattern, which discloses structured sparsity and pruning to a predetermined pattern.).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify Teng’s matrix operations to utilize a structured sparsity, as disclosed by Simonyan, in order to improve performance by reducing data storage requirements. See Simonyan, ¶ [0009].

Regarding claim 17, Teng, as modified, discloses the elements of claim 16, as discussed above. Teng does not explicitly disclose wherein the method further comprises compacting the elements of the sparse first matrix into the compressed representation within a memory of the at least one processing resource.
However, in the same field of endeavor (e.g., matrix multiplication) Page discloses:
compacting the elements of the sparse first matrix into the compressed representation within a memory of the at least one processing resource (Page discloses, at ¶ [0030], the compressed representation is stored in memory.).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify Teng’s matrix operations to utilize a compressed format for sparse matrices, as disclosed by Page, in order to improve performance by eliminating unproductive operations, e.g., multiplying by zero. See Page, ¶ [0004].

Regarding claim 21, Teng discloses:
a data processing system comprising: a memory device; and a general purpose graphics processor comprising a compute cluster including multiple processing resources coupled with a cache (Teng discloses, at ¶ [0068], a processing system that includes a cluster of processing circuitry including a GPU, which discloses that the system is a general purpose graphics processor, including neural network accelerators, which discloses matrix accelerators. Teng discloses, at ¶ [0054], using a systolic array. Teng also discloses, at ¶ [0044], a host and accelerator coupled via a RAM, which is used as a cache memory. Teng discloses, at ¶ [0023], performing inner product operations, which are dot product operations and are understood to be performed in response to a dot product instruction and which involve multiple elements of a first and second matrix.); 
wherein the … [first matrix] is to be stored to the cache memory…(Teng discloses, at ¶ [0051], storing weight matrices to the RAM.); and 
wherein in response to the sparse dot product instruction, the at least one processing resource is configured to: load the … [first matrix] from the cache memory into a memory within the at least one processing resource (Teng discloses, at ¶ [0043], the accelerator loading the weights from the RAM to a cache, which is memory within the accelerator. As noted above, performing dot product operations, including loading input and storing output, are performed in response to dot product instructions.);   
load the second matrix from the cache memory into the memory within the at least one processing resource (Teng discloses, at ¶ [0052], reading input data (i.e., activation matrices, from the RAM, and, at ¶ [0043], storing them in FIFOs, which is memory within the accelerator.);   
perform the dot product operation on elements from the … [first matrix] and selected elements of the second matrix, wherein the selected elements of the second matrix correspond with…values of the…first matrix stored within the … [first matrix]… (Teng discloses, at ¶ [0043], performing matrix multiplication operations, which discloses selecting the elements and performing dot product operations.);  and 
write output of the dot product operation to the memory within the at least one processing resource (Teng discloses, at ¶ [0042], an output of the operations is sent to FIFOs, which is memory within the accelerator.).

However, in the same field of endeavor (e.g., matrix multiplication) Page discloses: 
the aforementioned first matrix is sparse and elements of the sparse first matrix are compacted into a compressed representation including a set of elements, the set of elements including at least one non- zero value element and an indication of the at least one non-zero value element, skipping computations associated with input including a zero value element, storing, loading, and operating on the non-zero elements of the compressed representation based on the indication of the at least one non-zero value (Page discloses, at ¶ [0030],storing non-zero elements and additional information to indicate their locations. Page also discloses, at ¶ [0037], loading and operating on the elements, which discloses skipping zero value elements.).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify Teng’s matrix operations to utilize a compressed format for sparse matrices, as disclosed by Page, in order to improve performance by eliminating unproductive operations, e.g., multiplying by zero. See Page, ¶ [0004].
Also in the same field of endeavor (e.g., matrix multiplication) Simonyan discloses: 
the sparse first matrix has a structured sparsity in which elements of the sparse first matrix have been pruned to a predetermined pattern (Simonyan discloses, at ¶ [0042], clearing values of blocks according to a predefined sparsity pattern, which discloses structured sparsity and pruning to a predetermined pattern.).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify Teng’s matrix operations to utilize a structured sparsity, as 

Regarding claim 22, Teng, as modified, discloses the elements of claim 21, as discussed above. Teng also discloses:
wherein the cache memory is a level two (L2) cache memory or a level one (L I) cache memory and the memory within the at least one processing resource includes a shared memory, a register file, or a memory within the matrix accelerator (Teng discloses, at Figure 4, that the RAM is level 2 cache memory, as the RAM is farther from the processing elements than another memory. (Teng discloses, at ¶ [0044], a set of registers, which is a register file).

Regarding claim 23, Teng, as modified, discloses the elements of claim 21, as discussed above. Teng also discloses:
wherein the sparse first matrix includes weight data associated with a neural network and the second matrix includes input activation data associated with the neural network (Teng discloses, at ¶ [0043], the first matrix is a weight matrix. Teng discloses, at ¶ [0043], the first matrix includes activation data.).

Regarding claim 26, Teng discloses:
a general purpose graphics processor comprising: a compute cluster including multiple processing resources coupled with a cache memory, at least one processing resource including a matrix accelerator, wherein the matrix accelerator includes a systolic array of processing elements and the matrix accelerator configured to: (Teng discloses, at ¶ [0068], a processing system that includes a cluster of processing circuitry including a GPU, including neural network accelerators, which discloses matrix accelerators. Teng discloses, at ¶ [0054], using a systolic array. Teng also discloses, at ¶ [0044], a host and accelerator coupled via a RAM, which is used as a cache memory.);
perform a dot product operation on multiple elements of a…first matrix and a second matrix in response to a sparse dot product instruction…, wherein the …dot product operation is performed on (Teng discloses, at ¶ [0023], performing inner product operations, which are dot product operations and are understood to be performed in response to a dot product instruction and which involve selecting and operating on multiple elements of a first and second matrix.); and 
write output of the dot product operation to the memory within the at least one processing resource (Teng discloses, at ¶ [0042], an output of the operations is sent to FIFOs, which is memory within the accelerator.).
Teng does not explicitly disclose the aforementioned first matrix is sparse, the sparse first matrix has a structured sparsity in which elements of the sparse first matrix have been pruned to a predetermined pattern, the sparse dot product instruction is to cause the matrix accelerator to skip computations associated with input including a zero value element, elements of the sparse first matrix are compacted into a compressed representation including a set of elements, the set of elements including at least one non- zero value element and an indication of the at least one non-zero value element, storing, loading, and operating on the non-zero elements of the compressed representation based on the indication of the at least one non-zero value.
However, in the same field of endeavor (e.g., matrix multiplication) Page discloses: 
the aforementioned first matrix is sparse, skipping computations associated with input including a zero value element, elements of the sparse first matrix are compacted into a compressed representation including a set of elements, the set of elements including at least one non- zero value element and an indication of the at least one non-zero value element, storing, loading, and operating on the non-zero elements of the compressed representation based on the indication of the at least one non-zero value (Page discloses, at ¶ [0030], storing non-zero elements and additional information to indicate their locations. Page also discloses, at ¶ [0037], loading and operating on the non-zero elements, which discloses skipping zero value elements.).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify Teng’s matrix operations to utilize a compressed format for sparse 
Also in the same field of endeavor (e.g., matrix multiplication) Simonyan discloses: 
the sparse first matrix has a structured sparsity in which elements of the sparse first matrix have been pruned to a predetermined pattern (Simonyan discloses, at ¶ [0042], clearing values of blocks according to a predefined sparsity pattern, which discloses structured sparsity and pruning to a predetermined pattern.).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify Teng’s matrix operations to utilize a structured sparsity, as disclosed by Simonyan, in order to improve performance by reducing data storage requirements. See Simonyan, ¶ [0009].

Claims 14, 15, 19, 20, 24, 27, and 28 are rejected under 35 U.S.C. 103 as being unpatentable over Teng in view of Page in view of Simonyan in view of US Publication No. 2020/0061811 by Iqbal et al. (hereinafter referred to as “Iqbal”). 
Regarding claim 14, Teng, as modified, discloses the elements of claim 1, as discussed above. Teng does not explicitly disclose wherein the dot product operation is an 8-bit integer dot product operation.
However, in the same field of endeavor (e.g., graphics processors) Iqbal discloses:
wherein the dot product operation is an 8-bit integer dot product operation (Iqbal discloses 8-bit integer matrix operations, which includes dot product operations.).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify Teng’s matrix operation to utilize 8-bit operations, as disclosed by Iqbal because doing so merely entails a combination of prior art elements (cited above) according to known methods to yield predictable results, which is an exemplary rationale to support a conclusion of obviousness, as per MPEP § 2143.

Regarding claim 15, Teng, as modified, discloses the elements of claim 14, as discussed above. Teng does not explicitly disclose wherein the sparse first matrix includes 8-bit integer elements.
However, in the same field of endeavor (e.g., graphics processors) Iqbal discloses:
wherein the sparse first matrix includes 8-bit integer elements (Iqbal discloses 8-bit integer matrix operations, which discloses 8-bit integer elements.).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify Teng’s matrix operation to utilize 8-bit elements, as disclosed by Iqbal because doing so merely entails a combination of prior art elements (cited above) according to known methods to yield predictable results, which is an exemplary rationale to support a conclusion of obviousness, as per MPEP § 2143.

Regarding claim 19, Teng, as modified, discloses the elements of claim 16, as discussed above. Teng does not explicitly disclose wherein the dot product operation is an 8-bit integer dot product operation.
However, in the same field of endeavor (e.g., graphics processors) Iqbal discloses:
wherein the dot product operation is an 8-bit integer dot product operation (Iqbal discloses 8-bit integer matrix operations, which includes dot product operations.).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify Teng’s matrix operation to utilize 8-bit operations, as disclosed by Iqbal because doing so merely entails a combination of prior art elements (cited above) according to known methods to yield predictable results, which is an exemplary rationale to support a conclusion of obviousness, as per MPEP § 2143.

Regarding claim 20, Teng, as modified, discloses the elements of claim 19, as discussed above. Teng does not explicitly disclose wherein the sparse first matrix includes 8-bit integer elements.
However, in the same field of endeavor (e.g., graphics processors) Iqbal discloses:
wherein the sparse first matrix includes 8-bit integer elements (Iqbal discloses 8-bit integer matrix operations, which discloses 8-bit integer elements.).


Regarding claim 24, Teng, as modified, discloses the elements of claim 21, as discussed above. Teng also discloses:
Teng does not explicitly disclose wherein the dot product operation is an 8-bit integer dot product operation and the sparse first matrix includes 8-bit integer elements.
However, in the same field of endeavor (e.g., graphics processors) Iqbal discloses:
wherein the dot product operation is an 8-bit integer dot product operation and the sparse first matrix includes 8-bit integer elements (Iqbal discloses 8-bit integer matrix operations, which includes dot product operations. Iqbal discloses 8-bit integer matrix operations, which discloses 8-bit integer elements.).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify Teng’s matrix operation to utilize 8-bit operations and elements, as disclosed by Iqbal because doing so merely entails a combination of prior art elements (cited above) according to known methods to yield predictable results, which is an exemplary rationale to support a conclusion of obviousness, as per MPEP § 2143.

Regarding claim 27, Teng, as modified, discloses the elements of claim 26, as discussed above. Teng does not explicitly disclose wherein the dot product operation is an 8-bit integer dot product operation.
However, in the same field of endeavor (e.g., graphics processors) Iqbal discloses:
wherein the dot product operation is an 8-bit integer dot product operation (Iqbal discloses 8-bit integer matrix operations, which includes dot product operations.).


Regarding claim 28, Teng, as modified, discloses the elements of claim 27, as discussed above. Teng does not explicitly disclose wherein the sparse first matrix includes 8-bit integer elements.
However, in the same field of endeavor (e.g., graphics processors) Iqbal discloses:
wherein the sparse first matrix includes 8-bit integer elements (Iqbal discloses 8-bit integer matrix operations, which discloses 8-bit integer elements.).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify Teng’s matrix operation to utilize 8-bit elements, as disclosed by Iqbal because doing so merely entails a combination of prior art elements (cited above) according to known methods to yield predictable results, which is an exemplary rationale to support a conclusion of obviousness, as per MPEP § 2143.

Response to Arguments
On pages 10-11 of the response filed July 30, 2021 (“response”), the Applicant argues, “Page specifically and explicitly mentions "a programmable logic circuit, such as a FPGA," but does not mention "a general purpose graphics processor" as claimed in each of the pending claims. As Primary reference Teng, in [0068] discloses "a CPU and field programmable gate array (FPGA) circuitry 926 and graphics processing unit (GPU) 928 operating as neural network accelerators." Thus, while one skilled in the art may look to Page as to how to modify the FPGA of Teng, no motivation would exist to instead apply those techniques to the GPU of Teng. As Teng explicitly includes an FPGA and a GPU, and Page is directed towards an FPGA-based hardware implementation, no teaching or suggestion is provided by the combination of Teng and Page that would render the pending claims obvious. Even if such motivation 
Though fully considered, the Examiner respectfully disagrees. While Page discloses an FPGA implementation, the particular choice of hardware is only one example. Page explicitly recites, e.g., at paragraph 51, "Whilst an FPGA implementation is currently seen as beneficial, other hardware and/or software embodiments can be considered." Therefore, an attempt to characterize Page as being directed to techniques suitable exclusively for FPGAs is unpersuasive. Page is cited as teaching a particular mechanism for processing sparse matrices, and is not cited for or limited to any particular hardware implementation. Similarly, Teng discloses a couple of examples of types of accelerators, e.g., FPGAs and GPUs, and discloses that certain data may be more effectively processed on one or the other type. See, e.g., paragraph 68 et seq. However Teng also explicitly discloses, e.g., at paragraph 51, that "the disclosed approaches are not limited to any specific hardware platforms." Therefore, one would have been motivated to modify the GPU disclosed by Teng to utilize the teachings of Page directed to sparse matrices because the advantages provided by Page's teachings would be realized by incorporation into the system of Teng. Accordingly, the Applicant's arguments are deemed unpersuasive.

On page 11 of the response the Applicant argues, “Teng, even in combination with Page, does not explicitly teach "a sparse dot product instruction," even if inner product operations are described, as no specific instruction is disclosed to perform a sparse dot product operation. Instead, Teng, Par. [0048] discloses "a group of per-layer instructions" that "specifies processing of a respective layer of the neural network" and Page does not disclose any specific instruction to perform the described techniques.”
Though fully considered, the Examiner respectfully disagrees. As noted, Teng discloses dot product instructions, but is silent regarding sparse data. However, Page discloses sparse data. The instructions of Teng, when utilized to process sparse data, as disclosed by Page, disclose a dot product instruction that processes sparse data, i.e., a sparse dot product instruction. Accordingly, the Applicant's arguments are deemed unpersuasive.

On pages 11-12 of the response the Applicant argues, “the combination of Teng and Page fails to teach or suggest "wherein the selected elements of the second matrix correspond with non-zero values of the sparse first matrix stored within the compressed representation and are selected by the processing resource based on the indication of the at least one non-zero value." Pages 4-5 of the Office Action cite to Page as teaching "operating on the non-zero elements of the compressed representation based on the indication of the at least one non-zero value." However, Page does not explicitly state, "the selected elements of the second matrix... are selected by the processing resource based on the indication of the at least one non-zero value." Page, par. [0035] merely describes an "associating step" in which "each of a plurality of the non-zero values of a row of the matrix is associated with a respective memory block that stores an element of the vector having an index corresponding with a respective index of the non-zero value," without describing any specific hardware structure that performs such step.”
Though fully considered, the Examiner respectfully disagrees. Teng discloses processing resources that perform dot product operations. Performing the operations includes selecting the values which will be operated on. Teng does not disclose that the selecting is based on indications of non-zero elements. However, Page discloses such selecting. When coupled with the hardware for which Teng is cited, the combination teaches all elements of the limitation in question.

On pages 12-13 of the response the Applicant argues, “As to the rejection of previous claims 12, 18, and 25, which are now included in independent claims 1, 16, 21, and 26, page 21 of the Office Action states, "Under a broadest reasonable interpretation (BRI), words of the claim must be given their plain meaning, unless such meaning is inconsistent with the specification" (emphasis added. Applicant respectfully submits that the BRI applied in the Office Action is inconsistent with the specification (see, e.g., par. [0398]-[0401] of the specification regarding "comparisons between unstructured sparsity and block sparsity within training data for a neural network."). For this reason, applicant respectfully requests the withdrawal of the finality of the present Office Action. However, to clarify the claim limitations, the claims are further amended to state, "wherein the sparse first matrix has a structured sparsity in which elements of the sparse first matrix have been pruned to a predetermined pattern," and "elements of the sparse first matrix are compacted, based on the structured sparsity, into a compressed representation 
These remarks have been fully considered and, in light of the claim amendments presented in the response, are deemed persuasive, in part. Please see above for new grounds of rejection of the amended claims. Specifically, Simonyan, as cited above, explicitly discloses using structured sparsity, e.g., block sparsity, and the well-known advantages of doing so. See, e.g., Simonyan ¶¶ [0007]-[0009] and [0042]. Simonyan also discloses that the structured sparsity is achieved by clearing, i.e., pruning, certain values in a matrix. Id. Therefore, the combination of references discloses all elements of the amended claims.
The Examiner notes, regarding the maxpool function disclosed by Teng, which was previously cited as disclosing generating structured sparsity, that the function operates by clearing or pruning all elements in a particular predefined block, except for the maximum value. See, e.g., the Krueger and Popescu references cited as pertinent below. Thus, the features related to structural sparsity are arguably disclosed by the previous rejection. However, these features are not explicitly spelled out by Teng. Therefore, in order to expedite prosecution, the Examiner has cited Simonyan, which makes the disclosure explicit. 

Conclusion
The following prior art made of record and not relied upon is considered pertinent to Applicant’s disclosure.
US 20200050830 by Krueger discloses maxpool and relu.
US 20170344822 by Popescu discloses that maxpool is pruning.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to SHAWN DOMAN whose telephone number is (571)270-5677.  The examiner can normally be reached on Monday through Friday 8:30am-6pm Eastern Time.

Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/SHAWN DOMAN/Primary Examiner, Art Unit 2183