DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claims 1, 6, 16, 21, 22, and 26 have been amended.
Claims 29-32 have been added.
Claims 1-11, 14-17, 19-24, and 26-32 have been examined.
The § 112 rejections in the previous Office Action have been addressed and are withdrawn.

Information Disclosure Statement
The applicant's submission of the Information Disclosure Statement dated January 18, 2022 is acknowledged by the examiner and the cited references have been considered in the examination of the claims now pending. A copy of the PTOL-1449 initialed and dated by the examiner is attached to the instant office action.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 1-11, 14-17, 19-24, and 26-32 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor, or for pre-AIA  the applicant regards as the invention.
Claim 1 recites, at line 25-26, “a plurality of elements of the second matrix stored in a memory within the matrix accelerator.” The number of instances of the plurality of elements of the second matrix required by the claim is indefinite. The second matrix, which 
Claims 2-11, 14, 15, 17, 19, 20, 22-24, and 27-32 are rejected as depending from rejected base claims and failing to cure the indefiniteness of those base claims.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-11, 16, 17, 21-23, 26, and 29-32 are rejected under 35 U.S.C. 103 as being unpatentable over US Publication No. 2019/0114534 by Teng et al. (hereinafter referred to as “Teng”) in view of US Publication No. 2019/0361954 by Page et al. (hereinafter referred to as “Page”) in view of US Publication No. 2021/0012197 by Simonyan et al. (previously cited by the Examiner in the action dated February 17, 2021 and hereinafter referred to as “Simonyan”). 
Regarding claim 1, Teng discloses:
a general purpose graphics processor comprising: a compute cluster including multiple processing resources coupled with a cache memory, at least one processing resource including a matrix accelerator, the matrix accelerator configured to perform a dot product operation on multiple elements of a…first matrix and a second matrix in response to a sparse dot product instruction…(Teng discloses, at ¶ [0068], a processing system that includes a cluster of processing circuitry including a GPU, which discloses that the system is a general purpose graphics processor, including neural network accelerators, which discloses matrix accelerators. Teng also discloses, at ¶ [0044], a host and accelerator coupled via a RAM, which is used as a cache memory. Teng discloses, at ¶ [0023], performing inner product operations, which are dot product operations and are understood to be performed in response to a dot product instruction and which involve multiple elements of a first and second matrix.); 
wherein the…[first matrix] is to be stored to the cache memory… (Teng discloses, at ¶ [0051], storing weight matrices to the RAM.); and 
wherein in response to the sparse dot product instruction, the at least one processing resource is configured to: load the…[first matrix] from the cache memory into a memory within the at least one processing resource (Teng discloses, at ¶ [0043], the accelerator loading the weights from the RAM to a cache, which is memory within the accelerator. As noted above, performing dot product operations, including loading input and storing output, are performed in response to dot product instructions.); 
load the second matrix from the cache memory into the memory within the at least one processing resource (Teng discloses, at ¶ [0052], reading input data (i.e., activation matrices, from the RAM, and, at ¶ [0043], storing them in FIFOs, which is memory within the accelerator.); 
perform the dot product operation via the matrix accelerator on elements from the…[first matrix] and selected elements of the second matrix, wherein the selected elements of the second matrix (Teng discloses, at ¶ [0043], performing matrix multiplication operations, which discloses selecting the elements and performing dot product operations. As discussed above, the elements of the second matrix are stored in memory within the matrix accelerator.); and 
write output of the dot product operation to the memory within the at least one processing resource (Teng discloses, at ¶ [0042], an output of the operations is sent to FIFOs, which is memory within the accelerator.).
Teng does not explicitly disclose the aforementioned first matrix is sparse, the sparse first matrix has a structured sparsity in which elements of the sparse first matrix have been pruned to a predetermined pattern, elements of the sparse first matrix are compacted, based on the structured sparsity, into a compressed representation including a set of elements, the set of elements including at least one non- zero value element and an indication of the at least one non-zero value element, the sparse dot product instruction is to cause the matrix accelerator to skip computations associated with input including a zero value element, storing, loading, and operating on the non-zero elements of the compressed representation based on the indication of the at least one non-zero value.
However, in the same field of endeavor (e.g., matrix multiplication) Page discloses: 
the aforementioned first matrix is sparse and elements of the sparse first matrix are compacted into a compressed representation including a set of elements, the set of elements including at least one non- zero value element and an indication of the at least one non-zero value element, skipping computations associated with input including a zero value element, storing, loading, and operating on the non-zero elements of the compressed representation based on the indication of the at least one non-zero value (Page discloses, at ¶ [0030],storing non-zero elements and additional information to indicate their locations. Page also discloses, at ¶ [0037], loading and operating on the non-zero elements, which discloses skipping zero value elements.).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify Teng’s matrix operations to utilize a compressed format for sparse 
Also in the same field of endeavor (e.g., matrix multiplication) Simonyan discloses: 
the sparse first matrix has a structured sparsity in which elements of the sparse first matrix have been pruned to a predetermined pattern (Simonyan discloses, at ¶ [0042], clearing values of blocks according to a predefined sparsity pattern, which discloses structured sparsity and pruning to a predetermined pattern.).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify Teng’s matrix operations to utilize a structured sparsity, as disclosed by Simonyan, in order to improve performance by reducing data storage requirements. See Simonyan, ¶ [0009].

Regarding claim 2, Teng, as modified, discloses the elements of claim 1, as discussed above. Teng also discloses:
wherein the cache memory is a level two (L2) cache memory (Teng discloses, at Figure 4, that the RAM is level 2 cache memory, as the RAM is farther from the processing elements than another memory.).

Regarding claim 3, Teng, as modified, discloses the elements of claim 1, as discussed above. Teng also discloses:
wherein the memory within the at least one processing resource includes a level one (L1) cache memory (Teng discloses, at Figure 3, that the memory within the accelerator is L1 cache, as the memory is closer to the processing elements than the RAM.).

Regarding claim 4, Teng, as modified, discloses the elements of claim 1, as discussed above. Teng also discloses:
wherein the memory…includes a shared memory (Teng discloses, at ¶ [0045], a shared memory.).

However, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify Teng such that the shared memory is within the accelerator. Doing so merely entails a combination of prior art elements (cited above) according to known methods to yield predictable results, which is an exemplary rationale to support a conclusion of obviousness, as per MPEP § 2143.

Regarding claim 5, Teng, as modified, discloses the elements of claim 1, as discussed above. Teng also discloses:
wherein the memory within the at least one processing resource includes a register file (Teng discloses, at ¶ [0044], a set of registers, which is a register file.).

Regarding claim 6, Teng, as modified, discloses the elements of claim 1, as discussed above. Teng also discloses:
wherein the memory within the at least one processing resource includes the memory within the matrix accelerator (Teng discloses, at ¶ [0034], the accelerator includes memory.).

Regarding claim 7, Teng, as modified, discloses the elements of claim 1, as discussed above. Teng also discloses:
wherein the sparse first matrix includes weight data associated with a neural network (Teng discloses, at ¶ [0043], the first matrix is a weight matrix.).

Regarding claim 8, Teng, as modified, discloses the elements of claim 7, as discussed above. Teng also discloses:
wherein the second matrix includes input activation data associated with the neural network (Teng discloses, at ¶ [0043], the first matrix includes activation data.).

Regarding claim 9, Teng, as modified, discloses the elements of claim 8, as discussed above. Teng also discloses:
wherein the output of the dot product operation includes output activation data associated with the neural network (Teng discloses, at ¶ [0003], the output is activation data.).

Regarding claim 10, Teng, as modified, discloses the elements of claim 9, as discussed above. Teng also discloses:
wherein the output of the dot product operation is a dense matrix (Teng discloses, at ¶ [0042], generating output data from matrix multiplication. The input matrices are interpreted as dense matrices, as there is no disclosure related to sparse matrices. The output of a dot product on dense input matrices is also a dense matrix.).

Regarding claim 11, Teng, as modified, discloses the elements of claim 1, as discussed above. Teng also discloses:
wherein the matrix accelerator includes a systolic array of processing elements (Teng discloses, at ¶ [0054], using a systolic array.).

Regarding claim 16, Teng discloses:
a method comprising: on a general-purpose graphics processor: performing a dot product operation on multiple elements of a…first matrix and a second matrix in response to a sparse dot product instruction, the dot product operation performed via a compute cluster including multiple processing resources coupled with a cache memory, at least one processing resource including a matrix accelerator…(Teng discloses, at ¶ [0068], a processing system that includes a cluster of processing circuitry including a GPU, which discloses that the system is a general purpose graphics processor, including neural network accelerators, which discloses matrix accelerators. Teng also discloses, at ¶ [0044], a host and accelerator coupled via a RAM, which is used as a cache memory. Teng discloses, at ¶ [0023], performing inner product operations, which are dot product operations and are understood to be performed in response to a dot product instruction and which involve multiple elements of a first and second matrix.);
storing the… [first matrix] to the cache memory…(Teng discloses, at ¶ [0051], storing weight matrices to the RAM.); and 
via the at least one processing resource and in response to the sparse dot product instruction: loading the…[first matrix] from the cache memory into a memory within the at least one processing resource (Teng discloses, at ¶ [0043], the accelerator loading the weights from the RAM to a cache, which is memory within the accelerator. As noted above, performing dot product operations, including loading input and storing output, are performed in response to dot product instructions.);  
loading the second matrix from the cache memory into the memory within the at least one processing resource (Teng discloses, at ¶ [0052], reading input data (i.e., activation matrices, from the RAM, and, at ¶ [0043], storing them in FIFOs, which is memory within the accelerator.);  
performing the dot product operation via the matrix accelerator on elements from the… [first matrix] and selected elements of the second matrix, wherein the selected elements of the second matrix correspond with…values of the…first matrix stored within the … [first matrix]… and are selected by the matrix accelerator from a plurality of elements of the second matrix stored in a memory within the matrix accelerator (Teng discloses, at ¶ [0043], performing matrix multiplication operations, which discloses selecting the elements and performing dot product operations. As discussed above, the elements of the second matrix are stored in memory within the matrix accelerator.); and 
writing output of the dot product operation to the memory within the at least one processing resource (Teng discloses, at ¶ [0042], an output of the operations is sent to FIFOs, which is memory within the accelerator.).
Teng does not explicitly disclose the aforementioned first matrix is sparse, the sparse first matrix has a structured sparsity in which elements of the sparse first matrix have been pruned to a predetermined pattern, elements of the sparse first matrix are compacted into a compressed representation including a set of elements, the set of elements including at least one non- zero value element and an indication of the at least one non-zero value element, the sparse dot product instruction is to cause the matrix accelerator to skip computations associated with input including a zero value element, 
However, in the same field of endeavor (e.g., matrix multiplication) Page discloses: 
the aforementioned first matrix is sparse and elements of the sparse first matrix are compacted into a compressed representation including a set of elements, the set of elements including at least one non- zero value element and an indication of the at least one non-zero value element, skipping computations associated with input including a zero value element, storing, loading, and operating on the non-zero elements of the compressed representation based on the indication of the at least one non-zero value (Page discloses, at ¶ [0030],storing non-zero elements and additional information to indicate their locations. Page also discloses, at ¶ [0037], loading and operating on the elements, which discloses skipping zero value elements.).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify Teng’s matrix operations to utilize a compressed format for sparse matrices, as disclosed by Page, in order to improve performance by eliminating unproductive operations, e.g., multiplying by zero. See Page, ¶ [0004].
Also in the same field of endeavor (e.g., matrix multiplication) Simonyan discloses: 
the sparse first matrix has a structured sparsity in which elements of the sparse first matrix have been pruned to a predetermined pattern (Simonyan discloses, at ¶ [0042], clearing values of blocks according to a predefined sparsity pattern, which discloses structured sparsity and pruning to a predetermined pattern.).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify Teng’s matrix operations to utilize a structured sparsity, as disclosed by Simonyan, in order to improve performance by reducing data storage requirements. See Simonyan, ¶ [0009].

Regarding claim 17, Teng, as modified, discloses the elements of claim 16, as discussed above. Teng does not explicitly disclose wherein the method further comprises compacting the elements of the 
However, in the same field of endeavor (e.g., matrix multiplication) Page discloses:
compacting the elements of the sparse first matrix into the compressed representation within a memory of the at least one processing resource (Page discloses, at ¶ [0030], the compressed representation is stored in memory.).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify Teng’s matrix operations to utilize a compressed format for sparse matrices, as disclosed by Page, in order to improve performance by eliminating unproductive operations, e.g., multiplying by zero. See Page, ¶ [0004].

Regarding claim 21, Teng discloses:
a data processing system comprising: a memory device; and a general purpose graphics processor comprising a compute cluster including multiple processing resources coupled with a cache memory, at least one processing resource including a matrix accelerator wherein the matrix accelerator includes a systolic array of processing elements, the matrix accelerator configured to perform a dot product operation on multiple elements of a…first matrix and a second matrix in response to a sparse dot product instruction…(Teng discloses, at ¶ [0068], a processing system that includes a cluster of processing circuitry including a GPU, which discloses that the system is a general purpose graphics processor, including neural network accelerators, which discloses matrix accelerators. Teng discloses, at ¶ [0054], using a systolic array. Teng also discloses, at ¶ [0044], a host and accelerator coupled via a RAM, which is used as a cache memory. Teng discloses, at ¶ [0023], performing inner product operations, which are dot product operations and are understood to be performed in response to a dot product instruction and which involve multiple elements of a first and second matrix.); 
wherein the … [first matrix] is to be stored to the cache memory…(Teng discloses, at ¶ [0051], storing weight matrices to the RAM.); and 
wherein in response to the sparse dot product instruction, the at least one processing resource is configured to: load the … [first matrix] from the cache memory into a memory within the at least one (Teng discloses, at ¶ [0043], the accelerator loading the weights from the RAM to a cache, which is memory within the accelerator. As noted above, performing dot product operations, including loading input and storing output, are performed in response to dot product instructions.);   
load the second matrix from the cache memory into the memory within the at least one processing resource (Teng discloses, at ¶ [0052], reading input data (i.e., activation matrices, from the RAM, and, at ¶ [0043], storing them in FIFOs, which is memory within the accelerator.);   
perform the dot product operation via the matrix accelerator on elements from the … [first matrix] and selected elements of the second matrix, wherein the selected elements of the second matrix correspond with…values of the…first matrix stored within the … [first matrix]… and are selected by the matrix accelerator from a plurality of elements of the second matrix stored in a memory within the matrix accelerator (Teng discloses, at ¶ [0043], performing matrix multiplication operations, which discloses selecting the elements and performing dot product operations. As discussed above, the elements of the second matrix are stored in memory within the matrix accelerator.); and 
write output of the dot product operation to the memory within the at least one processing resource (Teng discloses, at ¶ [0042], an output of the operations is sent to FIFOs, which is memory within the accelerator.).
Teng does not explicitly disclose the aforementioned first matrix is sparse, the sparse first matrix has a structured sparsity in which elements of the sparse first matrix have been pruned to a predetermined pattern, elements of the sparse first matrix are compacted into a compressed representation including a set of elements, the set of elements including at least one non- zero value element and an indication of the at least one non-zero value element, the sparse dot product instruction is to cause the matrix accelerator to skip computations associated with input including a zero value element, storing, loading, and operating on the non-zero elements of the compressed representation based on the indication of the at least one non-zero value.
However, in the same field of endeavor (e.g., matrix multiplication) Page discloses: 
the aforementioned first matrix is sparse and elements of the sparse first matrix are compacted into a compressed representation including a set of elements, the set of elements including at least one non- zero value element and an indication of the at least one non-zero value element, skipping (Page discloses, at ¶ [0030],storing non-zero elements and additional information to indicate their locations. Page also discloses, at ¶ [0037], loading and operating on the elements, which discloses skipping zero value elements.).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify Teng’s matrix operations to utilize a compressed format for sparse matrices, as disclosed by Page, in order to improve performance by eliminating unproductive operations, e.g., multiplying by zero. See Page, ¶ [0004].
Also in the same field of endeavor (e.g., matrix multiplication) Simonyan discloses: 
the sparse first matrix has a structured sparsity in which elements of the sparse first matrix have been pruned to a predetermined pattern (Simonyan discloses, at ¶ [0042], clearing values of blocks according to a predefined sparsity pattern, which discloses structured sparsity and pruning to a predetermined pattern.).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify Teng’s matrix operations to utilize a structured sparsity, as disclosed by Simonyan, in order to improve performance by reducing data storage requirements. See Simonyan, ¶ [0009].

Regarding claim 22, Teng, as modified, discloses the elements of claim 21, as discussed above. Teng also discloses:
wherein the cache memory is a level two (L2) cache memory or a level one (L I) cache memory and the memory within the at least one processing resource includes a shared memory, a register file, or a memory within the matrix accelerator (Teng discloses, at Figure 4, that the RAM is level 2 cache memory, as the RAM is farther from the processing elements than another memory. (Teng discloses, at ¶ [0044], a set of registers, which is a register file).

Regarding claim 23, Teng, as modified, discloses the elements of claim 21, as discussed above. Teng also discloses:
wherein the sparse first matrix includes weight data associated with a neural network and the second matrix includes input activation data associated with the neural network (Teng discloses, at ¶ [0043], the first matrix is a weight matrix. Teng discloses, at ¶ [0043], the first matrix includes activation data.).

Regarding claim 26, Teng discloses:
a general purpose graphics processor comprising: a compute cluster including multiple processing resources coupled with a cache memory, at least one processing resource including a matrix accelerator, wherein the matrix accelerator includes a systolic array of processing elements and the matrix accelerator is configured to: (Teng discloses, at ¶ [0068], a processing system that includes a cluster of processing circuitry including a GPU, including neural network accelerators, which discloses matrix accelerators. Teng discloses, at ¶ [0054], using a systolic array. Teng also discloses, at ¶ [0044], a host and accelerator coupled via a RAM, which is used as a cache memory.);
perform a dot product operation on multiple elements of a…first matrix and a second matrix in response to a sparse dot product instruction…, wherein the dot product operation is performed on elements from the…[first matrix] and selected elements of the second matrix in response to the sparse dot product instruction, the selected elements of the second matrix correspond with…values of the…first matrix stored within the …[first matrix]... and are selected by the matrix accelerator from a plurality of elements of the second matrix stored in a memory within the matrix accelerator, (Teng discloses, at ¶ [0023], performing inner product operations, which are dot product operations and are understood to be performed in response to a dot product instruction and which involve selecting and operating on multiple elements of a first and second matrix.); and 
write output of the dot product operation to the memory within the at least one processing resource (Teng discloses, at ¶ [0042], an output of the operations is sent to FIFOs, which is memory within the accelerator.).

However, in the same field of endeavor (e.g., matrix multiplication) Page discloses: 
the aforementioned first matrix is sparse, skipping computations associated with input including a zero value element, elements of the sparse first matrix are compacted into a compressed representation including a set of elements, the set of elements including at least one non- zero value element and an indication of the at least one non-zero value element in a bitstream, storing, loading, and operating on the non-zero elements of the compressed representation based on the indication of the location of at least one non-zero value within the bitstream (Page discloses, at ¶ [0030], storing non-zero elements and additional information to indicate their locations in the matrix, which is a bitstream. Page also discloses, at ¶ [0037], loading and operating on the non-zero elements, which discloses skipping zero value elements.).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify Teng’s matrix operations to utilize a compressed format for sparse matrices, as disclosed by Page, in order to improve performance by eliminating unproductive operations, e.g., multiplying by zero. See Page, ¶ [0004].
Also in the same field of endeavor (e.g., matrix multiplication) Simonyan discloses: 
the sparse first matrix has a structured sparsity in which elements of the sparse first matrix have been pruned to a predetermined pattern (Simonyan discloses, at ¶ [0042], clearing values of blocks according to a predefined sparsity pattern, which discloses structured sparsity and pruning to a predetermined pattern.).


Regarding claim 29, Teng, as modified, discloses the elements of claim 26, as discussed above. Teng does not explicitly disclose wherein the predetermined pattern includes at least one zero value and one non-zero value.
However, in the same field of endeavor (e.g., matrix multiplication) Simonyan discloses: 
the predetermined pattern includes at least one zero value and one non-zero value (Simonyan discloses, at ¶ [0043], that the matrix resulting from the clearing has a specified sparsity. By definition, a sparse matrix has one or more zero values and one or more non-zero values.).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify Teng’s matrix operations to utilize a structured sparsity, as disclosed by Simonyan, in order to improve performance by reducing data storage requirements. See Simonyan, ¶ [0009].

Regarding claim 30, Teng, as modified, discloses the elements of claim 1, as discussed above. Teng does not explicitly disclose wherein the predetermined pattern includes at least one zero value and one non-zero value.
However, in the same field of endeavor (e.g., matrix multiplication) Simonyan discloses: 
the predetermined pattern includes at least one zero value and one non-zero value (Simonyan discloses, at ¶ [0043], that the matrix resulting from the clearing has a specified sparsity. By definition, a sparse matrix has one or more zero values and one or more non-zero values.).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify Teng’s matrix operations to utilize a structured sparsity, as disclosed by Simonyan, in order to improve performance by reducing data storage requirements. See Simonyan, ¶ [0009].

Regarding claim 31, Teng, as modified, discloses the elements of claim 16, as discussed above. Teng does not explicitly disclose wherein the predetermined pattern includes at least one zero value and one non-zero value.
However, in the same field of endeavor (e.g., matrix multiplication) Simonyan discloses: 
the predetermined pattern includes at least one zero value and one non-zero value (Simonyan discloses, at ¶ [0043], that the matrix resulting from the clearing has a specified sparsity. By definition, a sparse matrix has one or more zero values and one or more non-zero values.).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify Teng’s matrix operations to utilize a structured sparsity, as disclosed by Simonyan, in order to improve performance by reducing data storage requirements. See Simonyan, ¶ [0009].

Regarding claim 32, Teng, as modified, discloses the elements of claim 21, as discussed above. Teng does not explicitly disclose wherein the predetermined pattern includes at least one zero value and one non-zero value.
However, in the same field of endeavor (e.g., matrix multiplication) Simonyan discloses: 
the predetermined pattern includes at least one zero value and one non-zero value (Simonyan discloses, at ¶ [0043], that the matrix resulting from the clearing has a specified sparsity. By definition, a sparse matrix has one or more zero values and one or more non-zero values.).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify Teng’s matrix operations to utilize a structured sparsity, as disclosed by Simonyan, in order to improve performance by reducing data storage requirements. See Simonyan, ¶ [0009].

Claims 14, 15, 19, 20, 24, 27, and 28 are rejected under 35 U.S.C. 103 as being unpatentable over Teng in view of Page in view of Simonyan in view of US Publication No. 2020/0061811 by Iqbal et al. (hereinafter referred to as “Iqbal”). 
Regarding claim 14, Teng, as modified, discloses the elements of claim 1, as discussed above. Teng does not explicitly disclose wherein the dot product operation is an 8-bit integer dot product operation.
However, in the same field of endeavor (e.g., graphics processors) Iqbal discloses:
wherein the dot product operation is an 8-bit integer dot product operation (Iqbal discloses 8-bit integer matrix operations, which includes dot product operations.).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify Teng’s matrix operation to utilize 8-bit operations, as disclosed by Iqbal because doing so merely entails a combination of prior art elements (cited above) according to known methods to yield predictable results, which is an exemplary rationale to support a conclusion of obviousness, as per MPEP § 2143.

Regarding claim 15, Teng, as modified, discloses the elements of claim 14, as discussed above. Teng does not explicitly disclose wherein the sparse first matrix includes 8-bit integer elements.
However, in the same field of endeavor (e.g., graphics processors) Iqbal discloses:
wherein the sparse first matrix includes 8-bit integer elements (Iqbal discloses 8-bit integer matrix operations, which discloses 8-bit integer elements.).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify Teng’s matrix operation to utilize 8-bit elements, as disclosed by Iqbal because doing so merely entails a combination of prior art elements (cited above) according to known methods to yield predictable results, which is an exemplary rationale to support a conclusion of obviousness, as per MPEP § 2143.

Regarding claim 19, Teng, as modified, discloses the elements of claim 16, as discussed above. Teng does not explicitly disclose wherein the dot product operation is an 8-bit integer dot product operation.
However, in the same field of endeavor (e.g., graphics processors) Iqbal discloses:
wherein the dot product operation is an 8-bit integer dot product operation (Iqbal discloses 8-bit integer matrix operations, which includes dot product operations.).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify Teng’s matrix operation to utilize 8-bit operations, as disclosed by Iqbal because doing so merely entails a combination of prior art elements (cited above) according to known methods to yield predictable results, which is an exemplary rationale to support a conclusion of obviousness, as per MPEP § 2143.

Regarding claim 20, Teng, as modified, discloses the elements of claim 19, as discussed above. Teng does not explicitly disclose wherein the sparse first matrix includes 8-bit integer elements.
However, in the same field of endeavor (e.g., graphics processors) Iqbal discloses:
wherein the sparse first matrix includes 8-bit integer elements (Iqbal discloses 8-bit integer matrix operations, which discloses 8-bit integer elements.).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify Teng’s matrix operation to utilize 8-bit elements, as disclosed by Iqbal because doing so merely entails a combination of prior art elements (cited above) according to known methods to yield predictable results, which is an exemplary rationale to support a conclusion of obviousness, as per MPEP § 2143.

Regarding claim 24, Teng, as modified, discloses the elements of claim 21, as discussed above. Teng also discloses:
Teng does not explicitly disclose wherein the dot product operation is an 8-bit integer dot product operation and the sparse first matrix includes 8-bit integer elements.
However, in the same field of endeavor (e.g., graphics processors) Iqbal discloses:
(Iqbal discloses 8-bit integer matrix operations, which includes dot product operations. Iqbal discloses 8-bit integer matrix operations, which discloses 8-bit integer elements.).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify Teng’s matrix operation to utilize 8-bit operations and elements, as disclosed by Iqbal because doing so merely entails a combination of prior art elements (cited above) according to known methods to yield predictable results, which is an exemplary rationale to support a conclusion of obviousness, as per MPEP § 2143.

Regarding claim 27, Teng, as modified, discloses the elements of claim 26, as discussed above. Teng does not explicitly disclose wherein the dot product operation is an 8-bit integer dot product operation.
However, in the same field of endeavor (e.g., graphics processors) Iqbal discloses:
wherein the dot product operation is an 8-bit integer dot product operation (Iqbal discloses 8-bit integer matrix operations, which includes dot product operations.).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify Teng’s matrix operation to utilize 8-bit operations, as disclosed by Iqbal because doing so merely entails a combination of prior art elements (cited above) according to known methods to yield predictable results, which is an exemplary rationale to support a conclusion of obviousness, as per MPEP § 2143.

Regarding claim 28, Teng, as modified, discloses the elements of claim 27, as discussed above. Teng does not explicitly disclose wherein the sparse first matrix includes 8-bit integer elements.
However, in the same field of endeavor (e.g., graphics processors) Iqbal discloses:
wherein the sparse first matrix includes 8-bit integer elements (Iqbal discloses 8-bit integer matrix operations, which discloses 8-bit integer elements.).
.

Response to Arguments
On page 10 of the response filed February 15, 2022 (“response”), the Applicant argues, “Teng does not disclose "the selected elements of the second matrix... are selected by the matrix accelerator from a plurality of elements of the second matrix stored in a memory within the matrix accelerator."”
Though fully considered, the Examiner respectfully disagrees. As discussed above, Teng discloses, e.g., at ¶ [0043], retrieving input activation matrices from FIFOs 358. These FIFOs disclose memory within the matrix accelerator. See, e.g., ¶ [0041]. Accordingly, the Applicant’s remarks are deemed unpersuasive. 

On page 10 of the response the Applicant argues, “Page also fails to teach "the selected elements of the second matrix... are selected by the matrix accelerator from a plurality of elements of the second matrix stored in a memory within the matrix accelerator."”
Though fully considered, the Examiner respectfully disagrees. As Teng is cited as disclosing these features, rather than Page, the Applicant’s arguments are deemed unpersuasive. 

On pages 10-11 of the response the Applicant argues, “The combination of Teng, Page, and Simonyan does not contemplate the selection of elements of the second matrix, "from a plurality of elements of the second matrix stored in a memory within the matrix accelerator." Teng does not teach a sparse dot product, so has no corresponding teaching regarding selection from a plurality of elements of the second matrix to multiply against a non-zero value of a compressed representation. Page is cited for sparse matrix operations, but does not teach to select "from a plurality of elements of the second matrix stored in a memory within the matrix accelerator." Instead, Page, par. 35 teaches to associate non-zero 
Though fully considered, the Examiner respectfully disagrees. Teng discloses selecting all elements of a matrix, i.e., each element sequentially. This necessarily involves some indication of which element is next. Therefore, Teng discloses selecting based on indications, i.e., indications of which element is next. However, Teng does not disclose that the indications are of a non-zero values. That is, Teng’s indications are just which element is next, regardless of whether the next element has a zero value or not. 
Page discloses that it is well-known that multiplying each element of a sparse matrix, where a large percentage of the entries have a zero value, is inefficient. See, e.g, Page, ¶ [0007]. Therefore, Page provides a manner to identify, i.e., indicate, which elements are non-zero. See, e.g., ¶ [0030], which discloses information to indicate the location of non-zero values. 
Given Teng’s disclosure of selecting elements using information that indicates the location of the next element, and Page’s disclosure of generating information that indicates the location of elements having zero values, the Examiner maintains that it would have been obvious to modify Teng to utilize Page’s information identifying the location of non-zero elements in order to improve efficiency of matrix operations by omitting calculations when one of the elements has a zero value. As noted, in sparse matrices, the percentage of zero-valued elements is high, so there is a significant benefit to be obtained by skipping those essentially trivial calculations. The modification of Teng to include Page’s identifying information is not affected by and does not require modification of Page’s multipliers. Accordingly, the Applicant’s remarks are deemed unpersuasive. 

On page 11 of the response the Applicant argues, “Moreover, the association technique taught by Page is not suitable for implementation within the matrix accelerator and during performance of the claimed dot product operation (e.g., "in response to the sparse dot product instruction"), due to the pre-
Though fully considered, the Examiner respectfully disagrees. The Examiner notes that Page is not cited for the associating step 120, as described at ¶ [0035]. Instead, Page is cited as disclosing storing non-zero elements and information that indicates the locations of the non-zero elements, as described, e.g., at ¶ [0030] regarding step 100. The Examiner maintains that these features discloses the claim limitations directed to compacting elements, including at least one non-zero element, and including an indication of the at least one non-zero element. The elements of Page related to memory organization and manipulation are not relied upon in rejecting the claims. Therefore, whether or not it would have been obvious to modify Teng to include such elements is not relevant. Accordingly, the Applicant’s arguments are deemed unpersuasive. 

On page 12 of the response the Applicant argues, “Even if some interpretation of the teachings of Page could be developed to address the elements of the amended claim, the teachings of Page cannot be properly combined with the teachings of Teng and Simonyan. The Office Action on page 19 states that "an attempt to characterize Page as being directed to techniques suitable exclusively for FPGAs is unpersuasive" because "Page is cited as teaching a particular mechanism for processing sparse matrices, and is not cited for or limited to any particular hardware implementation." Applicant respectfully disagrees. While Page states "other hardware and/or software embodiments can be considered," the techniques of Page are explicitly described as being adapted for implementation in an FPGA. Page, par. 50 states "[b]y using the distributed memories found in modern FPGAs to store multiple copies of vector elements and pre-processing of the matrix to store indices into these memories, circuits using the presented method perform 100 times better than existing sparse matrix times vector implementation, and over two times better than state of the art dense matrix vector circuits." Page, par. 52 goes on to state, "[e]xisting memory devices with two read ports do not typically come in a configuration that would allow efficient use all the resources of the FPGA. Consequently, in the implementation described above, each 
Though fully considered, the Examiner respectfully disagrees. As has already been discussed, Page explicitly discloses, e.g., at ¶ [0051], “Whilst an FPGA implementation is currently seen as beneficial, other hardware and/or software embodiments can be considered, with different memory and/or processor configurations.” The Applicant argues that the overall method of Page can be beneficially implemented on an FPGA. However, as discussed above, Page is cited not for the overall method, including all the specific design choices and optimizations selected by page. Instead, Page is selected to supply the claim elements missing in Teng’s disclosure. That is, Teng does not disclose compressing non-zero values in a sparse matrix and providing an indication of the location of the non-zero values so that the zero values can be skipped. 
The concept of skipping elements with zero values is known, as evidenced by Page. Skipping elements with zero values can have benefits regardless of the particular hardware implementation. This is also disclosed by Page, as evidenced by the explicit disavowal of limitation to any specific hardware implementation. Therefore, the Applicant’s repeated argument that because the description of zero-skipping is given in the context of an FPGA, zero-skipping is limited to FPGA is unpersuasive. 
The advantages that follow from FPGA implementation are not directly related to the teachings of Page that are relied upon for determining that the claims are obvious. That is, those advantages are related to the memory management and organization. As discussed above, these are not relied upon in teaching elements of Applicant’s claims. Instead, it is the identification and consolidating of non-zero elements that is relied upon. In addition to ¶ [0051], Page states in several other places that the teachings are not limited to FPGAs. For example, ¶ [0028] states, “This implementation focuses on an FPGA system, but alternative systems using a similar or equivalent configuration can be considered.” And ¶ [0029], states, “These steps may be implemented on a computing system such as shown in FIG. 1 as either hardware (such as programmed logic) or software (implementing the processor arrangement through one or more microprocessors, for instance), although they may alternatively be implemented in software in a different type of computing system.” 
The Examiner maintains that modifying 

On pages 12-13 of the response the Applicant argues, “Applicant respectfully submits that the assertion that one would not have been motivated to modify the GPU disclosed by Teng to implement the teachings of Page (as opposed to implementing those teachings within the FPGA) in light of how Teng characterizes the benefits of an FPGA in comparison with a GPU. Teng, par. 69-70 states: The FPGA circuitry 926 is beneficial as a neural network accelerator for a neural network in which the computation is distributed into many layers, and the computational requirements of each layer are insufficient to keep the GPU 928 well utilized. Custom, on-chip memory can help to ensure data locality when transitioning  between  small layers and  thereby  significantly  accelerate computations. The GPU 928 is beneficial as a neural network accelerator for a neural network in which [] each [layer's] computational requirements would keep the GPU busy and less reliant on data transfers between memory and processor elements of the GPU. Thus, Teng describes differential benefits of a GPU and an FPGA, but does not otherwise mention the use of a GPU. Where Teng par. 46 states that "[t]he disclosed approaches are not limited to any specific hardware platforms," Teng then explicitly states "for purposes of providing a frame of reference to those skilled in the art, the neural network accelerator can be implemented on a KINTEX® ULTRASCALETM 115 device, which is available from Xilinx, Inc.," which, as known by those skilled in the art, is an FPGA. No example of how to implement the teachings of Teng in a GPU are provided. Thus, Teng appears to effectively be stating that the teachings are not limited to any particular FPGA platform. Furthermore, based on the discussion of the strengths of FPGA and GPU implementations of a neural network accelerator in Teng, par. 69-70, there is not any indication to one skilled in the art that Teng would prefer to implement sparse matrix functionality into a GPU instead of enhancing the FPGA design that is disclosed. To modify the teachings of the FPGU design of Page and the FPGA design of Teng to enable implementation in the GPU of Teng goes well beyond what would have been obvious to a person having ordinary skill in the art at the time of the earliest priority date applicable to the instant claims.”

As noted by the Applicant, Teng discloses that whether to use an FPGA or GPU is essentially a design choice dependent on the type of neural network being implemented. However, there is no suggestion that only an FPGA based accelerator, and not a GPU based accelerator, would benefit from zero skipping, as taught by Page. Accordingly, the Applicant’s arguments are deemed unpersuasive.

On page 14 of the response the Applicant presents similar arguments with regard to claim 26 as those presented above.
Though fully considered, the Examiner respectfully disagrees. The reasons set forth in the remarks and rejections presented above are applicable to these arguments as well.

On page 15 of the response the Applicant argues, “the cited combination of references fails to teach or suggest, at the least, "wherein the predetermined pattern includes at least one zero value and at least one non-zero value," as Simonyan, which is cited for structured sparsity, is limited to contiguous non-zero values and does not support a predetermined pattern that includes "at least one zero value."”.
Though fully considered, the Examiner respectfully disagrees. As noted above, Simonyan discloses generating a result matrix that has a specified sparsity. See, e.g., ¶ [0042]. While Simonyan may generate patterns having contiguous blocks on non-zero values, the pattern is still sparse, i.e., includes zero values. Therefore, the Applicant’s arguments are deemed unpersuasive. 

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SHAWN DOMAN whose telephone number is (571)270-5677.  The examiner can normally be reached on Monday through Friday 8:30am-6pm Eastern Time.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jyoti Mehta can be reached on 571-270-3995.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/SHAWN DOMAN/
Primary Examiner, Art Unit 2183