DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claims 1-24 have been examined.

Information Disclosure Statement
The Applicant's submission of the Information Disclosure Statements dated October 9, 2021 and January 7, 2022 is acknowledged by the examiner and the cited references have, except where otherwise indicated, been considered in the examination of the claims now pending. One reference is not considered because Applicant has failed to comply with 37 CFR 1.98(b)(5), which requires that relevant pages be identified. Citing references that are dozens or hundreds of pages long without identifying relevant portions hinders the Office' s ability to effectively determine the relevance of the reference. Although a concise explanation of the relevance of the information is not required for English language information, applicants are encouraged to provide a concise explanation of why the English-language information is being submitted and how it is understood to be relevant. Concise explanations (especially those which point out the relevant pages and lines) are helpful to the Office, particularly where documents are lengthy and complex and applicant is aware of a section that is highly relevant to patentability or where a large number of documents are submitted and applicant is aware that one or more are highly relevant to patentability. Copies of the PTOL-1449s initialed and dated by the Examiner are attached to the instant office action.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a 

Claims 1-5, 7-13, 15-21, 23, and 24 are rejected under 35 U.S.C. 103 as being unpatentable over US Publication No. 2004/0111587 by Nair et al. (hereinafter referred to as “Nair”) in view of US Publication No. 2017/0293659 by Huang (hereinafter referred to as “Huang”) in view of US Patent No. 8,924,455 by Barman et al. (hereinafter referred to as “Barman”). 
Regarding claims 1, 9, and 17, taking claim 1 as representative, Nair discloses:
a processor comprising: fetch circuitry to fetch an instruction comprising fields to specify locations of a first matrix, a second matrix, a third matrix, and an opcode indicating execution circuitry is to multiply and accumulate matching non-zero (NZ) elements of the first matrix and the second matrix with corresponding elements of the third matrix (Nair discloses, at ¶ [0021], a processor that fetches instructions, which discloses fetch circuitry. Nair also discloses, at ¶ [0062], an instruction that specifies three matrix operands and an opcode. As disclosed at ¶ [0090] (Table 1) the opcode can indicate multiply and accumulate of, as disclosed at ¶ [0089], corresponding elements, which includes non-zero elements.); 
decode circuitry to decode the fetched instruction (Nair discloses, at ¶ [0024], decoding instructions, which discloses decode circuitry.); and 
the execution circuitry to execute the decoded instruction as per the opcode… to multiply and accumulate matching NZ elements of the first matrix and the second matrix with corresponding elements of the third matrix (Nair discloses, at ¶ [0090] (Table 1), executing a multiply and accumulate instruction, which discloses execution circuitry, that multiplies a first and second matrix and accumulates the result with a third (destination) matrix. Nair also discloses, at ¶ [0089], the matrix instructions operate on corresponding (matching) elements, which includes non-zero elements.).
Nair does not explicitly disclose that the execution circuitry to execute the instruction is to generate NZ bitmasks for the first matrix and the second matrix, broadcast NZ elements from each row of the first matrix to a corresponding row of a two-dimensional grid of processing engines and from each column of the second matrix to a corresponding column of the two-dimensional grid of processing engines, that the multiplying and accumulating of the NZ elements is based on the NZ bitmasks, and wherein each processing engine comprises a buffer, and is to store a broadcast NZ element in its buffer 
However, in the same field of endeavor (e.g., matrix operations) Huang discloses:
generating NZ bitmasks (Huang discloses, at ¶ [0109], generating bitmasks that identify non-zero elements.)
broadcasting NZ elements from each row of the first matrix… and from each column of the second matrix…and multiplying and accumulating of the NZ elements is based on the NZ bitmasks (Huang discloses, at ¶¶ [0134]- [0135] and Figure 12, broadcasting rows and vectors into fifos of an array of processing units. The values that are stored are stored are the compressed values, which means they are stored as a result of the non-zero bit masks indicating matching elements will arrive to be used for performing multiplication and addition based on this indication provided by the NZ bitmasks.); 
wherein each processing engine comprises a buffer, and is to store a broadcast NZ element in its buffer for use in a subsequent cycle in response to the NZ bitmasks indicating a matching NZ element will arrive in the subsequent cycle, and not store a broadcast NZ element in its buffer in response to the NZ bitmasks indicating a matching NZ element will not arrive (Huang discloses, at ¶¶ [0134]- [0135] and Figure 12, storing rows and vectors into fifos (buffers) of an array of processing units. The values that are stored are stored are the compressed values, which means the stored values are stored as a result of the non-zero bit masks indicating matching elements will arrive and when the non-zero bit masks do not indicate matching elements will arrive values are not stored, and each value in the fifo, except the first, will be used in subsequent cycles.); and
non-transitory computer-readable storage media (Huang discloses, at ¶ [0103], non-transitory computer-readable storage media.).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify Nair’s matrix multiply and accumulate instruction to include Huang’s generation of non-zero bit masks and storing of non-zero elements in order to improve performance by reducing the number of computations needed calculate a result. See Huang, ¶ [0121].
Also, in the same field of endeavor (e.g., matrix operations) Barman discloses:
(Barman discloses, at col. 3, lines 59-62, each processing cell in an MxL (two-dimensional) array of processing cells receives elements from corresponding rows and columns of input matrices.).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify Nair’s matrix multiply and accumulate instruction to utilize a two-dimensional array of processing cells, as disclosed by Barman, in order to improve performance by increasing parallelism to achieve high throughput. See Barman, col. 3, lines 2-4.

Regarding claims 2, 10, and 18, taking claim 2 as representative, Nair, as modified, discloses the elements of claim 1, as discussed above. Nair also discloses:
wherein each of the first matrix, second matrix, and third matrix are located in a corresponding single two-dimensional tile register in a matrix operations accelerator (Nair discloses, at ¶ [0037], the matrix data processor, i.e., matrix operations accelerator, utilizes packed data contained in the span of a register set, i.e., single two-dimensional tile register.).

Regarding claims 3, 11, and 19, taking claim 3 as representative, Nair, as modified, discloses the elements of claim 1, as discussed above. Nair does not explicitly disclose the execution circuitry is to execute the decoded instruction to broadcast a first set of NZ elements from a first row of the first matrix to both a first processing engine and a second processing engine in a first row of the two-dimensional grid of processing engines, and broadcast a second set of NZ elements from a first column of the second matrix to both the first processing engine and a third processing engine in a first column of the two-dimensional grid of processing engines.
However, in the same field of endeavor (e.g., matrix operations) Barman discloses:
broadcasting a first set of NZ elements from a first row of the first matrix to both a first processing engine and a second processing engine in a first row of the two-dimensional grid of processing engines, and broadcasting a second set of NZ elements from a first column of the second matrix to both the first processing engine and a third processing engine in a first column of the two-dimensional grid of  (Barman discloses, at col. 6, lines 42-45 and Figure 9, broadcasting a row of an input matrix into a row of the systolic array, which discloses first and second processing engines, and broadcasting a column of a second input matrix into a column of the systolic array, which discloses first and third processing engines.). 
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify Nair’s matrix multiply and accumulate instruction to utilize a two-dimensional array of processing cells, as disclosed by Barman, in order to improve performance by increasing parallelism to achieve high throughput. See Barman, col. 3, lines 2-4. 

Regarding claims 4, 12, and 20, taking claim 4 as representative, Nair, as modified, discloses the elements of claim 1, as discussed above. Nair also discloses:
wherein the first matrix has M rows by K columns, the second matrix has K rows by N columns, the third matrix has M rows by N columns…and wherein the instruction is further to specify K, M, and N (Nair discloses, at ¶ [0050], the instruction specifies the number of rows and columns for each of the matrices.).
Nair does not explicitly disclose wherein the two dimensional grid of processing engines has M rows by N columns.
However, in the same field of endeavor (e.g., matrix operations) Barman discloses:
a two dimensional grid of processing engines having M rows by N columns (Barman discloses, at col. 3, lines 10-42, a systolic array having a number of MAC units (processing elements) that depends on the size of the matrices being multiplied.).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify Nair’s matrix multiply and accumulate instruction to utilize a two-dimensional array of processing cells, one for each element of the output matrix, as disclosed by Barman, in order to improve performance by increasing parallelism to achieve high throughput. See Barman, col. 3, lines 2-4.

Regarding claims 5, 13, and 21, taking claim 5 as representative, Nair, as modified, discloses the elements of claim 1, as discussed above. Nair also discloses:
wherein the first matrix has M rows by K columns, the second matrix has K rows by N columns, the third matrix has M rows by N columns…and wherein K, M, and N are configured in a configuration register in the processor before the instruction is fetched (Nair discloses, at ¶ [0063], specifying the matrix parameters in a configuration register. As these parameters are used when the instruction is executed, the parameters are stored prior to fetching the instruction.).
Nair does not explicitly disclose wherein the two dimensional grid of processing engines has M rows by N columns.
However, in the same field of endeavor (e.g., matrix operations) Barman discloses:
a two dimensional grid of processing engines having M rows by N columns (Barman discloses, at col. 3, lines 10-42, a systolic array having a number of MAC units (processing elements) that depends on the size of the matrices being multiplied.).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify Nair’s matrix multiply and accumulate instruction to utilize a two-dimensional array of processing cells, one for each element of the output matrix, as disclosed by Barman, in order to improve performance by increasing parallelism to achieve high throughput. See Barman, col. 3, lines 2-4.

Regarding claims 7, 15, and 23, taking claim 7 as representative, Nair, as modified, discloses the elements of claim 1, as discussed above. Nair does not explicitly disclose wherein at least one of the first matrix and the second matrix is a sparse matrix containing a plurality of zero-valued elements.
However, in the same field of endeavor (e.g., matrix operations) Huang discloses:
wherein at least one of the first matrix and the second matrix is a sparse matrix containing a plurality of zero-valued elements (Huang discloses, at ¶ [0105], sparse matrices.).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify Nair’s matrix multiply and accumulate instruction to include See Huang, ¶ [0121].

Regarding claims 8, 16, and 24, taking claim 8 as representative, Nair, as modified, discloses the elements of claim 7, as discussed above. Nair does not explicitly disclose wherein the sparse matrix has been stored in memory in compressed format before fetching the instruction, the compressed format to pack NZ elements together and indicate a logical matrix position of each NZ element in a header.
However, in the same field of endeavor (e.g., matrix operations) Huang discloses:
wherein the sparse matrix has been stored in memory in compressed format before fetching the instruction, the compressed format to pack NZ elements together and indicate a logical matrix position of each NZ element in a header (Huang discloses, at ¶ [0108], storing non-zero elements in compressed format and, at ¶ [0111], storing the compression information in a header.).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify Nair’s matrix multiply and accumulate instruction to include Huang’s compression because doing so improves efficiency by saving storage space. See Huang, ¶ [0105].

Claims 6, 14, and 22 are rejected under 35 U.S.C. 103 as being unpatentable over Nair in view of Huang in view of Barman in view of US Publication No. 2014/0059322 by Ould-Ahmed-Vall et al. (hereinafter referred to as “Ould”).
Regarding claims 6, 14, and 22, taking claim 6 as representative, Nair, as modified, discloses the elements of claim 1, as discussed above. Nair does not explicitly disclose wherein the instruction is further to specify a writemask to indicate, for each element of the third matrix, whether the element is to be updated or is to be masked, the instruction further to specify whether masked elements are to be zeroed, setting their values to zero, or merged, leaving their values unchanged.
However, in the same field of endeavor (e.g., vector operations) Ould discloses:
wherein the instruction is further to specify a writemask to indicate, for each element of the third matrix, whether the element is to be updated or is to be masked, the instruction further to specify whether (Ould discloses, at ¶ [0066], a mask (writemask) that specifies whether a destination will be updated and whether the destination will be zeroed, merged, or retain its old value.).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify Nair’s matrix multiply and accumulate instruction to include Ould’s writemasking because doing so provides “significant benefits over current techniques which have the disadvantage of increased instruction count resulting from memory access operations.” See Ould, ¶ [0075].

Response to Arguments
On page 8 of the response filed January 7, 2022 (“response”), the Applicant argues, “As noted in MPEP 707.07(d), "[t]he examiner should, as a part of the first Office action on the merits, identify any claims which he or she judges, as presently recited, to be allowable and/or should suggest any way in which he or she considers that rejected claims may be amended to make them allowable." (emphasis added). Applicant respectfully requests suggestions regarding allowability including claim amendment suggestions. Providing this prescribed guidance is in the interest of compact prosecution.”
Though fully considered, the Examiner respectfully disagrees. It is not at present apparent which part of the application could serve as a basis for a new, allowable claim.

On page 11-12 of the response the Applicant argues, “Huang of the alleged combination does not disclose "wherein each processing engine ... is to   not store a broadcast NZ element in its buffer in response to the NZ bitmasks indicating a matching NZ element will not arrive" as in Applicant's independent claims 1, 9, and 17 (emphasis added). Figure 12 of Huang of the alleged combination illustrates a "masks ops and skip" 1231 downstream of the fifos. If Huang was not storing a broadcast non-zero (NZ) element in its fifos, it is unclear why there would be "skip" 1231.”
Though fully considered, the Examiner respectfully disagrees. Huang explicitly discloses that what is stored into the FIFOs is a compressed buffer. That is, the values determined to correspond to non-zero values are stored, whereas those values that are determined to not correspond to non-zero 

On page 12 of the response the Applicant argues, “If this is Official Notice, any allegation of Official notice is hereby traversed. Applicant submits that these allegations are not "capable of instant and unquestionable demonstration as being well-known" and are thus improper (see, e.g., MPEP §2144.03). "It is never appropriate to rely solely on common knowledge in the art without evidentiary support in the record as the principal evidence upon which a rejection was based." See Zurko, 258 F.3d at 1386, 59 USPQ2d at 1697; Ahlert, 424 F.2d at 1092, 165 USPQ 421. For example, the Applicant submits that the allegation on page 4 of the Office action that Huang of the alleged combination discloses "wherein each processing engine ... is to store a broadcast NZ element in its buffer for use in a subsequent cycle in response to the NZ bitmasks indicating a matching NZ element will arrive in the subsequent cycle, and not store a broadcast NZ element in its buffer in response to the NZ bitmasks indicating a matching NZ element will not arrive" (as in Applicant's independent claims 1, 9, and 17) from Figure 12 and paragraphs [0134]-[0135] of Huang is not "capable of instant and unquestionable demonstration as being well-known".”
Though fully considered, the Examiner respectfully disagrees. As the Examiner is not relying on official notice, the Applicant’s arguments are moot.

On page 12 of the response the Applicant argues, “Further, Huang of the alleged combination does not disclose "wherein each processing engine ... is to store a broadcast NZ element in its buffer for use in a subsequent cycle in response to the NZ bitmasks indicating a matching NZ element will arrive in the subsequent cycle, and not store a broadcast NZ element in its buffer in response to the NZ bitmasks indicating a matching NZ element will not arrive" as in Applicant's independent claims 1, 9, and 17 (emphasis added).”
Though fully considered, the Examiner respectfully disagrees. These arguments are essentially the same as those addressed above. The response above applies similarly here. Accordingly, the Applicant’s arguments are deemed unpersuasive.

On page 12 of the response the Applicant argues, “If the Office maintains the rejection of the claims, the Applicant requests the Examiner include an explicit analysis for any alleged obviousness (e.g., identifying any exemplary rationale used from MPEP §2143 and clearly articulating how the references would have been combined to disclose the Applicant's claims).”
Though fully considered, the Examiner respectfully disagrees. The Examiner notes that rationales for combining the references have been provided. See, e.g., pages 4-5 of the previous office action and above. Accordingly, the Applicant’s arguments are deemed unpersuasive.

On page 13 of the response the Applicant argues, “Because the Applicant has demonstrated the patentability of all pending independent claims, the Applicant respectfully submits that all pending claims are allowable. The Applicant's silence with respect to the dependent claims should not be construed as an admission by the Applicant that the Applicant is complicit with the Examiner's rejection of these claims. Because the Applicant has demonstrated the patentability of the independent claims, the Applicant need not substantively address the theories of rejection applied to the dependent claims.”
Though fully considered, the Examiner respectfully disagrees. The reasons set forth in the remarks and rejections presented above, including those regarding the independent claims, are applicable to these claims.

Conclusion
THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SHAWN DOMAN whose telephone number is (571)270-5677.  The examiner can normally be reached on Monday through Friday 8:30am-6pm Eastern Time.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jyoti Mehta can be reached on 571-270-3995.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/SHAWN DOMAN/
Primary Examiner, Art Unit 2183