DETAILED ACTION
Claims 1-20 are pending.
The office acknowledges the following papers:
Claims and remarks filed on 4/21/2022.

Allowable Subject Matter
Claims 4 and 14 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

Withdrawn objections and rejections
The 35 U.S.C. 112(a) rejections for claims 4 and 14 have been withdrawn.

New Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.

Claims 1-3, 5, 7, 9-13, 15, 17, and 19-20 are rejected under 35 U.S.C. 103 as being unpatentable over Hagog et al. (U.S. 2014/0013075), in view of Ginzburg et al. (U.S. 2011/0153707), in view of Chung et al. (U.S. 10,167,800).
As per claim 1:
Hagog disclosed a processor comprising: 
a plurality of vector registers (Hagog: Figure 9 element 910, paragraph 142);
fetch circuitry to fetch a single instruction specifying a horizontal tile operation (Hagog: Figures 2 and 10B elements 201 and 1038, paragraphs 45 and 155)(The vector packed horizontal add/sub operation is fetched within the processor. The vector operation adds/subtracts data elements horizontally across the source vector register (i.e. horizontal tile operation).);
decode circuitry to decode the fetched instruction into a decoded instruction (Hagog: Figures 2 and 10B elements 203 and 1040, paragraphs 46 and 155); and 
execution circuitry to respond to the decoded instruction (Hagog: Figures 1-2 and 10B elements 207 and 1062, paragraphs 38, 48, and 156).
Hagog failed to teach a matrix operations accelerator comprising a single two-dimensional tile register, separate from the plurality of vector registers, to store a single two-dimensional M by N source matrix; K groups of elements of the single two-dimensional M by N source matrix, the single two-dimensional tile register to store the single two-dimensional M by N source matrix comprising the K groups of elements, and locations of K destinations; execution circuitry to respond to the decoded instruction to cause the matrix operations accelerator to generate K results, each result to be generated by performing the specified horizontal tile operation across every element of a corresponding group of the K groups of elements from the single two-dimensional tile register and write each generated result to a corresponding location of the K specified destination locations
However, Chung and Ginzburg combined with Hagog disclosed a matrix operations accelerator (Chung: Figure 4 element 440, column 8 lines 43-67 continued to column 9 lines 1-3)(Ginzburg: Figure 4 element 136, paragraph 35)(Hagog: Figure 10B element 1060, paragraph 156)(The combination implements the matrix-vector unit (i.e. matrix operations accelerator) into Hagog for matrix processing.) comprising a single two-dimensional tile register, separate from the plurality of vector registers, to store a single two-dimensional M by N source matrix (Chung: Figure 4 elements 430, 442, and 452, column 8 lines 43-67 continued to column 9 lines 1-6)(Hagog: Figures 9 and 10B elements 910 and 1058, paragraphs 142 and 156)(The combination implements the matrix-vector unit (i.e. matrix operations accelerator) into Hagog for matrix processing. The matrix-vector unit includes a separate matrix register file from the vector register file present in Hagog. Individual registers within the matrix register file store single matrices (i.e. single two-dimensional tile register). Chung defines matrices as a 2D set of scalar elements.);
K groups of elements of the single two-dimensional M by N source matrix, the single two-dimensional tile register to store the single two-dimensional M by N source matrix comprising the K groups of elements, and locations of K destinations (Chung: Figure 4 elements 440 and 442, column 8 lines 43-67 continued to column 9 lines 1-6)(Ginzburg: Figure 3 elements 310-330, paragraphs 31-32)(Hagog: Figure 1, paragraph 38)(Ginzburg disclosed vector load and gather operations that load matrices from memory into registers by row and column layout. The combination allows for loading matrices from memory into the matrix register file. The combination allows for the vector packed horizontal add/subtract operations to execute on matrix data in the matrix vector unit using a single source matrix register. The source matrix register stores a 4x4 matrix with 4 groups of rows/columns. The combination allows for the instruction to specify a destination register to store execution results of the 4 groups. Chung defines matrices as a 2D set of scalar elements.);
execution circuitry to respond to the decoded instruction to cause the matrix operations accelerator to generate K results (Chung: Figure 4 elements 440 and 442, column 8 lines 43-67 continued to column 9 lines 1-3)(Ginzburg: Figure 3 elements 310-330, paragraphs 31-32)(Hagog: Figures 1-2 and 10B elements 207 and 1062, paragraphs 38, 48, and 156)(The combination allows for the vector packed horizontal add/subtract operations to execute on matrix data in the matrix vector unit. The vector packed horizontal add/subtract operation generates four results.), each result to be generated by performing the specified horizontal tile operation across every element of a corresponding group of the K groups of elements from the single two-dimensional tile register and write each generated result to a corresponding location of the K specified destination locations (Chung: Figure 4 elements 440 and 442, column 8 lines 43-67 continued to column 9 lines 1-3)(Ginzburg: Figure 3 elements 310-330, paragraphs 31-32)(Hagog: Figures 1-2 element 207, paragraphs 38 and 48)(The combination allows for the vector packed horizontal add/subtract operations to execute on matrix data from a single source matrix register in the matrix register file of the matrix vector unit. The combination allows for the instruction to specify a destination matrix register to store execution results of the 4 groups.).
The advantage of the vector load and gather instructions of Ginzburg is that matrices can be loaded from memory in row/column ordering for matrix processing operations. Thus, it would have been obvious to one of ordinary skill in the art at the time of the earliest effective filing date to implement the vector load and gather instructions of Ginzburg into Hagog to allow for matrix processing operations.
The advantage of the matrix vector unit with a separate matrix register file is that larger data sets can be stored for faster parallel processing. Thus, it would have been obvious to one of ordinary skill in the art at the time of the earliest effective filing date to implement the matrix vector unit and matrix register file of Chung into Hagog for the advantage of faster parallel processing.
As per claim 2:
Hagog, Ginzburg, and Chung disclosed the processor of claim 1, wherein the horizontal tile operation is selectable between add, add-squares, multiply, maximum, minimum, logical AND, logical OR, and logical XOR (Hagog: Figures 1-2 element 201, paragraphs 38 and 45).
As per claim 3:
Hagog, Ginzburg, and Chung disclosed the processor of claim 1, wherein each of the K generated results is a scalar value and each of the K specified destination locations is in a second single two-dimensional tile register, that is separate from the plurality of vector registers, of the matrix operations accelerator (Chung: Figure 4 elements 440 and 442, column 8 lines 43-67 continued to column 9 lines 1-3)(Ginzburg: Figure 3 elements 310-330, paragraphs 31-32)(Hagog: Figures 1-2 and 9 elements 207 and 910, paragraphs 38, 48, and 142)(The combination allows for the vector packed horizontal add/subtract operations to execute on matrix data from the matrix register file in the matrix vector unit. The combination allows for the instruction to specify a destination matrix register to store execution results of the 4 groups. Each individual execution result is a scalar value. The destination matrix register is separate from the vector register file of Hagog.).
As per claim 5:
Hagog, Ginzburg, and Chung disclosed the processor of claim 1, wherein each of the K generated results is a scalar value and each of the K specified destination locations is a register (Hagog: Figure 1, paragraphs 38 and 48)(Each data lane execution result is a scalar value written to a data element of the destination register.).
As per claim 7:
Hagog, Ginzburg, and Chung disclosed the processor of claim 1, wherein the single instruction further specifies M, N, an element size, and a data format of each of the elements of the specified single two-dimensional M by N source matrix (Chung: Figure 4 elements 440 and 442, column 8 lines 43-67 continued to column 9 lines 1-6)(Ginzburg: Figure 3 elements 310-330, paragraphs 31-32)(Hagog: Figures 3 and 8A elements 303-305, paragraphs 35, 43, 52-53, and 143)(The VPHADDSUB instruction encoding allows for specifying the data element size via a prefix. The VPHADDSUB instruction encoding also allows for floating-point data elements. The vector encoding format allows for specifying floating-point or integer data elements. The combination allows for the vector packed horizontal add/subtract operations to execute on matrix data in the matrix vector unit. The source vector register stores a 4x4 matrix with 4 groups of rows/columns. Thus, the data element size indicates the number of data elements in source registers and indirectly indicates the number of M and N (i.e. 4x4).).
As per claim 9:
Hagog, Ginzburg, and Chung disclosed the processor of claim 1, wherein the execution circuitry, in response to the decoded instruction, is to operate on J of the K groups of elements, where J is less than K (Hagog: Figures 1-2 elements 207, paragraphs 32 and 48)(A writemask register allows for execution on fewer than all data elements. It would have been obvious to one of ordinary skill in the art that a writemask can be set so that one or more of the data lanes performs no operation/update of data within packed registers.).
As per claim 10:
Hagog, Ginzburg, and Chung disclosed the processor of claim 1, wherein the specified horizontal tile operation is non-commutative, and wherein the execution circuitry is to perform the specified horizontal tile operation on elements of each group in a predetermined order (Ginzburg: Figures 5-6, paragraphs 36-37)(Hagog: Figure 1, paragraph 38)(Ginzburg disclosed vector multiply-add operations. The combination allows for Hagog to additionally perform a multiplication with add operation on a second vector source register. Multiply-add operations are non-commutative as the result changes based on performing the multiply or the add operation first.).
The advantage of the vector MAC instruction of Ginzburg is that the 2D matrix operation can be performed through a single instruction, instead of a sequence of instructions, which has the benefit of reduced code size. Thus, it would have been obvious to one of ordinary skill in the art at the time of the earliest effective filing date to implement the vector 2D matrix instruction of Ginzburg into the processor of Hagog.
As per claim 11:
Claim 11 essentially recites the same limitations of claim 1. Therefore, claim 11 is rejected for the same reasons as claim 1.
As per claim 12:
The additional limitation(s) of claim 12 basically recite the additional limitation(s) of claim 2. Therefore, claim 12 is rejected for the same reason(s) as claim 2.
As per claim 13:
The additional limitation(s) of claim 13 basically recite the additional limitation(s) of claim 3. Therefore, claim 13 is rejected for the same reason(s) as claim 3.
As per claim 15:
The additional limitation(s) of claim 15 basically recite the additional limitation(s) of claim 5. Therefore, claim 15 is rejected for the same reason(s) as claim 5.
As per claim 17:
The additional limitation(s) of claim 17 basically recite the additional limitation(s) of claim 7. Therefore, claim 17 is rejected for the same reason(s) as claim 7.
As per claim 19:
The additional limitation(s) of claim 19 basically recite the additional limitation(s) of claim 9. Therefore, claim 19 is rejected for the same reason(s) as claim 9.
As per claim 20:
The additional limitation(s) of claim 20 basically recite the additional limitation(s) of claim 10. Therefore, claim 20 is rejected for the same reason(s) as claim 10.

Claims 6, 8, 16, and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Hagog et al. (U.S. 2014/0013075), in view of Ginzburg et al. (U.S. 2011/0153707), in view of Chung et al. (U.S. 10,167,800), further in view of Official Notice.
As per claim 6:
Hagog, Ginzburg, and Chung disclosed the processor of claim 1, wherein each of the K generated results is a scalar value plus metadata, and each of the K specified destination locations is one register to store the scalar value and one register, separate from the one register to store the scalar value, to store the metadata (Hagog: Figure 1, paragraphs 38 and 48)(Each data lane execution result is a scalar value written to a data element of the destination register. Official notice is given that signed scalar values include a sign bit (i.e. metadata) within the scalar value for the advantage of processing positive and negative data. Thus, it would have been obvious to one of ordinary skill in the art to implement sign bits within the destination data elements. It would have been obvious to one of ordinary skill in the art that data values and sign bits can be stored separately. In addition, according to “In re Japikse” (181 F.2d 1019, 86 USPQ 70 (CCPA 1950)), shifting the location of parts doesn’t give patentability over prior art.).
As per claim 8:
Hagog, Ginzburg, and Chung disclosed the processor of claim 1, wherein the execution circuitry, when generating each of the K results, is to perform the specified horizontal tile operation across every element of the corresponding group of the K groups of elements in a single clock cycle (Hagog: Figures 1-2 and 10A elements 207 and 1016, paragraphs 48 and 153)(Official notice is given that pipeline stage lengths are completed in a single clock cycle for the advantage of increased performance. Thus, it would have been obvious to one of ordinary skill in the art that the execution stage for the vector horizontal add instruction takes a single clock cycle.).
As per claim 16:
The additional limitation(s) of claim 16 basically recite the additional limitation(s) of claim 6. Therefore, claim 16 is rejected for the same reason(s) as claim 6.
As per claim 18:
The additional limitation(s) of claim 18 basically recite the additional limitation(s) of claim 8. Therefore, claim 18 is rejected for the same reason(s) as claim 8.

Response to Arguments
The arguments presented by Applicant in the response, received on 4/21/2022 are partially considered persuasive.	
Applicant argues:
“As noted in MPEP 707.07(d), "[t]he examiner should, as a part of the first Office action on the merits, identify any claims which he or she judges, as presently recited, to be allowable and/or should suggest any way in which he or she considers that rejected claims may be amended to make them allowable." Applicant respectfully requests suggestions regarding allowability including claim amendment suggestions. Providing this prescribed guidance is in the interest of compact prosecution.”

Claims 4 and 14 are now in condition for allowance if made into independent form due to applicant’s persuasive remarks. 
In addition, the examiner notes that making minor amendments to claims 7 and 17 to indicate each claimed element is an instruction field of the single instruction would likely overcome the combination. This is due to the rejection relying upon the data element size field to implicitly indicate the number of rows and columns of the source matrix data.
Applicant argues for claims 4 and 14:
“For example, the Applicant's published application at paragraph [0187] (with the heading of "'Groups to Process") recites "Disclosed embodiments perform the TILEHOP instructions on groups of elements within a matrix (tile). A group of elements may consist of all elements within a matrix (tile), or a matrix (tile) may be partitioned row-wise, column-wise, or into sub-tiles. Other partitions are possible, but not described for sake of simplicity." (emphasis added). 
The Applicant's published application at paragraph [0222] recites "As shown, instruction 2400 further includes several additional, optional fields: order of operations 2408, which groups to process' 2410, data element size 2412, data element format 2414, M (number of rows) 2416, N (number of columns), K (number of groups of elements) 2420, and group size 2422." (emphasis added). 
The Applicant's published application at paragraph [0191] (with the heading of "Groups: Row-Wise") recites "In some embodiments, disclosed embodiments process a TILEHOP instruction specifying row-wise groups of elements within a matrix (tile)." (emphasis added). Further, the Applicant's published application at paragraph [0217] recites "FIG. 23B is exemplary pseudocode describing an embodiment of a processor executing a TILEHOP instruction specifying a row-wise horizontal ADD operation" (emphasis added). 
The Applicant's published application at paragraph [0195] (with the heading of "Groups: Column-Wise") recites "In some embodiments, disclosed embodiments process a TILEHOP instruction specifying column-wise groups of elements within a matrix (tile). Each column of the matrix (tile) is sometimes treated as a group." (emphasis added). Further, the Applicant's published application at paragraph [0218] recites "FIG. 23C is exemplary pseudocode describing an embodiment of a processor executing a TILEHOP instruction specifying a column-wise horizontal ADD operation." (emphasis added).”

This argument is found to be persuasive for the following reason. The examiner agrees that these paragraphs provide sufficient written description support for field 2410 in figure 24 being the claimed field. Thus, the 35 U.S.C. 112(a) rejections for claims 4 and 14 have been withdrawn.
Applicant argues for claims 1 and 11:
“As another example, page 13 of the Office action alleges that "The combination further adds the matrix register file of Chung to map to the 'two-dimensional tile register' to store the 2D matrices loaded from memory by Ginzburg". However, the alleged combination (e.g., Chung of the alleged combination) does not each or suggest "a matrix operations accelerator comprising a single two-dimensional tile register, separate from the plurality of vector registers, to store a single two-dimensional M by N source matrix" as recited, inter alia, in Applicant's amended independent claim 1 (and similarly in amended independent claim 11) (emphasis added). For example, column 10, lines 17-21 of Chung of the alleged combination merely recite "The fourth parameter may be the size of the matrix register file (NRF SIZE), which stores a given number of HWVECELEMSxHWVECELEMS matrices in an on-chip memory corresponding to the NFU (e.g., fast on-chip BRAM (see description later)." and column 10, lines 43-46 of Chung of the alleged combination merely recites "First, a matrix register file may be used to store MRF SIZE HWVECELEMSxHWVECSELEMS matrices in a series of fast on-chip random access memories (e.g., [block random access memories] in an FPGA).””  

This argument is not found to be persuasive for the following reason. The combination implements the matrix-vector unit of Chung into the processor of Hagog to perform matrix operations. The matrix-vector unit includes a matrix register file, which includes individual registers to hold matrix data. Thus, an individual matrix register within the matrix register file reads upon the amended claim limitations.
Applicant argues regarding the official notices taken:
“Also, any allegations of Official notice are hereby traversed. The Office action on page 14 alleges that "Applicant's response hasn't included why the noticed fact isn't considered well-known in the art." That is incorrect. Applicant's previous response specifically alleged the following (and maintains that traversal herein): 

Applicant submits that these allegations are not "capable of instant and unquestionable demonstration as being well-known" and are thus improper (see, e.g., MPEP §2144.03). "It is never appropriate to rely solely on common knowledge in the art without evidentiary support in the record as the principal evidence upon which a rejection was based." See Zurko, 258 F.3d at 1386, 59 USPQ2d at 1697; Ahlert, 424 F.2d at 1092, 165 USPQ 421. For example, the Applicant submits that the allegation that a sign bit is metadata is not "capable of instant and unquestionable demonstration as being well-known" in reference to the Official notice for claims 6 and 16. As another example, the Applicant submits that the allegation that pipeline stage lengths are completed in a single clock cycle is not "capable of instant and unquestionable demonstration as being well-known" in reference to the Official notice for claims 8 and 18. Additionally, Applicant's 
claims 8 and 18 do not recite a pipeline stage. (Emphasis added.) 

The MPEP offers no examples of "specifically pointing out the supposed errors", other than in MPEP §2144.03(C) stating "A general allegation that the claims define a patentable invention without any reference to the examiner's assertion of official notice would be inadequate." (emphasis added). Applicant clearly made reference to, and specifically traversed, the underlined allegations of Official notice in the above quote from the previous response.”  

This argument is not found to be persuasive for the following reason. MPEP 2144.03 C states “To adequately traverse such a finding, an applicant must specifically point out the supposed errors in the examiner’s action, which would include stating why the noticed fact is not considered to be common knowledge or well-known in the art … A general allegation that the claims define a patentable invention without any reference to the examiner’s assertion of official notice would be inadequate.” Applicant’s response hasn’t included why the noticed fact isn’t considered well-known in the art. Thus, the official notices taken are maintained.
In this instance, sign bits are attached to data values (e.g. scalar) are overwhelmingly known to one of ordinary skill in the art. Additionally, pipeline stages taking a single clock cycle is overwhelmingly known to one of ordinary skill in the art. In both instances, each can likely be found in most introductory computer architecture college textbooks.

	Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
The following is text cited from 37 CFR 1.111(c): In amending in reply to a rejection of claims in an application or patent under reexamination, the applicant or patent owner must clearly point out the patentable novelty which he or she thinks the claims present in view of the state of the art disclosed by the references cited or the objections made. The applicant or patent owner must also show how the amendments avoid such references or objections.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JACOB A. PETRANEK whose telephone number is (571)272-5988.  The examiner can normally be reached on M-F 8:00-4:30.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Aimee Li can be reached on (571) 272-4169.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/JACOB PETRANEK/Primary Examiner, Art Unit 2183