DETAILED ACTION

Status of Application
Claims 1-20 are pending in the present application.
The Double Patenting rejection is withdrawn based on the terminal disclaimer filed 11/29/2021.

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 11/29/2021 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.


Response to Arguments
Applicant’s arguments with respect to claim(s) 1, 9, and 15 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.

Terminal Disclaimer
The terminal disclaimer filed on 11/29/2021 disclaiming the terminal portion of any patent granted on this application which would extend beyond the expiration date of U.S. Patent No. 10,776,110 B2 has been reviewed and is accepted.  The terminal disclaimer has been recorded.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of 
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 5, 9, 15, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Meixner et al (hereinafter Meixner), U.S. Publication No. 2018/0005347 A1, in view of Juffa et al (hereinafter Juffa), U.S. Publication No. 2007/0271325 A1.
	Referring to claims 1, 9, and 15, taking claim 1 as exemplary, Meixner discloses a processor comprising:
front end circuitry [figs. 9a, 10] to schedule matrix operations responsive to a matrix multiplication instruction [paragraph 95, a RISC-like instruction set whose supported arithmetic instruction opcodes include any workable combination of the following…5) MAD (multiply operands A and B and add C to resultant)];
a plurality of lanes to perform parallel execution of the matrix operations, wherein a lane comprises an arithmetic logic unit [paragraph 121, In a further embodiment, the VLIW format includes both an ALU opcode that directs a mathematical function performed by each execution lane's ALU; paragraph 110, “arrays of execution lanes operate in unison to simultaneously process the image data”] to multiply a block of a first matrix with a block of a second matrix to generate a product [paragraph 182, With 
Meixner does not explicitly disclose broadcast circuitry to broadcast one or more invariant matrix blocks including the block of the first matrix to at least one of different registers of the lane and different registers across different lanes.
However, Juffa discloses broadcast circuitry to broadcast one or more invariant matrix blocks to at least one of different registers within the lane and different registers across different lanes [paragraph 6, fig. 1C; performing the multiplication of two matrices in such a way that in a given step of the matrix multiplication, a group of T vector lanes share one of the two source operands to their respective multiply-add operations. This is exploited by the inclusion of an operand broadcast mechanism within the multi-
One of ordinary skill in the art before the effective filing date of the claimed invention would have clearly recognized that it is quite advantageous for the processor of Meixner to provide reduce memory bandwidth requirements for matrix multiplication. It is for this reason one of ordinary skill in the art would have been motivated to implement broadcast circuitry to broadcast one or more invariant matrix blocks including the block of the first matrix to at least one of different registers of the lane and different registers across different lanes.
Referring to claims 5 and 20, taking claim 5 as exemplary, the modified Meixner discloses the processor of claim 1, wherein each lane further comprises multipliers to multiple matrix blocks of the first and second matrices, and wherein one or more adders and accumulators to accumulate results from the multiplication [Meixner, paragraph 182, With the data in matrices A and B being realigned from the shearing algorithms, as observed in FIG. 20b, a multiply operation is performed where each execution lane multiplies the A and B values in its corresponding two dimensional shift register space; “The resultant of the multiplication is stored in local R2 space. Null values may be loaded as an initial condition into R3 space and the resultant of the .
Claims 2, 10, and 16 is/are rejected under 35 U.S.C. 103 as being unpatentable over Meixner, in view of Juffa, as applied to claims 1, 9, and 15 above, and further in view of Coon et al (hereinafter Coon), U.S. Patent No. 8,108,625 B1.
Referring to claims 2, 10, and 16, taking claim 2 as exemplary, the modified Meixner does not explicitly disclose the processor of claim 1, wherein the one or more invariant matrix blocks are accessed by multiple threads within the lane or across multiple lanes.
However, Coon discloses wherein the one or more invariant matrix blocks are accessed by multiple threads within the lane [col. 7, lines 22-26, each thread in a different portion of its assigned lane; hence multiple threads within the lane] or across multiple lanes, in order to provide low latency and support for multiple parallel access operations [col. 2, lines 1-3].
One of ordinary skill in the art before the effective filing date of the claimed invention would have clearly recognized that it is quite advantageous for the processor of the modified Meixner to provide low latency and support for multiple parallel access operations. It is for this reason one of ordinary skill in the art would have been motivated to implement wherein the one or more invariant matrix blocks are accessed by multiple threads within the lane or across multiple lanes.
Claims 3, 11, and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Meixner, in view of Juffa, as applied to claims 1, 9, and 15 above, and further in view of Wang et al (hereinafter Wang), U.S. Publication No. 2016/0233850 A1.
Referring to claims 3, 11, and 17, taking claim 3 as exemplary, the modified Meixner does not explicitly disclose the processor of claim 1, wherein the broadcast circuitry comprises a data cache coupled to all of the lanes, and wherein the data cache is to transmit the same data to all of the lanes.
However, Wang discloses wherein the broadcast circuitry comprises a data cache coupled to all of the lanes, and wherein the data cache is to transmit the same data to all of the lanes [paragraphs 11, 73, reads the data from the coefficient buffer broadcast device 30 as operands for multiplying operation by the vector multiplier unit 401; The coefficient buffer broadcast device 30 is configured to cache the filter coefficients as read from the multi-granularity filter coefficient storage unit 102, and broadcast the cached data], in order to provide reduced number of accesses and improved data usage efficiency [paragraph 27].
One of ordinary skill in the art before the effective filing date of the claimed invention would have clearly recognized that it is quite advantageous for the processor of the modified Meixner to provide reduced number of accesses and improved data usage efficiency. It is for this reason one of ordinary skill in the art would have been motivated to implement wherein the broadcast circuitry comprises a data cache coupled to all of the lanes, and wherein the data cache is to transmit the same data to all of the lanes.
Claims 7, 13, and 19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Meixner, in view of Juffa, as applied to claims 1, 9, and 15 above, and further in view of Ginzburg et al (hereinafter Ginzburg), U.S. Publication No. 2011/0153707 A1.
Referring to claims 7, 13, and 19, taking claim 7 as exemplary, the modified Meixner does not explicitly disclose the processor of claim 1, wherein the matrix multiplication instruction specifies memory locations from which the first, second, and third matrices are accessed.
However, Ginzburg discloses wherein the matrix multiplication instruction specifies memory locations from which the first, second, and third matrices are accessed [paragraph 28, For example, for a 2D matrix multiply-add operation, the instruction includes SRC1, SRC2, SRC3, and DEST register addresses. SRC1 is the address of the first source register. SRC2 is the address of the second source register. SRC3 is the address of the third source register. DEST is the address of the destination register where the result data is stored. In some implementations, the storage location referenced by SRC 1 is also used to store the result data and is referred to as SRC1/DEST], in order to provide a single instruction resulting in significant performance advantages [paragraph 3].
One of ordinary skill in the art before the effective filing date of the claimed invention would have clearly recognized that it is quite advantageous for the processor of the modified Meixner to provide a single instruction resulting in significant performance advantages. It is for this reason one of ordinary skill in the art would have .
Claim 8 is/are rejected under 35 U.S.C. 103 as being unpatentable over Meixner, in view of Juffa, as applied to claim 1 above, and further in view of Hansen et al (hereinafter Hansen), U.S. Patent No. 6,295,599 B1.
Referring to claim 8, the Meixner does not explicitly disclose the processor of claim 1, wherein matrix multiplication instruction specifies sizes of the blocks of the matrices to perform the matrix operations.
However, Hansen discloses wherein matrix multiplication instruction specifies sizes of the blocks of the matrices to perform the matrix operations [col. 10, lines 15-26, The Wide Multiply Matrix instructions use a wide operand to specify a matrix of values of width up to 64 bits (one half of register file and data path width) and depth of up to 128 bits/symbol size; The width and depth of the matrix can be selected by specifying the size and shape of the wide operand as described above], in order to provide improved performance and efficient handling of operands of greater width than system memory or general purpose register [col. 2, lines 44-57].
One of ordinary skill in the art before the effective filing date of the claimed invention would have clearly recognized that it is quite advantageous for the processor of the modified Meixner to provide improved performance and efficient handling of operands of greater width than system memory or general purpose register. It is for this reason one of ordinary skill in the art would have been motivated to implement wherein matrix multiplication instruction specifies sizes of the blocks of the matrices to perform the matrix operations.
Claim 14 is rejected under 35 U.S.C. 103 as being unpatentable over Meixner, in view of Juffa, as applied to claim 9 above, and further in view of Sideris et al (hereinafter Sideris), U.S. Publication No. 2015/0301826 A1.
Referring to claim 14, the modified Meixner does not explicitly disclose the method of claim 9, wherein the matrix multiplication instruction comprises a macroinstruction, and wherein the macroinstruction is decoded into sets of microoperations to be executed in parallel either within the lane or across different lanes.
However, Sideris discloses wherein the matrix multiplication instruction comprises a macroinstruction, and wherein the macroinstruction is decoded into sets of microoperations to be executed in parallel either within the lane or across different lanes [paragraphs 6, 53, micro-operations to be executed in parallel], in order to provide improved performance and/or reduced energy consumption [paragraph 4].
One of ordinary skill in the art before the effective filing date of the claimed invention would have clearly recognized that it is quite advantageous for the method of the modified Meixner to provide improved performance and/or reduced energy consumption. It is for this reason one of ordinary skill in the art would have been motivated to implement wherein the matrix multiplication instruction comprises a macroinstruction, and wherein the macroinstruction is decoded into sets of microoperations to be executed in parallel either within the lane or across different lanes.

Allowable Subject Matter
Claims 4, 6, 12, and 18 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
The following is a statement of reasons for the indication of allowable subject matter: The prior art of record taken alone or in combination fails to teach and/or fairly suggest the broadcast circuitry is to broadcast invariant matrix blocks of the first matrix to different registers within the lane and to broadcast invariant matrix blocks of the second matrix to different registers across different lanes, in combination with other recited limitations in claim 4.
The prior art of record taken alone or in combination fails to teach and/or fairly suggest wherein interconnections between the multipliers and registers are reconfigured in response to different matrix dimensions of the first, second, or third matrices, in combination with other recited limitations in claim 6.
The prior art of record taken alone or in combination fails to teach and/or fairly suggest wherein the broadcasting comprises broadcasting invariant matrix blocks of the first matrix to different tile registers within the lane and to broadcast invariant matrix blocks of the second matrix to different tile registers across different lanes, in combination with other recited limitations in claim 12.
The prior art of record taken alone or in combination fails to teach and/or fairly suggest wherein the broadcasting comprises broadcasting invariant matrix blocks of the first matrix to different registers within the lane and to broadcast invariant matrix blocks of the second matrix to different tile registers across different lanes, in combination with other recited limitations in claim 18.


Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 



Any inquiry concerning this communication or earlier communications from the examiner should be directed to FARLEY J ABAD whose telephone number is (571)270-3425. The examiner can normally be reached M-Th 6:30 - 3:00 PM; Fri 7:30 - 4:00 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Idriss Alrobaye can be reached on (571) 270-1023. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/Farley Abad/Primary Examiner, Art Unit 2181