DETAILED ACTION

Status of Application
Claims 1-11 are pending in the present application.

Response to Arguments
Applicant’s arguments with respect to claim(s) 1 and 6 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
In regards to claims 2-4, 7-9, and 11, applicant has not provided specific arguments as to how the claims define a patentable invention by specifically pointing out how the language of the claims patentably distinguishes them from the references.  Claims 2-4, 7-9, and 11 remain rejected.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Objections
Claims 5 and 10 are objected to because of the following informalities:  Claims 5 and 10 recite “wherein the elements of the rows the A matrix” however this limitation is missing the word “of”. This should read “wherein the elements of the rows of the A matrix”. Appropriate correction is required.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-4, 6-9, and 11 is/are rejected under 35 U.S.C. 103 as being unpatentable over Ould-Ahmed-Vall et al (hereinafter Ould-Ahmed-Vall), U.S. Publication No. 2020/0210188 A1, in view of Mimar, U.S. Patent No. 7,873,812 B1.
	Referring to claim 1, Ould-Ahmed-Vall discloses a method comprising:
receiving, by a processor [fig. 13], elements of rows of an m x n matrix (A matrix) and elements of columns of a n x p matrix (B matrix) [paragraphs 28, 183, 188-192; figs. 21A, 21B; execution of a row-wise matrix (tile) permute instruction can be performed where the processor receives elements of source matrix shown in fig. 21A “where M and N may be indicated by fields of the instruction, and may be selected from a wide range of values, including 2, 4, 8, 16, 32, etc.”, in this example, the source matrix for row-wise permutation can be a 4 x 8 matrix A (m x n); execution of a column-wise matrix (tile) permute instruction can be performed where the processor receives elements of a different source matrix shown in fig. 21B where the dimension of the source matrix shown in 21B can also be selected from a wide range of values, including 2, 4, 8, 16, 32, etc., in this example, the source matrix for column-wise permutation can be a 8 x 4 matrix B (n x p)];

Ould-Ahmed-Vall does not explicitly disclose performing, by the processor in response to a vector matrix multiply instruction, multiplying the A matrix and the B matrix to generate elements of an m x p matrix (R matrix); and
storing the elements of the R matrix in a storage location specified by the vector matrix multiply instruction.
However, Mimar discloses performing, by the processor in response to a vector matrix multiply instruction [col. 8, lines 4-6, instruction of special interest for matrix multiplication is vector-multiply instruction; fig. 6, VMUL instruction], multiplying the A matrix and the B matrix to generate elements of an m x p matrix (R matrix) [figs. 8-9, multiplication of a 4 x 4 matrix (m x n) and a 4 x 1 matrix (n x p) = 4 x 1 matrix (m x p); the examiner notes that the dimensions of the source matrices are not limited and that Mimar discloses the notation “m-by-n matrix and n-by-p matrix” (see col. 5, lines 48-50)]; and

One of ordinary skill in the art before the effective filing date of the claimed invention would have clearly recognized that it is quite advantageous for the method of Ould-Ahmed-Vall to provide the flexibility while efficiently performing matrix multiplications. It is for this reason one of ordinary skill in the art would have been motivated to implement performing, by the processor in response to a vector matrix multiply instruction, multiplying the A matrix and the B matrix to generate elements of an m x p matrix (R matrix); and storing the elements of the R matrix in a storage location specified by the vector matrix multiply instruction.
Referring to claims 2 and 7, taking claim 2 as exemplary, the modified Ould-Ahmed-Vall discloses the method of claim 1, wherein m = 4, n = 8, and p = 4 [Ould-Ahmed-Vall, paragraphs 28, 183, 188-192; figs. 21A, 21B; execution of a row-wise matrix (tile) permute instruction can be performed where the processor receives elements of source matrix shown in fig. 21A “where M and N may be indicated by fields of the instruction, and may be selected from a wide range of values, including 2, 4, 8, 16, 32, etc.”, in this example, the source matrix for row-wise permutation can be a 4 x 8 matrix A (m x n); execution of a column-wise matrix (tile) permute instruction can be performed where the processor receives elements of a different source matrix shown in fig. 21B where the dimension of the source matrix shown in 21B can also be selected from a wide range of values, including 2, 4, 8, 16, 32, etc., in this example, the source matrix for column-wise permutation can be a 8 x 4 matrix B (n x p)].
Referring to claims 3 and 8, taking claim 3 as exemplary, the modified Ould-Ahmed-Vall discloses the method of claim 1, wherein multiplying further comprises generating the elements of the R matrix using vector multiplication units [Mimar, fig. 2, OP elements 251] comprised in a vector data path [Mimar, fig. 2] of the processor and configured to perform vector multiply operations [Mimar, fig. 6, VMUL instruction for vector matrix multiply], wherein each vector multiplication unit comprises a slice multiply component [Mimar, fig. 4, unit 251 comprises ALU and MULTIPLIER component] for each slice of the vector data path [Mimar, fig. 2, in the case of fig. 2, each element of register 210 proceeds through the data path in a slice (N-1 elements would equal N-1 slices), and wherein each slice multiply component generates a respective element of the R matrix [Mimar, fig. 2, result of multiply operation in 251 is eventually sent to destination register].	
Referring to claims 4 and 9, taking claim 4 as exemplary, the modified Ould-Ahmed-Vall discloses the method of claim 3, further comprising:
mapping the elements of the rows of the A matrix and the elements of columns of the B matrix to each slice multiplication component based on the respective element of the R matrix to be generated by the slice multiplication component [Mimar, fig. 2, see Vector Element Mapping Logic 230 and 240; the examiner notes that the elements of the source matrices are represented as elements in the source registers, e.g. 230 and 240 mapping elements of source matrix A (210) and elements of source matrix B (220) to each component 251].
Referring to claim 6, Ould-Ahmed-Vall discloses a processor [fig. 13] comprising:

Ould-Ahmed-Vall does not explicitly disclose an instruction decoder configured to decode a vector matrix multiply instruction; 

However, Mimar discloses an instruction decoder [col. 10, lines 55-59, “Instruction Decode”] configured to decode a vector matrix multiply instruction [col. 8, lines 4-6, instruction of special interest for matrix multiplication is vector-multiply instruction; fig. 6, VMUL instruction]; 
and vector matrix multiplication logic [figs. 2 and 4, computational unit 251] configured to multiply, responsive to the vector matrix multiply instruction, the A matrix and the B matrix to generate elements of an m x p matrix (R matrix) [figs. 8-9, multiplication of a 4 x 4 matrix (m x n) and a 4 x 1 matrix (n x p) = 4 x 1 matrix (m x p)], in order to provide the flexibility while efficiently performing matrix multiplications [col. 4, lines 53-60].
One of ordinary skill in the art before the effective filing date of the claimed invention would have clearly recognized that it is quite advantageous for the processor of Ould-Ahmed-Vall to provide the flexibility while efficiently performing matrix multiplications. It is for this reason one of ordinary skill in the art would have been motivated to implement an instruction decoder configured to decode a vector matrix multiply instruction; and vector matrix multiplication logic configured to multiply, responsive to the vector matrix multiply instruction, the A matrix and the B matrix to generate elements of an m x p matrix (R matrix).
Referring to claim 11, the modified Ould-Ahmed-Vall discloses the processor of claim 6, wherein the processor is a digital signal processor (DSP) [Ould-Ahmed-Vall, paragraph 330].

Allowable Subject Matter
Claims 5 and 10 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
The following is a statement of reasons for the indication of allowable subject matter:  The prior art of record taken alone or in combination fails to teach and/or fairly suggest wherein the elements of the rows of the A matrix and the elements of the columns of the B matrix are provided by a streaming engine, and wherein the mapping is performed by the permute network coupled between the streaming engine and the vector multiplication components, in combination with other recited limitations in claim 5.
The prior art of record taken alone or in combination fails to teach and/or fairly suggest wherein the elements of the rows of the A matrix and the elements of the columns of the B matrix are provided by a streaming engine, and wherein the elements of the rows of the A matrix and the elements of the columns of the B matrix are mapped by the permute network coupled between the streaming engine and the vector multiplication units, in combination with other recited limitations in claim 10.



Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
Tang et al, U.S. Publication No. 2007/0089019 A1, discloses “An SxS Benes network and sorting Benes network can be used to cyclically permute any input of dimensions” [paragraph 122].
Woo et al, U.S. Patent No. 9,959,247 B1, discloses permuting in a matrix-vector processor [Abstract].

Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to FARLEY J ABAD whose telephone number is (571)270-
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Idriss Alrobaye can be reached on (571) 270-1023.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/Farley Abad/Primary Examiner, Art Unit 2181