Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION
Examiner notes the entry of the following papers
Amended claims and specification filed 3/15/2021
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Drawings
The drawings are acceptable for the purposes of examination.


Specification
The specification has not been checked to the extent necessary to determine the presence of all possible minor errors. Applicant's cooperation is requested in correcting any errors the applicant may become aware of in the specification.
Claim Objections
In claim 21 lines 2-3, “is configured is configured” should be replaced with “is configured”.
Claim Rejections - 35 USC § 112(b)
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claim 6 is rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claim 6 recites “a plurality of second operand values from the second operand register file”. It is not clear whether these are the same or different from “one or more second operand values of the second plurality of operand values from the second operand register file”. Further, claim 6 recites “the second input distribution circuit selectively routes different ones of the plurality of second operand values to at least two of the plurality of SIMD engines as the second input operand values”. It is not clear whether this means a 1st second operand value and a 2nd second operand value and nd second operand value is routed to a different engine. 
Prior art rejections
Some of the claims have been rejected under Chen (See 35 USC 102 rejections). All the claims have been rejected under Mantor and Chung (See 35 USC 103 rejections).
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.


Claims 1, 6, 8, 17 and 18 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Chen et al (US 2019/0171448 A1, herein Chen).

Regarding Claim 1, Chen teaches a computer system comprising:
a plurality of single instruction, multiple data (SIMD) engines (Fig. 3 DOT4 engines, paragraph 2);
a plurality of output register sets, each coupled to a corresponding one of the plurality of SIMD engines (Fig. 3 304 bank 0 and bank 1 a set of registers in each bank as an output register set, Paragraph 33, all components are coupled);
 plurality of operand values includes a plurality of operand words (Fig. 3 304, Paragraph 31, each element as a word, a set of accumulation values provided to each of the 8 units 330 as a first plurality of operand values);
a second operand register file that stores a second plurality of operand values, wherein each of the  second plurality of  operand  values includes a plurality of operand words (Fig. 3, register file 308, Fig. 2, each row of the matrix A as an operand value, each element as a word);
an first input distribution circuit coupled to receive a first operand value of the first plurality of operand values from the first operand register file, wherein the first operand value includes a first plurality of operand words, wherein the first input distribution circuit selectively routes one or more of the first plurality of operand words of the first operand value to create a plurality of first input operand values, each having a plurality of first input operand words, wherein each of the plurality of first input operand values is routed to a corresponding one of the plurality of SIMD engines (Fig. 3, Paragraph 32, the source muxes 310 as the 1st distribution circuit; the accumulation values as the first operand value, the accumulation values to the 8 units 330 as the first input operand values); and
a second input distribution circuit coupled to receive one or more second operand values of the second plurality of operand values from the second operand register file, wherein the one or more second operand values include a second plurality of operand words, wherein the second input distribution circuit selectively routes one or more of the second plurality of operand words of the one or more second operand values to  the second input operand values is routed to a corresponding one of the plurality of SIMD engines (Fig. 3, Fig. 14,Paragraph 34, the register rotation crossbar, the double buffer and the replication crossbar as the input distribution circuit, the second input operand values as the values input to each of the 8 units 330, Fig. 2, values of matrix A as the second operand value).

Regarding Claim 6, Chen teaches the computer system of claim 1, wherein the second input distribution circuit receives a plurality of second operand values from the second operand register file, wherein the second input distribution circuit selectively routes a different ones of the plurality of second operand values to at least two of the plurality of SIMD engines as the second input operand values (Figs. 3-5, Different values routed to the DOT engines, The values routed to the dot engines are read from the register file. These are the plurality of the second operand values. Alternatively, the words read from the second register file as the second operand values, different words are routed to the SIMD engines).

Regarding Claim 8, Chen teaches the computer system of claim 1, wherein the second input distribution circuit includes a first plurality of second operand buffers, each of the first plurality of second operand buffers configured to store one or more of the second operand values from the second operand register file (Fig. 3 Double buffers store the values from the second operand register file).

Regarding Claim 17, Chen teaches the computer system of claim 1, wherein each of the plurality of output register sets (Fig. 3 304, bank 0 and bank 1 each as a register set) is configured to provide an accumulation value to the corresponding SIMD engine and store an accumulation value provided by the corresponding SIMD engine (Fig. 3, Paragraph 32, at some point, an output register set stores an accumulation value and provides to a corresponding SIMD engine).

Regarding Claim 18, Chen teaches the computer system of claim 16, wherein each of the plurality of output register sets is independently addressed (Fig. 3 Each of the banks is independently addressed to store the accumulation value).
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 4, 6, 8 and 17-19, 21 and 22 are rejected under 35 U.S.C. 103 as being unpatentable over Chen et al (US 2018/0144435 A1, herein Mantor) and further in view of Chung et al (US 2013/0067203 A1, herein Chung).

Regarding Claim 1, Mantor teaches a computer system comprising:
a plurality of single instruction, multiple data (SIMD) engines (Fig. 2 SIMD unit 224, engines 244 and 246, Paragraph 28; Alternatively, Fig. 4 418 and 420, paragraph 43);

a first operand register file that stores a first plurality of operand values (Fig. 4 410A);
a second operand register file that stores a second plurality of operand values (Fig. 4 410C);
a first input distribution circuit (The multiplexers connected to the output of 410A, the flipflops and the routing to the FMAs) coupled to receive a first operand value of the first plurality of operand values from the first operand register file (Fig. 4), wherein the first input distribution circuit selectively routes the first operand value to create a plurality of first input operand values, wherein each of the plurality of first input operand values is routed to a corresponding one of the plurality of SIMD engines (Fig. 4); and
a second input distribution circuit (The multiplexers connected to the output of 410C, the flipflops and the routing to the FMAs) coupled to receive one or more second operand values of the second plurality of operand values from the second operand register file (Fig. 4), wherein the second input distribution circuit selectively routes the one or more second operand values to create a plurality of second input operand values, wherein each of the second input operand values is routed to a corresponding one of the plurality of SIMD engines (Fig. 4).
Mantor does not explicitly teach that each of the first plurality of operand values includes a plurality of operand words. Mantor does not explicitly teach that the first operand value includes a first plurality of operand words. Mantor does not explicitly 
Chung teaches an operand value comprising a plurality of operand words (Figs 1, 2A, 6).
It would be obvious to one of ordinary skill in the art before the effective filing date of the claimed invention, with the teachings of Mantor and Chung before them, to implement the operand values as comprising a plurality of operand words. This would result in each of the first plurality of operand values including a plurality of operand words. This would also result in the first operand value including a first plurality of operand words. This would also result in each of the second plurality of operand values including a plurality of operand words. This would also result in the one or more second operand values including a second plurality of operand words and each of the one or more second input operand values having a plurality of second input operand words. This would also result in the first input distribution circuit selectively routing one or more of the first plurality of operand words of the first operand value and that each of the first 
One of ordinary skill in the art would be motivated to do so as it would allow processing on the multiple operand words in a single operand value at a time. Also, this would be merely the simple substitution of one known element for another (substituting operand value with operand value comprising words) and the rationale may support a conclusion of obviousness  (See KSR Int'l Co. v. Teleflex Inc., 550 U.S. 398, 415-421, 82 USPQ2d 1385, 1395-97 (2007)).

Regarding Claim 4, Mantor and Chung teach the computer system of claim 1.
The combination thus far does not teach that the first input distribution circuit selectively routes the first plurality of operand words of the first operand value such that each of the plurality of first input operand values comprises a repeated one of the first plurality of operand words of the first operand value.
Chung teaches swizzling data elements and the swizzled data elements being an input to a processing unit (Fig. 6). Chung also teaches swizzling such that the swizzled output comprises a repeated operand element of an input value (Fig. 4). 
It would be obvious to one of ordinary skill in the art before the effective filing date of the claimed invention, with the teachings of Mantor and Chung before them, to provide swizzled inputs to the engines of Mantor. This would result in selectively routing the first plurality of operand words of the first operand value such that each of the plurality of 
One of ordinary skill in the art would be motivated to do so as this would allow for providing different patterns of inputs without requiring an instruction to perform the swizzling. Also, this would merely be the application of a known technique to a known device ready for improvement to yield predictable results, and the rationale may support a conclusion of obviousness  (See KSR Int'l Co. v. Teleflex Inc., 550 U.S. 398, 415-421, 82 USPQ2d 1385, 1395-97 (2007)).

Regarding Claim 6, Mantor and Chung teach the computer system of claim 1, wherein the second input distribution circuit receives a plurality of second operand values from the second operand register file, wherein the second input distribution circuit selectively routes different ones of the plurality of second operand values to at least two of the plurality of SIMD engines as the second input operand values (Mantor Fig. 4, At different times, different values would be received from the register file. These would be the different second operand values. Alternatively, the different words in an operand as the second operand values and different ones of the words are routed to at least two of the SIMD engines).

Regarding Claim 8, Mantor and Chung teach the computer system of claim 1, wherein the second input distribution circuit includes a first plurality of second operand buffers, each of the first plurality of second operand buffers configured to store one of 

Regarding Claim 17, Mantor and Chung teach the computer system of claim 1.
The combination thus far does not explicitly teach that each of the plurality of output register sets is configured to provide an accumulation value to the corresponding SIMD engine and store an accumulation value provided by the corresponding SIMD engine.
Mantor teaches providing an accumulation value from an output register set to a corresponding SIMD engine (Fig. 4, each FMA is provided with an accumulation value) and store an accumulation value provided by a SIMD engine (Fig. 4, the accumulation value from the FMA is stored as an output; Fig. 4, 410A-B and 416A-B as a set, 410C-D and 416C-D as a register set).
It would be obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, with the teachings of Mantor and Chung before them, to implement each of the output register sets (Mantor Fig. 4, 410A-B and 416A-B as a set, 410C-D and 416C-D as a register set) being configured to provide an accumulation value to the corresponding SIMD engine and store an accumulation value provided by the corresponding SIMD engine. This would merely be choosing from a finite number of identified, predictable solutions, with a reasonable expectation of success (See KSR Int'l Co. v. Teleflex Inc., 550 U.S. 398, 415-421, 82 USPQ2d 1385, 1395-97 (2007)).

Regarding Claim 18, Mantor and Chung teach the computer system of claim 1, wherein each of the output register sets (Mantor Fig. 4, 410A-B and 416A-B as a set, 

Regarding Claim 19, Mantor and Chung teach the computer system of claim 1 wherein each of the plurality of SIMD engines generates a corresponding plurality of product values (Mantor Fig. 4, a FMA unit generates a product value).
The combination thus far does not explicitly teach that each of the plurality of SIMD engines is configured to multiply the plurality of first input operand words of the first input operand value received from the first input distribution circuit with the plurality of second input operand words of the second input operand value received from the second input distribution circuit.
Mantor teaches a SIMD engine performing multiple multiplication operations (Fig. 4, Paragraph 43). Mantor teaches a SIMD engine receiving the plurality of first input operand words of the first input operand value from the first distribution circuit and the plurality of second input operand words of the second input operand value from the second input distribution circuit (Fig. 4).
It would be obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to implement each of the SIMD engines being configured to multiply the plurality of first input operand words of the first input operand value received from the first input distribution circuit with the plurality of second input operand words of the second input operand value received from the second input distribution circuit. This would merely be choosing from a finite number of identified predictable solutions (inputs from the first distribution circuit or the second distribution circuit or both), with a KSR Int'l Co. v. Teleflex Inc., 550 U.S. 398, 415-421, 82 USPQ2d 1385, 1395-97 (2007)).

Regarding Claim 21, Mantor and Chung teach the computer system of claim 20. 
The combination thus far does not explicitly teach that each of the plurality of output register sets is configured is configured to provide a corresponding plurality of accumulation values to the corresponding one of the SIMD engines, wherein each of the plurality of SIMD engines is configured to add the received accumulation values to the corresponding generated product values, whereby each of the plurality of SIMD engines generates a corresponding plurality of updated accumulation values.
The combination thus far teaches providing a plurality of accumulation values from an output register set to a corresponding SIMD engine (Mantor Fig. 4, each FMA is provided with accumulation values) and storing accumulation values provided by a SIMD engine (Fig. 4, the accumulation values from the FMA is stored as an output; Fig. 4, 410A-B and 416A-B as a set, 410C-D and 416C-D as a register set).
It would be obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, with the teachings of Mantor and Chung before them, to implement each of the output register sets (Mantor Fig. 4, 410A-B and 416A-B as a set, 410C-D and 416C-D as a register set) being configured to provide a corresponding plurality of accumulation values to the corresponding one of the SIMD engines, wherein each of the SIMD engines is configured to add the received accumulation values to the corresponding generated product values (Mantor Fig. 4 FMA operation), whereby each of the SIMD engines generates a corresponding plurality of updated accumulation KSR Int'l Co. v. Teleflex Inc., 550 U.S. 398, 415-421, 82 USPQ2d 1385, 1395-97 (2007)).

Regarding Claim 22, Mantor and Chung teach the computer system of claim 21.
The combination thus far does not explicitly teach that each of the plurality of output register sets is configured to receive and store the corresponding plurality of updated accumulation values from the corresponding one of the plurality of SIMD engines.
The combination thus far teaches an output register set storing an accumulation value provided by a SIMD engine (Mantor Fig. 4, the accumulation value from the FMA is stored as an output).
It would be obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to implement each of the plurality of output register sets (Mantor Fig. 4, 410A-B and 416A-B as a set, 410C-D and 416C-D as a register set) receiving and storing the corresponding plurality of updated accumulation values from the corresponding one of the plurality of SIMD engines. This would be choosing from a finite number of identified, predictable solutions (the output set to store the output result to), with a reasonable expectation of success  (See KSR Int'l Co. v. Teleflex Inc., 550 U.S. 398, 415-421, 82 USPQ2d 1385, 1395-97 (2007)).
Response to Arguments
The Applicant’s arguments, filed 3/15/2021, have been fully considered.
The Applicant argues, on page 12, that Chen does not teach that each of the accumulation VGPRS banks 0 and 1 is coupled to a corresponding one of the 32 DOT4 engines as recited in claim 1. However, all components are coupled (See Chen Fig. 3). Corresponding means to be associated with and Fig. 3 shows an association between the banks and the DOT4 engines. Thus, Chen teaches a plurality of output register sets each coupled to a corresponding one of the plurality of SIMD engines. The Applicant further argues that the same element of Chen is being used to teach two separate elements – a plurality of output register sets and a first operand register file. The Applicant is not addressing the rejection. The rejection clearly mapped a set of registers in each of the banks 0 and 1 as an output register set, and the register file was mapped to 304. Fig. 3 shows that these 2 elements are different. The claim requires 2 different elements, not that the output register sets be separate from the first register file. The Applicant further argues that Chen does not teach that the source multiplexers 310 receive a first operand value from the vector register file 304, and then selectively route one or more words of this first operand value to create a plurality of first input operand values. However, the Applicant is not providing any details. The Applicant merely cites parts of the rejection but does not explain what part of the limitation is not taught and how it is not taught in view of the mappings. Absent these details, the argument is not persuasive. The Applicant further argues that Chen does not teach that each of these 32 accumulation values is stored in a corresponding output register set that is independently accessed. However, the claim does not recite “corresponding output register set”. The claim requires that each output register set be independently 
The Applicant’s argument, on page 15, that the Examiner the rejection is prima facie inadequate as the examiner does not show which element of Fig. 4 corresponds with “a plurality of output register sets” is not persuasive. The Applicant is not addressing the rejection. The rejection clearly explained that output data is stored in the registers and a set of registers in which output data being stored were being mapped to an output register set. The Applicant further argues that each of the VGPRs could not represent output register sets. However, the Applicant provides no details, merely makes a statement. Absent specific details, the argument is not persuasive.
The Applicant’s argument that Mantor does not teach that each of the VGPRs is coupled to a corresponding one of the FMA units, is not persuasive. All components in Fig. 4 are coupled thus each VGPR is coupled to a corresponding FMA unit.
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Jyoti Mehta whose telephone number is (571)270-3995.  The examiner can normally be reached on Monday-Friday 8 am-4 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Aimee Li can be reached on (571) 272-4169.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.