DETAILED ACTION
Claims 1-10 are pending.
The office acknowledges the following papers:
Claims, specification, and remarks filed on 12/1/2021.

	Withdrawn objections and rejections
The specification objections have been withdrawn due to amendment.

Maintained Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.

Claims 1-10 are rejected under 35 U.S.C. 103 as being unpatentable over Chen et al. (U.S. 2019/0171448), in view of Boettcher et al. (U.S. 2015/0254076), in view of Official Notice.
As per claim 1:
Chen and Boettcher disclosed a computing device comprising: 

a set of registers able to supply data of operand type to the inputs of said arithmetic logic units and able to be supplied with data from the outputs of said arithmetic logic units (Chen: Figure 3 elements 302-308, paragraphs 31-33)(The ACC VGPR and VGPR (i.e. set of registers) supplies the FMA and DOT4x4 units with source operand data. Both register files receive execution results from the set of execution units.); 
a memory (Chen: Figures 1, 4-5, and 14 elements 130 and 1430, paragraph 23, 32, and 59)(Final matrix results are written to memory.); 
a memory interface by way of which data are transmitted and routed between the registers and the memory (Boettcher: Figure 1 elements 6 and 14, paragraphs 72-73)(Chen: Figures 1 and 3 elements 130 and 308, paragraphs 23 and 25)(Boettcher disclosed a vector LSU to transfer operands between registers and cache/memory. The combination implements a vector LSU in Chen to transfer data between the VGPR and the memory device. Official notice is given that memory interfaces are used between registers and memory for the advantage of ensuring the correct memory addresses are loaded from/stored to. Thus, it would have been obvious to one of ordinary skill in the art to implement a memory interface within the combination between the memory device of Chen and the added vector LSU.); 
a control unit configured so as to control the arithmetic logic units in accordance with a processing chain microarchitecture such that the arithmetic logic units perform computing operations in parallel with one another (Boettcher: Figure 1 elements 20-24, paragraph 73)(Chen: Figures 3-5 elements 324 and 330A-H, paragraphs 36 and 39-
said control operations generating: 
at least one cycle i including both implementing at least one first computing operation by way of an arithmetic logic unit and downloading a first dataset from the memory to at least one register (Boettcher: Figure 1 elements 6 and 14, paragraphs 72-73)(Chen: Figures 1-5 elements 130, 308, and 330A-H, paragraphs 23, 25, 30, 33, 36, and 39-41)(Boettcher disclosed a vector LSU to transfer operands between registers and cache/memory. The combination implements a vector LSU in Chen to transfer data between the VGPR and the memory device. The combination includes execution of vector load instructions prior to cycle 0 to load the source matrices of Chen into the VGPR. Chen disclosed in figure 4 cycles 0-1, source matrices A and B are read from the 
at least one cycle ii, following the at least one cycle i, including implementing a second computing operation by way of an arithmetic logic unit, for which second computing operation at least part of the first dataset forms at least one operand (Chen: Figures 2-5 elements 330A-H, paragraphs 28, 30, 33, 36, and 39-41)(Chen disclosed in figures 4-5 cycles 2-8, source matrix data are used for matrix DOT-product execution. The same source matrices A and B are used for these cycles of execution in addition to being used in cycle 1.).
The advantage of implementing decoding circuitry, issuing circuitry, and a vector LSU it that vector instructions can be detected, correctly executed, and data transfers can take place between registers and memory. Thus, it would have been obvious to one of ordinary skill in the art at the time of the effective filing date to implement the vector components of Boettcher into the processing system of Chen for the above advantages.
As per claim 2:
Chen and Boettcher disclosed the device as claimed in claim 1, wherein the control unit is furthermore configured, prior to controlling the arithmetic units and the memory access operations, so as to implement an identification algorithm for identifying the first dataset to be downloaded during the at least one cycle i on the basis of the second computing operation to be implemented during the at least one cycle ii (Boettcher: Figure 1 elements 14 and 20-24, paragraphs 72-73)(Chen: Figures 1 and 3 elements 130, 304, and 308, paragraphs 23 and 25)(The combination implements the control elements and vector LSU in Chen to execute vector load/store operations that transfer data between the VGPR and the memory device. Decoding vector load 
As per claim 3:
Chen and Boettcher disclosed the device as claimed in claim 1, wherein the control unit is configured so as to implement two cycles i separate from one another, such that two first datasets separate from one another are downloaded to at least one register (Boettcher: Figure 1 elements 6 and 14, paragraphs 72-73)(Chen: Figures 1-5 elements 130, 308, and 330A-H, paragraphs 23, 25, 30, 33, 36, and 39-41)(The combination includes execution of vector load instructions prior to cycle 0 to load the source matrices of Chen into the VGPR for a first matrix operation. The combination also includes execution of vector load instructions prior to cycle 8 to load the next source matrices of Chen into the VGPR for a second matrix operation.), at least part of each of the two first datasets forming an operand for the second computing operation of the at least one cycle ii (Chen: Figures 2-5 elements 330A-H, 30, 33, 36, and 39-41)(Chen disclosed in figures 4-5 cycles 2-8, source matrix data are used for matrix DOT-product execution. The same source matrices A and B are used for these cycles of execution in addition to being used in cycle 1 for a first matrix operation. A second matrix operation using the second source matrices are processed in at least cycles 9-10.).
As per claim 4:
Chen and Boettcher disclosed the device as claimed in claim 1, wherein the control unit is configured so as to implement a plurality of cycles ii separate from one another, and such that the part of the first dataset forming at least one operand for the second computing operation of a cycle ii is different from one cycle ii to another cycle ii of the plurality (Boettcher: Figure 1 elements 14 and 20-24, paragraphs 72-73)(Chen: 
As per claim 5:
Chen and Boettcher disclosed the device as claimed in claim 1, wherein the control unit is configured so as to perform at least two iterations of a series of at least one cycle i, and one cycle ii, said two iterations being at least partly superimposed such that at least one cycle ii of the first iteration forms a cycle i of the following iteration (Boettcher: Figure 1 elements 6 and 14, paragraphs 72-73)(Chen: Figures 2-5 elements 330A-H, 30, 33, 36, and 39-41)(Chen disclosed in figures 4-5 cycles 2-8 (cycle ii), source matrix data are used for matrix DOT-product execution. The same source matrices A and B are used for these cycles of execution in addition to being used in cycle 1. Cycle 8 reads source register data from the VGPR into the double buffer for processing of a subsequent matrix operation. The combination includes execution of vector load instructions prior to cycle 8 to load the source matrices of Chen into the VGPR for the subsequent matrix operation. Both of these operations are within cycles 2-8 that make up the at least one cycle ii.).
As per claim 6:

As per claim 7:
Chen and Boettcher disclosed the device as claimed in claim 1 wherein the control unit is furthermore designed to control the memory access operations by way of the memory interface (Boettcher: Figure 1 elements 14 and 20-24, paragraphs 72-73)(Chen: Figures 1 and 3 elements 130, 304, and 308, paragraphs 23 and 25)(Boettcher disclosed a decoder and issue queue (i.e. control unit) to generate micro-operations and control information for controlling issuance and execution of vector operations. Boettcher disclosed a vector LSU to transfer operands between registers and cache/memory. The combination implements the control elements and vector LSU in Chen to execute vector load/store operations that transfer data between the VGPR and the memory device.), such that said control operations generate: 

during a cycle ii, the implementation of a plurality of second computing operations by a plurality of arithmetic logic units, the grouping of the data per dataset to be downloaded being selected so as to match a distribution of the assignments of the computing operations to each of the arithmetic logic units of the plurality, such that said arithmetic logic units have synchronized, asynchronous or mixed operation (Chen: Figures 2-5 elements 330A-H, 30, 33, 36, and 39-41)(Chen disclosed in figures 4-5 cycles 2-8, source matrix data are used for matrix DOT-product execution. The same source matrices A and B are used for these cycles of execution in addition to being used in cycle 1. Matrix source data is assigned to DOT4x4 execution units for synchronous processing.).
As per claim 8:
Claim 8 essentially recites the same limitations of claim 1. Therefore, claim 8 is rejected for the same reasons as claim 1.
As per claim 9:
Claim 9 essentially recites the same limitations of claim 8. Claim 9 additionally recites the following limitations:
A non-transitory computer program comprising instructions for implementing the method as claimed in claim 8 when this program is executed by a processor (Boettcher: 
As per claim 10:
Claim 10 essentially recites the same limitations of claim 8. Claim 10 additionally recites the following limitations:
A non-transient computer-readable recording medium on which there is recorded a program for implementing the method as claimed in claim 8 when this program is executed by a processor (Boettcher: Figure 1)(Chen: Figure 3)(Official notice is given that hardware can be implemented as executable software for the advantage of reduced costs. Thus, it would have been obvious to implement the hardware of Boettcher and Chen as executable software stored in memory for emulated execution.).

Response to Arguments
The arguments presented by Applicant in the response, received on 12/1/2021 are not considered persuasive.
Applicant argues regarding claim 1:
“Page 5 of the Office Action contends that Figures 4-5 of Chen disclose the implementation of dependent and successive computing operations. In other words, the input operands of a second cycle ii are the output operands of a first cycle i, as required by claim 1.”  

This argument is not found to be persuasive for the following reason. In response to applicant's argument that the references fail to show certain features of applicant’s invention, it is noted that the features upon which applicant relies (i.e. input operands of In re Van Geuns, 988 F.2d 1181, 26 USPQ2d 1057 (Fed. Cir. 1993). 
Instead, the claims recite that as part of cycle i, a first dataset is downloaded from memory to a register. Additionally, the second computing operation of cycle ii uses at least part of the downloaded first dataset as at least one operand. Thus, the claims make no mention that the operands used in cycle ii are execution results from cycle i.
Applicant argues regarding claim 1:
“Claim 1 clearly requires that the calculations are made successively in cycles i and ii and that the registers loading of input operands of the second calculation is made during the previous cycle i while Chen only discloses that the calculations are successive.”  

This argument is not found to be persuasive for the following reason. The claims don’t require that cycle i and cycle ii are a single clock cycle and take place in immediately subsequent clock cycles. Paragraph 61 of the specification gives an example of cycles i and ii both being multiple-cycles and not being in adjacent clock cycles. As such, Chen in a clock cycle prior to 0 loading data into the vector registers teaches the claimed cycle i. Additionally, Chen in clock cycles 2-8 using the loaded data in the vector registers for further matrix processing teaches the claimed cycle ii.
Applicant argues regarding claim 1:
“In particular, paragraph [0072] of Boettcher, which is relied upon in page 5 of the Office Action, states (with emphasis): 

"the processing circuitry 4 also has a vector scan unit 12a for performing vector scan operations" 

and 

"some of the processing units may have bypass (forwarding) paths for routing an output operand from one of the functional units 12 to the input port of the same functional unit 12 or a different functional unit 12 so that the generated operand can be processed more quickly in a further processing step than if it had to be written back to the register file 6 and read out via the input bus 8." 

In other words, Boettcher explicitly teaches - and this is a key point of the method of Boettcher -- that, if improved performance ("more quickly") is desired, some operands should not be written in register files. This teaching of Boettcher is in direct contradiction to the features of claim 1 as explained above.”  

This argument is not found to be persuasive for the following reason. The citation regarding execution result forwarding is unrelated to the citation of the vector LSU being used to move operands between the register file and memory. Boettcher makes no mention of not writing loaded data to the vector register file nor directly writing execution results to memory. Additionally, even when in instances that forwarding is used, the execution results are still written to the register file. The forwarded results can be selected for execution in the next clock cycle to avoid the register read stage step for that particular source operand.
	
	Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any 
The following is text cited from 37 CFR 1.111(c): In amending in reply to a rejection of claims in an application or patent under reexamination, the applicant or patent owner must clearly point out the patentable novelty which he or she thinks the claims present in view of the state of the art disclosed by the references cited or the objections made. The applicant or patent owner must also show how the amendments avoid such references or objections.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JACOB A. PETRANEK whose telephone number is (571)272-5988.  The examiner can normally be reached on M-F 8:00-4:30.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Aimee Li can be reached on (571) 272-4169.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/JACOB PETRANEK/Primary Examiner, Art Unit 2183