DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Drawings
Figures 1-4 should be designated by a legend such as --Prior Art-- because only that which is old is illustrated. See specification [0026-0029], which describes these figures as prior art.  See MPEP § 608.02(g).  Corrected drawings in compliance with 37 CFR 1.121(d) are required in reply to the Office action to avoid abandonment of the application. The replacement sheet(s) should be labeled “Replacement Sheet” in the page header (as per 37 CFR 1.84(c)) so as not to obstruct any portion of the drawing figures. If the changes are not accepted by the examiner, the applicant will be notified and informed of any required corrective action in the next Office action. The objection to the drawings will not be held in abeyance.

Specification
The abstract of the disclosure is objected to because it comprises language that is not clear and concise.  Such language is “the present invention relates to”, the “operation circuit may perform”, and “the operation accelerator may be configured” (emphasis added). Correction is required.  See MPEP § 608.01(b).I.C.

Claim Objections
Claims 2, 11,  and 15-16 are objected to because of the following informalities.  
Claim 2 line 1 recites “the adder circuit”.  This limitation lacks antecedent basis. 
Claim 11 line 2, and claim 16 lines 1-2 recite “the operation accelerator is applied to the convolutional neural network” and “the operation circuit is applied to the convolutional neural network” respectively.  The limitation “the convolutional neural network” lacks antecedent basis.  
Claim 15 recites an “added circuit” in line 1.  This appears to be a typographical error and should possibly recite an “adder circuit”.
Appropriate correction is required.

Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitations are: “operation groups”, “operation blocks”, “operation units”, and “vector calculation unit’ and “storage unit”.  The term “unit” has been interpreted to be a generic placeholder 2181.I.A.  The terms “group” and “block” are being interpreted in a manner similar to unit, as a generic placeholder.
Because this/these claim limitations are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
The limitation “operation unit” is being interpreted as in figures 14 and 15, and specification [0015], [0017], [0074-0075] to comprise a storage unit, a multiplying circuit connected to the storage unit and input and further input and output connections as in figure 14, or a plurality of storage units, a multiplying circuit, a first section circuit connected to the plurality of storage units, and a second selection circuit connected to the plurality of storage units and the multiplying circuit, and input and output connections as in figure 15.  
The limitation “storage unit(s)” is further interpreted to comprise a register, a random access memory (RAM), static RAM, flash memory, or another readable and writable memory. See [0055].
The limitation “operation group” is being interpreted to comprise NxK operation units arranged as in figure 6 and including input and output connections. See [0058].
The limitation “operation block” is being interpreted to comprise N operation units arranged as in figure 6 and including input and output connections. See [0058].
The limitation “vector calculation unit” is being interpreted to comprise M*K operation units. See [0110].
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 2, 6-10, and 14-18 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor, or for pre-AIA  the applicant regards as the invention.

Claim 2 lines 2-4 recite “the adder tree”.  It is unclear what adder tree this refers to.  Lines 2-3 recite “the adder circuit comprises M*K adder trees, one adder tree is corresponding to one operation block”.  It is unclear whether “the adder tree” is the “M*K adder trees” collectively, or one of the adder tree that corresponds to one operation block, and if so which one of those adder tree that is.
Claim 6 line 3, claim 7 lines 5, 6, 9-10, claim 9 lines 3, 5, 7, claim 10 lines 3, 4,  and claim 19 lines 3-4, recite “the controller”.  This limitation lacks antecedent basis. Claims 8-9 inherit the same deficiency as claim 7 by reason of dependence.
Claim 10 line 8 recites “the source data of the first matrix”.  This limitation lacks antecedent basis.  It is unclear if this is the same as the first matrix or different.  For purposes of examination, Examiner interprets these as the same.
Claim 14 line 8 recites “the weight matrix”.  This limitation lacks antecedent basis. It is unclear to what matrix “the weight matrix” refers.  Claims 15-18 inherit the same deficiency as claim 14 by reason of dependence.
	Claim 16 lines 2- 3 recite “a weight matrix”.  It is unclear whether this weight matrix is the same weight matrix recited in claim 14 or a different weight matrix.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 14-18 are rejected under 35 U.S.C. 103 as being unpatentable over W.M Jose et al., Algorithm-oriented design of efficient many-core architectures applied to dense matrix multiplication, Analog Integr Circ Sig Process (2015), (hereinafter “Jose”) in view of J. Zhang et al., Improving the Performance of OpenCL-based FPGA Accelerator for Convolutional Neural Network, FPGA ’17, ACM, Feb 2017 (hereinafter “Zhang”).

Regarding claim 14, Jose teaches the following:
 the operation circuit reads a first matrix from a memory and respectively sends the M row vectors in the first matrix to the M operation groups, wherein the first matrix is an M*N matrix (Fig 2, buffer in DMA for memory, section 3, A for first matrix, Fig 4, section 4 for read/send from memory);
the operation circuit reads a second matrix from a memory and respectively writes the K column vectors of the weight matrix into the K operation blocks of each operation group, wherein the second matrix is an N*K matrix (Fig 2, buffer in DMA for memory, section 3, B for second matrix, Fig 4, section 4 for read/write a second matrix from memory); and
the operation circuit performs a matrix multiplication calculation of the first matrix and the second matrix within one clock cycle (Section 3 figure 1, section 4 second paragraph FPMA result every cycle for a matrix multiplication calculation of the first matrix and the second matrix within one clock cycle, the FPMA result for a matrix multiplication calculation).
Jose discloses reading a first and second matrix from memory but does not explicitly disclose distinct first and second memories.  However in the same field of endeavor, Zhang discloses an accelerator for a convolutional neural network that performs matrix multiplication (abstract, section 4.1 third para.).  Zhang further discloses first and second memories storing first and second matrix data (Fig 4).  It would have been obvious to one of ordinary skill in the art before the effective filing date to configure Jose’s DMA to comprise first and second memories within the DMA as disclosed by Zhang. As recognized by Zhang using on-chip memories allows opportunities to reuse data from each on chip memory (Section 4.1 third column).

Regarding claim 15, in addition to the teachings addressed in the claim 14 analysis, Jose teaches the following:
the adder circuit adds calculation results of operation units belonging to a same operation block to obtain a calculation result of each operation block (fig 3, section 3, section 4 second paragraph).

Regarding claim 16, Jose teaches the claim 14 limitations.  Jose further discloses application to compute intensive applications such as scientific computing, but does not explicitly disclose application to a convolutional neural network.  However in the same field of endeavor Zhang discloses an accelerator that performs matrix multiplication for a convolutional neural network with multiplication of a weights by input feature maps (abstract, section 4.1 third paragraph. Second 5.1 first paragraph).  It would have been obvious to one of ordinary skill in the art before the effective filing data to apply the matrix multiplication to a convolutional neural network using the weights disclosed by Zhang as the second matrix, and the feature map disclosed by Zhang as the input matrix.  As recognized by Zhang, the convolutional neural network is one of the most widely used computation intensive scientific applications (Zhang Abstract, Introduction).

Regarding claim 17, in addition to the teachings addressed in the claim 14 analysis, Jose teaches the following:
wherein M = N = K (figure 1, section 3 second paragraph).

Regarding claim 18, in addition to the teachings addressed in the claim 14 analysis, Jose teaches the following:
	wherein any two parameters M, N, and K are equal, but not equal to the third parameter (figure 1).

Claims 1-3, 5-6, and 10-13 are rejected under 35 U.S.C. 103 as being unpatentable over Jose in view of Zhang in view of US 20190347125 A1 Sankaran et al., (hereinafter “Sankaran”)

Regarding claim 1, and claim 3 Jose teaches the following: 
a memory, configured to store a first matrix, wherein the first matrix is an M*N matrix (Fig 2, buffer in DMA for memory and section 3 second paragraph for stored in memory, section 3, A for first matrix); 
a memory, configured to store a second matrix, wherein the second matrix is an N*K matrix (Fig 2, buffer in DMA for memory and section 3 second paragraph for stored in memory, section 3, B for second matrix, section 3 also desc); and 
an operation circuit connected to the first memory and the second memory, wherein the operation circuit comprises a matrix multiplying circuit; the matrix multiplying circuit comprises M operation groups, each operation group comprises K operation blocks, each operation block comprises N operation circuits, each operation circuit receives two pieces of data respectively from the first memory and the second memory, and the operation circuit multiplies the two pieces of data, so that the operation accelerator can perform M*N*K times of multiplication in one clock cycle; M, N, and K are integers greater than 0 (Fig 1, Fig 2 full circuit of cores for group M, section 3 multiple partitioned blocks for K operation block, and fig 2 each core for N operation circuits, section 4 second paragraph FPMA result every cycle for a matrix multiplication calculation of the first matrix and the second matrix within one clock cycle.
Jose discloses a memory, configured to store a first matrix, and the memory, configured to store the second matrix, but does not explicitly disclose distinct first and second memories.  However in the same field of endeavor, Zhang discloses an accelerator for a convolutional neural network that performs matrix multiplication (abstract, section 4.1 third para.).  Zhang further discloses first and second memories storing first and second matrix data (Fig 4).  It would have been obvious to one of ordinary skill in the art before the effective filing date to configure Jose’s DMA to comprise first and second memories within the DMA as disclosed by Zhang. As recognized by Zhang using on-chip memories allows opportunities to reuse data from each on chip memory (Section 4.1 third column).
Furthermore, Jose discloses N operation circuits, but does not explicitly disclose N operation units as interpreted under 35 USC 112(f), which includes at least a storage unit, a multiplying circuit connected to the storage unit and input and further input and output connections as in figure 14, wherein the storage unit is further interpreted to comprise a register, a random access memory (RAM), static RAM, flash memory, or another readable and writable memory.  
However in the same field of endeavor, Sankaran discloses a plurality of processing elements that perform matrix multiplication operations include matrix multiplication circuitry (fig 69-690x, fig 71, fig 74-740x, fig 75).  Sankaran further discloses the processing element includes both matrix multiplication circuitry and a RAM that stores one data input to the multiplication circuitry, and an input that reads another input to the multiplication circuity directly from memory (Fig 75a,b, [0906-0908]).  It would have been obvious to one of ordinary skill in the art before the effective filing date to substitute the processing element of Sankaran wherein the N operation circuits are comprised of the processing elements of Sankaran to result in the N operation units claimed.  It would have been obvious to achieve the benefit of achieving efficiencies by placing irregularly accessed data in on-chip RAMS (Sankaran [0904]).

Regarding claim 2, Jose in view of Zhang in view of Sankaran teach the claim 1 limitations.  Jose further discloses the core architecture wherein each core comprises an add unit (FPMA) (section 4, fig 3), but is silent with respect to the specific structure of the add unit.  However in the same field of endeavor Sankaran discloses:
the adder circuit comprises M*K adder trees, one adder tree is corresponding to one operation block, the adder tree is connected to N operation units in the corresponding operation block, and the adder tree is configured to add calculation results of the N operation units connected to the adder tree (fig 110, [1055-1079]).
It would have been obvious to one of ordinary skill in the art before the effective filing date to implement the adder circuitry using the adder tree disclosed by Sankaran.  As disclosed by Sankaran, use of adder tree circuitry is known to one of ordinary skill in the art in performing matrix multiplications ([1055]).  It is obvious to one of ordinary skill in the art to use a known technique to improve similar devices in the same way. MPEP 2141.III.(C).

Regarding claim 5, in addition to the teachings addressed in the claim 1 analysis, Jose teaches vertical and horizontal buses that cross the system, but does not explicitly disclose the first memory is connected to the operation circuit using a first bus, and the second memory is connected to the operation circuit using the second bus.  However in the same field of endeavor Zhang discloses:
wherein the first memory is connected to the operation circuit by using a first bus, and a bit width of the first bus is Wi*N*M; the second memory is connected to the operation circuit by using a second bus, and a bit width of the second bus is Wi*N; and Wi is a maximum bit width that is of input data and that is allowed by the operation unit (Figure 4 red and blue bus lines for first and second bus respectively, section 5.1 for bit widths inclusive of Wi*N, and Wi*N*M respectively).
The motivation to combine provided with respect to the claim 1 analysis applies equally to claim 5.  It is furthermore obvious to one of ordinary skill in the art before the effective filing date to choose the bus width as disclosed by Z the first memory is connected to the operation circuit by using a first bus, and a bit width of the first bus is Wi*N*M; the second memory is connected to the operation circuit by using a second bus, and a bit width of the second bus is Wi*N; and Wi is a maximum bit width that is of input data and that is allowed by the operation unit.t ease the design interface between compute units and the shared buffer.

Regarding claim 6, in addition to the teachings addressed in the claim 1 analysis, Jose teaches:
wherein the operation accelerator further comprises a storage unit access controller connected to the memory (fig 2), wherein 
the storage unit access controller is configured to: obtain the first matrix and the second matrix, save the first matrix  and save the second matrix to the memory (fig 4, section 5).
Jose discloses the storage access controller obtains the first and second matrix and saves to the memory, but does not explicitly disclose saving to first and second memories respectively. Jose further does not explicitly disclose a controller, wherein the controller controls the storage unit access controller.
However in the same field of endeavor, Zhang discloses an accelerator for a convolutional neural network that performs matrix multiplication (abstract, section 4.1 third para.).  Zhang further discloses first and second memories storing first and second matrix data (Fig 4).  Zhang further discloses a DDR controller, which controls read, write of data to the compute units (Section 2.1) It would have been obvious to one of ordinary skill in the art before the effective filing date to configure Jose’s storage access controller to Zhang’s controller, which controls data read, write to first and second storage units. The motivation to combine provided with respect to claim 1 applies equally to claim 6.   Furthermore, Zhang discloses a controller that controls the storage access controller (section 5.1 first paragraph, fig 2 PCI-e controller).  It would have been obvious to one of ordinary skill in the art before the effective filing date to use Zhang’s PCI-e controller to control storage unit access to achieve the benefit of enhancing code portability and programmability while using OpenCL (abstract).

Regarding claim 10, in addition to the teachings addressed in the claim 1 analysis, Jose teaches:
the operation accelerator further comprises: 
an instruction fetch buffer, configured to store an instruction used by the controller (fig 4 Instruction memory); and 
a bus interface unit connected to the instruction fetch buffer, the storage unit access controller, and an external memory, used by the instruction fetch buffer to obtain the instruction from the external memory, and further used by the storage unit access controller to obtain at least one of the source data of the first matrix, the first matrix, and the second matrix from the external memory (Section 4 vertical and horizontal buses that cross the system, fig 4, fig 5, DMA control for storage unit access controller that uses the bus interface unit to obtain source data, section 5 paragraphs 1-3 for instructions obtained from external memory).
Jose is silent with respect to the instruction buffer connected to the controller. However, in the same field of endeavor,  Zhang discloses a controller that controls the storage access (section 5.1 first paragraph, fig 2 PCI-e controller).  It would have been obvious to one of ordinary skill in the art before the effective filing date to use Zhang’s PCI-e controller to the instruction fetch buffer to achieve the benefit of enhancing code portability and programmability while using OpenCL for execution of instructions (abstract).

Regarding claim 11, Jose in view of Zhang in view of Sankaran teach the claim 1 limitations.  Jose further discloses application to compute intensive applications such as scientific computing, but does not explicitly disclose application to a convolutional neural network.  However in the same field of endeavor Zhang discloses an accelerator that performs matrix multiplication for a convolutional neural network with multiplication of a weights by input feature maps (abstract, section 4.1 third paragraph. Second 5.1 first paragraph).  It would have been obvious to one of ordinary skill in the art before the effective filing data to apply the matrix multiplication to a convolutional neural network using the weights disclosed by Zhang as the second matrix, and the feature map disclosed by Zhang as the input matrix.  As recognized by Zhang, the convolutional neural network is one of the most widely used computation intensive scientific applications (Zhang Abstract, Introduction).

Regarding claim 12, in addition to the teachings addressed in the claim 1 analysis, Jose teaches the following:
wherein M = N = K (figure 1, section 3 second paragraph).

Regarding claim 13, in addition to the teachings addressed in the claim 1 analysis, Jose teaches the following:
	wherein any two parameters M, N, and K are equal, but not equal to the third parameter (figure 1).

Allowable Subject Matter
Claim 4 is objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
Claims 7-9 would be allowable if rewritten to overcome the respective rejection under 35 USC 112(b), and if rewritten in independent form including all of the limitations of the base claim and any intervening claims.  The following is a statement of reasons for the indication of allowable subject matter.  
Applicant claims apparatus and methods for matrix multiplication.  The apparatus as in claim 1 comprises a first memory, a second memory, an operation circuit, and a controller.  The first and second memories store first and second matrixes respectively.  The operation circuit comprises a matrix multiplying circuit.  The matrix multiplying circuit comprises M operation groups, each operation group comprises K operation blocks, each operation block comprises N operation units that are configured to receive two pieces of data, one each respectively from the first memory and the second memory.  The matrix multiplying circuit is further configured to multiply the two pieces of data so that the operation accelerator can perform M*N*K times of multiplication in on clock cycle; M, N< and K are integers greater than 0.  
A reason for indication of allowable subject matter include the limitations in combination with the remaining limitations where as in claim 4:
the operation unit comprises a plurality of storage units, a multiplying circuit, a first selection circuit connected to the plurality of storage units, and a second selection circuit connected to the plurality of storage units and the multiplying circuit, wherein the plurality of storage units are configured to store data; 
the first selection circuit is configured to: before the multiplying circuit performs a multiplication operation, select, from the plurality of storage units, a storage unit for storing data used when the multiplying circuit performs the multiplication operation; 
the second selection circuit is configured to: when the multiplying circuit performs the multiplication operation, select a storage unit for storing data used when the multiplying circuit performs the multiplication operation; and 
the multiplying circuit is configured to calculate a product of received data and the data stored in the storage unit selected by the second selection circuit.

A further reason for indication of allowable subject matter include the limitations in combination with the remaining limitations where as in claim 7:
wherein the operation accelerator further comprises: 
a third memory, configured to store source data of the first matrix;
 a storage unit access controller connected to the first memory, the second memory, the third memory, and the controller, wherein the storage unit access controller is configured to: obtain, under control of the controller, the source data of the first matrix and the second matrix, save the source data of the first matrix to the third memory, and save the second matrix to the second memory; and
 a vector calculation unit connected to the first memory, the third memory, and the controller, wherein the vector calculation unit is configured to: convert, under control of the controller, the source data of the first matrix into the first matrix, and save the first matrix to the first memory.

Jose is the closest prior art found.  Jose discloses claimed subject matter in accordance with the claim mappings above.  Jose does not, however explicitly disclose as in claim 4: the operation unit comprises a plurality of storage units, a multiplying circuit, a first selection circuit connected to the plurality of storage units, and a second selection circuit connected to the plurality of storage units and the multiplying circuit, wherein the plurality of storage units are configured to store data; the first selection circuit is configured to: before the multiplying circuit performs a multiplication operation, select, from the plurality of storage units, a storage unit for storing data used when the multiplying circuit performs the multiplication operation; the second selection circuit is configured to: when the multiplying circuit performs the multiplication operation, select a storage unit for storing data used when the multiplying circuit performs the multiplication operation; and the multiplying circuit is configured to calculate a product of received data and the data stored in the storage unit selected by the second selection circuit.  Jose further does not explicitly disclose as in claim 7: a third memory, configured to store source data of the first matrix; a storage unit access controller connected to the first memory, the second memory, the third memory, and the controller, wherein the storage unit access controller is configured to: obtain, under control of the controller, the source data of the first matrix and the second matrix, save the source data of the first matrix to the third memory, and save the second matrix to the second memory; and a vector calculation unit connected to the first memory, the third memory, and the controller, wherein the vector calculation unit is configured to: convert, under control of the controller, the source data of the first matrix into the first matrix, and save the first matrix to the first memory.

Zhang discloses an accelerator for a convolutional neural network that performs matrix multiplication (abstract, section 4.1 third para.).  Zhang further discloses an array of compute units each comprising an array of processing elements that performing matrix multiplications in parallel (section 2.2, figure 2).  Zhang does not, however explicitly disclose as in claim 4: a first selection circuit connected to the plurality of storage units, and a second selection circuit connected to the plurality of storage units and the multiplying circuit, the first selection circuit is configured to: before the multiplying circuit performs a multiplication operation, select, from the plurality of storage units, a storage unit for storing data used when the multiplying circuit performs the multiplication operation; the second selection circuit is configured to: when the multiplying circuit performs the multiplication operation, select a storage unit for storing data used when the multiplying circuit performs the multiplication operation.  Zhang further does not explicitly disclose as in claim 7: save the source data of the first matrix to the third memory, and save the second matrix to the second memory; and a vector calculation unit connected to the first memory, the third memory, and the controller, wherein the vector calculation unit is configured to: convert, under control of the controller, the source data of the first matrix into the first matrix, and save the first matrix to the first memory.

Sankaran discloses the claimed invention as in the above claim mappings. Sankaran further discloses selection circuitry in the PE (Fig 74).  Sankaran further discloses a data management unit with a PE scheduler (fig 74-6905). Sankaran does not, however, explicitly disclose as in claim 4: the first selection circuit is configured to: before the multiplying circuit performs a multiplication operation, select, from the plurality of storage units, a storage unit for storing data used when the multiplying circuit performs the multiplication operation.  Sankaran further does not explicitly disclose as in claim 7: save the source data of the first matrix to the third memory, and save the second matrix to the second memory; and a vector calculation unit connected to the first memory, the third memory, and the controller, wherein the vector calculation unit is configured to: convert, under control of the controller, the source data of the first matrix into the first matrix, and save the first matrix to the first memory.

US 20220391209 A1 Fowers et al., (hereinafter “Fowers”), discloses hardware and methods for neural network processing wherein the hardware includes a matrix vector unit for that includes matrix multiplication (fig 8-10, [0051]).  Fowers further discloses a first selection circuit connected to a plurality of storage units (fig 5A-504), and a second selection circuit connected to a plurality of storage units and the multiplying circuit and selecting storage data used when multiplying at matrix vector multiplier (Fig 5A-506, 530).  Fowers does not, however, explicitly disclose as in claim 4: the first selection circuit is configured to: before the multiplying circuit performs a multiplication operation, select, from the plurality of storage units, a storage unit for storing data used when the multiplying circuit performs the multiplication operation.  Fowers further does not explicitly disclose as in claim 7: save the source data of the first matrix to the third memory, and save the second matrix to the second memory; and a vector calculation unit connected to the first memory, the third memory, and the controller, wherein the vector calculation unit is configured to: convert, under control of the controller, the source data of the first matrix into the first matrix, and save the first matrix to the first memory.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to EMILY E LAROCQUE whose telephone number is (469)295-9289.  The examiner can normally be reached on 10:00am - 1200pm, 2:00pm - 8pm ET M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor Jyoti Mehta can be reached on 571-270-3995.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/EMILY E LAROCQUE/Primary Examiner, Art Unit 2182