DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Remarks
Claim 1 line 7 recites “an interface couplable to a memory…” Examiner suggests Applicant to amend the word “couplable” to “coupled”.

Drawings
The drawings are objected to under 37 CFR 1.83(a).  The drawings must show every feature of the invention specified in the claims.  Therefore, the claims 1 and 18 recites "a memory controller", claim 2 recites "a set of connectors". must be shown or the feature(s) canceled from the claim(s).  No new matter should be entered.
Corrected drawing sheets in compliance with 37 CFR 1.121(d) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. The figure or figure number of an amended drawing should not be labeled as “amended.” If a drawing figure is to be canceled, the appropriate figure must be removed from the replacement sheet, and where necessary, the remaining figures must be renumbered and appropriate changes made to the brief description of the several views of the drawings for consistency. Additional replacement sheets may be necessary to show the renumbering of the remaining figures. Each drawing sheet submitted after the filing date of an application must be labeled in the top margin as either “Replacement Sheet” or “New Sheet” pursuant to 37 CFR 1.121(d). If the changes are not accepted by the examiner, the applicant will be notified and informed of any required corrective action in the next Office action. The objection to the drawings will not be held in abeyance.

Claim Objections
Claims 3-12, and 18-20 are objected to because of the following informalities:  
Claim 3 line 1 "the processing units includes" should be "the processing units include". Dependent claims should be objected for inheriting the same deficiencies in which claim they depend on.
Claim 18 line 9 "the set of computing logic" should be "a set of computing logic". Dependent claims should be objected for inheriting the same deficiencies in which claim they depend on.
Appropriate correction is required.

Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f), is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f):
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f). The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f), is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f). The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f), is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f), except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f), except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f), because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation(s) is/are:
"a first unit configured to perform computations…" in claim 3. Figure 2 illustrates the matrix-matrix unit that provides structure for a first unit to perform computations on two matrix operands [0047-0060] with figure 3 further illustrating the structure of each matrix-vector unit of Fig. 2 and figure 4 illustrating structure of each vector-vector unit shown in Fig. 3.
"a plurality of second units configured to perform…" in claim 4. figure 2 illustrates a plurality of matrix-vector units included in the first unit, figure 3 further illustrates the structure of each matrix-vector unit or second unit to perform computations on the first matrix and respectively the plurality of vectors of the second matrix [0047-0060].
"a plurality of third units … configured to operate in parallel" in claim 5. Figure 3 illustrates a plurality of vector-vector units and figure 4 illustrates structure of each vector-vector unit to perform computation on a common vector and respectively a plurality of vectors of the first matrix [0047-0060].	
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f), it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f), applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f).

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claims 1-2, 13-19 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Huang (US 20190180170).
Regarding claim 1, Huang teaches a device (Huang, figure 5 neural network processing engine), comprising: 
a set of computing logic having processing units configured to perform at least computations on matrix operands (Huang, figure 5, processing engine array comprises a plurality of processing engines 511 [i.e. a set of computing logic having processing units] [0077] to execute parallel matrix multiplication); 
random access memory coupled to the set of computing logic and configured to store instructions executable by the set of computing logic and store matrices of an Artificial Neural Network (Huang, figure 5 memory subsystem 504 [i.e. random access memory] couple to the processing engine array [i.e. the set of computing logic]. [0075] memory banks 514 can be implemented using SRAM. [0085-0086] input data 550 and weight 506 can be stored in the memory banks 514, [0072, 0093] memory banks in memory subsystem are used to store weights and state values, and states can include … instructions for the processing engine array, and weights of neural network can also be stored in the memory subsystem 504 for the processing engine array to perform matrix multiplication [0077]); and
an interface couplable to a memory controller external to the device and distinct from the set of computing logic, the interface configured to facilitate access to the random access memory by the memory controller (Huang, figures 5 and 8, chip interconnect [i.e. an interface] coupled to the DRAM controller [0118] DRAM controller can also be referred to as memory controller. As shown in figures 5 and 8, the DRAM controller is external to the neural network processing engine and distinct from the processing engine array. Figures 5 and 8 illustrate the chip interconnect is coupled to the DRAM controller and the memory subsystem); 
wherein in response to an indication provided in the random access memory, the set of computing logic is configured to execute the instructions to apply first input that is stored in the random access memory to the Artificial Neural Network, generate first output from the Artificial Neural Network, and store the first output in the random access memory (Huang,  [0072,0093] the states and weights are stored in the memory subsystem and the states can include instructions for processing engine array, the state can include input data, wherein [0086] can include an indication to perform task (e.g. image processing, speech recognition, machine transition etc.,.). [0088] when in operation (e.g. computing result for a set of input data), the processing engine array reads state values 608 from the memory subsystem 604 to apply on the weights of the neural network and generate computation results and store the computation results into memory subsystem).

Regarding claim 2, Huang teaches the device of claim 1, further comprising:
an integrated circuit package configured to enclose the device (Huang, [0071] the neural network processing engine is an integrated circuit [i.e. an integrated circuit package] that can be included in a neural network processor ); and 
a set of connectors configured to couple the interface to the memory controller located outside of the integrated circuit package (Huang, figure 8 a set of arrows [i.e. a set of connectors] coupled to the chip interconnect to the DRAM controller, which is located outside of the neural network processing engine).

Regarding claim 13, Huang teaches a method, comprising: 
accessing random access memory of a computing device using an interface of the computing device to a memory controller (Huang, figures 5 and 8, chip interconnect [i.e. an interface] coupled to the DRAM controller [0118] DRAM controller can also be referred to as memory controller. Figures 5 and 8 illustrate the chip interconnect is coupled to the DRAM controller and the memory subsystem [i.e. random access memory] of neural network processing engine [i.e. a computing device]), the computing device having processing units configured to perform at least computations on matrix operands (Huang, figure 5, the neural network processing engine comprises a plurality of processing engines 511 [processing unit][0077] to execute parallel matrix multiplication); 
writing, through the interface into the random access memory, instructions executable by the processing units; writing, through the interface into the random access memory, matrices of an Artificial Neural Network; writing, through the interface into the random access memory, first input to the Artificial Neural Network (Huang, [0072, 0093] memory banks in memory subsystem are used to store weights and state values, and states can include partial sums determined by the processing engine array, a current layer of the neural network that is being operated on, and/or instructions for the processing engine array and weights of neural network can also be stored in the memory subsystem 504 for the processing engine array to perform matrix multiplication [0077]); 
providing, in the random access memory, an indication that causes the processing units to start execution of the instructions, wherein the processing units execute the instructions to combine the first input with the matrices of the Artificial Neural Network to generate first output from the Artificial Neural Network and store the first output in the random access memory (Huang, [0085-0086] input data can include indication of the task to perform and as shown in figure 5 the input data provided into memory subsystem. Thus upon receiving the indication, [0088] when in operation (e.g. computing result for a set of input data), the processing engine array reads state values 608 from the memory subsystem 604 to apply on the weights of the neural network and generate computation results and store the computation results into memory subsystem); and 
reading, through the interface, the first output from the random access memory (Huang, [0076--0077] the result is stored in the memory subsystem, and can be read for the next round of computation, for example, can be input to another processing engine via chip interconnect).

Regarding claim 14, Huang teaches the method of claim 13, wherein the computing device is an integrated circuit device enclosed within an integrated circuit package; the integrated circuit device has a Deep Learning Accelerator having the processing units, a control unit and local memory (Huang, figure 5 [0071] the neural network processing engine is an integrated circuit that can be included in a neural network processor, the neural network processing engine [i.e. the integrated circuit device] comprises processing engine array having a plurality of processing engine array [i.e. a deep learning accelerator having the processing units], [0076] memory subsystem can include a control logic [i.e. a control unit], a result buffer); the processing units include at least a matrix-matrix unit configured to execute an instruction having two matrix operands (Huang, figure 5, the plurality of processing engines [i.e. the processing units] made up a processing engine array [i.e. a matrix-matrix unit] to perform matrix multiplication of weights and state values [0077, 0088]. Figure 3 illustrates weight matrix and input feature map matrix); the matrix-matrix unit includes a plurality of matrix-vector units configured to operate in parallel (Huang, figure 5, processing engine array [i.e. matrix-matrix unit] comprises a plurality of columns of processing engines [i.e. a plurality of matrix-vector unit] performing in parallel [0048, 0077]); each of the matrix-vector units includes a plurality of vector-vector units configured to operate in parallel (Huang, each pair of columns of the PE array [i.e. each of the matrix-vector units] comprises 2 columns of PE array [i.e. a plurality of vector-vector units] performing in parallel); and each of the vector-vector units includes a plurality of multiply-accumulate units configured to operate in parallel (Huang, each column of the PE array [i.e. each of the vector-vector units] includes a plurality of elements, which are multiplier-accumulator elements and are performed in parallel).

Regarding claim 15, Huang teaches the method of claim 14, further comprising: converting a description of the Artificial Neural Network into the instructions and the matrices (Huang, [0034-0035, 0046-0048] Training occurs offline and before the neural network is put into operation, training sample sets can be large. Once trained, a neural network includes the weights determined during the training and a set of instructions [i.e. instructions and the matrices] describing the computation to be executed at each layer or node of the network, thus the training data sample for neural network to perform tasks and [0034] the underlying program for the neural network (e.g. organization of nodes into layers, the connections between the nodes of each layer, and the computation executed by each node further provide description of ANN).

Regarding claim 16, Huang teaches the method of claim 15, further comprising: 
writing, through the interface into the random access memory, second input to Artificial Neural Network, during a time period in which the Deep Learning Accelerator executes the instructions to generate the first output from the first input according to the matrices of the Artificial Neural Network (Huang, [0085] input data can arrive over chip interconnect and can be stored in the memory banks along with the weights, [0074] the memory banks 514 can be independently accessible, wherein the weights and states can be read at the same time that intermediate results are written to the memory subsystem. [0094] on every clock cycle, the weights can be read out and intermediate results can be stored as state values. Thus, new input data is written to the memory subsystem while the processing engine array executes on operations in order for the result and the input data to be able to read and write at the same time. figures 7A-7C [0108] storing the data for the second neural network can occur during computation of a result of first input data, and storing the weights of second input data can start concurrently with receipt of second input data, the second input data is received while the engine 702 is in the process of computing a result for the first input data, and receiving second input data triggers computation of a result for second input data); and 
providing, after the first output is stored in the random access memory, an indication in the random access memory to cause the processing units to start execution of the instructions to generate second output from the second input (Huang [0085-0086] the input data can be stored in the memory banks of the memory subsystem, and the input data can include indication of the task to perform. Thus upon receiving the indication, [0088] when in operation (e.g. computing result for a set of input data), the processing engine array reads state values 608 from the memory subsystem 604 to apply on the weights of the neural network and generate computation results and store the computation results into memory subsystem. [0107-0108] receiving the second input data triggers computation of a result for the second input data).

Regarding claim 17, Huang teaches the method of claim 16, wherein the reading of the first output from the random access memory is performed within a time period in which the Deep Learning Accelerator executes the instructions to generate the second output from the second input according to the matrices of the Artificial Neural Network (Huang, [0076] the control logic can move data between memory banks 514, for example, to move intermediate results from the memory banks 514 to which the intermediate results are written, to the memory banks 514 from which the intermediate results will be read for the next round of computation. [0074] the states can be read at the same time that intermediate results are written to the memory subsystem, thus reading would be performed within a time period for generating the results. [0108]).

Regarding claim 18, Huang teaches an apparatus (Huang, figure 5 neural network processing engine), comprising: 
random access memory (figure 5 memory subsystem [i.e. random access memory]); 
a Field-Programmable Gate Array (FPGA) or Application Specific Integrated circuit (ASIC) ([0071] neural network processing engine is an integrated circuit that can be included in a neural network processor  [i.e. ASIC]) having: 
a memory interface to access the random access memory (figure 5 the data path arrows [i.e. a memory interface] from result buffer to memory subsystem ); 
a control unit ([0076] memory subsystem can include control logic); and 
at least one processing unit configured to operate on two matrix operands of an instruction executed in the FPGA or ASIC ([0077] the processing engine array [i.e. at least one processing unit] is the computation matrix of the neural processing engine, for example, matrix multiplication); and
a connection between the random access memory and the set of computing logic (figure 5 illustrates data path arrows [i.e. a connection] couple the memory subsystem and the processing engine array); and 
an interface to a memory controller (figure 8 a chip interconnect [i.e. an interface]), the interface configured to facilitate access to the random access memory by the memory controller (figures 5 and 8, the chip interface is coupled to the DRAM controller and the memory subsystem of the neural network processing engine); 
wherein in response to an indication provided in the random access memory, the FPGA or ASIC configured to execute instructions stored in the random access memory ([0072,0093] the states and weights are stored in the memory subsystem and the states can include instructions for processing engine array, the state can include input data, wherein [0086] can include an indication to perform task (e.g. image processing, speech recognition, machine transition etc.,.)); and 
wherein execution of the instructions by the FPGA or ASIC is configured to generate first output from an Artificial Neural Network based on first input stored in the random access memory and at least one matrix of Artificial Neural Network stored in the random access memory ([0088] when in operation (e.g. computing result for a set of input data), the processing engine array reads state values 608 from the memory subsystem 604 to apply on the weights of the neural network and generate computation results and store the computation results into memory subsystem).

Regarding claim 19, Huang teaches The apparatus of claim 18, further comprising: 
an integrated circuit package enclosing the apparatus ([0071] the neural network processing engine is an integrated circuit that can be included in a neural network processor); 
wherein the FPGA or ASIC is formed on a first integrated circuit die ([0144] an array of processing engines on a die]).

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claim 20 are rejected under 35 U.S.C. 103 as being unpatentable over Huang in view of  Muralimanohar – US 20190042411

Regarding claim 20, Huang teaches the apparatus of claim 19, however Huang does not teach wherein the random access memory includes non- volatile memory formed on a second integrated circuit die; and the connection includes Through-Silicon Vias (TSVs) between the first integrated circuit die and the second integrated circuit die
However, Muralimanohar teaches the random access memory includes non- volatile memory formed on a second integrated circuit die (Muralimanohar, figures 3 [0028-0030] illustrates a memory 302 [i.e. random access memory], wherein the memory 302 maybe non-volatile. Figure 4 [0038-0040] illustrates the memory is on dies 402 and 404 [i.e. one or more second integrated circuit dies])  ; and 
the connection includes Through-Silicon Vias (TSVs) between the first integrated circuit die and the second integrated circuit die ( Muralimanohar, [0030] the logical engine 304 maybe connected to memory via a data bus having high bandwidth [i.e. the connection], such as a Through silicon Via. Figure 4 illustrates logical engine 304 on die 406 [i.e. first integrated circuit die] and memory on 402 and 404 [second integrated circuit die] connected using TSV 410).
It would have been obvious for one of ordinary skills in the art before the effective filing date of the claimed invention to modify Huang’s to neural network processing engines and memory subsystem to be on different dies, which are interconnected having a relatively high bandwidth, such as TSVs as disclosed by Muralimanohar figures 3-4. This modification would have been obvious because both references disclose apparatus for performing matrix operation and as recognized by Muralimanohar [0043] having memory storing data on a different die to the logical engine 304 would free up space for more computation logic on a processing die 406 and providing the memory on separate die 402 and 404 may also reduce the footprint of the stack 400.

Allowable Subject Matter
Claims objected to as being dependent upon a rejected base claim, but would be allowable if rewritten to overcome objections and in independent form including all of the limitations of the base claim and any intervening claims.
The following is a statement of reasons for the indication of allowable subject matter: 
	Applicant claims a device, comprising: a set of computing logic having processing units configured to perform at least computations on matrix operands; random access memory coupled to the set of computing logic and configured to store instructions executable by the set of computing logic and store matrices of an Artificial Neural Network; and an interface couplable to a memory controller external to the device and distinct from the set of computing logic, the interface configured to facilitate access to the random access memory by the memory controller; wherein in response to an indication provided in the random access memory, the set of computing logic is configured to execute the instructions to apply first input that is stored in the random access memory to the Artificial Neural Network, generate first output from the Artificial Neural Network, and store the first output in the random access memory. Applicant further claims wherein the processing units includes a first unit configured to perform computations on two matrix operands of an instruction as recited in claim 3.
	The primary reasons for indication of allowable subject matter is limitation in combination of all limitations, such as the specific structure of a first unit configured to perform computations on two matrix operands of an instruction.

Huang – US 20190180170
	Huang teaches an integrated circuit for a neural network processing system that includes at least a neural network processing engine 502 as shown in figure 5, wherein engine 502 further include a memory subsystem 504 that stores instructions and matrices for processing engine array 510 to execute, Huang also teaches that the memory subsystem include a plurality of banks that can be access independently, wherein weight and state can be read at the same time that intermediate results are written to the memory subsystem, Huang further discloses that storing the weight for second neural network can occur during computation of a result for the first input data and storing the weight can occur concurrently with receiving input data and receipt of the second input data triggers computation of a result for the second input data. However Huang does not disclose the specific structure of a first unit configured to perform computations on two matrix operands of an instruction.
Zejda – US 10515135
Zejda discloses a device as shown in figure 3 to include compute array to perform matrix operation, figure 3, compute array 362 have a systolic array structure of compute cores 602 [i.e. a set of computing logic having processing units] to perform matrix multiplication of matrix A and matrix B as shown in figure 6 column 10 line 22-49, and system memory 216 and RAM 226 [i.e. random access memory] coupled to compute array 362 [i.e. the set of computing logic] and column 5 line 29-34, system memory 216 stores executable instructions. column 7 line 43-56, compute array 362 performs matrix operations for implementing a neural network, and weight matrices may be read from RAM 226 [i.e. store matrices of an artificial neural network. However, Zejda does not explicitly disclose the specific structure of a first unit configured to perform computations on two matrix operands of an instruction.
Olsen – US 10387122 
Olsen teaches an example of systolic matrix multiplier as illustrated in figure 1, wherein accelerator 106 includes memory 146 to store instructions and matrices and local memory 125 and 130 to receive the matrices to be performed by systolic array 100. Olsen further teaches that the data rate exiting the systolic array 100 maybe equal to the data flow of matrix operand input. However Olsen does not disclose the specific structure of a first unit configured to perform computations on two matrix operands of an instruction.
Kim – US 20190303198
Kim discloses a system include a computing module, which includes a plurality of processing unit, each processing unit has an implementation as shown in figure 2, includes function unit having a plurality of computation module to perform matrix operation. system 100 in figure 1 figure storages 120 and 170 to store input data and instruction to be performed by computing module, and command processor to provide command to the computing module. However, Kim does not explicitly disclose the specific structure of a first unit configured to perform computations on two matrix operands of an instruction.
Therefore, none of the closest found prior art discloses the limitation as required in claim 3. Thus claims 3-12 would be allowable if rewritten to overcome objections and in independent form including all of the limitations of the base claim and any intervening claims.


Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to HUY DUONG whose telephone number is (571)272-2764. The examiner can normally be reached Mon-Friday 7:30-5:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jyoti Mehta can be reached on 571-270-3995. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/HUY DUONG/            Examiner, Art Unit 2182                                                                                                                                                                                            	(571)272-2764

/JYOTI MEHTA/            Supervisory Patent Examiner, Art Unit 2182