DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
EXAMINER’S AMENDMENT
An examiner’s amendment to the record appears below. Should the changes and/or additions be unacceptable to applicant, an amendment may be filed as provided by 37 CFR 1.312. To ensure consideration of such an amendment, it MUST be submitted no later than the payment of the issue fee.
Authorization for this examiner’s amendment was given via email by Letao Qin on 06/24/2021.
The application has been amended as follows: 

(Currently Amended) A computation device, comprising:
a controller unit, and 
a conversion unit, 
wherein the controller unit is configured to: 
obtain a data conversion instruction and[[/or]] one or more  operation instructions, wherein the data conversion instruction comprises an opcode field and an opcode, wherein the opcode is configured to indicate information of a function of the data conversion instruction, and the opcode field comprises information of a decimal point position, a flag bit indicating a data type of [[the]]a first input data, and an identifier of data type conversion; 

obtain the first input data; and

transmit the opcode and the opcode field of the data conversion instruction and the first input data to the conversion unit; and

wherein the conversion unit is configured to convert the first input data into a second input 

2.	(Currently Amended) The computation device of claim 1, wherein the obtaining the data conversion instruction and[[/or]] one or more operation instructions by the controller unit includes:
obtaining, by the controller unit, a computation instruction, and parsing, by the controller unit, the computation instruction to obtain [[a]]the data conversion instruction and[[/or]] the one or more operation instructions.

3.	(Currently Amended) The computation device of claim 2, wherein the computation device is configured to perform a machine learning computation, and further includes an operation unit, wherein:
            the controller unit is further configured to transmit the one or more 
            the conversion unit is further configured to transmit the second input data to the operation unit, and
the operation unit is configured to operate on the second input data according to the one or more 

4.	(Original) The computation device of claim 3, wherein the machine learning computation includes an artificial neural network operation; the first input data includes an input neuron and a weight; and the computation result is an output neuron.

5.	(Currently Amended) The computation device of claim [[3]]4, wherein the operation unit includes a primary processing circuit and a plurality of secondary processing circuits, wherein:
the primary processing circuit is configured to perform pre-processing on the second input data and to transmit data and the one or more 
the plurality of secondary processing circuits is configured to perform intermediate operations to obtain a plurality of intermediate results according to the second input data and the one or more operation instructions transmitted from the primary processing circuit, and transmit the plurality of intermediate results to the primary processing circuit, and
the primary processing circuit is further configured to perform post-processing on the plurality of intermediate results to obtain the computation result of the computation instruction.

6.	(Currently Amended) The computation device of claim 5, further comprising a storage unit and a direct memory access (DMA) unit, wherein:
the storage unit includes [[any]]a combination of a register and a cache,
the cache includes a scratch pad cache and is configured to store the first input data, 
the register is configured to store scalar data in the first input data, and
the DMA unit is configured to read data from the storage unit or store data into the storage unit.

7.	(Currently Amended) The computation device of claim 5, wherein the controller unit includes an instruction cache unit, an instruction processing unit, and a storage queue unit, wherein:
the instruction cache unit is configured to store the computation instruction associated with an artificial neural network operation,
the instruction processing unit is configured to parse the computation instruction to obtain the data conversion instruction and the one or more operation instructions, and to parse the data conversion instruction to obtain the opcode and the opcode field of the data conversion instruction, and
the storage queue unit is configured to store an instruction queue, the instruction queue including a plurality of operation 

8.	(Currently Amended) The computation device of claim 7, wherein the controller unit further includes:
a dependency relationship processing unit configured to: 
determine whether there exists an associated relationship between a first operation instruction and a zeroth operation instruction before the first operation instruction,
cache the first operation instruction in the instruction cache unit based on a determination that there exists an associated relationship between the first operation instruction and the zeroth operation instruction, and
extract the first operation instruction from the instruction cache unit to the operation unit, when an execution of the zeroth operation instruction is completed,
wherein determining whether there exists an associated relationship between [[a]]the first operation instruction and [[a]]the zeroth operation instruction before the first operation instruction by the dependency relationship processing unit includes:
extracting a first storage address interval of data required in the first operation instruction according to the first operation instruction, 
extracting a zeroth storage address interval of data required in the zeroth operation instruction according to the zeroth operation instruction,
determining that there exists an associated relationship between the first operation instruction and the zeroth operation instruction, when an overlapped region exists between the first storage address interval and the zeroth storage address interval, and
determining that there does not exist an associated relationship between the first operation instruction and the zeroth operation instruction, when no overlapped region exists between the first storage address interval and the zeroth storage address interval.

9.	(Currently Amended) The computation device of claim 3, wherein when the first input data is fixed-point data, the operation unit further includes:
a decimal point position of the first input data, wherein the one or more intermediate results are obtained according to the first input data.

10.	(Original) The computation device of claim 9, wherein the operation unit further includes:
a data cache unit configured to cache one or more intermediate results.

11.	(Currently Amended) The computation device of claim [[4]]5, wherein the operation unit includes a tree module, wherein:
the tree module includes a root port coupled with the primary processing circuit and a plurality of branch ports coupled with the plurality of secondary processing circuits, and
the tree module is configured to forward data and the  one or more operation instructions transmitted among the primary processing circuit and the plurality of secondary processing circuits, and
the tree module is an n-tree structure, the n being an integer greater than or equal to two.

12.	(Currently Amended) The computation device of claim [[4]]5, wherein the operation unit further includes a branch processing circuit, wherein: 
the primary processing circuit is is broadcast data and the weight[[s are]]is distribution data, divide the distribution data into a plurality of data blocks, and transmit at least one of the plurality of data blocks, the broadcast data, and at least one of the one or more operation instructions to the branch processing circuit,
the branch processing circuit is configured to forward the data blocks, the broadcast data, and the at least one of the one or more
the plurality of secondary processing circuits is configured to perform operations on  one or more operation instructions to obtain [[a]]the plurality of intermediate results, and to transmit the plurality of intermediate results to the branch processing circuit, and
the primary processing circuit is further configured to perform post-processing on the plurality of intermediate results received from the branch processing circuit to obtain [[a]]the computation result of the computation instruction, and to send the computation result of the computation instruction to the controller unit.

13.	(Currently Amended) The computation device of claim [[4]]5, wherein the plurality of secondary processing circuits is distributed in an array, wherein:
each of the plurality of secondary processing circuit is coupled with adjacent other secondary processing circuits, and the primary processing circuit is coupled with K secondary processing circuits of the plurality of secondary processing circuits,
the K secondary processing circuits include n secondary processing circuits in [[the]]a first row, n secondary processing circuits in [[the]]a mth row, and m secondary processing circuits in [[the]]a first column, and the K secondary processing circuits are configured to forward data and instructions transmitted among the primary processing circuit and the plurality of secondary processing circuits,
the primary processing circuit is further configured to determine that the input neuron[[s are]]is broadcast data, the weight[[s are]]is distribution data, divide the distribution data into a plurality of data blocks, and transmit at least one of the plurality of data blocks and at least one of the one or more 
the K secondary processing circuits are configured to convert the data transmitted among the primary processing circuit and the plurality of secondary processing circuits,
the plurality of secondary processing circuits is configured to perform operations on the plurality of data blocks received according to the one or more operation instructions to obtain [[a]]the plurality of intermediate results, and to 
the primary processing circuit is configured to process the plurality of intermediate results received from the K secondary processing circuits to obtain the computation result of the computation instruction, and to send the computation result of the computation instruction to the controller unit.


14.	(Currently Amended) The computation device of claim [[11]]13, wherein:
the primary processing circuit is secondary processing circuits to obtain [[a]]the computation result of the computation instruction, or
the primary processing circuit is perform a combined ranking processing and an activation processing on the plurality of intermediate results received from the plurality of secondary processing circuits to obtain the computation result of the computation instruction.

15.	(Currently Amended) The computation device of claim 11, wherein the primary processing circuit includes a combination of an activation processing circuit and an addition processing circuit, wherein:
the activation processing circuit is configured to perform an activation operation on data in the primary processing circuit, and
the addition processing circuit is configured to perform an addition operation or an accumulation operation, and
the plurality of secondary processing circuits includes:
multiplication processing circuits configured to perform a multiplication operation on [[the]] data blocks received to obtain product results, and
an accumulation processing circuits configured to perform an accumulation operation on the product results to obtain the plurality of intermediate results.

16.	(Currently Amended) A machine learning operation device, comprising one or more computation devices each according to claim 3, wherein the one or more computation devices [[is]]are configured to obtain data to be processed and control information from other processing devices, to perform a specified machine learning computation, and to transmit an execution result to the other processing devices through I/O interfaces, wherein:
when the machine learning operation device includes a plurality of the computation devices, the plurality of computation devices is configured to couple and transmit data with each other through a specific structure, and
	            the plurality of computation devices is configured to:
interconnect and to transmit data through a fast external device interconnection PCIE (peripheral component interface express) bus to support larger-scale machine learning computations,
share the same one control system or have respective control systems,
share the same one memory or have respective memories, and
deploy an interconnection manner of any arbitrary interconnection topology.

17.	(Original) A combination processing device, comprising the machine learning operation device of claim 16, universal interconnection interfaces, a storage device, and other processing devices, wherein:
the machine learning operation device is configured to interact with the other processing devices to jointly perform user-specified computing operations, and
the storage device is configured to couple with the machine learning operation device and the other processing devices respectively for storing data of the machine learning operation device and the other processing devices.

18.	(Currently Amended) A neural network chip, comprising 

19.	(Original) An electronic device, comprising the neural network chip of claim 18.

a storage device, an interface device, a control device, and the neural network chip of claim 18, wherein:
the neural network chip is respectively coupled with the storage device, the control device, and the interface device,
the storage device is configured to store data,
the interface device is configured to implement data transmission between the neural network chip and external devices, 
the control device is configured to monitor a status of the neural network chip,
wherein the storage device includes a plurality of groups of storage units, each group of the plurality of groups of [[the]] storage units being coupled with the neural network chip through a bus, and [[the]]each storage unit being a double data rate (DDR) synchronous dynamic random access memory (SDRAM),
the neural network chip includes a DDR controller for controlling data transmission and data storage of each 
the interface device is a standard PCIE interface.

21.	(Currently Amended) A method for performing a machine learning computation, comprising:
obtaining, by [[the]]a controller unit, [[the]]a data conversion instruction and [[the]]a plurality of operation instructions, wherein the data conversion instruction comprises an opcode field and an opcode, wherein the opcode is configured to indicate information of a function of the data conversion instruction, the opcode field comprises information of a decimal point position, a flag bit indicating a data type of [[the]]a first input data, and a data type conversion, and
converting, by [[the]]a conversion unit, the first input data into second input data according to the data conversion instruction, the second input data being fixed-point data.

22.	(Original) The method of claim 21, wherein the controller unit is configured to obtain the data conversion instruction and the plurality of operation instructions, wherein obtaining the data conversion instruction and the plurality of operation instructions includes:


23.    (Currently Amended) The method of claim 22, wherein the method is configured to perform a machine learning computation, and further includes:
operating on the second input data according to the plurality of operation instructions to obtain a result of the computation instruction.

  24.     (Currently Amended) The method of claim 23, wherein:
the machine learning computation includes [[the]]an artificial neural network operation,
the first input data includes an input neuron and a weight, and
the 

25.	(Currently Amended) The method of claim 23, wherein converting the first input data into the second input data according to the data conversion instruction by the conversion unit includes:
parsing the computation instruction to obtain information of the decimal point position, wherein the flag bit indicates [[a]]the data type of the first input data, and the data type conversion,
determining the data type of the first input data according to the flag bit indicating the data type of the first input data, and 
converting the first input data into the second input data according to the decimal point position and the data type conversion, the data type of the first input data [[is]]being inconsistent with [[that]]a data type of the second input data.


26.	(Currently Amended) The method of claim 23, wherein:
when the first input data and the second input data are both fixed-point data, the decimal point position of the first input data is inconsistent with [[that]]a decimal point position of the second input data.

27.	(Currently Amended) The method of claim 26, wherein when the first input data is fixed-point data, the method further includes:
deriving a decimal point position of at least one intermediate result according to the decimal point position of the first input data, wherein the at least one intermediate result[[s are]]is obtained by operating according to the first input data.

Reasons for Allowance
Claims 1-27 are allowed.
The following is an examiner’s statement of reasons for allowance: 
The known prior art of record, taken alone or in combination, was not found to teach, in combination with other limitations in the claims, a data conversion instruction that comprises an opcode and opcode field, where the opcode indicates the function of the data conversion instruction, and the opcode field indicates a decimal point position, a flag bit indicating a data type of input data, and an identifier of data type conversion, and wherein a conversion unit converts the input data according to the opcode and the opcode field into fixed-point data. In particular, while the prior art was found to generally teach instructions for data conversion, the prior art was not found to teach a data conversion instruction with the specific opcode and opcode fields used to indicate the specific information for converting input data into fixed-point data as described in claim 1 (and similarly claim 21).  
The following prior art was found to be of closest relevance:
CN107608715 teaches a matrix instruction that has an opcode and operation domain field, where the operation domain field indicates a data size of a register number of data with a specified size (page 4 of Google translation, provided in IDS dated 03/17/2020), but does not teach the operation domain field including a flag bit indicating a data type 
US 7,236,995 (hereinafter, Hinds) teaches a format conversion instruction that converts between fixed-point and floating-point and includes a control field specifying a decimal point (col 3 lines 5-15). However, Hinds does not teach the control field including a flag bit indicating a data type of the input and an identifier of data type conversion. 
US 7,242,414 (hereinafter, Thekkath) teaches a format conversion instruction with a COP1 field that instructions the processor to perform a specific action and the instruction also specifies an input data location and the format of the input data (col 28 lines 15-37). However, Thekkath does not teach the format field being indicated by a flag bit, and further does not teach the same field indicating a decimal point position and an identifier of data type conversion.
US 2018/0157464 (hereinafter, Lutz) teaches a convert and accumulate instruction with an opcode, a source register field that identifies a floating point operand, and a second register field that functions as the destination register and contains a fixed-point value ([0058]-[0059]). However, Lutz does not teach the convert and accumulate instruction comprising information of a decimal point position or a flag bit indicating that the source register is a floating point in the same field.
US 10,224,954 (hereinafter, Madduri) teaches a convert instruction that converts floating point data to a fixed-point representation and includes fields for an opcode, a source identifier, and a destination identifier (col 4 lines 54-60). However, Madduri does not teach the convert instruction including an opcode field indicating a decimal point position, a flag bit indicating a data type of input data, and an identifier of data type conversion. 


Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to KASIM ALLI whose telephone number is (571)270-1476.  The examiner can normally be reached on Monday - Friday 9am 5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Aimee Li can be reached on 5712724169.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.







/William B Partridge/Primary Examiner, Art Unit 2183