DETAILED ACTION
Claims 1-24 are pending.
The office acknowledges the following papers:
Drawings filed on 2/5/2020,
IDS filed on 5/18/2020 and 3/17/2020.

	Priority
The effective filing date for the subject matter defined in the pending claims in this application is 2/13/2018.

Drawings
The Examiner contends that the drawings submitted on 2/5/2020 are acceptable for examination proceedings. 

Specification
The disclosure is objected to because of the following informalities:
The lengthy specification has not been checked to the extent necessary to determine the presence of all possible minor errors. The Applicant’s cooperation is requested in correcting any errors of which the Applicant may become aware.
The title of the invention is not descriptive. A new title is required that is clearly indicative of the invention to which the claims are directed. The following title is suggested: “Computing Device with a Conversion Unit to Convert data values between various sizes of Fixed-Point and Floating-Point data”.
Appropriate correction is required.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.

Claim 1 is rejected under 35 U.S.C. 103 as being unpatentable over Liu et al. (“Cambricon: An Instruction Set Architecture for Neural Networks” 2016 IEEE), in view of Hinds (U.S. 2007/0220076).
As per claim 1:
Liu and Hinds disclosed a computation device, comprising: 
a storage unit (Liu: Section II)(The scratchpad memory reads upon the storage unit.), 
a controller unit (Liu: Figure 8, Section IV)(The decoder reads upon the controller unit.), and 
a conversion unit (Liu: Figure 8, Section IV)(Hinds: Figures 5 and 7, paragraphs 95 and 113-114)(Hinds disclosed circuitry to convert fixed-point data to floating-point 
the controller unit is configured to: 
obtain one or more operation instructions (Liu: Figure 8, Section IV)(Hinds: Figure 3B, paragraphs 64-81, table 1)(Liu disclosed a fetch stage to fetch instructions to be decoded. Hinds disclosed a plurality of instructions to convert different sizes of fixed-point data to different sizes of floating-point data and vice versa. The combination implements the instructions of Hinds into the processor of Liu.), wherein the operation instruction is a fixed-point format operation instruction (Hinds: Paragraphs 64-81, table 1), and an opcode field of the fixed-point format operation instruction comprises a first address of first input data, a first address of output data, and a decimal point position (Liu: Figures 2, 4, and 6, Sections II and III)(Hinds: Paragraphs 15, 64-65, and 95)(Hinds disclosed scalar fixed-point conversion instructions including a source and destination register field and a value specifying the locations of the decimal point (i.e. decimal point position). Liu disclosed vector instructions referencing a scratchpad memory for source and destination operands. The combination implements the fixed-point conversion instructions of Hinds as vector operations in Liu using the scratchpad memory for operand storage.); 
parse the operation instruction to obtain the first address of the first input data, the first address of the output data, and the decimal point position, and obtain the first input data from the storage unit according to the first address of the first input data (Liu: Figures 8, Sections IV)(Hinds: Paragraphs 15, 64-65, and 
transmit the first input data and the decimal point to the conversion unit (Liu: Figure 8, section IV)(Hinds: Figure 5, paragraphs 15, 64-65, and 95)(The combination implements the fixed-point conversion instructions of Hinds as vector operations in Liu using the scratchpad memory for operand storage. The source data and location of the decimal point of the conversion instructions is read during the register read stage of Liu. This data is transmitted to the conversion logic for execution in the execution stage of Liu.); and 
the conversion unit is configured to convert the first input data into a second input data (Liu: Figure 8, section IV)(Hinds: Figure 5, paragraphs 15, 64-65, and 95)(The combination implements the fixed-point conversion instructions of Hinds as vector operations in Liu using the scratchpad memory for operand storage. The fixed-point source data and location of the decimal point of the conversion instructions is transmitted to the conversion logic for execution in the execution stage of Liu, which produces a floating-point result.).
The advantage of implementing fixed-point data formats is that certain data sets can be stored more efficiently (Hinds: Paragraph 16). The advantage of implementing . 

Claims 2-5, 10-11, are rejected under 35 U.S.C. 103 as being unpatentable over Liu et al. (“Cambricon: An Instruction Set Architecture for Neural Networks” 2016 IEEE), in view of Hinds (U.S. 2007/0220076), further in view of Lutz et al. (2016/0124710).
As per claim 2:
Liu and Hinds disclosed the computation device of claim 1.
Liu and Hinds failed to teach wherein obtaining the one or more operation instructions by the controller unit includes: obtaining, by the controller unit, a computation instruction, and parsing the computation instruction to obtain the one or more operation instructions.
However, Lutz combined with Liu and Hinds disclosed wherein obtaining the one or more operation instructions by the controller unit includes: 
obtaining, by the controller unit, a computation instruction, and parsing the computation instruction to obtain the one or more operation instructions (Lutz: Figure 13 element 60, paragraphs 241 and 366-370)(Liu: Figure 8, Section IV)(Hinds: Figure 5, paragraphs 15, 64-65, and 95)(Lutz disclosed receiving program instructions (e.g. 
The advantage of implementing macro-instructions is that program code can be condensed for storage, which reduces storage costs and increases fetching efficiency. Thus, it would have been obvious to one of ordinary skill in the art to implement the micro-operation generating circuitry of Lutz in Liu to produce micro-operations for execution from stored macro-instructions for the above advantages.
As per claim 3:
Liu, Hinds, and Lutz disclosed the computation device of claim 2, wherein the opcode field of the fixed-point format operation instruction further includes the length of the first input data, and the controller unit is further configured to parse the operation instruction to obtain the length of the first input data (Liu: Figures 2 and 4, section II, table 1)(Hinds: Paragraphs 15, 64-65, and 95)(The combination implements the fixed-point conversion instructions of Hinds as vector operations in Liu using the scratchpad memory for operand storage. The vector operations of Liu include vector size fields indicating the total size/width (i.e. length) of the source vector.); and 
wherein obtaining the first input data from the storage unit according to the first address of the first input data by the controller unit includes: 
obtaining, by the controller unit, the first input data from the storage unit according to the first input data and the length of the first input data (Liu: Figures 8, Sections IV)(Hinds: Paragraphs 15, 64-65, and 95)(The combination 
As per claim 4:
Liu, Hinds, and Lutz disclosed the computation device of claim 3, wherein the computation device is configured to execute a machine learning computation (Liu: Section I), and further includes an operation unit, and wherein: 
the controller unit is further configured to transmit the one or more operation instructions to the operation unit (Liu: Figure 8, Section IV, table 1)(The decoder reads upon the controller unit and transmits instructions to the execution circuitry in the execution stage.),
the conversion unit is further configured to transmit the second input data to the operation unit (Liu: Figure 8, section IV)(Hinds: Figure 5, paragraphs 15, 64-65, and 95)(The combination implements the fixed-point conversion instructions of Hinds as vector operations in Liu using the scratchpad memory for operand storage. The fixed-point source data and location of the decimal point of the conversion instructions is transmitted to the conversion logic for execution in the execution stage of Liu, which produces a floating-point result. Conversion execution results are further processed by 
the operation unit is configured to operate the second input data according to the one or more operation instructions to obtain a computation result of the computation instruction, and store the computation result into a storage space corresponding to the first address of the output data in the storage unit (Liu: Figure 8, section IV)(Hinds: Figure 5, paragraphs 15, 64-65, and 95)(The combination implements the fixed-point conversion instructions of Hinds as vector operations in Liu using the scratchpad memory for operand storage. Conversion execution results are stored in the scratchpad memory. Subsequent dependent instructions can read the conversion results for further processing and store the result in the scratchpad memory.).
As per claim 5:
Liu, Hinds, and Lutz disclosed the computation device of claim 4, wherein: 
the machine learning computation includes an artificial neural network operation, the first input data includes an input neuron and a weight, and the computation result is an output neuron (Liu: Figure 8, section III, table 1)(Liu disclosed  matrix instructions performing neural network operations with input neurons and weights, which produces output neuron results.).
As per claim 10:
Liu, Hinds, and Lutz disclosed the computation device of claim 4, wherein when the first input data is fixed-point data, the operation unit further includes: 
a derivation unit configured to derive a decimal point position of one or more intermediate results according to the decimal point position of the first input data, wherein the one or more intermediate results are obtained by computing according to the first 
As per claim 11:
Liu, Hinds, and Lutz disclosed the computation device of claim 10, wherein the operation unit further includes: 
a data cache unit configured to cache the one or more intermediate results (Liu: Figure 8, section IV)(The L1 cache is accessed by vector load/store operations.).

Claims 6-7 and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Liu et al. (“Cambricon: An Instruction Set Architecture for Neural Networks” 2016 IEEE), in view of Hinds (U.S. 2007/0220076), in view of Lutz et al. (2016/0124710), further in view of Barman et al. (U.S. 8,924,455).
As per claim 6:
Liu, Hinds, and Lutz disclosed the computation device of claim 4.
Liu, Hinds, and Lutz disclosed the failed to teach wherein the operation unit includes a primary processing circuit and a plurality of secondary processing circuits, and wherein: the primary processing circuit is configured to perform pre-processing on the second input data and to transmit data and the plurality of operation instructions between the plurality of secondary processing circuits and the primary processing circuit, the plurality of secondary processing circuits is configured to perform an intermediate 
However, Barman combined with Liu, Hinds, and Lutz disclosed wherein the operation unit includes a primary processing circuit and a plurality of secondary processing circuits, and wherein: 
the primary processing circuit is configured to perform pre-processing on the second input data and to transmit data and the plurality of operation instructions between the plurality of secondary processing circuits and the primary processing circuit (Barman: Figure 15 element 1502-1506 and 1520-1522, column 1 lines 27-54 and column 8 lines 42-61)(Liu: Figure 8, section IV, table 1)(Liu disclosed complex matrix operations and a generic matrix functional unit. Barman disclosed specific pre-processing, systolic array elements, and post-processing circuits that make up a multiplication unit for matrix multiply-accumulate operations. The combination implements the multiplication unit of Barman into the matrix-functional unit of Liu. The pre-processing circuitry prepares matrix data and transmits it to the systolic array for execution of a given matrix instruction.), 
the plurality of secondary processing circuits is configured to perform an intermediate operation to obtain a plurality of intermediate results according to the second input data and the plurality of operation instructions transmitted from the primary processing circuit, and to transmit the plurality of intermediate results to the primary 
the primary processing circuit is further configured to perform post-processing on the plurality of intermediate results to obtain the computation result of the computation instruction (Barman: Figure 15 element 1502-1506 and 1520-1522, column 1 lines 27-54 and column 8 lines 42-61)(Liu: Figure 8, section IV, table 1)(Liu disclosed complex matrix operations and a generic matrix functional unit. Barman disclosed specific pre-processing, systolic array elements, and post-processing circuits that make up a multiplication unit for matrix multiply-accumulate operations. The combination implements the multiplication unit of Barman into the matrix-functional unit of Liu. The post-processing circuitry receives systolic array outputs and generates a final matrix result of the matrix multiply-accumulate operation.).
Liu disclosed multiple matrix operations and a generic matrix functional unit, but doesn’t show a detailed implementation of how the matrix instructions are executed using specific processing circuitry. One of ordinary skill in the art would have been motivated by this lack of teaching to find the Barman reference that shows specific 
As per claim 7:
Liu, Hinds, Lutz, and Barman disclosed the computation device of claim 6, further comprising a direct memory access (DMA) unit, wherein: 
the storage unit includes any combination of a register and a cache (Liu: Figure 8, section II, table 1), 
the cache includes a scratch pad cache and is configured to store the first input data (Liu: Figure 8, section II, table 1)(Vector operands are stored in an on-chip scratchpad memory.), and 
the register is configured to store scalar data in the first input data (Liu: Figure 8, section II, table 1)(Scalar operands are stored in general-purpose registers.), and 
the DMA unit is configured to read data from the storage unit or store data in the storage unit (Liu: Figure 8, section IV).
As per claim 14:
Liu, Hinds, and Lutz disclosed the computation device of claim 4.
Liu, Hinds, and Lutz failed to teach wherein the plurality of secondary processing circuits is distributed in an array, wherein: each secondary processing circuit is coupled with adjacent other secondary processing circuits, and the primary processing circuit is coupled with K secondary processing circuits of the plurality of secondary processing circuits, the K secondary processing circuits include n secondary processing circuits in 
However, Barman combined with Liu, Hinds, and Lutz disclosed wherein the plurality of secondary processing circuits is distributed in an array (Barman: Figure 15 element 1502-1506 and 1520-1522, column 1 lines 55-67 continued to column 2 lines 1-11 and column 8 lines 42-61)(Liu: Figure 8, section IV, table 1)(Liu disclosed complex matrix operations and a generic matrix functional unit. Barman disclosed specific pre-processing, systolic array elements, and post-processing circuits that make up a 
each secondary processing circuit is coupled with adjacent other secondary processing circuits, and the primary processing circuit is coupled with K secondary processing circuits of the plurality of secondary processing circuits (Barman: Figure 15 element 1502-1506 and 1520-1522, column 1 lines 55-67 continued to column 2 lines 1-11 and column 8 lines 42-61)(Liu: Figure 8, section IV, table 1)(The combination implements the multiplication unit of Barman into the matrix-functional unit of Liu. The systolic array elements (i.e. secondary processing circuits) is a distributed array of processing circuitry. Each systolic array portion is coupled with another systolic array portion, as well as coupled to the pre-processing and post-processing circuitry.),
the K secondary processing circuits include n secondary processing circuits in the first row, n secondary processing circuits in the mth row, and m secondary processing circuits in the first column, and the K secondary processing circuits are configured to forward data and instructions transmitted between the primary processing circuit and the plurality of secondary processing circuits (Barman: Figure 15 element 1502-1506 and 1520-1522, column 1 lines 55-67 continued to column 2 lines 1-11 and column 8 lines 42-61)(Liu: Figure 8, section IV, table 1)(The combination implements the multiplication unit of Barman into the matrix-functional unit of Liu. The systolic array elements (i.e. secondary processing 
and wherein the primary processing circuit is further configured to: 
determine that the input neurons are broadcast data, the weights are distribution data (Barman: Figures 6 and 15 element 1502-1506 and 1520-1522, column 1 lines 27-54 and column 8 lines 42-61)(Liu: Figure 8, sections III and IV, table 1)(The combination implements the multiplication unit of Barman into the matrix-functional unit of Liu. Liu disclosed matrix MAC operations using input neurons and weights to calculate output neurons. The prepared matrix data of input neurons and weights is broadcasted and distributed to different parts of the systolic array.),
divide the distribution data into a plurality of data blocks (Barman: Figures 6 and 15 element 1502-1506 and 1520-1522, column 1 lines 27-54 and column 8 lines 42-61)(Liu: Figure 8, section IV, table 1)(The combination implements the multiplication unit of Barman into the matrix-functional unit of Liu. The pre-processing circuitry prepares matrix data and transmits it to the systolic array for execution of a given matrix instruction. The prepared matrix data is divided into portions to output to different parts of the systolic array.), and 
transmit at least one of the plurality of data blocks and at least one of the plurality of operation instructions to the K secondary processing circuits (Barman: Figure 15 element 1502-1506 and 1520-1522, column 1 lines 27-54 and column 8 
the K secondary processing circuits are configured to convert the data transmitted between the primary processing circuit and the plurality of secondary processing circuits (Barman: Figure 15 element 1502-1506 and 1520-1522, column 1 lines 55-67 continued to column 2 lines 1-11 and column 8 lines 42-61)(Liu: Figure 8, section IV, table 1)(The combination implements the multiplication unit of Barman into the matrix-functional unit of Liu. The systolic array elements (i.e. secondary processing circuits) execute the matrix multiply-accumulate instruction and produce intermediate results that are sent to the post-processing circuitry (i.e. convert the data transmitted).),
the plurality of secondary processing circuits is configured to perform operations on the data blocks according to the plurality of operation instructions to obtain a plurality of intermediate results, and to transmit the plurality of intermediate results to the K secondary processing circuits (Barman: Figure 15 element 1502-1506 and 1520-1522, column 1 lines 55-67 continued to column 2 lines 1-11 and column 8 lines 42-61)(Liu: Figure 8, section IV, table 1)(The combination implements the multiplication unit of Barman into the matrix-functional unit of Liu. The systolic array elements (i.e. secondary processing circuits) execute the matrix multiply-accumulate instruction and produce intermediate results that are sent to the post-processing circuitry.), and 
the primary processing circuit is configured to process the plurality of intermediate results received from the K secondary processing circuits to obtain the computation 
Liu disclosed multiple matrix operations and a generic matrix functional unit, but doesn’t show a detailed implementation of how the matrix instructions are executed using specific processing circuitry. One of ordinary skill in the art would have been motivated by this lack of teaching to find the Barman reference that shows specific circuitry for executing matrix multiply-accumulation operations. Thus, it would have been obvious to one of ordinary skill in the art to implement the matrix multiply-accumulate circuitry of Barman into Liu for the advantage of showing specific circuitry to execute the instructions of the instruction set of Liu.

Claims 8, 12-13, and 15-16 are rejected under 35 U.S.C. 103 as being unpatentable over Liu et al. (“Cambricon: An Instruction Set Architecture for Neural Networks” 2016 IEEE), in view of Hinds (U.S. 2007/0220076), in view of Lutz et al. (2016/0124710), in view of Barman et al. (U.S. 8,924,455), further in view of Official Notice.
As per claim 8:
Liu, Hinds, Lutz, and Barman disclosed the computation device of claim 6, wherein the controller unit includes an instruction cache unit, an instruction processing unit, and a storage queue unit, wherein:
the instruction cache unit is configured to store the computation instruction associated with the artificial neural network operation (Liu: Figure 8, section IV, table 1)(Official notice is given that instructions can be stored in instruction caches for the advantage of faster fetching times. Thus, it would have been obvious to one of ordinary skill in the art to implement an instruction cache to store the various Cambricon instructions of a program under execution.),  
the instruction processing unit is configured to parse the computation instruction to obtain the data conversion instruction and the plurality of operation instructions, and to parse the data conversion instruction to obtain the opcode and the opcode field of the data conversion instruction (Lutz: Figure 13 element 60, paragraphs 241 and 366-370)(Liu: Figure 8, Section IV)(Hinds: Figure 5, paragraphs 15, 64-65, 82, and 95)(Lutz disclosed receiving program instructions (e.g. macro-instructions) and generating a plurality of micro-instructions from them. The combination implements macro-instructions into the processing system of Liu. The decoder and micro-operation generating circuitry produce decoded micro-operation conversion operations. The conversion instructions include opcodes, source, destination, and decimal point fields that are decoded.), and 
the storage queue unit is configured to store an instruction queue, the instruction queue including a plurality of operation instructions or computation instructions, wherein the plurality of operation instructions or computation instructions is to be executed in a 
As per claim 12:
Liu, Hinds, and Lutz disclosed the computation device of claim 4.
Liu, Hinds, and Lutz failed to teach wherein the operation unit includes a tree module, wherein: the tree module includes a root port coupled with the primary processing circuit and a plurality of branch ports coupled with the plurality of secondary processing circuits, and the tree module is configured to forward data and the plurality of operation instructions transmitted between the primary processing circuit and the plurality of secondary processing circuits, wherein the tree module is an n-tree structure, the n being an integer greater than or equal to two.
However, Barman combined with Liu, Hinds, and Lutz disclosed wherein the operation unit includes a tree module, wherein: 
the tree module includes a root port coupled with the primary processing circuit and a plurality of branch ports coupled with the plurality of secondary processing circuits (Barman: Figure 15 element 1502-1506 and 1520-1522, column 1 lines 27-54 and column 8 lines 42-61)(Liu: Figure 8, section IV, table 1)(Liu disclosed complex matrix operations and a generic matrix functional unit. Barman disclosed specific pre-processing, systolic array elements, and post-processing circuits that make up a multiplication unit for matrix multiply-accumulate operations. The combination implements the multiplication unit of Barman into the matrix-functional unit of Liu. Official notice is given that processor arrays can be implemented using tree topologies for the advantage of reducing connection costs between processing elements. Thus, it would 
the tree module is configured to forward data and the plurality of operation instructions transmitted between the primary processing circuit and the plurality of secondary processing circuits, wherein the tree module is an n-tree structure, the n being an integer greater than or equal to two (Barman: Figure 15 element 1502-1506 and 1520-1522, column 1 lines 27-54 and column 8 lines 42-61)(Liu: Figure 8, section IV, table 1)(The combination implements the multiplication unit of Barman into the matrix-functional unit of Liu. In view of the above official notice, data output to the systolic array from the pre-processing circuitry filters through multiple nodes in a tree structure.).
Liu disclosed multiple matrix operations and a generic matrix functional unit, but doesn’t show a detailed implementation of how the matrix instructions are executed using specific processing circuitry. One of ordinary skill in the art would have been motivated by this lack of teaching to find the Barman reference that shows specific circuitry for executing matrix multiply-accumulation operations. Thus, it would have been obvious to one of ordinary skill in the art to implement the matrix multiply-accumulate circuitry of Barman into Liu for the advantage of showing specific circuitry to execute the instructions of the instruction set of Liu.
As per claim 13:
Liu, Hinds, and Lutz disclosed the computation device of claim 4.
Liu, Hinds, and Lutz failed to teach wherein the operation unit further includes a 
However, Barman combined with Liu, Hinds, and Lutz disclosed wherein the operation unit further includes a branch processing circuit, wherein: 
the primary processing circuit is configured to: 
determine that the input neurons are broadcast data and the weights are distribution data (Barman: Figures 6 and 15 element 1502-1506 and 1520-1522, column 1 lines 27-54 and column 8 lines 42-61)(Liu: Figure 8, sections III and IV, table 1)(The combination implements the multiplication unit of Barman into the matrix-functional unit of Liu. Liu disclosed matrix MAC operations using input 
divide the distribution data into a plurality of data blocks (Barman: Figures 6 and 15 element 1502-1506 and 1520-1522, column 1 lines 27-54 and column 8 lines 42-61)(Liu: Figure 8, section IV, table 1)(The combination implements the multiplication unit of Barman into the matrix-functional unit of Liu. The pre-processing circuitry prepares matrix data and transmits it to the systolic array for execution of a given matrix instruction. The prepared matrix data is divided into portions to output to different parts of the systolic array.), and
transmit at least one of the plurality of data blocks, the broadcast data, and at least one of the plurality of operation instructions to the branch processing circuits (Barman: Figure 15 element 1502-1506 and 1520-1522, column 1 lines 27-54 and column 8 lines 42-61)(Liu: Figure 8, section IV, table 1)(The combination implements the multiplication unit of Barman into the matrix-functional unit of Liu. The pre-processing circuitry prepares matrix data and transmits it to the systolic array for execution of a given matrix instruction. Official notice is given that processor arrays can be implemented using tree topologies for the advantage of reducing connection costs between processing elements. Thus, it would have been obvious to one of ordinary skill in the art to implement the pre-processing, systolic arrays, and post-processing in Liu using a tree topology. In view of the official notice, the data output from the pre-processing circuitry is sent to the node connections (i.e. branch processing circuitry).),

the plurality of secondary processing circuits is configured to perform operations on the data blocks received and the broadcast data received according to the plurality of operation instructions to obtain a plurality of intermediate results, and to transmit the plurality of intermediate results to the plurality of branch processing circuits (Barman: Figure 15 element 1502-1506 and 1520-1522, column 1 lines 55-67 continued to column 2 lines 1-11 and column 8 lines 42-61)(Liu: Figure 8, section IV, table 1)(The combination implements the multiplication unit of Barman into the matrix-functional unit of Liu. The systolic array elements (i.e. secondary processing circuits) execute the matrix multiply-accumulate instruction and produce intermediate results that are sent to the post-processing circuitry.), and 
the primary processing circuit is further configured to perform post-processing on the plurality of intermediate results received from the branch processing circuits to obtain a computation result of the computation instruction, and to send the computation result of the computation instruction to the controller unit (Barman: Figure 15 element 1502-1506 
Liu disclosed multiple matrix operations and a generic matrix functional unit, but doesn’t show a detailed implementation of how the matrix instructions are executed using specific processing circuitry. One of ordinary skill in the art would have been motivated by this lack of teaching to find the Barman reference that shows specific circuitry for executing matrix multiply-accumulation operations. Thus, it would have been obvious to one of ordinary skill in the art to implement the matrix multiply-accumulate circuitry of Barman into Liu for the advantage of showing specific circuitry to execute the instructions of the instruction set of Liu.
As per claim 15:
Liu, Hinds, Lutz, and Barman disclosed the computation device of claim 12, wherein: 
the primary processing circuit is configured to perform a combined ranking processing on the plurality of intermediate results received from the plurality of processing circuits to obtain a computation result of the computation instruction (Barman: Figure 15 element 1502-1506 and 1520-1522, column 1 lines 27-54 and column 8 lines 42-61)(Liu: Figure 8, section IV, table 1)(Liu disclosed complex matrix operations and a 
the primary processing circuit is configured to a combined ranking processing and an activation processing on the plurality of intermediate results received from the plurality of processing circuits to obtain the computation result of the computation instruction.
As per claim 16:
Liu, Hinds, Lutz, and Barman disclosed the computation device of claim 12, wherein the primary processing circuit includes one or any combination of an activation processing circuit and an addition processing circuit, wherein: 
the activation processing circuit is configured to perform an activation operation on data in the primary processing circuit (Barman: Figure 15 element 1502-1506 and 1520-1522, column 1 lines 27-54 and column 8 lines 42-61)(Liu: Figure 8, section IV, table 1)(Liu disclosed complex matrix operations and a generic matrix functional unit. Barman disclosed specific pre-processing, systolic array elements, and post-processing circuits that make up a multiplication unit for matrix multiply-accumulate operations. The combination implements the multiplication unit of Barman into the matrix-functional unit of Liu. The pre-processing circuitry prepares matrix data (i.e. activation operation) and transmits it to the systolic array for execution of a given matrix instruction.), and 

the plurality of secondary processing circuit includes: 
a multiplication processing circuit configured to perform a multiplication operation on the data blocks received to obtain a product result (Barman: Figures 2, 4, and 15 element 1508, column 3 lines 58-67 continued to column 4 lines 1-3 and column 8 lines 42-61)(Liu: Figure 8, section IV, table 1)(The combination implements the multiplication unit of Barman into the matrix-functional unit of Liu. The systolic array elements execute the matrix multiply-accumulate instruction using multiply-accumulate circuitry.), and 
an accumulation processing circuit configured to perform an accumulation operation on the product results to obtain the plurality of intermediate results (Barman: Figures 2, 4, and 15 element 1508, column 3 lines 58-67 continued to column 4 lines 1-3 and column 8 lines 42-61)(Liu: Figure 8, section IV, table 1)(The combination implements the multiplication unit of Barman into the matrix-functional unit of Liu. The systolic array elements execute the matrix multiply-accumulate instruction using multiply-accumulate circuitry).

Claim 9 is rejected under 35 U.S.C. 103 as being unpatentable over Liu et al. (“Cambricon: An Instruction Set Architecture for Neural Networks” 2016 IEEE), in view of Hinds (U.S. 2007/0220076), in view of Lutz et al. (2016/0124710), in view of Barman et al. (U.S. 8,924,455), in view of Official Notice, further in view of Leibholz (U.S. 2002/0138714).
As per claim 9:

Liu, Hinds, Lutz, and Barman failed to teach wherein the controller unit further includes: a dependency relationship processing unit configured to: determine whether there exists an associated relationship between a first operation instruction and a zeroth operation instruction before the first operation instruction, cache the first operation instruction in the instruction cache unit based on a determination that there exists an associated relationship between the first operation instruction and the zeroth operation instruction, and extract the first operation instruction from the instruction cache unit to the operation unit, when an execution of the zeroth operation instruction is completed, wherein determining whether there exists an associated relationship between a first operation instruction and a zeroth operation instruction before the first operation instruction by the dependency relationship processing unit includes: extracting a first storage address interval of data required in the first operation instruction according to the first operation instruction, extracting a zeroth storage address interval of data required in the zeroth operation instruction according to the zeroth operation instruction, determining that there exists an associated relationship between the first operation instruction and the zeroth operation instruction, when an overlapped region exists between the first storage address interval and the zeroth storage address interval, and determining that there does not exist an associated relationship between the first operation instruction and the zeroth operation instruction when no overlapped region exists between the first storage address interval and the zeroth storage address interval.
However, Leibholz combined with Liu, Hinds, Lutz, and Barman wherein the controller unit further includes: 

determine whether there exists an associated relationship between a first operation instruction and a zeroth operation instruction before the first operation instruction (Leibholz: Figure 1 element 105, paragraphs 33 and 40)(Liu: Figure 8)(Leibholz disclosed a dependency checker to detect dependent operations. The combination adds the dependency checker to the accelerator of Liu.), 
cache the first operation instruction in the instruction cache unit based on a determination that there exists an associated relationship between the first operation instruction and the zeroth operation instruction (Leibholz: Figure 1 element 105, paragraphs 33 and 40)(Liu: Figure 8)(Leibholz disclosed a dependency checker to detect dependent operations. The combination adds the dependency checker to the accelerator of Liu. Official notice is given that dependent instructions can be fetched into an instruction cache multiple times through program execution for the advantage of faster fetching times. Thus, it would have been obvious to one of ordinary skill in the art that detected instructions with dependencies are later added to an instruction cache.), and 
extract the first operation instruction from the instruction cache unit to the operation unit, when an execution of the zeroth operation instruction is completed (Leibholz: Figure 1 element 105, paragraphs 33 and 40)(Liu: Figure 8, sections II and IV)(Liu disclosed using an on-chip scratchpad in place of a vector register file. Leibholz disclose a dependency checker that compares source and destination registers to detect dependencies. The combination implements a dependency check in Liu by comparing vector input and vector output addresses of the 
wherein determining whether there exists an associated relationship between a first operation instruction and a zeroth operation instruction before the first operation instruction by the dependency relationship processing unit includes: 
extracting a first storage address interval of data required in the first operation instruction according to the first operation instruction (Leibholz: Figure 1 element 105, paragraphs 33 and 40)(Liu: Figure 8, sections II and IV)(Liu disclosed using an on-chip scratchpad in place of a vector register file. Leibholz disclose a dependency checker that compares source and destination registers to detect dependencies. The combination implements a dependency check in Liu by comparing vector input and vector output addresses of the scratchpad.), 
extracting a zeroth storage address interval of data required in the zeroth operation instruction according to the zeroth operation instruction (Leibholz: Figure 1 element 105, paragraphs 33 and 40)(Liu: Figure 8, sections II and IV)(Liu disclosed using an on-chip scratchpad in place of a vector register file. Leibholz disclose a dependency checker that compares source and destination registers to detect dependencies. The combination implements a dependency check in Liu by comparing vector input and vector output addresses of the scratchpad.),
 determining that there exists an associated relationship between the first operation instruction and the zeroth operation instruction, when an overlapped region exists between the first storage address interval and the zeroth storage address interval (Leibholz: Figure 1 element 105, paragraphs 33 and 40)(Liu: 
determining that there does not exist an associated relationship between the first operation instruction and the zeroth operation instruction when no overlapped region exists between the first storage address interval and the zeroth storage address interval (Leibholz: Figure 1 element 105, paragraphs 33 and 40)(Liu: Figure 8, sections II and IV)(Liu disclosed using an on-chip scratchpad in place of a vector register file. Leibholz disclose a dependency checker that compares source and destination registers to detect dependencies. The combination implements a dependency check in Liu by comparing vector input and vector output addresses of the scratchpad. A lack of a match or overlap of the compared addresses determines a dependency doesn’t exist.).
The advantage of dependency checking is that instructions dependent upon certain data are stalled so that the correct data is accessed for execution. Thus, it would have been obvious to one of ordinary skill in the art at the time of the effective filing date to implement the dependency checker of Leibholz into the accelerator of Liu to ensure correct program execution for dependent operations.

Claim 18 is rejected under 35 U.S.C. 103 as being unpatentable over Liu et al. .
As per claim 18:
Claim 18 essentially recites the same limitations of claim 1. Claim 18 additionally recites the following limitations:
wherein the second input data is fixed-point data (Liu: Figure 8, section IV)(Hinds: Figure 7, paragraphs 15, 64-65, and 113-115)(The combination implements the floating-point conversion instructions of Hinds as vector operations in Liu using the scratchpad memory for operand storage. Official notice is given that vector compress and expand instructions can be used to enlarge or shrink data element sizes for the advantage of quickly transforming data elements to needed sizes. Thus, it would have been obvious to one of ordinary skill in the art to implement vector compress and expand instructions in Liu using fixed-point elements as in Hinds.).

Claims 17 and 19-24 are rejected under 35 U.S.C. 103 as being unpatentable over Liu et al. (“Cambricon: An Instruction Set Architecture for Neural Networks” 2016 IEEE), in view of Hinds (U.S. 2007/0220076), in view of Official Notice, further in view of Lutz et al. (2016/0124710).
As per claim 17:
Claim 17 essentially recites the same limitations of claim 4. Claim 17 additionally recites the following limitations:
when the machine learning operation device includes a plurality of the computation devices, the plurality of computation devices is configured to couple and 
the plurality of computation devices is configured to: 
interconnect and transmit data through a fast external device interconnection PCIE (peripheral component interface express) bus to support larger-scale machine learning computations (Liu: Figure 8, section IV)(Official notice is given that processing nodes can be connected together by PCIE buses for the advantage of faster data connections. Thus, it would have been obvious to one of ordinary skill in the art to implement PCIE buses to connect together a plurality of accelerator nodes.), 
share the same one control system or have respective control systems (Liu: Figure 8, section IV)(In view of the above official notice, each accelerator has its own control system.), 
share the same one memory or have respective memories (Liu: Figure 8, section IV)(In view of the above official notice, each accelerator has its own memory.), and 
deploy an interconnection manner of any arbitrary interconnection topology (Liu: Figure 8, section IV)(In view of the above official notice, each accelerator is connected with other accelerators via I/O interfaces.).
As per claim 19:
The additional limitation(s) of claim 19 basically recite the additional limitation(s) of claim 2. Therefore, claim 19 is rejected for the same reason(s) as claim 2.
As per claim 20:
The additional limitation(s) of claim 20 basically recite the additional limitation(s) of claim 3. Therefore, claim 20 is rejected for the same reason(s) as claim 3.
As per claim 21:
The additional limitation(s) of claim 21 basically recite the additional limitation(s) of claim 4. Therefore, claim 21 is rejected for the same reason(s) as claim 4.
As per claim 22:
The additional limitation(s) of claim 22 basically recite the additional limitation(s) of claim 5. Therefore, claim 22 is rejected for the same reason(s) as claim 5.
As per claim 23:
Liu, Hinds, and Lutz disclosed the method of claim 20, wherein when the first input data and the second input data are both fixed-point data, the decimal point position of the first input data is inconsistent with that of the second input data (Liu: Figure 8, section IV)(Hinds: Figure 7, paragraphs 15, 64-65, and 113-115)(The combination implements the floating-point conversion instructions of Hinds as vector operations in Liu using the scratchpad memory for operand storage. In view of the above official notice, both the source and destination data elements are fixed-point elements and the decimal location is inconsistent with the newly expanded/compressed locations.).
As per claim 24:
The additional limitation(s) of claim 24 basically recite the additional limitation(s) of claim 10. Therefore, claim 24 is rejected for the same reason(s) as claim 10.

Conclusion
The following is text cited from 37 CFR 1.111(c): In amending in reply to a rejection of claims in an application or patent under reexamination, the applicant or patent owner must clearly point out the patentable novelty which he or she thinks the claims present in view of the state of the art disclosed by the references cited or the objections made. The applicant or patent owner must also show how the amendments avoid such references or objections.
The prior art made of record and not relied upon is considered pertinent to applicant’s disclosure.  
Sudharsanan et al. (U.S. 6,671,796), taught instructions to convert fixed-point operands to floating-point operands.
Madduri et al. (U.S. 2019/0199370), taught instructions to convert floating-point operands to fixed-point operands.
Chen et al. (U.S. 2019/0122094), taught converting floating-point operands to fixed-point operands.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JACOB A. PETRANEK whose telephone number is (571)272-5988.  The examiner can normally be reached on M-F 8:00-4:30.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Aimee Li can be reached on (571) 272-4169.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/JACOB PETRANEK/Primary Examiner, Art Unit 2183