DETAILED ACTION
Claims 1-24 are pending.
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 3/21/2022 has been entered.
The office acknowledges the following papers:
Claims and remarks filed on 3/21/2022,
IDS filed on 10/14/2022.

Specification
The disclosure is objected to because of the following informalities:
The lengthy specification has not been checked to the extent necessary to determine the presence of all possible minor errors. The Applicant’s cooperation is requested in correcting any errors of which the Applicant may become aware.
The title of the invention is not descriptive. A new title is required that is clearly indicative of the invention to which the claims are directed. The following title is suggested: “Computing Device with a Conversion Unit to Convert data values between various sizes of Fixed-Point and Floating-Point data”.
Appropriate correction is required.

New Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.

Claim 1 is rejected under 35 U.S.C. 103 as being unpatentable over Liu et al. (“Cambricon: An Instruction Set Architecture for Neural Networks” 2016 IEEE), in view of Hinds (U.S. 2007/0220076).
As per claim 1:
Liu and Hinds disclosed a computation device, comprising: 
a storage unit (Liu: Section II)(The scratchpad memory reads upon the storage unit.), 
a controller unit (Liu: Figure 8, Section IV)(The decoder reads upon the controller unit.), and 
an operation unit (Liu: Figure 8, Section IV)(The scalar, vector, and matrix units each read upon the operation unit.); and
a conversion unit (Hinds: Figures 5 and 7, paragraphs 95 and 113-114)(Liu: Figure 8, Section IV)(Hinds disclosed circuitry to convert fixed-point data to floating-point data and vice-versa. The combination implements the conversion circuitry of Hinds into Liu.); wherein: 
the controller unit is configured to: 
obtain one or more operation instructions (Liu: Figure 8, Section IV)(Hinds: Figure 3B, paragraphs 64-81, table 1)(Liu disclosed a fetch stage to fetch instructions to be decoded. Hinds disclosed a plurality of instructions to convert different sizes of fixed-point data to different sizes of floating-point data and vice versa. The combination implements the instructions of Hinds into the processor of Liu.), wherein each of the operation instruction is a fixed-point format operation instruction, each fixed-point format operation instruction comprises an opcode and a opcode field (Hinds: Paragraphs 64-82, table 1)(Hinds disclosed scalar fixed-point conversion instructions that use source/destination fixed-point operands. Each fixed-point conversion instruction includes an opcode field specifying an opcode.), and the opcode field of each fixed-point format operation instruction comprises a first address of first input data, a first address of output data, and a decimal point position (Liu: Figures 2, 4, and 6, Sections II and III)(Hinds: Paragraphs 15, 64-65, and 95)(Hinds disclosed scalar fixed-point conversion instructions each including a source register field, a destination register field, an opcode field, and a value specifying the locations of the decimal point (i.e. decimal point position). Liu disclosed vector instructions referencing a scratchpad memory for source and destination operands. The combination implements the fixed-point conversion instructions of Hinds as vector operations in Liu using the scratchpad memory for operand storage.); 
parse the fixed-point format operation instruction to obtain the opcode, the first address of the first input data, the first address of the output data, and the decimal point position, and obtain the first input data from the storage unit according to the first address of the first input data (Liu: Figures 8, Sections IV)(Hinds: Paragraphs 15, 64-65, and 95)(The combination implements the fixed-point conversion instructions of Hinds as vector operations in Liu using the scratchpad memory for operand storage. The decode stage in Liu decodes (i.e. parses) the added fixed-point conversion instructions of Hinds. Decoding the fixed-point conversion instructions determines the source data address in the scratchpad memory, the destination data address in the scratchpad memory, the location of the decimal point, and the conversion variant specified by the specific/extended opcode portions. The source data of the conversion instructions is read during the register read stage of Liu.); 
transmit the first input data and the decimal point position to the conversion unit (Liu: Figure 8, section IV)(Hinds: Figure 5, paragraphs 15, 64-65, and 95)(The combination implements the fixed-point conversion instructions of Hinds as vector operations in Liu using the scratchpad memory for operand storage. The source data and location of the decimal point of the conversion instructions is read during the register read stage of Liu. This data is transmitted to the conversion logic for execution in the execution stage of Liu.);
wherein the conversion unit is configured to convert the first input data into a second input data based on the decimal point position (Liu: Figure 8, section IV)(Hinds: Figure 5, paragraphs 15, 64-65, and 95)(The combination implements the fixed-point conversion instructions of Hinds as vector operations in Liu using the scratchpad memory for operand storage. The fixed-point source data and location of the decimal point of the conversion instructions is transmitted to the conversion logic for execution in the execution stage of Liu, which produces a floating-point result.) and transmit the second input data to the operation unit  (Liu: Figure 8, section IV)(Hinds: Figure 5, paragraphs 15, 64-65, and 95)(The combination implements the fixed-point conversion instructions of Hinds as vector operations in Liu using the scratchpad memory for operand storage. The fixed-point source data and location of the decimal point of the conversion instructions is transmitted to the conversion logic for execution in the execution stage of Liu, which produces a floating-point result. Conversion execution results are further processed by instructions reading converted data via the scratchpad memory. The converted data is sent to an execution unit for processing.); and 
transmit the opcode and the second input data to the operation unit (Liu: Figure 8, section IV)(Hinds: Figure 5, paragraphs 15, 64-65, and 95)(The combination implements the fixed-point conversion instructions of Hinds as vector operations in Liu using the scratchpad memory for operand storage. The fixed-point source data and location of the decimal point of the conversion instructions is transmitted to the conversion logic for execution in the execution stage of Liu, which produces a floating-point result. Conversion execution results are further processed by subsequent dependent instructions reading converted data via the scratchpad memory. The converted data and function (i.e. opcode) are sent to an execution unit for processing.).
The advantage of implementing fixed-point data formats is that certain data sets can be stored more efficiently (Hinds: Paragraph 16). The advantage of implementing floating-point execution units is that fixed-point data formats can be executed more efficiently than on integer units (Hinds: Paragraph 16). The advantage of implementing scalar conversion operations as vector conversion operations is that more data can be processed in parallel, which improves performance. Thus, it would have been obvious to one of ordinary skill in the art at the time of the effective filing data to implement the scalar conversion instructions of Hinds as vector conversion instructions in Liu for the above advantages. 

Claims 2-5 and 10-11 are rejected under 35 U.S.C. 103 as being unpatentable over Liu et al. (“Cambricon: An Instruction Set Architecture for Neural Networks” 2016 IEEE), in view of Hinds (U.S. 2007/0220076), further in view of Lutz et al. (2016/0124710).
As per claim 2:
Liu and Hinds disclosed the computation device of claim 1.
Liu and Hinds failed to teach wherein obtaining the one or more operation instructions by the controller unit includes: obtaining, by the controller unit, a computation instruction, and parsing the computation instruction to obtain the one or more operation instructions.
However, Lutz combined with Liu and Hinds disclosed wherein obtaining the one or more operation instructions by the controller unit includes: 
obtaining, by the controller unit, a computation instruction, and parsing the computation instruction to obtain the one or more operation instructions (Lutz: Figure 13 element 60, paragraphs 241 and 366-370)(Liu: Figure 8, Section IV)(Hinds: Figure 5, paragraphs 15, 64-65, and 95)(Lutz disclosed receiving program instructions (e.g. macro-instructions) and generating a plurality of micro-instructions from them. The combination implements macro-instructions into the processing system of Liu. The decoder and micro-operation generating circuitry produce decoded micro-operation conversion operations.).
The advantage of implementing macro-instructions is that program code can be condensed for storage, which reduces storage costs and increases fetching efficiency. Thus, it would have been obvious to one of ordinary skill in the art at the time of the effective filing date to implement the micro-operation generating circuitry of Lutz in Liu to produce micro-operations for execution from stored macro-instructions for the above advantages.
As per claim 3:
Liu, Hinds, and Lutz disclosed the computation device of claim 2, wherein the opcode field of the fixed-point format operation instruction further includes the length of the first input data, and the controller unit is further configured to parse the fixed-point format operation instruction to obtain the length of the first input data (Liu: Figures 2 and 4, section II, table 1)(Hinds: Paragraphs 15, 64-65, and 95)(The combination implements the fixed-point conversion instructions of Hinds as vector operations in Liu using the scratchpad memory for operand storage. The vector operations of Liu include vector size fields indicating the total size/width (i.e. length) of the source vector.); and 
wherein obtaining the first input data from the storage unit according to the first address of the first input data by the controller unit includes: 
obtaining, by the controller unit, the first input data from the storage unit according to the first input data and the length of the first input data (Liu: Figures 8, Sections IV)(Hinds: Paragraphs 15, 64-65, and 95)(The combination implements the fixed-point conversion instructions of Hinds as vector operations in Liu using the scratchpad memory for operand storage. The vector operations of Liu include vector size fields indicating the total size/width (i.e. length) of the source vector. Decoding the fixed-point conversion instructions determines the source data address in the scratchpad memory, the destination data address in the scratchpad memory, and the location of the decimal point, as well as the vector size. The source data of the conversion instructions is read during the register read stage of Liu according to the vector input address and vector size.).
As per claim 4:
Liu, Hinds, and Lutz disclosed the computation device of claim 3, wherein the computation device is configured to execute a machine learning computation (Liu: Section I), and wherein: 
the operation unit is configured to operate the second input data according to the one or more operation instructions to obtain a computation result of the computation instruction, and store the computation result into a storage space corresponding to the first address of the output data in the storage unit (Liu: Figure 8, section IV)(Hinds: Figure 5, paragraphs 15, 64-65, and 95)(The combination implements the fixed-point conversion instructions of Hinds as vector operations in Liu using the scratchpad memory for operand storage. Conversion execution results are stored in the scratchpad memory. Subsequent dependent instructions can read the conversion results for further processing and store the result in the scratchpad memory.).
As per claim 5:
Liu, Hinds, and Lutz disclosed the computation device of claim 4, wherein: 
the machine learning computation includes an artificial neural network operation, the first input data includes an input neuron and a weight, and the computation result is an output neuron (Liu: Figure 8, section III, table 1)(Liu disclosed matrix instructions performing neural network operations with input neurons and weights, which produces output neuron results.).
As per claim 10:
Liu, Hinds, and Lutz disclosed the computation device of claim 4, wherein when the first input data is fixed-point data, the operation unit further includes: 
a derivation unit configured to derive a decimal point position of one or more intermediate results according to the decimal point position of the first input data, wherein the one or more intermediate results are obtained by computing according to the first input data (Liu: Table 1)(Hinds: Figures 5 and 8 elements 100 and 705, paragraphs 95 and 123)(The combination implements the fixed-point conversion instructions of Hinds as vector operations in Liu using the scratchpad memory for operand storage. Vector/matrix instructions referencing fixed-point vector data use the location of the decimal point for calculating fixed-point execution results.).
As per claim 11:
Liu, Hinds, and Lutz disclosed the computation device of claim 10, wherein the operation unit further includes: 
a data cache unit configured to cache the one or more intermediate results (Liu: Figure 8, section IV)(The L1 cache is accessed by vector load/store operations.).

Claims 6-7 and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Liu et al. (“Cambricon: An Instruction Set Architecture for Neural Networks” 2016 IEEE), in view of Hinds (U.S. 2007/0220076), in view of Lutz et al. (2016/0124710), further in view of Barman et al. (U.S. 8,924,455).
As per claim 6:
Liu, Hinds, and Lutz disclosed the computation device of claim 4.
Liu, Hinds, and Lutz disclosed the failed to teach wherein the operation unit includes a primary processing circuit and a plurality of secondary processing circuits, and wherein: the primary processing circuit is configured to perform pre-processing on the second input data and to transmit the second input data and the plurality of operation instructions between the plurality of secondary processing circuits and the primary processing circuit, the plurality of secondary processing circuits is configured to perform an intermediate operation to obtain a plurality of intermediate results according to the second input data and the plurality of operation instructions transmitted from the primary processing circuit, and to transmit the plurality of intermediate results to the primary processing circuit, and the primary processing circuit is further configured to perform post-processing on the plurality of intermediate results to obtain the computation result of the computation instruction.
However, Barman combined with Liu, Hinds, and Lutz disclosed wherein the operation unit includes a primary processing circuit and a plurality of secondary processing circuits, and wherein: 
the primary processing circuit is configured to perform pre-processing on the second input data and to transmit the second input data and the plurality of operation instructions between the plurality of secondary processing circuits and the primary processing circuit (Barman: Figure 15 element 1502-1506 and 1520-1522, column 1 lines 27-54 and column 8 lines 42-61)(Liu: Figure 8, section IV, table 1)(Liu disclosed complex matrix operations and a generic matrix functional unit. Barman disclosed specific pre-processing, systolic array elements, and post-processing circuits that make up a multiplication unit for matrix multiply-accumulate operations. The combination implements the multiplication unit of Barman into the matrix-functional unit of Liu. The pre-processing circuitry prepares matrix data and transmits it to the systolic array for execution of a given matrix instruction.), 
the plurality of secondary processing circuits is configured to perform an intermediate operation to obtain a plurality of intermediate results according to the second input data and the plurality of operation instructions transmitted from the primary processing circuit, and to transmit the plurality of intermediate results to the primary processing circuit (Barman: Figure 15 element 1502-1506 and 1520-1522, column 1 lines 55-67 continued to column 2 lines 1-11 and column 8 lines 42-61)(Liu: Figure 8, section IV, table 1)(Liu disclosed complex matrix operations and a generic matrix functional unit. Barman disclosed specific pre-processing, systolic array elements, and post-processing circuits that make up a multiplication unit for matrix multiply-accumulate operations. The combination implements the multiplication unit of Barman into the matrix-functional unit of Liu. The systolic array elements (i.e. secondary processing circuits) execute the matrix multiply-accumulate instruction and produce intermediate results send to the post-processing circuitry.), and 
the primary processing circuit is further configured to perform post-processing on the plurality of intermediate results to obtain the computation result of the computation instruction (Barman: Figure 15 element 1502-1506 and 1520-1522, column 1 lines 27-54 and column 8 lines 42-61)(Liu: Figure 8, section IV, table 1)(Liu disclosed complex matrix operations and a generic matrix functional unit. Barman disclosed specific pre-processing, systolic array elements, and post-processing circuits that make up a multiplication unit for matrix multiply-accumulate operations. The combination implements the multiplication unit of Barman into the matrix-functional unit of Liu. The post-processing circuitry receives systolic array outputs and generates a final matrix result of the matrix multiply-accumulate operation.).
Liu disclosed multiple matrix operations and a generic matrix functional unit, but doesn’t show a detailed implementation of how the matrix instructions are executed using specific processing circuitry. One of ordinary skill in the art would have been motivated by this lack of teaching to find the Barman reference that shows specific circuitry for executing matrix multiply-accumulation operations. Thus, it would have been obvious to one of ordinary skill in the art at the time of the effective filing date to implement the matrix multiply-accumulate circuitry of Barman into Liu for the advantage of showing specific circuitry to execute the instructions of the instruction set of Liu.
As per claim 7:
Liu, Hinds, Lutz, and Barman disclosed the computation device of claim 6, further comprising a direct memory access (DMA) unit, wherein: 
the storage unit includes any combination of a register and a cache (Liu: Figure 8, section II, table 1), 
the cache includes a scratch pad cache and is configured to store the first input data (Liu: Figure 8, section II, table 1)(Vector operands are stored in an on-chip scratchpad memory.), and 
the register is configured to store scalar data in the first input data (Liu: Figure 8, section II, table 1)(Scalar operands are stored in general-purpose registers.), and 
the DMA unit is configured to read data from the storage unit or store data in the storage unit (Liu: Figure 8, section IV).
As per claim 14:
Liu, Hinds, and Lutz disclosed the computation device of claim 4.
Liu, Hinds, and Lutz failed to teach wherein the plurality of secondary processing circuits is distributed in an array, wherein: each secondary processing circuit is coupled with adjacent other secondary processing circuits, and the primary processing circuit is coupled with K secondary processing circuits of the plurality of secondary processing circuits, the K secondary processing circuits include n secondary processing circuits in the first row, n secondary processing circuits in the mth row, and m secondary processing circuits in the first column, and the K secondary processing circuits are configured to forward data and instructions transmitted between the primary processing circuit and the plurality of secondary processing circuits, and wherein the primary processing circuit is further configured to: determine that the input neurons are broadcast data, the weights are distribution data, divide the distribution data into a plurality of data blocks, and transmit at least one of the plurality of data blocks and at least one of the plurality of operation instructions to the K secondary processing circuits, the K secondary processing circuits are configured to convert the data transmitted between the primary processing circuit and the plurality of secondary processing circuits, the plurality of secondary processing circuits is configured to perform operations on the data blocks according to the plurality of operation instructions to obtain a plurality of intermediate results, and to transmit the plurality of intermediate results to the K secondary processing circuits, and the primary processing circuit is configured to process the plurality of intermediate results received from the K secondary processing circuits to obtain the computation result of the computation instruction, and to send the computation result of the computation instruction to the controller unit.
However, Barman combined with Liu, Hinds, and Lutz disclosed wherein the plurality of secondary processing circuits is distributed in an array (Barman: Figure 15 element 1502-1506 and 1520-1522, column 1 lines 55-67 continued to column 2 lines 1-11 and column 8 lines 42-61)(Liu: Figure 8, section IV, table 1)(Liu disclosed complex matrix operations and a generic matrix functional unit. Barman disclosed specific pre-processing, systolic array elements, and post-processing circuits that make up a multiplication unit for matrix multiply-accumulate operations. The combination implements the multiplication unit of Barman into the matrix-functional unit of Liu. The systolic array elements (i.e. secondary processing circuits) is a distributed array of processing circuitry.), wherein: 
each secondary processing circuit is coupled with adjacent other secondary processing circuits, and the primary processing circuit is coupled with K secondary processing circuits of the plurality of secondary processing circuits (Barman: Figure 15 element 1502-1506 and 1520-1522, column 1 lines 55-67 continued to column 2 lines 1-11 and column 8 lines 42-61)(Liu: Figure 8, section IV, table 1)(The combination implements the multiplication unit of Barman into the matrix-functional unit of Liu. The systolic array elements (i.e. secondary processing circuits) is a distributed array of processing circuitry. Each systolic array portion is coupled with another systolic array portion, as well as coupled to the pre-processing and post-processing circuitry.),
the K secondary processing circuits include n secondary processing circuits in the first row, n secondary processing circuits in the mth row, and m secondary processing circuits in the first column, and the K secondary processing circuits are configured to forward data and instructions transmitted between the primary processing circuit and the plurality of secondary processing circuits (Barman: Figure 15 element 1502-1506 and 1520-1522, column 1 lines 55-67 continued to column 2 lines 1-11 and column 8 lines 42-61)(Liu: Figure 8, section IV, table 1)(The combination implements the multiplication unit of Barman into the matrix-functional unit of Liu. The systolic array elements (i.e. secondary processing circuits) is a distributed array of processing circuitry. Each systolic array portion includes rows and columns of processing circuitry. Data output from the pre-processing circuitry for processing is forwarded to the systolic array processing elements.), 
and wherein the primary processing circuit is further configured to: 
determine that the input neurons are broadcast data, the weights are distribution data (Barman: Figures 6 and 15 element 1502-1506 and 1520-1522, column 1 lines 27-54 and column 8 lines 42-61)(Liu: Figure 8, sections III and IV, table 1)(The combination implements the multiplication unit of Barman into the matrix-functional unit of Liu. Liu disclosed matrix MAC operations using input neurons and weights to calculate output neurons. The prepared matrix data of input neurons and weights is broadcasted and distributed to different parts of the systolic array.),
divide the distribution data into a plurality of data blocks (Barman: Figures 6 and 15 element 1502-1506 and 1520-1522, column 1 lines 27-54 and column 8 lines 42-61)(Liu: Figure 8, section IV, table 1)(The combination implements the multiplication unit of Barman into the matrix-functional unit of Liu. The pre-processing circuitry prepares matrix data and transmits it to the systolic array for execution of a given matrix instruction. The prepared matrix data is divided into portions to output to different parts of the systolic array.), and 
transmit at least one of the plurality of data blocks and at least one of the plurality of operation instructions to the K secondary processing circuits (Barman: Figure 15 element 1502-1506 and 1520-1522, column 1 lines 27-54 and column 8 lines 42-61)(Liu: Figure 8, section IV, table 1)(The combination implements the multiplication unit of Barman into the matrix-functional unit of Liu. The pre-processing circuitry prepares matrix data and transmits it to the systolic array for execution of a given matrix instruction.),
the K secondary processing circuits are configured to convert the data transmitted between the primary processing circuit and the plurality of secondary processing circuits (Barman: Figure 15 element 1502-1506 and 1520-1522, column 1 lines 55-67 continued to column 2 lines 1-11 and column 8 lines 42-61)(Liu: Figure 8, section IV, table 1)(The combination implements the multiplication unit of Barman into the matrix-functional unit of Liu. The systolic array elements (i.e. secondary processing circuits) execute the matrix multiply-accumulate instruction and produce intermediate results that are sent to the post-processing circuitry (i.e. convert the data transmitted).),
the plurality of secondary processing circuits is configured to perform operations on the data blocks according to the plurality of operation instructions to obtain a plurality of intermediate results, and to transmit the plurality of intermediate results to the K secondary processing circuits (Barman: Figure 15 element 1502-1506 and 1520-1522, column 1 lines 55-67 continued to column 2 lines 1-11 and column 8 lines 42-61)(Liu: Figure 8, section IV, table 1)(The combination implements the multiplication unit of Barman into the matrix-functional unit of Liu. The systolic array elements (i.e. secondary processing circuits) execute the matrix multiply-accumulate instruction and produce intermediate results that are sent to the post-processing circuitry.), and 
the primary processing circuit is configured to process the plurality of intermediate results received from the K secondary processing circuits to obtain the computation result of the computation instruction, and to send the computation result of the computation instruction to the controller unit (Barman: Figure 15 element 1502-1506 and 1520-1522, column 1 lines 27-54 and column 8 lines 42-61)(Liu: Figure 8, section IV, table 1)(Liu disclosed complex matrix operations and a generic matrix functional unit. Barman disclosed specific pre-processing, systolic array elements, and post-processing circuits that make up a multiplication unit for matrix multiply-accumulate operations. The combination implements the multiplication unit of Barman into the matrix-functional unit of Liu. The post-processing circuitry receives systolic array outputs and generates a final matrix result of the matrix multiply-accumulate operation.).
Liu disclosed multiple matrix operations and a generic matrix functional unit, but doesn’t show a detailed implementation of how the matrix instructions are executed using specific processing circuitry. One of ordinary skill in the art would have been motivated by this lack of teaching to find the Barman reference that shows specific circuitry for executing matrix multiply-accumulation operations. Thus, it would have been obvious to one of ordinary skill in the art at the time of the effective filing date to implement the matrix multiply-accumulate circuitry of Barman into Liu for the advantage of showing specific circuitry to execute the instructions of the instruction set of Liu.

Claims 8, 12-13, and 15-16 are rejected under 35 U.S.C. 103 as being unpatentable over Liu et al. (“Cambricon: An Instruction Set Architecture for Neural Networks” 2016 IEEE), in view of Hinds (U.S. 2007/0220076), in view of Lutz et al. (2016/0124710), in view of Barman et al. (U.S. 8,924,455), further in view of Official Notice.
As per claim 8:
Liu, Hinds, Lutz, and Barman disclosed the computation device of claim 6, wherein the controller unit includes an instruction cache unit, an instruction processing unit, and a storage queue unit, wherein:
the instruction cache unit is configured to store the computation instruction associated with the artificial neural network operation (Liu: Figure 8, section IV, table 1)(Official notice is given that instructions can be stored in instruction caches for the advantage of faster fetching times. Thus, it would have been obvious to one of ordinary skill in the art to implement an instruction cache to store the various Cambricon instructions of a program under execution.),  
the instruction processing unit is configured to parse the computation instruction to obtain the data conversion instruction and the plurality of operation instructions, and to parse the data conversion instruction to obtain the opcode and the opcode field of the data conversion instruction (Lutz: Figure 13 element 60, paragraphs 241 and 366-370)(Liu: Figure 8, Section IV)(Hinds: Figure 5, paragraphs 15, 64-65, 82, and 95)(Lutz disclosed receiving program instructions (e.g. macro-instructions) and generating a plurality of micro-instructions from them. The combination implements macro-instructions into the processing system of Liu. The decoder and micro-operation generating circuitry produce decoded micro-operation conversion operations. The conversion instructions include opcodes, source, destination, and decimal point fields that are decoded.), and 
the storage queue unit is configured to store an instruction queue, the instruction queue including a plurality of operation instructions or computation instructions, wherein the plurality of operation instructions or computation instructions is to be executed in a sequence (Liu: Figure 8, section IV)(Decoded instructions are stored in an issue queue prior to execution.).
As per claim 12:
Liu, Hinds, and Lutz disclosed the computation device of claim 4.
Liu, Hinds, and Lutz failed to teach wherein the operation unit includes a tree module, wherein: the tree module includes a root port coupled with the primary processing circuit and a plurality of branch ports coupled with the plurality of secondary processing circuits, and the tree module is configured to forward data and the plurality of operation instructions transmitted between the primary processing circuit and the plurality of secondary processing circuits, wherein the tree module is an n-tree structure, the n being an integer greater than or equal to two.
However, Barman combined with Liu, Hinds, and Lutz disclosed wherein the operation unit includes a tree module, wherein: 
the tree module includes a root port coupled with the primary processing circuit and a plurality of branch ports coupled with the plurality of secondary processing circuits (Barman: Figure 15 element 1502-1506 and 1520-1522, column 1 lines 27-54 and column 8 lines 42-61)(Liu: Figure 8, section IV, table 1)(Liu disclosed complex matrix operations and a generic matrix functional unit. Barman disclosed specific pre-processing, systolic array elements, and post-processing circuits that make up a multiplication unit for matrix multiply-accumulate operations. The combination implements the multiplication unit of Barman into the matrix-functional unit of Liu. Official notice is given that processor arrays can be implemented using tree topologies for the advantage of reducing connection costs between processing elements. Thus, it would have been obvious to one of ordinary skill in the art to implement the pre-processing, systolic arrays, and post-processing in Liu using a tree topology. The pre-processing circuitry reads upon the root and a plurality of node connections exist between the pre-processing/post-processing circuits and the systolic array processing elements.), and 
the tree module is configured to forward data and the plurality of operation instructions transmitted between the primary processing circuit and the plurality of secondary processing circuits, wherein the tree module is an n-tree structure, the n being an integer greater than or equal to two (Barman: Figure 15 element 1502-1506 and 1520-1522, column 1 lines 27-54 and column 8 lines 42-61)(Liu: Figure 8, section IV, table 1)(The combination implements the multiplication unit of Barman into the matrix-functional unit of Liu. In view of the above official notice, data output to the systolic array from the pre-processing circuitry filters through multiple nodes in a tree structure.).
Liu disclosed multiple matrix operations and a generic matrix functional unit, but doesn’t show a detailed implementation of how the matrix instructions are executed using specific processing circuitry. One of ordinary skill in the art would have been motivated by this lack of teaching to find the Barman reference that shows specific circuitry for executing matrix multiply-accumulation operations. Thus, it would have been obvious to one of ordinary skill in the art at the time of the effective filing date to implement the matrix multiply-accumulate circuitry of Barman into Liu for the advantage of showing specific circuitry to execute the instructions of the instruction set of Liu.
As per claim 13:
Liu, Hinds, and Lutz disclosed the computation device of claim 4.
Liu, Hinds, and Lutz failed to teach wherein the operation unit further includes a branch processing circuit, wherein: the primary processing circuit is configured to: determine that the input neurons are broadcast data and the weights are distribution data, divide the distribution data into a plurality of data blocks, and transmit at least one of the plurality of data blocks, the broadcast data, and at least one of the plurality of operation instructions to the branch processing circuits, the branch processing circuit is configured to forward the data blocks, the broadcast data, and the plurality of operation instructions transmitted between the primary processing circuit and the plurality of secondary processing circuits, the plurality of secondary processing circuits is configured to perform operations on the data blocks received and the broadcast data received according to the plurality of operation instructions to obtain a plurality of intermediate results, and to transmit the plurality of intermediate results to the plurality of branch processing circuits, and the primary processing circuit is further configured to perform post-processing on the plurality of intermediate results received from the branch processing circuits to obtain a computation result of the computation instruction, and to send the computation result of the computation instruction to the controller unit.
However, Barman combined with Liu, Hinds, and Lutz disclosed wherein the operation unit further includes a branch processing circuit, wherein: 
the primary processing circuit is configured to: 
determine that the input neurons are broadcast data and the weights are distribution data (Barman: Figures 6 and 15 element 1502-1506 and 1520-1522, column 1 lines 27-54 and column 8 lines 42-61)(Liu: Figure 8, sections III and IV, table 1)(The combination implements the multiplication unit of Barman into the matrix-functional unit of Liu. Liu disclosed matrix MAC operations using input neurons and weights to calculate output neurons. The prepared matrix data of input neurons and weights is broadcasted and distributed to different parts of the systolic array.),
divide the distribution data into a plurality of data blocks (Barman: Figures 6 and 15 element 1502-1506 and 1520-1522, column 1 lines 27-54 and column 8 lines 42-61)(Liu: Figure 8, section IV, table 1)(The combination implements the multiplication unit of Barman into the matrix-functional unit of Liu. The pre-processing circuitry prepares matrix data and transmits it to the systolic array for execution of a given matrix instruction. The prepared matrix data is divided into portions to output to different parts of the systolic array.), and
transmit at least one of the plurality of data blocks, the broadcast data, and at least one of the plurality of operation instructions to the branch processing circuits (Barman: Figure 15 element 1502-1506 and 1520-1522, column 1 lines 27-54 and column 8 lines 42-61)(Liu: Figure 8, section IV, table 1)(The combination implements the multiplication unit of Barman into the matrix-functional unit of Liu. The pre-processing circuitry prepares matrix data and transmits it to the systolic array for execution of a given matrix instruction. Official notice is given that processor arrays can be implemented using tree topologies for the advantage of reducing connection costs between processing elements. Thus, it would have been obvious to one of ordinary skill in the art to implement the pre-processing, systolic arrays, and post-processing in Liu using a tree topology. In view of the official notice, the data output from the pre-processing circuitry is sent to the node connections (i.e. branch processing circuitry).),
the branch processing circuit is configured to forward the data blocks, the broadcast data, and the plurality of operation instructions transmitted between the primary processing circuit and the plurality of secondary processing circuits (Barman: Figure 15 element 1502-1506 and 1520-1522, column 1 lines 27-54 and column 8 lines 42-61)(Liu: Figure 8, section IV, table 1)(The combination implements the multiplication unit of Barman into the matrix-functional unit of Liu. In view of the above official notice, a plurality of node connections (i.e. branch processing circuitry) exist between the pre-processing/post-processing circuits and the systolic array processing elements that receive and forward data.),
the plurality of secondary processing circuits is configured to perform operations on the data blocks received and the broadcast data received according to the plurality of operation instructions to obtain a plurality of intermediate results, and to transmit the plurality of intermediate results to the plurality of branch processing circuits (Barman: Figure 15 element 1502-1506 and 1520-1522, column 1 lines 55-67 continued to column 2 lines 1-11 and column 8 lines 42-61)(Liu: Figure 8, section IV, table 1)(The combination implements the multiplication unit of Barman into the matrix-functional unit of Liu. The systolic array elements (i.e. secondary processing circuits) execute the matrix multiply-accumulate instruction and produce intermediate results that are sent to the post-processing circuitry.), and 
the primary processing circuit is further configured to perform post-processing on the plurality of intermediate results received from the branch processing circuits to obtain a computation result of the computation instruction, and to send the computation result of the computation instruction to the controller unit (Barman: Figure 15 element 1502-1506 and 1520-1522, column 1 lines 27-54 and column 8 lines 42-61)(Liu: Figure 8, section IV, table 1)(Liu disclosed complex matrix operations and a generic matrix functional unit. Barman disclosed specific pre-processing, systolic array elements, and post-processing circuits that make up a multiplication unit for matrix multiply-accumulate operations. The combination implements the multiplication unit of Barman into the matrix-functional unit of Liu. The post-processing circuitry receives systolic array outputs and generates a final matrix result of the matrix multiply-accumulate operation.).
Liu disclosed multiple matrix operations and a generic matrix functional unit, but doesn’t show a detailed implementation of how the matrix instructions are executed using specific processing circuitry. One of ordinary skill in the art would have been motivated by this lack of teaching to find the Barman reference that shows specific circuitry for executing matrix multiply-accumulation operations. Thus, it would have been obvious to one of ordinary skill in the art at the time of the effective filing date to implement the matrix multiply-accumulate circuitry of Barman into Liu for the advantage of showing specific circuitry to execute the instructions of the instruction set of Liu.
As per claim 15:
Liu, Hinds, Lutz, and Barman disclosed the computation device of claim 12, wherein: 
the primary processing circuit is configured to perform a combined ranking processing on the plurality of intermediate results received from the plurality of processing circuits to obtain a computation result of the computation instruction (Barman: Figure 15 element 1502-1506 and 1520-1522, column 1 lines 27-54 and column 8 lines 42-61)(Liu: Figure 8, section IV, table 1)(Liu disclosed complex matrix operations and a generic matrix functional unit. Barman disclosed specific pre-processing, systolic array elements, and post-processing circuits that make up a multiplication unit for matrix multiply-accumulate operations. The combination implements the multiplication unit of Barman into the matrix-functional unit of Liu. The post-processing circuitry receives systolic array outputs and generates a final matrix result of the matrix multiply-accumulate operation. The received results are ranked based on column data being processed.), or 
the primary processing circuit is configured to a combined ranking processing and an activation processing on the plurality of intermediate results received from the plurality of processing circuits to obtain the computation result of the computation instruction.).
As per claim 16:
Liu, Hinds, Lutz, and Barman disclosed the computation device of claim 12, wherein the primary processing circuit includes one or any combination of an activation processing circuit and an addition processing circuit, wherein: 
the activation processing circuit is configured to perform an activation operation on data in the primary processing circuit (Barman: Figure 15 element 1502-1506 and 1520-1522, column 1 lines 27-54 and column 8 lines 42-61)(Liu: Figure 8, section IV, table 1)(Liu disclosed complex matrix operations and a generic matrix functional unit. Barman disclosed specific pre-processing, systolic array elements, and post-processing circuits that make up a multiplication unit for matrix multiply-accumulate operations. The combination implements the multiplication unit of Barman into the matrix-functional unit of Liu. The pre-processing circuitry prepares matrix data (i.e. activation operation) and transmits it to the systolic array for execution of a given matrix instruction.), and 
the addition processing circuit is configured to perform an addition operation or an accumulation operation, 
the plurality of secondary processing circuit includes: 
a multiplication processing circuit configured to perform a multiplication operation on the data blocks received to obtain a product result (Barman: Figures 2, 4, and 15 element 1508, column 3 lines 58-67 continued to column 4 lines 1-3 and column 8 lines 42-61)(Liu: Figure 8, section IV, table 1)(The combination implements the multiplication unit of Barman into the matrix-functional unit of Liu. The systolic array elements execute the matrix multiply-accumulate instruction using multiply-accumulate circuitry.), and 
an accumulation processing circuit configured to perform an accumulation operation on the product results to obtain the plurality of intermediate results (Barman: Figures 2, 4, and 15 element 1508, column 3 lines 58-67 continued to column 4 lines 1-3 and column 8 lines 42-61)(Liu: Figure 8, section IV, table 1)(The combination implements the multiplication unit of Barman into the matrix-functional unit of Liu. The systolic array elements execute the matrix multiply-accumulate instruction using multiply-accumulate circuitry).

Claim 9 is rejected under 35 U.S.C. 103 as being unpatentable over Liu et al. (“Cambricon: An Instruction Set Architecture for Neural Networks” 2016 IEEE), in view of Hinds (U.S. 2007/0220076), in view of Lutz et al. (2016/0124710), in view of Barman et al. (U.S. 8,924,455), in view of Official Notice, further in view of Leibholz (U.S. 2002/0138714).
As per claim 9:
Liu, Hinds, Lutz, and Barman disclosed the computation device of claim 8.
Liu, Hinds, Lutz, and Barman failed to teach wherein the controller unit further includes: a dependency relationship processing unit configured to: determine whether there exists an associated relationship between a first operation instruction and a zeroth operation instruction before the first operation instruction, cache the first operation instruction in the instruction cache unit based on a determination that there exists an associated relationship between the first operation instruction and the zeroth operation instruction, and extract the first operation instruction from the instruction cache unit to the operation unit, when an execution of the zeroth operation instruction is completed, wherein determining whether there exists an associated relationship between a first operation instruction and a zeroth operation instruction before the first operation instruction by the dependency relationship processing unit includes: extracting a first storage address interval of data required in the first operation instruction according to the first operation instruction, extracting a zeroth storage address interval of data required in the zeroth operation instruction according to the zeroth operation instruction, determining that there exists an associated relationship between the first operation instruction and the zeroth operation instruction, when an overlapped region exists between the first storage address interval and the zeroth storage address interval, and determining that there does not exist an associated relationship between the first operation instruction and the zeroth operation instruction when no overlapped region exists between the first storage address interval and the zeroth storage address interval.
However, Leibholz combined with Liu, Hinds, Lutz, and Barman wherein the controller unit further includes: 
a dependency relationship processing unit configured to: 
determine whether there exists an associated relationship between a first operation instruction and a zeroth operation instruction before the first operation instruction (Leibholz: Figure 1 element 105, paragraphs 33 and 40)(Liu: Figure 8)(Leibholz disclosed a dependency checker to detect dependent operations. The combination adds the dependency checker to the accelerator of Liu.), 
cache the first operation instruction in the instruction cache unit based on a determination that there exists an associated relationship between the first operation instruction and the zeroth operation instruction (Leibholz: Figure 1 element 105, paragraphs 33 and 40)(Liu: Figure 8)(Leibholz disclosed a dependency checker to detect dependent operations. The combination adds the dependency checker to the accelerator of Liu. Official notice is given that dependent instructions can be fetched into an instruction cache multiple times through program execution for the advantage of faster fetching times. Thus, it would have been obvious to one of ordinary skill in the art that detected instructions with dependencies are later added to an instruction cache.), and 
extract the first operation instruction from the instruction cache unit to the operation unit, when an execution of the zeroth operation instruction is completed (Leibholz: Figure 1 element 105, paragraphs 33 and 40)(Liu: Figure 8, sections II and IV)(Liu disclosed using an on-chip scratchpad in place of a vector register file. Leibholz disclose a dependency checker that compares source and destination registers to detect dependencies. The combination implements a dependency check in Liu by comparing vector input and vector output addresses of the scratchpad. The combination issues dependent instructions to the vector/matrix units when all source data is ready.),
wherein determining whether there exists an associated relationship between a first operation instruction and a zeroth operation instruction before the first operation instruction by the dependency relationship processing unit includes: 
extracting a first storage address interval of data required in the first operation instruction according to the first operation instruction (Leibholz: Figure 1 element 105, paragraphs 33 and 40)(Liu: Figure 8, sections II and IV)(Liu disclosed using an on-chip scratchpad in place of a vector register file. Leibholz disclose a dependency checker that compares source and destination registers to detect dependencies. The combination implements a dependency check in Liu by comparing vector input and vector output addresses of the scratchpad.), 
extracting a zeroth storage address interval of data required in the zeroth operation instruction according to the zeroth operation instruction (Leibholz: Figure 1 element 105, paragraphs 33 and 40)(Liu: Figure 8, sections II and IV)(Liu disclosed using an on-chip scratchpad in place of a vector register file. Leibholz disclose a dependency checker that compares source and destination registers to detect dependencies. The combination implements a dependency check in Liu by comparing vector input and vector output addresses of the scratchpad.),
 determining that there exists an associated relationship between the first operation instruction and the zeroth operation instruction, when an overlapped region exists between the first storage address interval and the zeroth storage address interval (Leibholz: Figure 1 element 105, paragraphs 33 and 40)(Liu: Figure 8, sections II and IV)(Liu disclosed using an on-chip scratchpad in place of a vector register file. Leibholz disclose a dependency checker that compares source and destination registers to detect dependencies. The combination implements a dependency check in Liu by comparing vector input and vector output addresses of the scratchpad. A match or overlap of the compared addresses determines a dependency exists.), and 
determining that there does not exist an associated relationship between the first operation instruction and the zeroth operation instruction when no overlapped region exists between the first storage address interval and the zeroth storage address interval (Leibholz: Figure 1 element 105, paragraphs 33 and 40)(Liu: Figure 8, sections II and IV)(Liu disclosed using an on-chip scratchpad in place of a vector register file. Leibholz disclose a dependency checker that compares source and destination registers to detect dependencies. The combination implements a dependency check in Liu by comparing vector input and vector output addresses of the scratchpad. A lack of a match or overlap of the compared addresses determines a dependency doesn’t exist.).
The advantage of dependency checking is that instructions dependent upon certain data are stalled so that the correct data is accessed for execution. Thus, it would have been obvious to one of ordinary skill in the art at the time of the effective filing date to implement the dependency checker of Leibholz into the accelerator of Liu to ensure correct program execution for dependent operations.

Claim 18 is rejected under 35 U.S.C. 103 as being unpatentable over Liu et al. (“Cambricon: An Instruction Set Architecture for Neural Networks” 2016 IEEE), in view of Hinds (U.S. 2007/0220076), in view of Official Notice.
As per claim 18:
Claim 18 essentially recites the same limitations of claim 1. Claim 18 additionally recites the following limitations:
wherein the second input data is fixed-point data (Liu: Figure 8, section IV)(Hinds: Figure 7, paragraphs 15, 64-65, and 113-115)(The combination implements the floating-point conversion instructions of Hinds as vector operations in Liu using the scratchpad memory for operand storage. Official notice is given that vector compress and expand instructions can be used to enlarge or shrink data element sizes for the advantage of quickly transforming data elements to needed sizes. Thus, it would have been obvious to one of ordinary skill in the art to implement vector compress and expand instructions in Liu using fixed-point elements as in Hinds.).

Claims 17 and 19-24 are rejected under 35 U.S.C. 103 as being unpatentable over Liu et al. (“Cambricon: An Instruction Set Architecture for Neural Networks” 2016 IEEE), in view of Hinds (U.S. 2007/0220076), in view of Official Notice, further in view of Lutz et al. (2016/0124710).
As per claim 17:
Claim 17 essentially recites the same limitations of claim 4. Claim 17 additionally recites the following limitations:
when the machine learning operation device includes a plurality of the computation devices, the plurality of computation devices is configured to couple and transmit data with each other through a specific structure (Liu: Figure 8, section IV)(Official notice is given that homogeneous processing nodes can be implemented for the advantage of executing larger sets of data at once. Thus, it would have been obvious to one of ordinary skill in the art to implement a plurality of accelerator nodes connected together by I/O interfaces for the above advantage.), and 
the plurality of computation devices is configured to: 
interconnect and transmit data through a fast external device interconnection PCIE (peripheral component interface express) bus to support larger-scale machine learning computations (Liu: Figure 8, section IV)(Official notice is given that processing nodes can be connected together by PCIE buses for the advantage of faster data connections. Thus, it would have been obvious to one of ordinary skill in the art to implement PCIE buses to connect together a plurality of accelerator nodes.), 
share the same one control system or have respective control systems (Liu: Figure 8, section IV)(In view of the above official notice, each accelerator has its own control system.), 
share the same one memory or have respective memories (Liu: Figure 8, section IV)(In view of the above official notice, each accelerator has its own memory.), and 
deploy an interconnection manner of any arbitrary interconnection topology (Liu: Figure 8, section IV)(In view of the above official notice, each accelerator is connected with other accelerators via I/O interfaces.).
As per claim 19:
The additional limitation(s) of claim 19 basically recite the additional limitation(s) of claim 2. Therefore, claim 19 is rejected for the same reason(s) as claim 2.
As per claim 20:
The additional limitation(s) of claim 20 basically recite the additional limitation(s) of claim 3. Therefore, claim 20 is rejected for the same reason(s) as claim 3.
As per claim 21:
The additional limitation(s) of claim 21 basically recite the additional limitation(s) of claim 4. Therefore, claim 21 is rejected for the same reason(s) as claim 4.
As per claim 22:
The additional limitation(s) of claim 22 basically recite the additional limitation(s) of claim 5. Therefore, claim 22 is rejected for the same reason(s) as claim 5.
As per claim 23:
Liu, Hinds, and Lutz disclosed the method of claim 20, wherein when the first input data and the second input data are both fixed-point data, the decimal point position of the first input data is inconsistent with that of the second input data (Liu: Figure 8, section IV)(Hinds: Figure 7, paragraphs 15, 64-65, and 113-115)(The combination implements the floating-point conversion instructions of Hinds as vector operations in Liu using the scratchpad memory for operand storage. In view of the above official notice, both the source and destination data elements are fixed-point elements and the decimal location is inconsistent with the newly expanded/compressed locations.).
As per claim 24:
The additional limitation(s) of claim 24 basically recite the additional limitation(s) of claim 10. Therefore, claim 24 is rejected for the same reason(s) as claim 10.

Response to Arguments
The arguments presented by Applicant in the response, received on 3/21/2022 are not considered persuasive.
Applicant argues for claim 1:
“The format conversion instruction disclosed by Hinds is different than the recited fixed-point format operation instruction. Hind's format conversion instruction includes operand Fm, which is the location of the decimal point, and operand Fd, which specified the source and destination register. In MIPS architecture, the destination register is used to store the result of the operation, i.e., the result of the format conversion. 
In contrast, the recited fixed-point format operation instruction "comprises a first address of first input data, a first address of output data, and a decimal point position." The output data is NOT the conversion result. The first address of the output data is used to store the output data of the operation instruction. The result generated by the conversion unit is the second input data, which, along with the opcode, is transmitted to the operation unit.”

This argument is not found to be persuasive for the following reason. The applicant is correct that Hinds format conversion instructions specify source and destination operands. The applicant is also correct that the destination register stores a result of a given operation. However, the destination register field within the conversion instructions specifies a register number (i.e. address) within a register file of where the result is to be stored. Thus, reading upon the claimed limitation.
Applicant argues for claim 1:
“Furthermore, neither Liu nor Hinds teaches or suggests the recited controller unit. The recited controller unit parses a fixed-point format operation instruction to obtain an opcode, the (first) address of the input data, the (first) address of the output data and a decimal point position. The controller unit then retrieves the input data and sends the input data along with the decimal point position to the conversion unit. Upon receiving the result from the conversion unit, i.e., the second input data, the controller unit transmits the opcode and the second input date to an operation unit.”
 
This argument is not found to be persuasive for the following reason. Liu disclose a decoder for decoding operations. The added fixed-point conversion instructions of Hinds are processed by the decoder in Liu. Decoding the fixed-point conversion instruction obtains the opcode, source/destination registers, and the location of the decimal point for the fixed-point conversion instruction. Thus, reading upon the claimed limitation.
Applicant argues for claim 1:
“The examiner's position seems to be that because Hinds teaches format conversion, which reads on the recited conversion unit, it would be obvious for a person skilled in the art to modify Liu's instruction sets to call on Hinds' format conversion instructions. However, the examiner has failed to establish a prima facie case of obviousness. The examiner did not provide a reasoned explanation as how the two references can even be combined. The examiner also has not provided any rationale as why a person skilled in the art would be motivated to combine the two references.”  

This argument is not found to be persuasive for the following reason. In response to applicant’s argument that there is no teaching, suggestion, or motivation to combine the references, the examiner recognizes that obviousness may be established by combining or modifying the teachings of the prior art to produce the claimed invention where there is some teaching, suggestion, or motivation to do so found either in the references themselves or in the knowledge generally available to one of ordinary skill in the art.  See In re Fine, 837 F.2d 1071, 5 USPQ2d 1596 (Fed. Cir. 1988), In re Jones, 958 F.2d 347, 21 USPQ2d 1941 (Fed. Cir. 1992), and KSR International Co. v. Teleflex, Inc., 550 U.S. 398, 82 USPQ2d 1385 (2007).  In this case, the advantages of implementing fixed-point data formats was given within the Hinds reference itself.
Applicant argues for claim 1:
“As shown above, Liu's MMV instruction takes a matrix and a vector as input data. Each register used in the MMV instruction specifies a scratchpad memory address that stores the vectors and matrices involved in the multiplication. For example, RegO specifies the base scratchpad memory address of the vector output. See Liu, page 396, col 2, line 1- 9. 
On the other hand, Hinds' format conversion instructions use a source register and a destination register to hold the input floating-point/fixed-point data and the converted floating- point/fixed-point data. It is baffling as how and why a person skilled in the art would or could fit Hinds' format conversion instructions into Liu's ISA. 
In short, there is no known technique or simple approach to incorporate Hinds' apparatus into Liu's novel instruction set. There is no teaching, suggestions, or motivation in the prior art that would have led a person skilled in the art to modify Liu or combine Lin with Hinds to arrive at the claimed device of claim 1. Claim 1 is not obvious over Liu in view of Hinds.
.”  

This argument is not found to be persuasive for the following reason. Applicant is correct that Hinds alone references scalar registers while Liu references scratchpad memory addresses holding vectors and matrices. Liu in figure 2 shows an example format of a vector load operation, where the destination address is encoded as register 0. This allows for the on-chip scratchpad memory to be directly addressed by an encoded field within the vector operation. The combination would allow for vector format conversion instructions to be encoded in a similar way such that the source and destination address are encoded directly by the instruction (e.g. 6 bits pointing to a location within the scratchpad). Thus, reading upon the claimed limitations.

	Conclusion
The following is text cited from 37 CFR 1.111(c): In amending in reply to a rejection of claims in an application or patent under reexamination, the applicant or patent owner must clearly point out the patentable novelty which he or she thinks the claims present in view of the state of the art disclosed by the references cited or the objections made. The applicant or patent owner must also show how the amendments avoid such references or objections.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JACOB A. PETRANEK whose telephone number is (571)272-5988.  The examiner can normally be reached on M-F 8:00-4:30.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Aimee Li can be reached on (571) 272-4169.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/JACOB PETRANEK/Primary Examiner, Art Unit 2183