DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
In response to applicant’s argument regarding 35 U.S.C. 103, page 17-18 “ Neither Mills, Langhammer, nor Divakar teach or suggest multiplier-accumulator circuits of the pipeline connected in series to form a ring architecture. Briefly, the application, in Figure 2C, illustrate an exemplary embodiment of a ring architecture (see “rotate current Y) wherein “the output of the accumulator of a first MAC circuit (e.g. MAC 1) is input into the accumulator of a second MAC circuit (e.g. MAC 2) and the output of a third MAC circuit (e.g. MAC n) is input into the accumulator of the first MAC circuit (MAC 1). That is, the MACs of the pipeline are connected in series to form a ring – i.e., a closed loop.”
Examiner respectfully disagrees because it is noted that the features upon which applicant relies (i.e., figure 2C … MAC n is input into the accumulator of the first MAC circuit (MAC1)) are not recited in the rejected claim(s).  Although the claims are interpreted in light of the specification, limitations from the specification are not read into the claims.  See In re Van Geuns, 988 F.2d 1181, 26 USPQ2d 1057 (Fed. Cir. 1993). The claim merely recites the plurality of multiplier-accumulator circuits … connected in series to form a ring architecture.
Applicant further asserted on page 19-20 “the MACs of the specialized processing blocks 600A-600D do not form a ring architecture – i.e. a closed loop … however, the output of adder 604 is not input into the processing blocks 600A-600C… adder 604 of processing block 600D to post-process that final dot product by accumulating the final dot product – neither (i) forms a ring architecture of the MACs NOR (ii) transforms the cascade architecture into a ring architecture”.
Examiner respectfully disagrees because the claim merely recites the plurality of multiplier-accumulator circuits … connected in series to form a ring architecture. Langhammer teaches such limitation as illustrated in figure 6, there are a plurality processing blocks 600 that are connected in series and form a ring architecture, for example MN+OP of 600D fed into 600C to generate IJ+KL+MN+OP and fed into 600B to generate AB+CD+EF+GH+IJ+KL+MN+OP, which fed back into 600D. such connected form a closed loop [i.e. a ring architecture].

In response to applicant’s argument regarding 35 U.S.C. 103, page 21 “Neither Mills, Langhammer nor Divakar teach or suggest an accumulator of the multiplier accumulator circuits of the pipeline generating floating point data format including a precision which is fixed and non-configurable after IC manufacture. … for example, Langhammer teaches a floating point adder circuit wherein the precision is programmable after IC manufacture, based on, for example, application (Langhammer 0026, 0033, and 0046)… (divakar 0028, 0038, 0044, and 0050-0053).”
Examiner respectfully disagrees because, for example, Langhammer [0033] processing circuitry 120 may be adaptable to efficiently implement … different precision, [0046] also uses the term “may operate”, accordingly processing circuitry 120 does not necessarily performing different operations or programmable to perform different operations, in fact paragraph [0113] describes the method and apparatus described may be incorporated into other integrated circuit, such as application specific integrated circuits (ASICs), one of ordinary skill in the art would recognize that ASIC  is designed and manufactured for one specific application and does not allow to reprogram or modify after it is produced, thus for integrated circuit, ASIC, the precision is fixed and non-configurable.
Applicant further asserted on page 21-22, “there is no reason for one skilled in the art to modify Mills, Langhammer, and Divakar to implement an accumulator … generating sum data, having the floating point data format including a precision which is fixed and non-configurable after manufacture of the integrated circuit.”
Examiner respectfully disagrees because incorporate into an applicant specific integrated circuit (ASIC) would achieve a higher level of efficiency and performance because an ASIC would have exact number of components/gates for the intended application, thereby would eliminate the waste energy and space.

Claim Objections
Claims 4, 9-14 are objected to because of the following informalities:
Claim 4 line 2 recites "the precision of the floating point…" should be "the second precision of the floating point…" as antecedently recited in claim 1.
Claim 9 line 5 "the first input data" should be "first input data" because there is lack of antecedent basis.
Claim 9 line 6 "the floating point data format" should be "a floating point data format" because there is lack of antecedent basis. Dependent claims are also objected for inheriting the same deficiencies in which claims they depend on.
Appropriate correction is required.
Claim 14 line 12 recites “multiply the first input data, having a floating point data format including the first precision” should be “multiply the first input data, having the floating point data format including the first precision” as antecedently recited.

Claim Rejections - 35 USC § 112(b)
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.



Claim 5 is rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claim 5 line 8 recites “to convert the filter weight data to the filter weight data having the floating point data format including the first precision ”. it is unclear whether “the filter weight data” having the first precision as antecedently recited in claim 1 line 6-7 or a different precision because the claim recites to convert the filter weight data (this filter weight data should be in different precision) to the filter weight data having … the first precision. For examination purposes, Examiner interprets the limitation as “to convert filter weight data having a first format to the filter weight data having the floating point data format including the first precision”.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 15-21 are addressed before claims 1-14.
Claims 15-20 are rejected under 35 U.S.C. 103 as being unpatentable over Mills (US 20190340489) in view of Langhammer (US 20180300105) and Divakar (US 20200097799).

Regarding claim 15, Mills teaches an integrated circuit (Mills, figure 3, neural processor circuit 218) comprising: first memory to store data (Mills figure 3 [0049] data buffer 318 [first memory] is embodied as a memory that store input data [i.e. data]); 5second memory to store filter weight data(Mills, figure 3, [0048] the kernel data [i.e. filter weight data] are fetched from system memory 230 [i.e. second memory].[0022] the engine circuit performs neural network operations using kernel data in a fixed-point precision or a floating-point precision); Mills further teaches a plurality of neural engine to includes a plurality of multiply add unit (see figure 4), wherein the multiplier are coupled to the memory (Mills, the MAD circuits are connected to memory). Mills does not teach first conversion circuitry, coupled to the first memory, to receive and convert the data from the first memory to first input data having a floating point data format including a first precision; second conversion circuitry, coupled to the second memory, to receive and convert the filter weight data from the second memory to filter weight data having the floating point data format including the first precision; a multiplier-accumulator execution pipeline, coupled to an output of the first conversion circuitry and an output of the second conversion circuitry, wherein the multiplier-accumulator execution pipeline includes a plurality of multiplier-accumulator circuits to, in operation, perform multiply and accumulate operations, wherein each multiplier-accumulator circuit includes: a multiplier to multiply the first input data, 15having the floating point data format including the first precision, by the filter weight data, having the 16floating point data format including the first precision, and generate product data having the17 floating point data format, and output the product data having the floating point data format including a second precision, and an accumulator, coupled to the multiplier of the associated multiplier-19accumulator circuit to add second input data and the product data output by the 20associated multiplier to generate sum data having the floating point data format including the second precision, wherein the second precision of the sum data is fixed and non-configurable after manufacture of the integrated circuit; and 21wherein, the plurality of multiplier-accumulator circuits of the multiplier- 22accumulator execution pipeline are connected in series to form a ring architecture and, 23in operation, perform a plurality of concatenated multiply and accumulate operations
However, Langhammer teaches a multiplier-accumulator execution pipeline (Langhammer, figure 6 illustrate 4 special processing blocks. Figure 2 illustrates the implementation of special processing blocks that having plurality of registers to hold in between the combination circuits. [0095] describes that each special processing block in figure 2 may implement each special processing blocks in figure 6. In another words, there are registers to hold data, which are being cascaded to the next processing blocks, thus figure 6 illustrates pipelining), wherein the multiplier-accumulator 11execution pipeline includes a plurality of multiplier-accumulator circuits to, in operation, 12perform multiply and accumulate operations (Langhammer, figure 6 illustrates at least 4 multiplier-accumulator circuits 600A-600D [i.e. a plurality of multiplier-accumulator circuits] to perform multiply and accumulate operations), wherein each multiplier-accumulator circuit (Langhammer, each of 600A-600D], for example 600A) 13includes: 
14a multiplier to multiply the first input data, 15having the floating point data format including the first precision, by a data, having the 16floating point data format including the first precision, and generate product data having 17the floating point data format including a second precision (Langhammer, figure 6 multiplier 601, 602, adder 603, and cast function 650 made up a multiplier that multiplies A and B and output a product of AB+CD in second floating point number [i.e. product data having the floating point data format including a second precision] . [0046-0047] the multiplier circuit may operate on floating point number of a first floating-point precision [i.e. the floating point format including the first precision] and adder 204 (in this case would be 604) operates on a second floating point number precision [i.e. second precision]), and 
18an accumulator, coupled to the multiplier of the associated multiplier-19accumulator circuit (Langhammer, figure 6 adder 604 of 600A), to add second input data and the product data output by the 20associated multiplier to generate sum data having the floating point data format including the second precision, wherein the second precision of the sum data is fixed and non-configurable after manufacture of the integrated circuit (Langhammer, the adder 604 adds EF+GH [i.e. second input data] and the product AB+CD [i.e. the product data] to generate an output AB+CD+EF+GH [i.e. sum data]. [0046-0047] the multiplier circuit may operate on floating point number of a first floating-point precision [i.e. the floating point format including the first precision] and adder 204 (in this case would be 604) operates on a second floating point number precision [i.e. second precision]. [0113] the method and apparatus may incorporate into other integrated circuit, such as application specific integrated circuit (ASIC). ASIC is designed and manufactured for one specific application and does not allow to reprogram or modify after it is produced, thus for ASIC, the precision is fixed and non-configurable); and 21wherein, the plurality of multiplier-accumulator circuits of the multiplier- 22accumulator execution pipeline are connected in series to form a ring architecture and, 23in operation, perform a plurality of concatenated multiply and accumulate operations (Langhammer, figure 6, the processing blocks 600A-600D are cascaded [i.e. connected in series] and the output of 600B is fed back into 600D via signal 679 [i.e. a ring architecture] and as show in signal 679, the partial products are being concatenated).
It would have been obvious for one of ordinary skills in the art before the effective filing date of the claimed invention to modify the MAC 404 of Mills in figure 4 to performing multiply and accumulate operations as disclosed in figure 6 of Langhammer and incorporate to an application specific integrated circuit (ASIC) . This modification would have been obvious because both references discloses a system for performing multiply and accumulate operation for different data format such as floating point and fixed-point, and as recognized by Langhammer, having plurality of multiplier-accumulators execution pipeline would simplify routing and improve speed [0092]. Furthermore, pipelining would reduce the critical path because pipelining introduces latches on the data path, therefore this allow high clock frequencies or sampling rates to be used in the circuit. Furthermore, incorporate into an ASIC would achieve a higher level of efficiency and performance because an ASIC would have exact number of components/gates for the intended application, thereby would eliminate the waste energy and space.
As modified, the combined system of Mills in view of Langhammer discloses first memory and second memory to store data and filter weight data and a multiplier accumulator execution pipeline to perform on different data format. However, the combined system of Mills in view of Lang hammer does not teaches first conversion circuitry, coupled to the first memory, to receive and convert the 4data from the first memory to first input data having a first floating point data format; second conversion circuitry, coupled to the second memory, to receive and 7convert the filter weight data from the second memory to filter weight data format having 8the first floating point data format;
Divakar discloses a system for performing multiplication on different data type and teaches 3first conversion circuitry configured to receive and convert the 4data to first input data having a floating point data format including a first precision (Divakar, figure 4D, [0040] format conversion 485a convert operand A format (for example in figure 4B, one of the data type could be INT8) into a floating point format for use in floating point multiplier 310); 6second conversion circuitry, to receive and 7convert the data to the data format having 8the floating point data format including the first precision( Divakar, figure 4D [0040] format conversion 485b converts operand B in a format (for example in figure 4B, one of the data type could be int8) into a floating point format for use in floating point multiplier 310); 
It would have been obvious for one of ordinary skills in the art before the effective filing date of the claimed invention to modify Mills’ system as shown in figure 3 to include format conversions 485a and 485b of Divakar to convert kernel data and input data before performing the multiply and accumulate operations. This modification would have been obvious because Mills discloses a system for performing multiply and accumulate operation that support on floating point number and fixed point number and Divakar also discloses a system for performing multiply operation includes conversion between fixed-point and floating point number. Furthermore, as recognized by Divakar, [0028] having format conversions for different datatype increase performance and power efficiencies, and would optimize compute area on the chip and allow the design of improved neural network that may be optimized based on the combination of data format used as inputs in various layers.
As modified, the combined system of Mills in view of Langhammer and Divakar discloses a system to includes first memory and second memory to store data and filter weight data, and first conversion circuitry and second conversion circuitry to convert the data and the filter weight data into a first floating point format to be executed in a multiplier-accumulator execution pipeline.

Regarding claim 16¸ the combined system of Mills in view of Langhammer and Divakar discloses the integrated circuit of claim 15 wherein:
2each multiplier-accumulator circuit of the plurality of multiplier-accumulator 3circuits is connected to two multiplier-accumulator circuits of the plurality of 4multiplier-accumulator circuits of the ring architecture (Langhammer, figure 6 [0094] the processing blocks are cascaded in a chain and form the ring architecture) including: 
5connected to a first multiplier-accumulator circuit to receive the second input data, having the floating point data format including the second precision, therefrom (Langhammer, figure 6, 600C [i.e. first multiplier-accumulator circuit] receives MN+OP [i.e. second input data]. [0046-0047] the multiplier circuit may operate on floating point number of a first floating-point precision [i.e. the floating point format including the first precision] and adder 204 (in this case would be 604) operates on a second floating point number precision [i.e. second precision] ), andPage 56 connected to a second multiplier-accumulator circuit to output the sum data, having the floating point data format including the second precision, thereto (Langhammer, figure 6 600B [i.e. second multiplier-accumulator circuit] output AB+CD+EF+GH+IJ+KL+MN+OP. [0046-0047] the multiplier circuit may operate on floating point number of a first floating-point precision [i.e. the floating point format including the first precision] and adder 204 (in this case would be 604) operates on a second floating point number precision [i.e. second precision]).  

Regarding claim 17, the combined system of Mills in view of Langhammer and Divakar discloses the integrated circuit of claim 16 wherein: 2the second input data is the sum data, which is output by the accumulator of the 3connected first multiplier-accumulator circuit (Langhammer, figure 6 600B receive AB+CD+EF+GH or IJ+KL+MN+OP [i.e. the second input data], which is the sum data output of 600A AB+CD+EF+GH or output of 600C).  

Regarding claim 18, the combined system of Mills in view of Langhammer and Divakar discloses the integrated circuit of claim 17 wherein: the second precision of the product data is fixed and non-configurable after manufacture of the integrated circuit (Langhammer, [0113] the method and apparatus may incorporate into other integrated circuit, such as application specific integrated circuit (ASIC). ASIC is designed and manufactured for one specific applicant and does not allow to reprogram or modify after it is produced, thus for ASIC, the precision of the product is fixed and non-configurable)).  

Regarding claim 19, the combined system of Mills in view of Langhammer and Divakar discloses the integrated circuit of claim 15 wherein: the data stored in the first memory includes the floating point data format 3having a third precision (Mills, figures 3-4 data buffer stores floating point number [0083], MAC 404 receive floating point number), and 4the first conversion circuitry receives the data having the floating point data 5format including the third precision, and converts the data to the floating point data 6format having the first precision, wherein the third precision is a higher precision 7than the first precision (Divakar, figure 4D convert 485a converts is provided to allow a given operand to be converted into any one of the multiple different data formats prior being fed into a floating point multiplier, [0045] the floating point format may have precision, such as 32 bits, and being converted to 16b via converter 485a).

Regarding claim 20, the combined system of Mills in view of Langhammer and Divakar discloses the integrated circuit of claim 19 wherein: the first precision of the floating point data format is 16 bit and the second precision of the floating point format is 24 bit or 32 bit (Langhammer, [0046-0047] if the first floating point precision (performed by multiplier circuits) is half-precision, the second floating point precision (performed by adder 604) is double-precision).  

Claim 21 is rejected under 35 U.S.C. 103 as being unpatentable over Mills in view of Langhammer and Divakar as applied to claim 19 above, and further in view of Sano (US-20170315778).

Regarding claim 21¸ the combined system of Mills in view of Langhammer and Divakar discloses the integrated circuit of claim 19, but does not teach the first conversion circuitry and the second conversion circuitry each includes an adder. However, Sano teaches the first conversion circuitry and the second conversion circuitry each includes an adder (Sano, abstract, claim 1, conversion arithmetic circuit includes an adder to convert integer to floating point format or floating point format to integer).
It would have been obvious for one of ordinary skills in the art before the effective filing date of the claimed invention to substitute the Divakar’s format conversion with the conversion circuit that includes an adder as disclosed in Sano. This modification would have been obvious because both Divakar and Sano discloses format conversion between integer or fixed-point and floating point, but Divakar does not disclose the structure of the format conversion, and Sano does. Furthermore, this modification would have been obvious because the substitution of one known element for another would have yielded predictable result to one of ordinary skill in the art, which is performing data conversion. see MPEP 2141 III(B) simple substitution of one known element for another to obtain predictable result.

Claims 1-4, and 8-13 are rejected under 35 U.S.C. 103 as being unpatentable over Mills in view of Langhammer.
Claims 9-13 are addressed before claims 1-4, and 8
Regarding claim 9, Mills teaches an integrated circuit (Mills, figure 3, neural processor circuit 218) comprising: a plurality of neural engine coupled to a first memory to store data (Mills figure 3 system memory to store filter weight data and buffer to store input data); however Mills does not teach a multiplier-accumulator execution pipeline, coupled to first memory, including a 3plurality of multiplier-accumulator circuits to, in operation, perform multiply and 4accumulate operations, wherein each multiplier-accumulator circuit includes: 5a multiplier, coupled to the first memory, to multiply first input data, having 6a first floating point data format, by filter weight data, having the first floating point 7data format, and to generate and output product data having a second floating 8point data format, and 9an accumulator, coupled to the multiplier of the associated multiplier- 10accumulator circuit, to add second input data and the product data output by the 11associated multiplier to generate sum data; and 12wherein, the plurality of multiplier-accumulator circuits of the multiplier- 13accumulator execution pipeline are connected in series to form a ring architecture and, 14in operation, perform a plurality of concatenated multiply and accumulate operations. 9
However, Langhammer teaches 9a multiplier-accumulator execution pipeline (Langhammer, figure 6 illustrate 4 special processing blocks. Figure 2 illustrates the implementation of special processing blocks that having plurality of registers to hold in between the combination circuits. [0095] describes that each special processing block in figure 2 may implement each special processing blocks in figure 6. In another words, there are registers to hold data, which are being cascaded to the next processing blocks, thus figure 6 illustrates pipelining), 11including a plurality of multiplier-accumulator circuits to, in operation, 12perform multiply and accumulate operations (Langhammer, figure 6 illustrates at least 4 multiplier-accumulator circuits 600A-600D [i.e. a plurality of multiplier-accumulator circuits]), wherein each multiplier-accumulator circuit (Langhammer, each of 600A-600D], for example 600A) 13includes: 
14a multiplier, to multiply the first input data, 15having the floating point data format, by the filter weight data, having the 16floating point data format including a first precision, and to generate and data having 17the floating point data format including a second precision and to output the product data having the floating point data format including the second precision (Langhammer, figure 6 multiplier 601, 602, adder 603, and cast function 650 made up a multiplier that multiplies A and B and output a product of AB+CD in second floating point number [i.e. product data having the floating point data format including a second precision] . [0046-0047] the multiplier circuit may operate on floating point number of a first floating-point precision [i.e. the floating point format including a first precision] and adder 204 (in this case would be 604) operates on a second floating point number precision [i.e. a second precision]), and 
18an accumulator, coupled to an output of the multiplier of the associated multiplier-19accumulator circuit to receive the product data (Langhammer, figure 6 adder 604 of 600A), to add second input data and the product data20 to generate sum data, having the floating point data format including a third precision, wherein the third precision of the sum data is fixed and non-configurable after manufacture of the integrated circuit (Langhammer, the adder 604 adds EF+GH [i.e. second input data] and the product AB [i.e. the product data] to generate an output AB+CD+EF+GH [i.e. sum data]. [0046-0047] the multiplier circuit may operate on floating point number of a first floating-point precision [i.e. the floating point format including the first precision] and adder 204 (in this case would be 604) operates on a second floating point number precision [i.e. third precision]. [0113] the method and apparatus may incorporate into other integrated circuit, such as application specific integrated circuit (ASIC). ASIC is designed and manufactured for one specific application and does not allow to reprogram or modify after it is produced, thus for ASIC, the precision is fixed and non-configurable)); and 21wherein, the plurality of multiplier-accumulator circuits of the multiplier- 22accumulator execution pipeline are connected in series to form a ring architecture and, 23in operation, perform a plurality of concatenated multiply and accumulate operations (Langhammer, figure 6, the processing blocks 600A-600D are cascaded [i.e. connected in series] and the output of 600B is fed back into 600D via signal 679 [i.e. a ring architecture] and as show in signal 679, the partial products are being concatenated).
It would have been obvious for one of ordinary skills in the art before the effective filing date of the claimed invention to modify the MAC 404 of Mills in figure 4 to performing multiply and accumulate operations as disclosed in figure 6 of Langhammer and incorporate to an application specific integrated circuit (ASIC) . This modification would have been obvious because both references discloses a system for performing multiply and accumulate operation for different data format such as floating point and fixed-point, and as recognized by Langhammer, having plurality of multiplier-accumulators execution pipeline would simplify routing and improve speed [0092]. Furthermore, pipelining would reduce the critical path because pipelining introduces latches on the data path, therefore this allow high clock frequencies or sampling rates to be used in the circuit. Furthermore, incorporate into an ASIC would achieve a higher level of efficiency and performance because an ASIC would have exact number of components/gates for the intended application, thereby would eliminate the waste energy and space.

Regarding claim 10, the combined system of Mills in view of Langhammer discloses the integrated circuit of claim 9, wherein: each multiplier-accumulator circuit of the plurality of multiplier-accumulator 3circuits of the multiplier-accumulator execution pipeline is connected to two 4multiplier-accumulator circuits of the plurality of multiplier-accumulator circuits of the multiplier-accumulator execution pipeline configured in the ring architecture, (Langhammer, figure 6 [0094] the processing blocks are cascaded in a chain and form the ring architecture) 5including: 6connected to a first multiplier-accumulator circuit to receive the 7second input data, having the floating point data format including the third precision, therefrom (Langhammer, figure 6, 600C [i.e. first multiplier-accumulator circuit] receives MN+OP [i.e. second input data]. [0046-0047] the multiplier circuit may operate on floating point number of a first floating-point precision [i.e. the floating point format including the first precision] and adder 204 (in this case would be 604) operates on a second floating point number precision [i.e. third precision]), and 8connected to a second multiplier-accumulator circuit to output the sum 9data, having the floating point data format including the third precision, thereto (Langhammer, figure 6 600B [i.e. second multiplier-accumulator circuit] output AB+CD+EF+GH+IJ+KL+MN+OP. [0046-0047] the multiplier circuit may operate on floating point number of a first floating-point precision [i.e. the floating point format including the first precision] and adder 204 (in this case would be 604) operates on a second floating point number precision [i.e. third precision]).

Regarding claim 11, the combined system of Mills in view of Langhammer discloses the integrated circuit of claim 10, wherein: the second input data is the sum data which is output by the accumulator of the 3connected first multiplier-accumulator circuit (Langhammer, figure 6 600B receive AB+CD+EF+GH or IJ+KL+MN+OP [i.e. the second input data], which is the sum data output of 600A AB+CD+EF+GH or output of 600C).  

Regarding claim 12, the combined system of Mills in view of Langhammer discloses the integrated circuit of claim 11, wherein: the second precision of the floating point data format of the product data is fixed and non-configurable after manufacture of the integrated circuit (Langhammer, [0113] the method and apparatus may incorporate into other integrated circuit, such as application specific integrated circuit (ASIC). ASIC is designed and manufactured for one specific applicant and does not allow to reprogram or modify after it is produced, thus for ASIC, the precision of the product is fixed and non-configurable).  

Regarding claim 13, the combined system of Mills in view of Langhammer discloses the integrated circuit of claim 11, wherein the accumulator of each multiplier-accumulator circuit of the plurality of multiplier-accumulator circuits adds the second input data, having the floating point data format, and the product data in the floating point data format, having the floating point data format including the second precision (Langhammer, figures 2 and 6, the adder 604 adds the product and data from other processor, [0046-0047] the second floating point precision (performed by adder 604) is in second precision ).

Claims 1-2, and 4, recites a system claim that would also be rejected for the same reasons as claims 9 and 12.

Regarding claim 3, the combined system of Mills in view of Langhammer discloses the integrated circuit of claim 1, the multiplier-accumulator circuits of the multiplier-accumulator execution pipeline are connected in series, via input and output buses having a width defined by third precision of the floating point data format of the sum data, to form a ring architecture (Langhammer, figure 6, the processing blocks 600A-600D are cascaded [i.e. connected in series] and the output of 600B is fed back into 600D via signal 679 to form a closed loop [i.e. a ring architecture], and further shown the output of component 650 being input to the processing blocks and input signal 679 to form a closed loop [i.e. a ring architecture] ).

Regarding claim 8, the combined system of Mills in view of Langhammer discloses the integrated circuit of claim 1, wherein: (i) the precision of the floating point data format of the first input data is 16 bit and the precision of the floating point format of the product data is 24 bit or 32 bit or (ii) the precision of the floating point data format of the first input data is 16 bit or 24 bit and the precision of the floating point format of the product data is 32 bit (Langhammer, [0046-0047] if the first floating point precision (performed by multiplier circuits) is half-precision, the second floating point precision (performed by adder 604) is double-precision).

Claims 5-6, 14 are rejected under 35 U.S.C. 103 as being unpatentable over Mills in view of Langhammer as applied to claims 1, and 9 above, and further in view of Divakar.

Regarding Claim 5, the combined system of Mills in view of Langhammer discloses the integrated circuit of claim 1 further including: a memory, having an output coupled to an input of each of the plurality of multiplier-accumulator circuits of the multiplier-accumulator execution pipeline, to store the filter weight data having the floating point data format including the first precision (Mills, figure 3, [0048] the kernel data [i.e. filter weight data] are fetched from system memory 230 [i.e. a memory]. [0022] the engine circuit performs neural network operations using kernel data in a floating-point precision [i.e. the first precision]. Langhammer figure 6 also illustrates a plurality of processing blocks). However, the combined system of Mills in view of Langhammer does not teach first conversion circuitry, having an output coupled to an input of the memory, to convert the filter weight data to the filter weight data having the first floating point data format. 
Divakar discloses first conversion circuitry to convert a data having a first format to a data having the floating point data format including the first precision, and to output the data having the floating point data format including the first precision (Divakar, figure 4D [0040] format conversion 485b converts operand B in a format (for example in figure 4B, one of the data type could be int8) into a floating point format for use in floating point multiplier 310)
It would have been obvious for one of ordinary skills in the art before the effective filing date of the claimed invention to modify Mills’ system as shown in figure 3 to include format conversions 485b of Divakar to convert kernel data before sending to system memory and performing the multiply and accumulate operations. This modification would have been obvious because Mills discloses a system for performing multiply and accumulate operation that support on floating point number and fixed point number and Divakar also discloses a system for performing multiply operation includes conversion between fixed-point and floating point number. Furthermore, as recognized by Divakar, [0028] having format conversions for different datatype increase performance and power efficiencies, and would optimize compute area on the chip and allow the design of improved neural network that may be optimized based on the combination of data format used as inputs in various layers.
As modified, the combined system of Mills in view of Langhammer and Divakar discloses a memory that stored filter weight data having the floating point data format including the first precision, and a first conversion circuitry to filter weight data into the filter weight data having the first precision and output the filter weight data into the memory.

Regarding Claim 6, the combined system of Mills in view of Langhammer discloses the integrated circuit of claim 1 including, the plurality of multiplier-accumulator circuits of the multiply-accumulator execution pipeline (Langhammer, figure 6), however, the combined system of Mills in view of Langhammer does not teach first conversion circuitry, coupled to the plurality of multiplier-accumulator circuits of the multiply-accumulator execution pipeline, to convert first input data to the first input data having the floating point data format including the first or the second precision. 
Divakar teaches first conversion circuitry to convert first input data to the first input data having the floating point data format including the first or the second precision (Divakar, figure 4D, [0040] format conversion 485a convert operand A format (for example in figure 4B, one of the data type could be INT8) into a floating point format for use in floating point multiplier 310).
It would have been obvious for one of ordinary skills in the art before the effective filing date of the claimed invention to modify Mills’ system as shown in figure 3 to include format conversions 485a and 485b of Divakar to convert kernel data and input data before performing the multiply and accumulate operations. This modification would have been obvious because Mills discloses a system for performing multiply and accumulate operation that support on floating point number and fixed point number and Divakar also discloses a system for performing multiply operation includes conversion between fixed-point and floating point number. Furthermore, as recognized by Divakar, [0028] having format conversions for different datatype increase performance and power efficiencies, and would optimize compute area on the chip and allow the design of improved neural network that may be optimized based on the combination of data format used as inputs in various layers.

Regarding claim 14, the combined system of Mills in view of Langhammer discloses the integrated circuit of claim 11, including the plurality of multiplier-accumulator circuit of the multiply accumulator execution pipeline (Langhammer, figure 6), wherein multiplier of each of multiplier-accumulator circuit of the multiply accumulator execution pipeline receive the first input data having the floating point data format including the first precision, and multiply the first input data, having a floating point data format including the first precision, by the filter weight data, having the floating point data format including the first precision, to generate the product data having the floating point data format including the second precision, wherein the second precision is a higher precision than the first precision (Langhammer, figure 6 multiplier 601, 602, adder 603, and cast function 650 made up a multiplier that multiplies A and B and output a product of AB+CD in second floating point number [i.e. product data having the floating point data format including a second precision] . [0046-0047] the multiplier circuit may operate on floating point number of a first floating-point precision, such as 16b half-precision [i.e. the floating point format including a first precision] and adder 204 (in this case would be 604) operates on a second floating point number precision, such as 32b single precision[i.e. a second precision], 32b has higher precision than 16b. Other processing blocks 600B-600D have similar features), the combined system of Mills in view of Langhammer does not teach first conversion circuit, having an output coupled to an input of each of the multiplier-accumulator circuit, to convert first input data, having the floating point data format including the second precision, to the first input data having the floating point data format including the first precision, and the multiplier of each of multiplier accumulator is coupled to the first conversion circuitry. However, Divakar teaches first conversion circuit to convert first input data, having the floating point data format including the second precision, to the first input data having the floating point data format including the first precision (Divakar, figure 4D convert 485a converts is provided to allow a given operand to be converted into any one of the multiple different data formats prior being fed into a floating point multiplier, [0045] the floating point format may have precision, such as 32 bits, and being converted to 16b via converter 485a).
It would have been obvious for one of ordinary skills in the art before the effective filing date of the claimed invention to modify Mills’ system as shown in figure 3 to include format conversions 485a of Divakar to convert first input data before performing the multiply and accumulate operations. This modification would have been obvious because Mills discloses a system for performing multiply and accumulate operation that support on floating point number and Divakar also discloses a system for performing multiply operation includes conversion. Furthermore, as recognized by Divakar, [0028] having format conversions for different datatype and precision increase performance and power efficiencies, and would optimize compute area on the chip and allow the design of improved neural network that may be optimized based on the combination of data format used as inputs in various layers.
As modified, the combined system of Mills in view of Langhammer and Divakar teaches first conversion circuit, having output coupled to an input of each of multiplier-accumulator, to covert first input data, and the first conversion circuit coupled to the multiplier of each multiplier-accumulator to provide converted data and perform multiply operation.


Claim 7 is rejected under 35 U.S.C. 103 as being unpatentable over Mills in view of Langhammer as applied to claim 6 above, and further in view of Divakar and Sano.
Regarding Claim 7, the combined system of Mills in view of Langhammer discloses the integrated circuit of claim 1, but does not teach the first conversion circuitry includes an adder. 
Divakar teaches first conversion circuitry to convert first input data having a first format to the first input data having the first floating point data format (Divakar, figure 4D, [0040] format conversion 485a convert operand A format (for example in figure 4B, one of the data type could be INT8) into a floating point format for use in floating point multiplier 310).
It would have been obvious for one of ordinary skills in the art before the effective filing date of the claimed invention to modify Mills’ system as shown in figure 3 to include format conversions 485a and 485b of Divakar to convert kernel data and input data before performing the multiply and accumulate operations. This modification would have been obvious because Mills discloses a system for performing multiply and accumulate operation that support on floating point number and fixed point number and Divakar also discloses a system for performing multiply operation includes conversion between fixed-point and floating point number. Furthermore, as recognized by Divakar, [0028] having format conversions for different datatype increase performance and power efficiencies, and would optimize compute area on the chip and allow the design of improved neural network that may be optimized based on the combination of data format used as inputs in various layers.
The combined system of Mills in view of Langhammer and Divakar discloses the first conversion circuitry, but does not teach the first conversion circuitry includes an adder (Sano, abstract, claim 1, conversion arithmetic circuit includes an adder to convert integer to floating point format or floating point format to integer).
It would have been obvious for one of ordinary skills in the art before the effective filing date of the claimed invention to modify the Divakar’s format conversion to have the conversion circuit that includes an adder as disclosed in Sano. This modification would have been obvious because both Divakar and Sano discloses format conversion between integer or fixed-point and floating point, but Divakar does not disclose the structure of the format conversion, and Sano does. Furthermore, as disclosed by Sano, [0006-0007] having such conversion circuit includes adder without employing a multiplier and a divider would reduce the load of the CPU.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to HUY DUONG whose telephone number is (571)272-2764. The examiner can normally be reached Mon-Friday 7:30-5:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jyoti Mehta can be reached on 571-270-3995. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/H.D./Examiner, Art Unit 2182                                                                                                                                                                                            (571)272-2764

/MATTHEW D SANDIFER/Primary Examiner, Art Unit 2182