DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Objections
Claims 10 and 20 are objected to because of the following informalities:  
 	In claims 10 (lines 18-19) and 20 (lines 18-19), the recitation “the second conversion circuitry, couple an input…, where the first conversion circuitry includes” does not make clear the relation between two circuitries in the claim.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: 
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.
Claims 1, 5, 6, 9, 10, 20, 22 are rejected under 35 U.S.C. 103 as being unpatentable by Lin et al. (U.S. 2019/0243610 A1) in view of Brothers et al. (U.S. 2017/0011288 A1). 
Regarding Claim 1, Lin discloses an integrated circuit (Lin, [0026] “FIG. 1, a circuit (e.g., the MAC circuit 101) operative to perform MAC operations” comprising: 
a multiplier-accumulator execution pipeline to receive (i) image data and (ii) filter weights (Lin, [0034] FIG. 4 illustrates circuit 400 which is an example of the MAC circuit 101 (FIG. 1)…includes multiple MACs 315 (FIG. 3). The P MACs 315 may work  the dot products may be computed by MAC operations between the weights of the filter 221 and the portion of the input feature map 210. This portion of the input feature map 210 is referred to as an input activation 280” Lin teaches a MAC circuit 101 execution pipeline of multiple MACs 315 (Fig. 3) to receive (i)  input image data 210 and (ii) filter weights (221) (Fig. 2), wherein the multiplier-accumulator execution pipeline includes a plurality of multiplier-accumulator circuits to process the image data, using associated filter weights, via a plurality of multiply and accumulate operations (Lin, Fig. 1 [0020] The accelerator 110 includes one or more engines (e.g. engines 111-114). One of the engines 111-114 include a MAC circuit 101 for performing MAC operations” the MAC circuit 101 execution pipeline includes a plurality of MAC circuit 101 in each of engines 111-114 to process the input image data, using associated filter (Fig. 2) via a plurality of MAC operations.
first conversion circuitry coupled an input of the multiplier-accumulator execution pipeline (Lin [0030] [0031] FIG. 3 illustrates circuitry 300 which is an example of the MAC circuit 101 (FIG. 1), the MAC hardware unit 310 generates an asymmetric MAC output” and [0027] “the asymmetrically quantized counterparts (Qa[i] and Qw[i][j]) may be described by the equations below: A[i]=Qa[i]+OFa; eq. (1) W[i][j]=Qw[i][j]+OFw[j]; eq. (2)” and [0032] “two data sequences may be fed into the MAC hardware unit 310, a first sequence of Qa[i] contains an input activation” and [0033] “The corrective values convert the asymmetric MAC output into a symmetric MAC output” and [0039] “A' [i] and W'[i][j] are the symmetrically quantized input activation and the weights of i-th filter” Lin teaches a first conversion circuitry (300) converts the asymmetric MAC output couple an input activation Qa[i] to obtain symmetric MAC operation on symmetrically quantized data, wherein the first conversion circuitry includes: 
inputs to receive a plurality of sets of image data, wherein each set of image data includes a plurality of image data (Lin, Fig. 2, [0024] “The input feature map 210 (e.g., an input image), the dot products may be computed by MAC operations” the inputs to receive a plurality of  sets of image data (Fig. 2).
Winograd conversion circuitry (Lin, [0037] “the Winograd fast algorithm for convolution may also use the aforementioned MAC operations, performed by the circuits, 400” Lin teaches the first two steps of Winograd fast algorithm are transformation of filter weighs and input activation and the transformation is taken as the conversion circuitry to convert each set of image data of the plurality of sets of image data to a corresponding Winograd set of image data (Lin, [0038] “FIG. 2, the input activation 280, the input activation is Winograd transformed into a (4x4xC) array denoted as Qa'[i]” to convert (transform) each set of image data e.g. the input image map 210 (Width=4,Height=4, depth C).
outputs to output the image data of the plurality of Winograd sets of image data to the multiplier-accumulator execution pipeline (Lin [0031] “the MAC hardware unit 310 generates an asymmetric MAC output” and [0032] “FIG. 3, two data sequences may be fed into the MAC hardware unit 310 to generate a data point in the output feature map: a first sequence of Qa[i] contains an input activation and a second sequence of Qw[i][j] contains the weights of the j-th filter” the MAC hardware unit 310 generates an asymmetric MAC output (Fig. 3) as a result of input activation of Winograd sets of image data (210) and weight of filter (221) (Fig. 2); and 
wherein, in operation, the multiplier-accumulator circuits of the multiplier- accumulator execution pipeline are configured to: (i) perform the plurality of multiply and accumulate operations (Lin, Fig. 1 [0020] “a MAC circuit 101 for performing MAC operations” using (a) the image data of the plurality of Winograd sets of image data from the first conversion circuitry [0030] FIG. 3 illustrates circuitry 300 which is an example of the MAC circuit 101 (FIG. 1) and [0038] “FIG. 2, the input activation 280, the input activation is Winograd transformed into a (4x4xC) array denoted as Qa'[i]”, specifically, by the equations below. A'[i]=Qa'[i]+OFa'; eq. (4) W'[i][j]=Qw'[i][j]+OFw'[j]; eq. (5)” the first conversion circuitry 300 uses the plurality of Winograd set of image data (210)  in operation of equations 4 and 5) and (b) the filter weights (Fig. 2, filter weights (220) and (ii) generate output data based on the multiply and accumulate operations (Fig. 3, [0031] “the MAC hardware unit 310 generates an asymmetric MAC output” generate an asymmetric MAC output.
Lin discloses a conversion of floating point numbers to integer for memory bandwidth reduction (Lin, [0016], lines 14-19].
However, Lin does not explicitly teaches floating point format conversion circuitry, coupled to the Winograd conversion circuitry, to convert the image data of each Winograd set of image data of the plurality of Winograd sets of image data to a floating point data format.
Brothers teaches floating point format conversion circuitry, coupled to the Winograd conversion circuitry, to convert the image data of each Winograd set of image data of the plurality of Winograd sets of image data to a floating point data format (Brothers, [0004] “The output data may be a feature map of the input data that the neural network generates by convolving an input image“ and Fig.1, [0039] “AAC arrays 106 are configured to perform multiply accumulate (MAC) operations” and  [0041] “DRU 108 is coupled to AAC array 106 and memory units 110 and outputs in memory units 100, applying final Winograd transform” and [0099] “FIG. 7, MMU 700 may be used to implement a NN processor as an integrated circuit” and [0104] “Accumulator 720 provides its output to converter 722 converts fixed point format values to floating point format values, storing output generated within data storage unit 702” Lin discloses Brothers teaches a floating point format converter (722) of MMU circuit (700) can apply final Winograd transform to converts the input image data of CNN through a MAC operations of an Accumulator to a floating point data format (Fig.7)
Lin and Brothers are combinable because they are from the same field of endeavor, system and method for image processing and try to solve similar problems. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention was made for modifying the method of Lin to combine the floating point format conversion circuitry (as taught by Brothers) in order to convert the image data of each Winograd set of image data of the plurality of Winograd sets of image data to a floating point data format because Brothers can provide a CNN accelerator couple to a converter to convert image data to the floating point format by applying a final Winograd transform (Brothers, Fig.1, [0039], Fig. 7, [0099]). Doing so, it may provide simplifying the architecture of the NN processor and increasing operating speed and/or throughput of the NN processor (Brothers, [0028]).
Regarding Claim 5, Lin discloses the integrated circuit of claim 1 further including: second conversion circuitry (Lin, [0036] FIG. 5 illustrates a circuit 500 which is an example of the MAC circuit 101 (FIG. 1) The MAC hardware unit 510 further includes an array of (PxK) MACs 315” Lin teaches a second conversion circuitry (a circuit 500), coupled to the multiplier-accumulator execution pipeline, to convert the output data corresponding to the image data of the plurality of Winograd sets of image data to output data having a non-Winograd format (Lin, 0032] “FIG. 3, two data sequences may be fed into the MAC hardware unit 310 to generate a data point in the output feature map: a first sequence of Qa[i] contains an input activation and a second sequence of Qw[i][j] contains the weights of the j-th filter” and [0039]  “At step (c), an inverse Winograd transformation is applied to the result of σi=1.sup.cA'[i]*W'[i][j] to generate a convolution result” Lin teaches step (c) in [0039] to apply an inverse Winograd transformation to the result of equation (6) ([0035]) to generate a convolution result. The inverse transformation is referred as the second conversion circuitry to undo the Winograd transform to non-Winograd format.
Regarding Claim 6, Lin discloses the integrated circuit of claim 5, wherein: the second conversion circuitry further includes floating point format conversion circuitry to convert the output data having a non-Winograd format to output data (Lin, [0039]  “At step (c), an inverse Winograd transformation is applied to the result of σi=1.sup.cA'[i]*W'[i][j] to generate a convolution result” Lin teaches step (c) in [0039] to apply an inverse Winograd transformation to the result of equation (6) ([0035]) to generate a convolution result. The inverse transformation is referred as the second conversion circuitry to undo the Winograd transform to non-Winograd format.
Lin discloses a conversion of floating point numbers to integer for memory bandwidth reduction (Lin, [0016], lines 14-19].
However, Lin does not explicitly teach the second conversion circuitry further includes floating point format conversion circuitry to convert the output data having a non-Winograd format to output data having a floating point data format 
Brothers teaches the second conversion circuitry further includes floating point format conversion circuitry to convert the output data having a non-Winograd format to output data having a floating point data format (Brothers, [0004] “The output data may be a feature map of the input data that the neural network generates by convolving an input image“ and Fig.1, [0039] “AAC arrays 106 are configured to perform multiply accumulate (MAC) operations” and  [0041] “DRU 108 is coupled to AAC array 106 and memory units 110 and outputs in memory units 100, applying final Winograd transform” and [0099] “FIG. 7, MMU 700 may be used to implement a NN processor as an integrated circuit” and [0104] “Accumulator 720 provides its output to converter 722 converts fixed point format values to floating point format values, storing output generated within data storage unit 702” Brothers teaches a floating point format converter (722) of MMU circuit (700) can apply final Winograd transform to converts the input image data of CNN through a MAC operations of an Accumulator to a floating point data format (Fig.7) Brothers teaches the MAC operations and the output results
are in floating format. The combination between Lin (output data from MAC having a non-Winograd format) and Brothers (convert output of MAC to floating point) can be used to teach the limitation in claim 6.
Lin, Brothers combinable see rationale in claim 1.
Regarding Claim 9, the integrated circuit of claim 1 Lin as modified does not explicitly teach wherein: each multiplier-accumulator circuit of the multiplier-accumulator execution pipeline includes a floating point multiplier and a floating point adder.  
However, Brothers teaches each multiplier-accumulator circuit of the multiplier-accumulator execution pipeline includes a floating point multiplier and a floating point adder (Brothers, Fig. 4, [0076] “During each cycle, one column (or row as the case may be) is selected from register 402 and output through multiplexers 404 and 406 and fed to operation units 408” and Fig. 7, [0105] “The output is provided to the adder array 718 which stores the results generated in accumulator 720 provides its output to converter 722 converts fixed point format values to floating point format values” Brothers teaches output of multiplexers and adders can be converted to a floating point format.
Lin and Brothers are combinable see rationale in claim 1.
Regarding Claim 10, Lin discloses an integrated circuit (Lin, [0026] “FIG. 1, a circuit (e.g., the MAC circuit 101) operative to perform MAC operations” comprising: 
a multiplier-accumulator execution pipeline to receive (i) image data and (ii) filter weights, wherein the multiplier-accumulator execution pipeline includes a plurality of multiplier-accumulator circuits to process the image data, using associated filter weights, via a plurality of multiply and accumulate operations; 
first conversion circuitry, coupled to an input of the multiplier-accumulator execution pipeline, wherein the first conversion circuitry includes: 
inputs to receive image data of a plurality of sets of image data, wherein each set of image data includes a plurality of image data, 
Winograd conversion circuitry to convert each set of image data of the plurality of sets of image data to a corresponding Winograd set of image data, 
floating point format conversion circuitry, coupled to the Winograd conversion circuitry of the first conversion circuitry, to convert the image data of each Winograd set of image data of the plurality of Winograd sets of image data to a floating point data format, and 
outputs to output the image data of the plurality of Winograd sets of image data to the multiplier-accumulator execution pipeline; 
wherein, in operation, the multiplier-accumulator circuits of the multiplier-  accumulator execution pipeline are configured to: (i) perform the plurality of multiply and  accumulate operations using (a) the image data of the plurality of Winograd sets of image  data from the first conversion circuitry and (b) the filter weights of the plurality of Winograd sets of filter weights, and (ii) generate output data based on the multiply and accumulate operations.  
Claim 10 is substantially similar to claim 1 is rejected based on similar analyses.
second conversion circuitry, coupled an input of the multiplier-accumulator execution pipeline (Lin, [0036] FIG. 5 illustrates a circuit 500 which is an example of the MAC circuit 101 (FIG. 1) The MAC hardware unit 510 further includes an array of (PxK) MACs 315” and [0027] “the asymmetrically quantized counterparts (Qa[i] and Qw[i][j]) may be described by the equations below: A[i]=Qa[i]+OFa; eq. (1) W[i][j]=Qw[i][j]+OFw[j]; eq. (2)” and [0039]  A' [i] and W'[i][j] are the symmetrically quantized input activation and the weights of i-th filter, respectively” Lin teaches a second conversion circuitry (a circuit 500) obtain input symmetric MAC operations on symmetrically quantized data execution pipeline, wherein the first conversion circuitry (Lin, [0034] the circuit 400) includes: 
inputs to receive filter weights of a plurality of sets of filter weights, wherein each set of filter weights includes a plurality of filter weights (Lin, [0024] “FIG. 2 illustrates an example of a convolution operation. A convolution operation may be performed on an input feature map 210 using a set of filters 220. Each filter 220 has width=S, height=R and depth=C” Lin teaches inputs (input feature map 210) receives filter weights of plurality of sets of filter weights (220). Each set filter weights (220) includes a plurality of filter weights (221, Fig. 2).
Winograd conversion circuitry (Lin, [0037] “the Winograd fast algorithm for convolution may also use the aforementioned MAC operations, performed by the circuits, 400” Lin teaches a first conversion circuitry as a winograd conversion circuitry to convert each set of filter weights of the plurality of sets of filter weights to a corresponding Winograd set of filter weights (Lin, [0037] “FIG. 2, The Winograd fast algorithm includes four steps: (1) transformation of filter weights” and [0038] “The weights of all K filter 220 are first Winograd transformed from (3x3xCxK) into a (4x4xCxK) array denoted as Qw'[i][j] where i=1, . , C and j=1, . ., K” Lin teaches the first two steps of Winograd fast algorithm are transformation (convert) each set of filter weights (transformed from (3x3xCxK) into a (4x4xCxK) array) and input activation and the transformation is taken as the second conversion circuitry.
outputs to output the filter weights of the plurality of Winograd sets of filter weights to the multiplier-accumulator execution pipeline (Lin, Fig. 2, [0038] “The weights of all K filter 220 are first Winograd transformed from (3x3xCxK) into a (4x4xCxK) array denoted as Qw'[i][j] where i=1,…, C and j=1,…, K” and [0032] “FIG. 3, two data sequences may be fed into the MAC hardware unit 310 to generate a data point in the output feature map, a second sequence of Qw[i][j] contains the weights of the j-th filter. The data pair (Qa[i], Qw[i][j]) may be fed into the MAC hardware unit 310 every data cycle and generates an asymmetric MAC output” outputs to output the filter weights (Qw[i][j]) to the multiplier-accumulator execution pipeline (an asymmetric MAC output); and 
Lin discloses a conversion of floating point numbers to integer for memory bandwidth reduction (Lin, [0016], lines 14-19].
However, Lin as modified does not explicitly teach floating point format conversion circuitry, coupled to the Winograd conversion circuitry of the second conversion circuitry, to convert the filter weights of each Winograd set of filter weights of the plurality of Winograd sets of filter weights to a floating point data format;
Brothers teaches floating point format conversion circuitry”, coupled to the Winograd conversion circuitry of the second conversion circuitry, to convert the filter weights of each Winograd set of filter weights of the plurality of Winograd sets of filter weights to a floating point data format (Brothers, [0004] “The output data may be a feature map of the input data that the neural network generates by convolving an input image“ and Fig.1, [0039] “AAC arrays 106 are configured to perform multiply accumulate (MAC) operations” and  [0041] “DRU 108 is coupled to AAC array 106 and memory units 110 and outputs in memory units 100, applying final Winograd transform” and [0099] “FIG. 7, MMU 700 may be used to implement a NN processor as an integrated circuit” and [0102] “MMU 700 includes a control unit 704 include registers 706 that store a weight table 708 defining one or more matrices (weights) to be applied (e.g., a second data set)” and [0104] “Accumulator 720 provides its output to converter 722 converts fixed point format values to floating point format values, storing output generated within data storage unit 702” Brothers teaches a floating point format converter (722) of MMU circuit (700) can apply final Winograd transform to converts the input image data of CNN (including a weight matrices) through a MAC operations of an Accumulator to a floating point data format (Fig.7).
Lin and Brothers are combinable see rationale in claim 1.
Regarding Claim 20, Lin discloses an integrated circuit (Lin, [0026] “FIG. 1, a circuit (e.g., the MAC circuit 101) operative to perform MAC operations” comprising: 
a multiplier-accumulator execution pipeline to receive (i) image data and (ii) filter weights, wherein the multiplier-accumulator execution pipeline includes a plurality of multiplier-accumulator circuits to process the image data, using associated filter weights, via a plurality of multiply and accumulate operations; 
first conversion circuitry, coupled to an input of the multiplier-accumulator execution pipeline, wherein the first conversion circuitry includes:
inputs to receive image data of a plurality of sets of image data, wherein each set of image data includes a plurality of image data, 
Winograd conversion circuitry to convert each set of image data of the plurality of sets of image data to a corresponding Winograd set of image data, 
floating point format conversion circuitry, coupled to the Winograd conversion circuitry of the first conversion circuitry, to convert the image data of each Winograd set of image data of the plurality of Winograd sets of image data to a floating point data format, and 
outputs to output the image data of the plurality of Winograd sets of image data to the multiplier-accumulator execution pipeline; 
second conversion circuitry, coupled an input of the multiplier-accumulator execution pipeline, wherein the first conversion circuitry includes: 
inputs to receive filter weights of a plurality of sets of filter weights, wherein each set of filter weights includes a plurality of filter weights, 
Winograd conversion circuitry to convert each set of filter weights of the plurality of sets of filter weights to a corresponding Winograd set of filter weights, 
floating point format conversion circuitry, coupled to the Winograd conversion circuitry of the second conversion circuitry, to convert the filter weights of each Winograd set of filter weights of the plurality of Winograd sets of filter weights to a floating point data format, and 
outputs to output the filter weights of the plurality of Winograd sets of filter weights to the multiplier-accumulator execution pipeline; 
wherein, in operation, the multiplier-accumulator circuits of the multiplier accumulator execution pipeline are configured to: (i) perform the plurality of multiply and  accumulate operations using (a) the image data of the plurality of Winograd sets of image data from the first conversion circuitry and (b) the filter weights of the plurality of Winograd sets of filter weights, and (ii) generate output data based on the multiply and accumulate operations; and 
Claim 20 is substantially similar to claim 10 is rejected based on similar analyses.
third conversion circuitry, coupled an output of the multiplier-accumulator execution pipeline (Lin, [0030] [0031] “FIG. 3 illustrates circuitry 300 which is an example of the MAC circuit 101 (FIG. 1) includes the MAC hardware unit 310 may include more than one multiplier and more than one accumulator, the MAC hardware unit 310 generates an asymmetric MAC output” Lin teaches a third conversion circuitry (a circuit 300) coupled an output of the multiplier-accumulator execution pipeline (an asymmetric MAC output, Fig. 3), wherein the third conversion circuitry includes: 
 	inputs to receive the output data of a plurality of sets of output data from the multiplier-accumulator circuits of the multiplier-accumulator execution pipeline (Lin, Fig. 3, [0033] “The corrective values convert the asymmetric MAC output into a symmetric MAC output. The overhead of the conversion is minimized because the multiplier 331 (in the MAD hardware unit 330) performs multiplication only once every N data cycles” the input of the MAD hardware unit 330 receive the output data of the asymmetric MAC output to convert into a symmetric MAC output with the corrective values.
Winograd conversion circuitry to convert each set of output data of the plurality of sets of output data to output data having a non-Winograd format (Lin, 0032] “FIG. 3, two data sequences may be fed into the MAC hardware unit 310 to generate a data point in the output feature map: a first sequence of Qa[i] contains an input activation and a second sequence of Qw[i][j] contains the weights of the j-th filter” and [0039]  “At step (c), an inverse Winograd transformation is applied to the result of σi=1.sup.cA'[i]*W'[i][j] to generate a convolution result” Lin teaches step (c) in [0039] to apply an inverse Winograd transformation to the result of equation (6) ([0035]) to generate a convolution result. The inverse transformation is referred as the second conversion circuitry to undo the Winograd transform to non-Winograd format.
Lin discloses a conversion of floating point numbers to integer for memory bandwidth reduction (Lin, [0016], lines 14-19].
	However, Lin as modified does not explicitly teach  
 	  floating point format conversion circuitry, coupled to the Winograd conversion circuitry of the third conversion circuitry, to convert the output data, having a non-Winograd format, of each set of output data of the plurality of sets of output data to a floating point data format, and 
 	outputs to output the output data, having a floating point data format, of the plurality of sets of output data.  
Brothers teaches floating point format conversion circuitry coupled to the Winograd conversion circuitry of the third conversion circuitry to convert the output data, having a non-Winograd format, of each set of output data of the plurality of sets of output data to a floating point data format, (Brothers, [0004] “The output data may be a feature map of the input data that the neural network generates by convolving an input image“ and Fig.1, [0039] “AAC arrays 106 are configured to perform multiply accumulate (MAC) operations” and  [0041] “DRU 108 is coupled to AAC array 106 and memory units 110 and outputs in memory units 100, applying final Winograd transform” and [0099] “FIG. 7, MMU 700 may be used to implement a NN processor as an integrated circuit” and [0104] “Accumulator 720 provides its output to converter 722 converts fixed point format values to floating point format values, storing output generated within data storage unit 702” Brothers teaches a floating point format converter (722) of MMU circuit (700) can apply final Winograd transform to converts the input image data of CNN through a MAC operations of an Accumulator to a floating point data format (Fig.7) Brothers teaches the MAC operations and the output results are in floating format. The combination between Lin (output data from MAC having a non-Winograd format) and Brothers (convert output of MAC to floating point) can be used to teach the limitation in claim 20, and
outputs to output the output data, having a floating point data format, of the plurality of sets of output data (Brothers, Fig. 7, [0011]  “Accumulator 720 provides its output to converter 722. In one example embodiment, converter 722 converts fixed point format values to floating point format values. Converter 722 is capable of storing output generated within data storage unit 702” Brothers teaches outputs the output data having a floating point data format and store within data storage unit.
Lin and Brothers are combinable see rationale in claim 1.
Regarding Claim 22, Lin as modified discloses the integrated circuit of claim 20 wherein: 
 	each multiplier-accumulator circuit of the multiplier-accumulator execution pipeline includes a floating point multiplier and a floating point adder.  
Claim 22 is substantially similar to claim 9 is rejected based on similar analyses.
Claims 2, 4, 7-8, 11, 13-14, 16-19, 21 and 23-24 are rejected under 35 U.S.C. 103 as being unpatentable by Lin et al. (U.S. 2019/0243610 A1) in view of Brothers et al. (U.S. 2017/0011288 A1) and further in view of Donovan et al. (U.S. 2008/0211827 A1)
Regarding Claim 2, the integrated circuit of claim 1, Lin as modified does not explicitly teach wherein: the first conversion circuitry further includes circuitry to determine a largest exponent of the image data of each set of image data of the plurality of sets of image data.  
However, Donovan teaches the first conversion circuitry further includes circuitry to determine a largest exponent of the image data of each set of image data of the plurality of sets of image data (Donovan, [0029] “Scan out module 224 may also perform generating composite screen images by combining the pixel data from pixel buffer 226 with data for a video” and [0061] “FIG. 5, Preprocessing block 402 converts floating-point texture values (a, b, c, d) in fp16 format to fixed-point values (a', b', c', d') in (s2.14) format by prescaling the inputs using a block exponent k” and [0064] “Four-way comparison circuit 510 selects the largest of the four exponents as block exponent k” In computing OFa and OFw[j] of equations (1) & (2), Qa[i] and Qw[i][j] represent an asymmetrically quantized input activation data point and an assymetrically quantized filter weight respectively (Lin: [0026] L.18-28 and [0027]). Therefore, quantization of the input activation data point and filter weights are performed. Donovan discloses a prescaling procedures for quantization. The precscaling circuit can determine (select) the largest exponents as block exponent k of the image data (floating-point texture values) by a conversion circuit (510).
Lin, Brothers and Donovan are combinable because they are from the same field of endeavor, system and method for image processing and try to solve similar problems. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention was made for modifying the method of Lin to determine a largest exponent of the input data for a prescaling procedures (as taught by Donovan) in order to determine a largest exponent of the image data of each set of image data of the plurality of sets of image data because Donovan can provide a prescaling procedures for quantization. The precscaling circuit can determine (select) the largest exponents as block exponent k of the image data (floating-point texture values) by a conversion circuit (510) (Donovan, Fig. 5, [0061]). Doing so, for proper prescaling, it may provide an efficient texture filtering unit that is capable of processing both floating-point and fixed-point texture data (Donovan, [0010]).
Regarding Claim 4, Lin discloses the integrated circuit of claim 2, the Winograd conversion circuitry converts the image data of each set of image data to a corresponding Winograd set of image data using the image data (Lin, [0038] “FIG. 2, the input activation 280, the input activation is Winograd transformed into a (4x4xC) array denoted as Qa'[i]” to convert (transform) each set of image data e.g. the input image map 210 (Width=4,Height=4, depth C).
However, Lin as modified does not explicitly teach wherein: the first conversion circuitry further includes fixed point format conversion circuitry, coupled between the inputs of the first conversion circuitry and the Winograd conversion circuitry, to convert a data format of the image data of each set of image data to a fixed point data format using the largest exponent of the image data of the associated set of image data, wherein: 
the Winograd conversion circuitry converts the image data of each set of image data to a corresponding Winograd set of image data using the image data having the fixed point data format.  
Donovan teaches the first conversion circuitry further includes fixed point format conversion circuitry, coupled between the inputs of the first conversion circuitry and the Winograd conversion circuitry, to convert a data format of the image data of each set of image data to a fixed point data format using the largest exponent of the image data of the associated set of image data (Donovan, [0029] “Scanout module 224 may also perform generating composite screen images by combining the pixel data from pixel buffer 226 with data for a video” and [0061] “FIG. 5, Preprocessing block 402 converts floating-point texture values (a, b, c, d) in fp16 format to fixed-point values (a', b', c', d') in (s2.14) format by prescaling the inputs using a block exponent k” and [0064] “Four-way comparison circuit 510 selects the largest of the four exponents as block exponent k” Donovan teaches using the largest exponents as block exponent k of the image data (floating-point texture values) by a conversion circuit (510), wherein: 
the Winograd conversion circuitry converts the image data of each set of image data to a corresponding Winograd set of image data using the image data having the fixed point data format (Donovan, [0033] “Texture blending generally involves defining a texture map, most often as an array of "texels” and [0008] “texture filtering circuits are conventionally implemented using fixed-point arithmetic circuits. Fixed-point texture data” the combination between Lin and Donovan can be used to teaches the Winograd conversion circuitry convert the set of image data (as taught by Lin) using the image data (input image data map = texture map) having the fixed point data format (as taught by Donovan).
Lin, Brothers and Donovan are combinable see rationale in claim 2.
Regarding Claim 7, the integrated circuit of claim 1 Lin as modified does not explicitly teach wherein: the floating point format conversion circuitry of the first conversion circuitry converts the image data of each Winograd set of image data of the plurality of  Winograd sets of image data to a floating point data format using the largest exponent of the image data of the associated set of image data.  
However, Donovan teaches the floating point format conversion circuitry of the first conversion circuitry converts the image data of each Winograd set of image data of the plurality of  Winograd sets of image data to a floating point data format using the largest exponent of the image data of the associated set of image data (Donovan, [0029] “Scanout module 224 may also perform generating composite screen images by combining the pixel data from pixel buffer 226 with data for a video” and [0061] “FIG. 5, Preprocessing block 402 converts floating-point texture values (a, b, c, d) in fp16 format to fixed-point values (a', b', c', d') in (s2.14) format by prescaling the inputs using a block exponent k” and [0064] “Four-way comparison circuit 510 selects the largest of the four exponents as block exponent k” Donovan teaches using the largest exponents as block exponent k of the image data (floating-point texture values) from pixels buffer data (226) by a conversion circuit (510).
Lin, Brothers and Donovan are combinable see rationale in claim 2.
Regarding Claim 8, the integrated circuit of claim 7 wherein: 
the first conversion circuitry, for the image data of each Winograd set of image data of the plurality of Winograd sets of image data, Lin as modified does not explicitly teach further includes circuitry to perform two's complement operation on a fraction field of the image data where the image data is negative, perform a priority encode operation on the fraction field of the image data, left shift the fraction field of the image data where the image data is negative, round the fraction field to where the image data was left shifted.
However, Donovan teaches circuitry to perform two's complement operation on a fraction field of the image data where the image data is negative, perform a priority encode operation on the fraction field of the image data, left shift the fraction field of the image data where the image data is negative, round the fraction field to where the image data was left shifted (Donovan, [0046] “In fp16 format, a number is represented by a sign bit, five exponent bits, and ten mantissa bits. The five exponent bits are biased by +15, and the ten mantissa bits represent the fractional portion of the mantissa, with an implicit "1" preceding the decimal point. Certain values are reserved to indicate special numbers including negative infinity (INF)” and [0073] “floating-point addition module 608 determines which of current block exponent k and accumulated block exponent kA is larger, shifts one of the mantissas λA and λfw so that both floating-point values are represented using the larger block exponent” Donovan teaches perform two component operation (convert fp16 format bits, five exponent bits increase +15 , ten mantissa bits in the fractional portion) including a negative value and shift one mantissa bits.
Lin, Brothers and Donovan are combinable see rationale in claim 2.
Regarding Claim 11, Lin discloses the integrated circuit of claim 10 wherein: 
the first conversion circuitry further includes circuitry to determine a largest exponent of the image data of each set of image data of the plurality of sets of image data.  
Claim 11 is substantially similar to claim 2 is rejected based on similar analyses.
Regarding Claim 13, Lin discloses the integrated circuit of claim 11 wherein: 
the first conversion circuitry further includes fixed point format conversion circuitry, coupled between the inputs of the first conversion circuitry and the Winograd conversion circuitry, to convert a data format of the image data of each set of image data to a fixed point data format using the largest exponent of the image data of the associated set of image data, wherein: 
the Winograd conversion circuitry of the first conversion circuitry converts the image data of each set of image data to a corresponding Winograd set of image data using the image data having the fixed point data format.  
Claim 13 is substantially similar to claim 4 is rejected based on similar analyses.
Regarding Claim 14, the integrated circuit of claim 10, Lin as modified does not explicitly teach wherein: the second conversion circuitry further includes circuitry to determine a largest exponent of the filter weights of each set of filter weights of the plurality of sets of filter weights.
However, Donovan teaches the second conversion circuitry further includes circuitry to determine a largest exponent of the filter weights of each set of filter weights of the plurality of sets of filter weights (Donovan, [0056] “accumulator circuit 408 is advantageously configurable to accumulate Bilerp results λf' and block exponents k over multiple passes, with each result being weighted by a respective weight wf, any filtering algorithm that can be performed for fixed-point texture data” and Fig. 5,  [0064] “Four-way comparison circuit 510 selects the largest of the four exponents as block exponent k” Donovan teaches a four-way comparison circuit to determine (select) the largest of the block exponent k of the filter weight Wf.
Lin, Brothers and Donovan are combinable see rationale in claim 2.
Regarding Claim 16, the integrated circuit of claim 14, Lin as modified does not explicitly teach wherein: the second conversion circuitry further includes fixed point format conversion circuitry, coupled between the inputs of the second conversion circuitry and the Winograd conversion circuitry of the second conversion circuitry, to convert a data format of the filter weights of each set of filter weights to a fixed point data format using the largest exponent of the image data of the associated set of filter weights, wherein: the Winograd conversion circuitry of the second conversion circuitry converts the filter weights of each set of filter weights to a corresponding Winograd set of filter weights using the filter weights having the fixed point data format.  
However, Donovan teaches the second conversion circuitry further includes fixed point format conversion circuitry, coupled between the inputs of the second conversion circuitry and the Winograd conversion circuitry of the second conversion circuitry, to convert a data format of the filter weights of each set of filter weights to a fixed point data format using the largest exponent of the image data of the associated set of filter weights (Donovan, [0056] “accumulator circuit 408 is advantageously configurable to accumulate Bilerp results λf' and block exponents k over multiple passes, with each result being weighted by a respective weight wf, any filtering algorithm that can be performed for fixed-point texture data” and Fig. 5,  [0064] “Four-way comparison circuit 510 selects the largest of the four exponents as block exponent k” Donovan teaches a four-way comparison circuit to determine (select) the largest of the block exponent k of the filter weight Wf.), wherein: the Winograd conversion circuitry of the second conversion circuitry converts the filter weights of each set of filter weights to a corresponding Winograd set of filter weights using the filter weights having the fixed point data format (Donovan, Fig. 4, [0056] “accumulator circuit 408 is advantageously configurable to accumulate Bilerp results λf' and block exponents k over multiple passes, with each result being weighted by a respective weight wf, that can be performed for fixed-point texture data” and [0059] “For fixed-point data, input selection circuit 404 provides the fixed-point values (a, b, c, d) directly to Bilerp circuit 406, ignoring the output of preprocessing block 402, and output selection circuit 414 selects the output of the fixed-point data path (provided by fixed-point formatting circuit 410) as the final output λ0, ignoring the output of the floating-point data path (provided by format conversion circuit 412)” the combination between Lin and Donovan can be used to teach the second conversion circuitry includes fixed point format conversion circuit (410) can convert the filter weights (Wf) to a fixed point data format (s2.14).
Lin, Brothers and Donovan are combinable see rationale in claim 2.
Regarding Claim 17, Lin as modified discloses the integrated circuit of claim 10 wherein: 
the first conversion circuitry further includes circuitry to determine a largest exponent of the image data of each set of image data of the plurality of sets of image data, 
Claim 17 is substantially similar to claim 2 is rejected based on similar analyses.
the floating point format conversion circuitry of the first conversion circuitry  converts the image data of each Winograd set of image data of the plurality of  Winograd sets of image data to a floating point data format using the largest  exponent of the image data of the associated set of image data, 
Claim 17 is substantially similar to claim 7 is rejected based on similar analyses.
the second conversion circuitry further includes circuitry to determine a largest exponent of the filter weights of each set of filter weights of the plurality of sets of filter weights, and 
Claim 17 is substantially similar to claim 14 is rejected based on similar analyses
the floating point format conversion circuitry of the second conversion circuitry converts the filter weights of each Winograd set of filter weights of the plurality of Winograd sets of filter weights to a floating point data format using the largest exponent of the filter weights of the associated set of filter weights (Donovan, Fig. 4, [0052] “Floating-point texture filtering is provided by fixed-point Bilerp circuit 406 in cooperation with preprocessing block 402accumulator circuit 408 and floating-point format conversion circuit 412” and [0057] “Format conversion circuit 412 converts the floating-point number represented by mantissa λA and block exponent kA to a standard floating-point format (e.g., fp16)” Donovan teaches a floating point format conversion circuit (412) with the Floating-point texture filtering can couple to the Winograd conversion circuitry (combining the fixed-point Bilerp circuit 406 with Winograd conversion circuitry taught by Lin) to convert each Winograd set of filter weights (taught by Lin) to a floating point data format (e.g. fp 16).
Lin, Brothers and Donovan are combinable see rationale in claim 2.
Regarding Claim 18, Lin discloses the integrated circuit of claim 17 wherein: 
 	each multiplier-accumulator circuit of the multiplier-accumulator execution pipeline includes a floating point multiplier and a floating point adder.  
Claim 18 is substantially similar to claim 9 is rejected based on similar analyses.
Regarding Claim 19, Lin as modified discloses the integrated circuit of claim 17 wherein: 
the first conversion circuitry, for the image data of each Winograd set of image data of the plurality of Winograd sets of image data, further includes circuitry to perform two's complement operation on a fraction field of the image data where the image data is negative, perform a priority encode operation on the fraction field of the image data, left shift the fraction field of the image data where the image data is negative, round the fraction field to where the image data was left shifted, and 
Claim 19 is substantially similar to claim 8 is rejected based on similar analyses.
Lin as modified does not explicitly teach the second conversion circuitry, for the filter weights of each Winograd set of filter weights of the plurality of Winograd sets of filter weights, further includes circuitry to perform two's complement operation on a fraction field of the filter weight where the filter weight is negative, perform a priority encode operation on the fraction field of the filter weight, left shift the fraction field of the filter weight where the image data is negative, round the fraction field to where the filter weight  was left shifted.
However, Donovan teaches the second conversion circuitry, for the filter weights of each Winograd set of filter weights of the plurality of Winograd sets of filter weights, further includes circuitry to perform two's complement operation on a fraction field of the filter weight where the filter weight is negative, perform a priority encode operation on the fraction field of the filter weight, left shift the fraction field of the filter weight where the image data is negative, round the fraction field to where the filter weight  was left shifted. (Donovan, [0046] “In fp16 format, a number is represented by a sign bit, five exponent bits, and ten mantissa bits. The five exponent bits are biased by +15, and the ten mantissa bits represent the fractional portion of the mantissa, with an implicit "1" preceding the decimal point. Certain values are reserved to indicate special numbers including negative infinity (INF)” and [0071] “accumulator circuit 408 advantageously implements rules for floating-point input texture data includes a special number (e.g., NaN or INF) behavior of a floating-point bilinear filter circuit, in an fp16 floating-point filter, it would be expected that, the result should also be NaN (or INF)” and [0073] “floating-point addition module 608 determines which of current block exponent k and accumulated block exponent kA is larger, shifts one of the mantissas λA and λfw so that both floating-point values are represented using the larger block exponent” Donovan teaches perform two component operation (convert fp16 format bits of filter weight, five exponent bits increase +15 , ten mantissa bits in the fractional portion) including a negative value and shift one mantissa bits.
Lin, Brothers and Donovan are combinable see rationale in claim 2.
Regarding Claim 21, the integrated circuit of claim 20, Lin as modified does not explicitly teach further including: memory, coupled to the floating point format conversion circuitry of the third conversion circuitry to store the output data, having a floating point data format, of the plurality of sets of output data.
 	However, Donovan teaches memory, coupled to the floating point format conversion circuitry of the third conversion circuitry to store the output data, having a floating point data format, of the plurality of sets of output data (Donovan, Fig. 2 shows a graphic memory couple to a rendering pipeline 220 of GPU 214 and Fig. 3, [0034] “rendering pipeline 220 a rasterizer 306” and [0039] Shader 308 advantageously includes a texture filter unit 314 for executing texture blending instructions. A texture map may be stored in local or remote memory in fixed-point or floating-point format” and [0047] “FIG.4 Texture filtering unit 400 includes a floating-point (fp16) format conversion circuit 412” Donovan teaches a memory (graphic memory, Fig. 2), coupled to a floating-point (fp16) format conversion circuit 412 to store the output data having a floating point data format.
Lin, Brothers and Donovan are combinable see rationale in claim 2.
Regarding Claim 23, Lin as modified discloses the integrated circuit of claim 20 wherein: 
 	the first conversion circuitry further includes circuitry to determine a largest exponent of the image data of each set of image data of the plurality of sets of  image data, 
Claim 23 is substantially similar to claim 2 is rejected based on similar analyses.
 	the floating point format conversion circuitry of the first conversion circuitry  converts the image data of each Winograd set of image data of the plurality of Winograd sets of image data to a floating point data format using the largest exponent of the image data of the associated set of image data, 
Claim 23 is substantially similar to claim 7 is rejected based on similar analyses.
 	the second conversion circuitry further includes circuitry to determine a largest exponent of the filter weights of each set of filter weights of the plurality of sets of filter weights, 
Claim 23 is substantially similar to claim 14 is rejected based on similar analyses.
 	the floating point format conversion circuitry of the second conversion circuitry converts the filter weights of each Winograd set of filter weights of the plurality of Winograd sets of filter weights to a floating point data format using the largest exponent of the filter weights of the associated set of filter weights.
Claim 23 is substantially similar to claim 17 (the last par.) is rejected based on similar analyses.
Regarding Claim 24, Lin as modified discloses the integrated circuit of claim 23 wherein: 
 	each multiplier-accumulator circuit of the multiplier-accumulator execution 3 pipeline includes a floating point multiplier and a floating point adder.  
Claim 24 is substantially similar to claim 18 is rejected based on similar analyses.
Allowable Subject Matter
Dependent claims 3, 12 and 15 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
 	The following is a statement of reasons for the indication of allowable subject
matter:
	Regarding to claims 3, 12 and 15 the closest prior art reference to Lin et al. (U.S. 2019/0243610 A1) in view of Brothers et al. (U.S. 2017/0011288 A1) have been made of record as teaching: a multiplier-accumulator execution pipeline to receive (i) image data and (ii) filter weights (Lin, Figs 2, 4, [0024] [0034]); inputs to receive a plurality of sets of image data, wherein each set of image data includes a plurality of image data (Lin, Fig. 2, [0024]); Winograd conversion circuitry (Lin, [0037]); floating point format conversion circuitry, coupled to the Winograd conversion circuitry, to convert the image data of each Winograd set of image data of the plurality of Winograd sets of image data to a floating point data format (Brothers Fig.1, [0039], Fig. 7, [0099]).
 However, the art of record did not teach or suggest the claim taken as a whole and particular the limitation pertaining to 
the first conversion circuitry, for the image data of each set of image data of 3 the plurality of sets of image data, further includes circuitry to right shift a fraction field of the image data with smaller exponents, round the fraction field to a BSF precision, and perform two's complement operation on the fraction field of the image data where the image data is negative as recited in claims 3, 12.
the second conversion circuitry, for the filter weights of each set of filter weights of the plurality of sets of filter weights, further includes circuitry to right shift a fraction field of the filter weights with smaller exponents, round the fraction field to a BSF precision, and perform two's complement operation on the fraction field of the filter weight where the image data is negative as recited in claim 15.
Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee.  Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance”.
Conclusion
The prior arts made of record and not relied upon are considered pertinent to applicant's disclosure Tsung et al. (U.S. 2019/0114536 A1), Hwang et al. (U.S. 2018/0173571 A1).
Any inquiry concerning this communication or earlier communications from the examiner should be directed to KHOA VU whose telephone number is (571)272-5994. The examiner can normally be reached 8:00- 4:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kee Tung can be reached on 571-272-7794. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/KHOA VU/Examiner, Art Unit 2611                                                                                                                                                                                                        


/SING-WAI WU/Primary Examiner, Art Unit 2611