DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Specification
The disclosure is objected to because of the following informalities:
Specification is missing paragraph [004].
[0057] line 9-10 "this new partial sum will be accumulated into convolution value 0,1, O0,1". As illustrated in figure 3B, the new partial sum PS0 is added to generate convolution value O1,0, instead of O0,1.
[0058] line 5 "multiplying the fifth row of data values with the second row of kernel values" should be "multiplying the fifth row of data values with the third row of kernel values" because the kernel values are being used to multiply are W2,0 W2,1 and W2,2, which are third row of kernel.
Appropriate correction is required.

Claim Objections
Claim 9 is objected to because of the following informalities: 
Claim 9 line 2 recites “a first partial sum is calculated using a preceding stride” should be “the first partial sum is calculated using a preceding stride” as antecedently recited in line 1.
Appropriate correction is required.

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claim(s)  1, 3 , 5-7, 9, 13, 15, 19-20 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Kuramoto (US - 20190004795).

Regarding claim 1, Kuramoto discloses a system, the system configured for convolving an input stream of data with a kernel (Kuramoto, figure 3 system for performing convolution of input 201 and weight 202 as shown in figure 5), the system comprising: a kernel store circuit configured to store the kernel (figure 3 [0059] registers filed 421 to 423 are referred to as a register file 420, [0063] register file 420 stores the weighting data 202 [i.e. kernel]), the kernel comprising a plurality of kernel values arranged as Wi rows of kernel values and Wj columns of kernel values, wherein Wi is greater than one (figure 5 illustrates the weighting 202 is a 2 dimensional 3 rows by 3 columns); an arithmetic circuit configured to calculate a plurality of partial sums, each calculated using at least one data value and at least one of the plurality of kernel values, wherein the input stream of data comprises the at least one data value (figure 5, arithmetic circuit 51 comprises a plurality of multiply add arithmetic circuit to calculate a plurality of intermediate results, each calculated using the input 201 and weights 202, see figure 14); a sum value store circuit configured to store the plurality of partial sums and to clock out a plurality of convolution values, wherein each one of the plurality of convolution values is calculated at least in part using a first row kernel value and a last row kernel value (figure 5 [0094] multiply add arithmetic circuit multiplies and add the multiplication results to the value stored in the register file 310-314 and store the addition result in register file 310-314 [i.e. a plurality of partial sums], and [0059] register file 431-433 are referred as register file 430, [0063] register file 430 stores the top data, which is the operation result in the convolution operation, figure 14 illustrates a plurality of convolution values t00-t03 and figure 1 illustrates outputting the top data for subsequent layer [i.e. clock out a plurality of convolution values]. figure 5 and 14 illustrate the convolution values are calculated using first row kernel w00,w01,w02 and last row of kernel w06,07,08).

Regarding claim 3, Kuramoto teaches the system of claim 1 further comprising: an input circuit configured to receive the input stream of data, the input stream of data comprising Fi rows of data values and Fj columns of data values, wherein the input stream of data is received row-by-row as a stream of data values ([0059] register 411-413 are referred to as register file 410, figure 5 [0090] illustrates register 410 receives input stream data 201 from memory 11, wherein the input data 201 comprises 5 rows and 5 columns, and the input data is received row by row); an output circuit configured to output the plurality of convolution values (figure 5 illustrates data line that connect adder of each multiply add arithmetic circuit output the top data into register file 430), wherein the plurality of convolution values comprises a plurality of first row partial sums and a plurality of last row partial sums (figure 14 illustrates [0146-0152] illustrates the operation when stride number is 2, the top data t00-t03 [i.e. the plurality of convolution values] are calculated using first row partial sum as shown in 731-732 and last row partial sum as shown in 735-736 ), wherein the plurality of first row partial sums are calculated using a first kernel row, and wherein the plurality of last row partial sums are calculated using a last kernel row, and wherein the plurality of first row partial sums overwrite the plurality of partial sums stored in the sum value store circuit (figure 14, 731-732 is calculated using first row kernel w00, w01, w02. 735-736 is calculated using last row of kernel w06, w07, w08. [0094] register 310 to 314 initial stored 0 as an initial value in each register, thus initially, the plurality of partial sums are 0 and are get overwritten with the intermediate values, such as the plurality of first row partial sums).

Regarding claim 5, Kuramoto teaches the system of claim 1 further comprising an input value store circuit configured to store the at least one data value, wherein the arithmetic circuit receives the at least one data value from the input value store circuit (Kuramoto, figure 5 [0090] the registers 411-412 stores input data 201).

Regarding claim 6, Kuramoto teaches the system of claim 5 further comprising a second arithmetic circuit configured to calculate a plurality of additional values (kuramoto, figure 14, multiply add arithmetic circuit 511 calculating a plurality of MAC operations) comprising an additional value calculated using the kernel and at least one subsequent data value (figure 14 511 calculates b06xw04 [i.e. an additional value] using kernel w04 and b06 [i.e. a subsequent data]), wherein a first partial sum is calculated using the kernel and at least one preceding data value (figure 14, b00xw00 + b02xw02 [i.e. first partial sum] calculated using kernel w00, w02 and b00, b02 [i.e. at least a preceding data value]), wherein the at least one subsequent data value overwrites the at least one preceding data value in the input value store circuit (figure 8 illustrates that the data set b05-b09 overwrite the data set b00-b04 in 472 at register 411 ), and wherein one of the plurality of convolution values is based at least in part on the first partial sum and the additional value (figure 14, t00[ i.e. one of the plurality of convolution values] is based on b00xw00 + b02xw02 [i.e. first partial sum] and b06xw04 [i.e. an additional value]).

Regarding claim 7, Kuramoto teaches the system of claim 5 wherein one of the plurality of partial sums is based at least in part on a first partial sum and an additional value (Kuramoto, figure 14, b00xw00 + b02xw02 [i.e. first partial sum] and b06xw04 [i.e. an additional value] made up at least one of the plurality of partial sum), the first partial sum calculated using at least one preceding data value (figure 14 b00xw00 + b02xw02 [i.e. first partial sum] calculated using kernel w00, w02 and b00, b02 [i.e. at least a preceding data value]) and the additional value calculated using at least one subsequent data value (figure 14 b06xw04 [i.e. an additional value] using kernel w04 and b06 [i.e. a subsequent data]), wherein the at least one subsequent data value overwrites the at least one preceding data value in the input value store circuit (figure 8 illustrates that the data set b05-b09 overwrite the data set b00-b04 in 472 at register 411).

Regarding claim 9, Kuramoto teaches the system of claim 5 wherein the arithmetic circuit sums a first partial sum and an additional value calculated using a subsequent stride, wherein a first partial sum is calculated using a preceding stride, and wherein the subsequent stride overwrites the preceding stride in the input value store circuit (Kuramoto, figure 14 [0146] performs the convolution operation using stride 2, arithmetic circuit 51 calculates b00xw00, b01xw01 [i.e. a first partial sum] calculated using the first two column of  the kernel [i.e. at a preceding stride], and the arithmetic circuit calculates b07xw05 [i.e. an additional value] using the third column of kernel and first column of the subsequent stride. As shown in figure 14, b00xw00, b01xw01 adds with b07xw05 in subsequent stride to generate convolution value. figure 8 illustrates that the data set b05-b09 overwrite the data set b00-b04 in 472 at register 411).

Regarding claim 13, Kuramoto teaches a system, the system configured for convolving an input stream of data with a kernel (Kuramoto, figure 3 system for performing convolution of input 201 and weight 202 as shown in figure 5) at a column stride of Sj and a row stride of Si ([0146] figure 14 illustrates convolution operation when stride number is 2 [i.e. at column stride of 2 and a row stride of 2), the system comprising: an input circuit configured to receive the input stream of data, the input stream of data comprising Fi rows of data values and Fj columns of data values, wherein the input stream of data is received row-by-row as a stream of data values ([0059] register 411-413 are referred to as register file 410, figure 5 [0090] illustrates register 410 receives input stream data 201 from memory 11, wherein the input data 201 comprises 5 rows and 5 columns, and the input data is received row by row); a kernel store circuit configured to store the kernel, the kernel comprising a plurality of kernel values arranged as Wi rows of kernel values and Wj columns of kernel values, wherein Wi is greater than one (figure 3 [0059] registers filed 421 to 423 are referred to as a register file 420, [0063] register file 620 stores the weighting data 202 [i.e. kernel]. Figure 5 illustrates the weighting 202 is a 2 dimensional 3 rows by 3 columns); an arithmetic circuit configured to calculate a plurality of partial sums, each calculated using at least one data value stored in an input value store circuit and at least one of the plurality of kernel values, wherein the input stream of data comprises the at least one data value (figure 5, arithmetic circuit 51 comprises a plurality of multiply add arithmetic circuit to calculate a plurality of intermediate results, each calculated using the input 201 and weights 202, see figure 14); a sum value store circuit configured to store the plurality of partial sums and to clock out a plurality of convolution values, wherein each one of the plurality of convolution values is calculated at least in part using a first row kernel value and a last row kernel value (figure 5 [0094] multiply add arithmetic circuit multiplies and add the multiplication results to the value stored in the register file 310-314 and store the addition result in register file 310-314 [i.e. a plurality of partial sums], and [0059] register file 431-433 are referred as register file 430, [0063] register file 430 stores the top data, which is the operation result in the convolution operation, figure 14 illustrates a plurality of convolution values t00-t03 and figure 1 illustrates outputting the top data for subsequent layer [i.e. clock out a plurality of convolution values]. figure 5 and 14 illustrate the convolution values are calculated using first row kernel w00,w01,w02 and last row of kernel w06,07,08); and an output circuit configured to output the plurality of convolution values (figure 5 illustrates data line that connect adder of each multiply add arithmetic circuit output the top data into register file 430).

Regarding claim 15, Kuramoto teaches A system, the system configured for convolving an input stream of data with a kernel (Kuramoto, figure 3 system for performing convolution of input 201 and weight 202 as shown in figure 5), the system comprising: a kernel store circuit configured to store the kernel, the kernel comprising 3 rows of kernel values and 3 columns of kernel values (figure 5 illustrates the 3x3 kernels ); an arithmetic circuit (figures 3 and 5, [0059] the arithmetic circuit 50); and a sum value store circuit configured to store a plurality of partial sums and to clock out a plurality of convolution values (figure 5 [0094] multiply add arithmetic circuit multiplies and add the multiplication results to the value stored in the register file 310-314 and store the addition result in register file 310-314 [i.e. a plurality of partial sums], and [0059] register file 431-433 are referred as register file 430, [0063] register file 430 stores the top data, which is the operation result in the convolution operation, figure 14 illustrates a plurality of convolution values t00-t03 and figure 1 illustrates outputting the top data for subsequent layer [i.e. clock out a plurality of convolution values]. figure 5 and 14 illustrate the convolution values are calculated using first row kernel w00,w01,w02 and last row of kernel w06,07,08); wherein the arithmetic circuit calculates a plurality of row 0 partial sums using the input stream of data and kernel row 0 (figure 14, 731 illustrates the row 0 partial sums using input and kernel w00, w01, w02 [i.e. kernel row 0]), wherein the plurality of row 0 partial sums is stored in the sum value store circuit by overwriting a previous plurality of partial sums previously stored in the sum value store circuit ([0094] register 310 to 314 initial stored 0 as an initial value in each register, thus initially, the plurality of partial sums are 0 and are get overwritten with the intermediate values, such as the plurality of row 0), wherein the arithmetic circuit calculates a plurality of row 1 partial sums using the input stream of data and kernel row 1 (figure 14, 733 illustrate a plurality of row 1 partial sums calculated using input and kernel w03, w04, w05 [i.e. kernel row 1]), wherein the plurality of row 1 partial sums is accumulated into to the plurality of partial sums stored in the sum value store circuit ([0059,0063] register file 430 stores the intermediate value of the operation, figure 5 illustrate each multiply add arithmetic circuit receives data from register file 431), wherein the arithmetic circuit calculates a plurality of row 2 partial sums using the input stream of data and kernel row 2 (figure 14, 735 illustrate a plurality of row 2 partial sums calculated using input and kernel w06, w07, w08 [i.e. kernel row 2]), wherein the plurality of row 2 partial sums is added to the plurality of partial sums to produce the plurality of convolution values (figure 14 illustrates that top data t00-t03 [i.e. a plurality of convolution values] are produced by the adder.[0059,0063] register file 430 stores the intermediate value of the operation, figure 5 illustrate each multiply add arithmetic circuit receives data from register file 431).

Regarding claim 19, Kuramoto discloses the system of claim 15 wherein one of the plurality of partial sums is based at least in part on a row 0 partial sum and an additional value, the row 0 partial sum calculated using a preceding stride and the additional value calculated using a subsequent stride (figure 14 illustrates operation of stride of 2 and one of plurality of partial sums based on b00xw00, b01xw01 [i.e. row 0 partial sum] calculated using the first two column of kernel [i.e. a preceding stride] and b07xw05 [i.e. an additional value] calculated using the third column of kernel [i.e. a subsequent stride]), wherein a subsequent stride overwrites the preceding stride in a stride store circuit (figure 8 illustrates that the data set b05-b09 overwrite the data set b00-b04 in 472 at register 411 [i.e. a stride store circuit]).

Regarding claim 20, Kuramoto the system of claim 15 wherein the arithmetic circuit is configured to add an additional value to a partial sum, the partial sum calculated using a preceding stride, the additional value calculated using a subsequent stride that overwrites the preceding stride in a stride store circuit (figure 14 illustrates operation of stride of 2 and one of plurality of partial sums based on b00xw00, b01xw01 [i.e. row 0 partial sum] calculated using the first two column of kernel [i.e. a preceding stride] and b07xw05 [i.e. an additional value] calculated using the third column of kernel [i.e. a subsequent stride]. figure 8 illustrates that the data set b05-b09 overwrite the data set b00-b04 in 472 at register 411 [i.e. a stride store circuit]).

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claim(s) 10-12 are rejected under 35 U.S.C. 103 as being unpatentable over Kuramoto in view of Lane (NPL - Multi-Channel Convolutions explained with… MS Excel).

Regarding claim 10, Kuramoto teaches the system of claim 1 further comprising: a second kernel store circuit; and a third kernel store circuit (Kuramoto, figure 3 illustrates [0059] register file 421-423 refers as register file 420, [0063] register file 420 stores weight data, thus figure 3 shows at least second and third kernel store circuit), wherein the input stream of data comprises a second input channel and a third input channel (figure 20 illustrates a plurality of input channels that comprises second and third input channel), wherein the arithmetic circuit is configured to produce a second plurality of partial sums and a third plurality of partial sums (figure 3 illustrates [0059] a plurality of arithmetic circuit unit referred as arithmetic unit 50 to perform convolution). Kuramoto does not teach a second kernel channel, a third kernel channel, and produce a second plurality of partial sums based on the second kernel channel and the second input channel, produce a third plurality of partial sums based on the third kernel channel and the third input channel, and wherein the second plurality of partial sums and the third plurality of partial sums are added to the plurality of partial sums stored in the sum value stored circuit.
However, Lane teaches a second kernel channel, a third kernel channel, and produce a second plurality of partial sums based on the second kernel channel and the second input channel, produce a third plurality of partial sums based on the third kernel channel and the third input channel, and wherein the second plurality of partial sums and the third plurality of partial sums are added to the plurality of partial sums (Lane, figures 5-6, green input channel [i.e. a second input channel], blue input channel [i.e. a third input channel], green intermediate output [i.e. a second plurality of partial sums] is calculated using green input and green kernel [i.e. second kernel channel] and blue intermediate output [i.e. a third plurality of partial sums] is calculated using blue input and kernel [i.e. a third kernel channel]. And perform addition of each intermediate output value to generate output).
It would have been obvious for one of ordinary skills in the art before the effective filing date of the claimed invention to modify Kuramoto system as disclosed in figure 5 to process additional input channel and kernel channel as disclosed in Lane to perform convolution. This modification would have been obvious because both references illustrate the method of performing convolution in neural network for processing images, and Kuramoto figure 20 discloses multi-channel input data, however Kuramoto does not disclose how the data are processed with kernel channels, Lane discloses such steps in figure 5-6. Furthermore, processing images having multiple channel would improve the accuracy of the system as recognized by Lane, see at least figure 4.  
As modified, the combined system of Kuramoto in view of Lane teaches a system comprising a second kernel store circuit configured to store a second kernel channel; and a third kernel store circuit configured to store a third kernel channel, wherein the input stream of data comprises a second input channel and a third input channel, wherein the arithmetic circuit is configured to produce a second plurality of partial sums based on the second kernel channel and the second input channel, wherein the arithmetic circuit is configured to produce a third plurality of partial sums based on the third kernel channel and the third input channel, and wherein the second plurality of partial sums and the third plurality of partial sums are added to the plurality of partial sums stored in the sum value store circuit.

Regarding claim 11, the combined system of Kuramoto in view of Lane teaches the system of claim 10 wherein the kernel further comprises a kernel output channel (Lane, figure 5, kernel channel red, green, blue) wherein a flattened output frame comprises a plurality of flattened output values (Lane, figure 5, output comprises a plurality of output values), wherein each flattened output value is based at least in part on a first output channel value, a second output channel value, a third output channel value, and the kernel output channel (Lane, each output value is calculated based on red, green, blue output value using according kernel channel).

Regarding claim 12, Kuramoto  teaches the system of claim 1: wherein the input stream of data comprises a plurality of input channels (Kuramoto, figure 20 illustrates a plurality of input channels); Kuramoto also teaches the kernel store circuit is configured to store kernel (Kuramoto, figure 3 illustrates [0059] register file 421-423 refers as register file 420, [0063] register file 420 stores weight data, thus figure 3 shows at least second and third kernel store circuit ); wherein the arithmetic circuit is configured to produce the plurality of convolution values (Kuramoto, figures 3 and 5 [0059] the arithmetic unit 51-53 referred as arithmetic unit 50, [0064] arithmetic unit 50 performs convolution operations to generate a plurality of convolution values). Kuramoto does not teach the kernels comprises a plurality of kernel channels; and the kernel store circuit configured to store the plurality of kernel channels; and the arithmetic circuit is configured to produce the plurality of convolution values using each one of the plurality of input channels and each one of the plurality of kernel channels. However, Lane teaches the kernels comprises a plurality of kernel channels; and the arithmetic circuit is configured to produce the plurality of convolution values using each one of the plurality of input channels and each one of the plurality of kernel channels (Lane, figures 5-6, kernels comprises 3 channels: red, green, and blue [i.e. a plurality of channels]. Output values are produced using plurality of input channels and plurality of kernel channels)
It would have been obvious for one of ordinary skills in the art before the effective filing date of the claimed invention to modify Kuramoto system as disclosed in figure 5 to process additional input channel and kernel channel as disclosed in Lane to perform convolution. This modification would have been obvious because both references illustrate the method of performing convolution in neural network for processing images, and Kuramoto figure 20 discloses multi-channel input data, however Kuramoto does not disclose how the data are processed with kernel channels, Lane discloses such steps in figure 5-6. Furthermore, processing images having multiple channel would improve the accuracy of the system as recognized by Lane, see at least figure 4.  
as modified, the combined system of Kuramoto in view of Lane teaches the input stream of data comprises a plurality of input channels; wherein the kernel comprises a plurality of kernel channels; wherein the kernel store circuit is configured to store the plurality of kernel channels; wherein the arithmetic circuit is configured to produce the plurality of convolution values using each one of the plurality of input channels and each one of the plurality of kernel channels.

Claim(s) 16 is rejected under 35 U.S.C. 103 as being unpatentable over Kuramoto in view of Suk - US 20200167405.

Regarding claim 16, Kuramoto teaches the system of claim 15 wherein clocking out the plurality of convolution values, but Kuramoto does not teach clocking out the plurality of convolution values performed in parallel with storing a subsequent plurality of row 0 partial sums in the sum value store circuit. however Suk teaches clocking out the plurality of convolution values, but Kuramoto does not teach clocking out the plurality of convolution values performed in parallel with storing a subsequent plurality of row 0 partial sums in the sum value store circuit (Suk [0034,0043] figure 3 discloses a system comprises a plurality of Mac to perform convolution operation, wherein each MAC includes an output register to store a partial sum, and being mapped one to one with a register of the output shift networ, the reason for using separate output shift network 342 is to read the final result in parallel or at the same time as the convolution operation is performed, thus convolution output are shifted out in parallel with subsequent convolution operation being performed and stored in the output register of each MAC)
It would have been obvious for one of ordinary skills in the art before the effective filing date of the claimed invention to modify Kuramoto’s system as disclosed in figure 3 to include an output shift network 342 as disclosed in figure 3 of Suk. This modification would have been obvious because both references are performing convolution operation on input data and weight, and as recognized by Suk in [0034,0043] using separate output shift network allow the system to read convolution results at the same time as the convolution operation is performed.

Allowable Subject Matter
Claims 2, 4, 8, 14, 17-18 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

The following is a statement of reasons for the indication of allowable subject matter: 

Applicant claims a system, the system configured for convolving an input stream of data with a kernel, the system comprising: a kernel store circuit configured to store the kernel, the kernel comprising a plurality of kernel values arranged as Wi rows of kernel values and Wj columns of kernel values, wherein 705Wi is greater than one; an arithmetic circuit configured to calculate a plurality of partial sums, each calculated using at least one data value and at least one of the plurality of kernel values, wherein the input stream of data comprises the at least one data value; a sum value store circuit configured to store the plurality of partial sums and to clock 710out a plurality of convolution values, wherein each one of the plurality of convolution values is calculated at least in part using a first row kernel value and a last row kernel value. Applicant further claims wherein the input stream of data is convolved with the kernel at a column stride of Sj, wherein the input stream of data comprises Fj columns of data values, and wherein the sum value store circuit contains no more than ceil(Fj/Sj) memory register as required in claims 2, 8, 18, wherein the plurality of last row partial sum is not accumulated into the sum value store circuit as required in claim 4, wherein the input value store circuit is configured to store Sj data values at once and wherein the input value store circuit contains no more than Sj memory register as required in claim 14; and an input circuit configured to receive the input stream of data, the input stream of data comprising a plurality of input columns of data values; a stride store circuit configured to sequentially store a plurality of length two strides from the input stream of data, wherein a preceding stride is from input columns n and n+1, wherein a subsequent stride is from input columns n+2 and n+3, and wherein the subsequent stride overwrites the preceding stride stored in the stride store circuit; and an output circuit configured to output the plurality of convolution values, wherein the stride store circuit provides the plurality of length two strides to the arithmetic circuit as required in claim 17.
The primary reasons for indication of allowable subject matter is the limitation in combination of all limitations, wherein the sum value store circuit contains no more than ceil(Fj/Sj) memory registers, wherein the plurality of last row partial sums is not accumulated into the sum value store circuit, wherein the input value store circuit is configured to store Sj data values at once and wherein the input value store circuit contains no more than Sj memory registers, wherein a stride store circuit configured to sequentially store a plurality of length two strides, wherein a preceding stride is from input column n and n+1, wherein a subsequent stride is from input columns n+2 and n+3.
	Kuramoto – US 20190004795
Kuramoto discloses a system and method for performing convolution operation using stream of input data and weight as disclosed in figures 3 and 5, the system generate a plurality of partial sums and output a plurality of convolution values as shown in figure 14. [0061] the number of column of convolution values are calculated by dividing a value obtained by subtracting the weight data from the number of columns of the input data by a stride number. figure 3 and 5 further illustrate the system comprises register 410 to store input data and register 420 to store the weight and register file 430 to store the intermediate values as calculated shown in figure 14. The system performs convolution when stride is 1 and 2 as illustrated in figure 10-14.

Dikici – US 20200301994
Dikici teaches a system and method for performing convolution as disclosed in figure 22 and 25, the system receives input data as stream of input data and weight data int o input buffer and coefficient buffer, respectively, the system further comprises a plurality of convolution device to perform convolution, as illustrated in figure 2-4, the convolution is performed using stride of 2, input data is being converted into 1 dimensional data to generate a plurality of partial sums. 

Sumbul – US 20200034148
Sumbul teaches a system and method for performing convolution using stride of 1 or 2, the system includes an input buffer to store stream input data, and memory to store filter weights, a stride control circuit to control the stride, an output buffer to store the partials sums, as illustrated in figure 3A, performing row wise convolution operation to obtain Psum, input is received row by row and weight sliding of stride of 1 to generate psum y

Suk – US 2020167405 
Suk discloses a system for performing convolutional operation having a plurality of multiply and accumulate units, each comprises a register to store the partial sum, and is mapped one to one to the output shift network, the system having separate registers to allow reading the convolution values in parallel of the convolution operation being performed. 

Ovsiannikov – US 20190392287 
Ovsiannikov discloses a system for performing convolution operation having input data comprises a plurality of channels and kernel data having a plurality of channels as shown in figure 1B, input data are fed into IFM buffer to performing multiply and add operation, then the result is being stored in the accumulator buffer 130A and 130B and is overwritten or cleared when the operation is complete, and the process is repeated until all input data are read and processed.
None of the closest found prior art teaches the limitation wherein the sum value store circuit contains no more than ceil(Fj/Sj) memory registers, wherein the plurality of last row partial sums is not accumulated into the sum value store circuit, wherein the input value store circuit is configured to store Sj data values at once and wherein the input value store circuit contains no more than Sj memory registers, wherein a stride store circuit configured to sequentially store a plurality of length two strides, wherein a preceding stride is from input column n and n+1, wherein a subsequent stride is from input columns n+2 and n+3, as required in the claims. Accordingly, Claims 2, 4, 8, 14, 17-18 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to HUY DUONG whose telephone number is (571)272-2764. The examiner can normally be reached Mon-Friday 7:30-5:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jyoti Mehta can be reached on 571-270-3995. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/HUY DUONG/Examiner, Art Unit 2182                                                                                                                                                                                            (571)272-2764

/MATTHEW D SANDIFER/Primary Examiner, Art Unit 2182