DETAILED ACTION
1.	This office action is in response to the Application No. 16203031 filed on 11/28/2018. Claims 1-20 are presented for examination and are currently pending.

Notice of Pre-AIA  or AIA  Status
2.	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Drawings
3.	The drawings are objected to because figure 11, recites “Convolution Enginer 1130” should be “convolution engine 1130” according to [0110] (instant specification, US20190163716).  
	Corrected drawing sheets in compliance with 37 CFR 1.121(d) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. The figure or figure number of an amended drawing should not be labeled as “amended.” If a drawing figure is to be canceled, the appropriate figure must be removed from the replacement sheet, and where necessary, the remaining figures must be renumbered and appropriate changes made to the brief description of the several views of the drawings for consistency. Additional replacement sheets may be necessary to show the renumbering of the remaining figures. Each drawing sheet submitted after the filing date of an application must be labeled in the top margin as either “Replacement Sheet” or “New Sheet” pursuant to 37 CFR 1.121(d). If the changes are not accepted by the examiner, the applicant will be notified and informed of any required corrective action in the next Office action. The objection to the drawings will not be held in abeyance.

Allowable Subject Matter
4.	Claims 5-8 and 10-18 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims and overcome the 35 USC 112(b) rejection.

Double Patenting
6.	The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.

7.	Claims 1-11, 13 and 15-20 are provisionally rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1-12 and 14-20 of copending Application No. 16202991 in view of Young et al. (US20180018554). 
This is a provisional nonstatutory double patenting rejection because the patentably indistinct claims have not in fact been patented.


Instant application 16203031
Copending application 16202991
1. A method for performing an operation of a convolutional layer in a convolutional neural network, comprising:
1. A method: for performing a convolution operation on folded feature data, comprising:
     reading unfolded feature data provided to the convolution layer and an original convolution kernel of the convolutional layer from a dynamic random access memory (DRAM);
     reading the folded feature data provided to a convolution layer and an original convolution kernel from a dynamic random access memory (DRAM);
     folding the unfolded feature data in at least one dimension of width and height to generate folded feature data;

pre-processing the folded feature data and the original convolution kernel;
     pre-processing the folded feature data and the original convolution kernel;
storing the pre-processed folded feature data into a static random-access memory (SRAM);
    storing the pre-processed folded feature data into a static random-access memory      
folding the pre-processed original convolution kernel in the at least one dimension to generate one or more folded convolution kernels corresponding to the original convolution kernel;

folding the pre-processed original convolution kernel in at least one dimension of width or height according to a folding manner of the folded feature data to generate one or more folded convolution kernels corresponding to the original convolution kernel;


storing the one or more folded convolution kernels in the SRAM; and
 storing the one or more folded convolution kernels in the SRAM; and
reading the pre-processed folded feature data and the one or more folded convolution kernels from the SRAM into a calculation unit for convolving the pre-processed folded feature data with the one or more folded convolution kernels.
reading the pre-processed folded feature data and the one or more folded convolution kernels from the SRAM into a calculation unit for convolving the pre-processed folded feature data with the one or more folded convolution kernels.


	Claim 1 of copending application 16/202991 includes all the limitation of claim 1 except for the underlined limitation identified in the table above.  That is, claim 1 of copending application 16/202991 does not specify the “reading the unfolded feature data” and “folding the unfolded feature data in at least one dimension of width and height to generate folded feature data”.  However, Young teaches “reading the unfolded feature data” and “folding the unfolded feature data in at least one dimension of width and height to generate folded feature data” (the convolutional neural network layers included in the superpixel convolutional neural network system 100 may be configured to receive an X by Y by Z input tensor and process the received input tensor using one or more respective convolutional neural network layer weight matrices, or kernels [0037]; The system transforms the X by Y by Z input tensor into an X′ by Y′ by Z′ superpixel input tensor (step 202). [0063] Fig. 2; The superpixel generator 114 groups multiple components of a received input together, trading spatial extent or indexing, e.g., X and Y dimensions, for depth extent or indexing, e.g., Z dimension [0042]).  It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of copending application 16/202991 to incorporate the method of Young for the benefit of processing the received input tensor using one or more convolutional neural network layer weight matrices to generate a U by V by W output tensor (Young [0003]). 

Instant application 16203031
Copending application 16202991
2. The method of claim 1 wherein the SRAM includes a plurality of memory units, at least every two pixels of the pre-processed folded feature data are stored in a same memory unit, and at least every two pixels of each folded convolution kernel are stored in a same memory unit
2. The method of claim 1 wherein the SRAM includes a plurality of memory units, each memory unit has a memory address, at least every two pixels of the pre-processed folded feature data are stored in a same memory unit, and at least every two pixels of each folded convolution kernel are stored in a same memory unit
3. The method of claim 1 wherein the calculation unit comprises a plurality of multipliers and a plurality of adders
3. The method of claim 1 wherein the calculation unit comprises a plurality of multipliers and a plurality of adders
4. The method of claim 3 wherein the folding the unfolded feature data in at least one dimension comprises: performing a first folding on the unfolded feature data in a first dimension by splicing every Nx consecutive slices in the first dimension m depth together, wherein the first dimension is one of width and height, and Nx, is an integer greater than 1.
The copending application does not recite wherein the folding the unfolded feature data in at least one dimension comprises: performing a first folding on the unfolded feature data in a first dimension by splicing every Nx consecutive slices in the first dimension m depth together, wherein the first dimension is one of width and height, and Nx, is an integer greater than 1”.  However, Young teaches the example illustration shows an example X by Y by Z input tensor 302. As shown in FIG. 3, the input tensor includes XY inputs, each of depth Z. As described above with reference to FIG. 2, the X by Y by Z input tensor may be transformed into an X′ by Y′ by Z′ superpixel input tensor by grouping multiple inputs together. During the grouping, indexing or layout in the spatial dimensions (X and Y dimensions) are traded for indexing or extent in the depth dimension (Z dimension). In the example illustration 300, the X by Y by Z input tensor has been transformed into an X/2 by Y/2 by 4Z superpixel input tensor 304. As shown in FIG. 3, the superpixel input tensor includes (X/2) (Y/2) inputs, each of depth 4Z. Each superpixel input included in the input tensor 304 represents 4 original inputs, and therefore represents 4 times the amount of data represented by an original input [0076]; The superpixel generator 114 is configured to transform the X by Y by Z input tensor into a X′ by Y′ by Z′ superpixel input tensor, where X′ is smaller than or equal to X, Y′ is smaller than or equal to Y, and Z′ is larger than or equal to Z. [0043].   It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of copending application 16/202991 to incorporate the method of Young for the benefit of processing the received input tensor using one or more convolutional neural network layer weight matrices to generate a U by V by W output tensor (Young [0003])
5. The method of claim 4, wherein data of all Cx channels in the (ifx×Nx+jfx)th slice of the unfolded feature data in the first dimension correspond to data of all consecutive Cx channels starting from the (jfx×Cx)th channel in the (ifx)th slice of the result of the first folding in the first dimension, where ifx is an integer greater than or equal to 0, jfx is an integer greater than or equal to 0 and less than Nx, and Cx is an integer greater than 0.
4. The method of claim 1 wherein the folded feature data corresponds to a first feature data that is not folded in a first dimension, data of all Cx channels in the (ifx×Nx+jfx)th slice of the first feature data in the first dimension correspond to data of consecutive Cx channels starting from the (jfx×Cx)th channel in the (ifx)th slice of the folded feature data in the first dimension, where the first dimension is one of width or height, ifx is an integer greater than or equal to 0, Nx is an integer greater than 1, Jfx is an integer greater than or equal to 0 and less than Nx, and Cx is an integer greater than 0.
6. The method of claim 4 wherein the pre-processing comprises:
determining a first padding quantity P1 at a starting boundary of the first feature data in the first dimension according to a padding manner specified by the convolutional layer, the first padding quantity P1 being greater than or equal to 0;
padding [P1/Nx] zero slices at the starting boundary of the result of the first folding in the first dimension, where “[ ]” indicates an upward rounding operation; and
padding [P1/Nx]*Nx−P1 zero slices at the starting boundary of the original convolution kernel in the first dimension.
5. The method of claim 4 wherein the pre-processing comprises:
determining a first padding quantity P1 for padding at a starting boundary of the first feature data in the first dimension according to a padding manner specified by the convolution layer, the first padding quantity P1 being greater than or equal to 0;
padding ┌P1/Nx┐ zero slices at a starting boundary of the folded feature data in the first dimension where “┌ ┐” represents an upward rounding operation; and
padding ┌P1/Nx┐×Nx−P1 zero slices at a starting boundary of the original convolution kernel in the first dimension.
7. The method of claim 4 wherein the original convolution kernel has a first stride in the first dimension that is not equal to NX, the convolving comprises:
moving all the folded convolution kernels corresponding to the original convolution kernel simultaneously by the first stride in the first dimension after convolving a same portion of the pre-processed folded feature data with all the folded convolution kernels corresponding to the original convolution kernel, or





convolving the entire pre-processed folded feature data with each of the folded convolution kernels corresponding to the original convolution kernel respectively, where each of the folded convolution kernels has a stride in the first dimension that is equal to the first stride
6. The method of claim 4 wherein the convolving comprises:
in a case where the original convolution kernel has a first stride the first dimension that is not equal to NX, moving all the folded convolution kernels corresponding to the original convolution kernel simultaneously by the first stride in the first dimension after convolving a same portion of the pre-processed folded feature data with all the
folded convolution kernels corresponding to the original convolution kernel

8. The method of claim 4 wherein the convolving comprises:
in a case where the original convolution kernel has a first stride the first dimension that is not equal to NX, convolving the entire pre-processed folded feature data with each of the folded convolution kernels corresponding to the original convolution kernel respectively, where each of the folded convolution kernels has a stride in the first dimension that is equal to the first stride
8. The method of claim 7 wherein convolving a same portion of the pre-processed folded feature data with all the folded convolution kernels comprises:
simultaneously calculating, by the plurality of multipliers, products of plural pixels in the pre-processed folded feature data each with a corresponding pixel of the plural folded convolution kernels 
7. The method of claim 6 wherein convolving a same portion of the pre-processed folded feature data with all the folded convolution kernels comprises:
simultaneously calculating, by the plurality of multipliers, products of plural pixels in the pre-processed folded feature data each with a corresponding pixel of the plural folded convolution kernels
9. The method of claim 4 wherein when the original convolution kernel has a stride in the first dimension that is equal to NX each of the folded convolution kernels has a stride 1 in the first dimension.
9. The method of claim 4 wherein when the original convolution kernel has a stride in the first dimension that is equal to NX each of the folded convolution kernels has a stride 1 in the first dimension.
10. The method of claim 4, wherein the folding the pre-processed original convolution kernel in the at least one dimension comprises:
padding kx×Sx zero slices padded at a starting boundary of the pre-processed original convolution kernel in the first dimension to generate Ex first transformed convolution kernels, respectively, where Sx is a first stride of the original convolution kernel in the first dimension, Ex is greater than or equal to 1 and depends on Nx and Sx, and kx is an integer greater than or equal to 0 and less than Ex; and
performing a second folding on each first transformed convolution kernel in the first dimension by splicing each Nx consecutive slices in the first dimension in depth together to generate a corresponding first folded convolution kernel.
10. The method of claim 4, wherein the folding the pre-processed original convolution kernel in the at least one dimension comprises:
padding kx×Sx zero slices padded at a starting boundary of the pre-processed original convolution kernel in the first dimension to generate Ex first transformed convolution kernels, respectively, where Sx is a first stride of the original convolution kernel in the first dimension, Ex is greater than or equal to 1 and depends on Nx and Sx, and kx is an integer greater than or equal to 0 and less than Ex; and
performing a second folding on each first transformed convolution kernel in the first dimension by splicing each Nx consecutive slices in the first dimension in depth together to generate a corresponding first folded convolution kernel.


11. The method of claim 10, wherein data of all Cx channels in the (ikx×Nx+jkx)th slice of each first transformed convolution kernel in the first dimension correspond to data of all consecutive Cx channels starting from the (jkx×Cx)th channel in the (ikx) slice of the corresponding first folded convolution kernel in the first dimension, where ikx is an integer greater than or equal to 0, and jkx is an integer greater than or equal to 0 and less than Nx.
11. The method of claim 10, wherein data of all Cx channels in the (ikx×Nx+jkx)th slice of each first transformed convolution kernel in the first dimension correspond to data of all consecutive Cx channels starting from the (jkx×Cx)th channel in the (ikx) slice of the corresponding first folded convolution kernel in the first dimension, where ikx is an integer greater than or equal to 0, and jkx is an integer greater than or equal to 0 and less than Nx.
13. The method according to claim 12, wherein data of all Cy channels in the (ify×Ny+jfy)th slice of the result of the first folding in the second dimension correspond to data of consecutive Cy channels starting from the (jfy×Cy)th channel in the (ify) slice of the result of the third folding in the second dimension, where ify is an integer greater than or equal to 0, jfy is an integer greater than or equal to 0 and less than Ny, and Cy is an integer greater than 0.
12. The method of claim 10 wherein the first feature data corresponds to a second feature data that is not folded in a second dimension, data of all Cy channels in the (ify×Ny+jfy)th slice of the second feature data in the second dimension correspond to data of consecutive Cy channels starting from the (jfy×Cy)th channel in the (ify)th slice of the first feature data in the second dimension, where the second dimension is the other one of width or height, ify is an integer greater than or equal to 0, Ny is an integer greater than 1, jfy is an integer greater than or equal to 0 and less than Ny, and Cy is an integer greater than 0.
15. The method of claim 12 wherein the folding the pre-processed original convolution kernel in the at least one dimension further comprises:
padding ky×Sy zero slices at the starting boundary of the first folded convolution kernel in the second dimension to generate Ey second transformed convolution kernels for each first folded convolution kernel, where Sy is a second stride of the original convolution kernel in the second dimension, Ey is greater than or equal to 1 and depends on Ny and Sy, and ky is an integer greater than or equal to 0 and less than Ey; and
performing a fourth folding on each second transformed convolution kernel in the second dimension by splicing every Ny consecutive slices in the second dimension in depth together, to generate a second folded convolution kernel corresponding to the second transformed convolution kernel.

14. The method of claim 12 wherein folding the pre-processed original convolution kernel in at least one dimension further comprises:
padding ky×Sy zero slices at a starting boundary of each first folded convolution kernel in the second dimension to generate Ey second transformed convolution kernels for each first folded convolution kernel, where Sy is a second stride of the original convolution kernel in the second dimension, Ey is greater than or equal to 1 and depends on Ny and Sy, and kx is an integer greater than or equal to 0 and less than Ex; and
folding each second transformed convolution kernel in the second dimension by splicing every Ny consecutive slices of the second transformed convolution kernel in the second dimension together in the depth dimension to generate a second folded convolution kernel corresponding to the second transformed convolution kernel.

16. The method of claim 15, wherein data of all Cy channels in the (iky×Ny+jky)th slice of each second transformed convolution kernel in the second dimension correspond to data of consecutive Cy channels starting from the (jky×Cy)th channel in the (iky)th slice of each second transformed convolution kernel in the second dimension, where iky is an integer greater than or equal to 0, and jky is an integer greater than or equal to 0 and less than Ny.
15. The method of claim 14 wherein data of all Cy channels in the (iky×Ny+jky)th slice of each second transformed convolution kernel in the second dimension correspond to data of consecutive Cy channels starting from the (jxy×Cy)th channel in the (ixy)th slice of the corresponding second folded convolution kernel in the second dimension, respectively, where iky is an integer greater than or equal to 0, and jky is an integer greater than or equal to 0 and less than Ny.
17. The method of claim 12 wherein the original convolution kernel has a second stride in the second dimension that is not equal to NX, the convolving comprises:
moving all the folded convolution kernels corresponding to the original convolution kernel simultaneously by the second stride in the second dimension after convolving a same portion of the pre-processed folded feature data with all the folded convolution kernels corresponding to the original convolution kernel, or


convolving the entire pre-processed folded feature data with each of the folded convolution kernels corresponding to the original convolution kernel respectively, where each of the folded convolution kernels has a stride in the second dimension that is equal to the second stride.
16. The method of claim 12 wherein the convolving further comprises:
in a case where the original convolution kernel has a second stride the second dimension that is not equal to NX, moving all the folded convolution kernels corresponding to the original convolution kernel simultaneously by the first stride in the first dimension after convolving a same portion of the pre-processed folded feature data with all the
folded convolution kernels corresponding to the original convolution kernel
 
17. The method of claim 12 wherein the convolving further comprises:
in a case where the original convolution kernel has a second stride the second dimension that is not equal to NX, convolving the entire pre-processed folded feature data with each of the folded convolution kernels corresponding to the original convolution kernel respectively, where each of the folded convolution kernels has a stride in the first dimension that is equal to the first stride.
18. The method of claim 12 wherein when the original convolution kernel has a second stride in the second dimension that is equal to Ny, each of the folded convolution kernel has a stride 1 in the second dimension.
18. The method of claim 12 wherein when the original convolution kernel has a second stride in the second dimension that is equal to Ny, each of the folded convolution kernel has a stride 1 in the second dimension.
19. An apparatus for performing an operation of a convolutional layer in a convolutional neural network comprising:
19. An apparatus for performing an operation of a convolutional layer in a convolutional neural network, comprising:
one or more processors configured to execute instructions stored in memory, execution of the instructions causing the one or more processors to perform the following steps:
one or more processors configured to execute instructions stored in memory, execution of the instructions causing the one or more processors to perform the following steps:
reading unfolded feature data provided to the convolution layer and an original convolution kernel of the convolutional layer from a dynamic random access memory (DRAM);
reading the folded feature data provided to a convolution layer and an original convolution kernel from a dynamic random access memory (DRAM);
folding the unfolded feature data in at least one dimension of width and height to generate folded feature data;

pre-processing the folded feature data and the original convolution kernel;
pre-processing the folded feature data and the original convolution kernel;
storing the pre-processed folded feature data into a static random-access memory (SRAM);
storing the pre-processed folded feature data into a static random-access memory      
folding the pre-processed original convolution kernel in the at least one dimension to generate one or more folded convolution kernels corresponding to the original convolution kernel;
folding the pre-processed original convolution kernel in at least one dimension of width or height according to a folding manner of the folded feature data to generate one or more folded convolution kernels corresponding to the original convolution kernel;
storing the one or more folded convolution kernels in the SRAM; and
storing the one or more folded convolution kernels in the SRAM; and
reading the pre-processed folded feature data and the one or more folded convolution kernels from the SRAM into a calculation unit for convolving the pre-processed folded feature data with the one or more folded convolution kernels.
reading the pre-processed folded feature data and the one or more folded convolution kernels from the SRAM into a calculation unit for convolving the pre-processed folded feature data with the one or more folded convolution kernels.
20. A method for performing an operation of a convolutional layer in a convolutional neural network, comprising:
20. A method for performing an operation on folded feature data comprising:

folding unfolded feature data provided to the convolution layer in at least one dimension of width and height to generate folded feature data

pre-processing the folded feature data and the original convolution kernel;
pre-processing the folded feature data and the original convolution kernel;
folding the pre-processed original convolution kernel in the at least one dimension to generate one or more folded convolution kernels corresponding to the original convolution kernel; and
folding the pre-processed original convolution kernel in at least one dimension of width or height according to a folding manner of the folded feature data to generate one or more folded convolution kernels corresponding to the original convolution kernel; and
convolving the pre-processed folded feature data with the one or more folded convolution kernels.
convolving the pre-processed folded feature data with the one or more folded convolution kernels.


	Claim 19 of copending application 16/202991 includes all the limitation of claim 19 except for the underlined limitation identified in the table above.  That is, claim 19 of copending application 16/202991 does not specify the “reading the unfolded feature data” and “folding the unfolded feature data in at least one dimension of width and height to generate folded feature data”.  However, Young teaches “reading the unfolded feature data” and “folding the unfolded feature data in at least one dimension of width and height to generate folded feature data” (the convolutional neural network layers included in the superpixel convolutional neural network system 100 may be configured to receive an X by Y by Z input tensor and process the received input tensor using one or more respective convolutional neural network layer weight matrices, or kernels [0037]; The system transforms the X by Y by Z input tensor into an X′ by Y′ by Z′ superpixel input tensor (step 202). [0063] Fig. 2; The superpixel generator 114 groups multiple components of a received input together, trading spatial extent or indexing, e.g., X and Y dimensions, for depth extent or indexing, e.g., Z dimension [0042]).  It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of copending application 16/202991 to incorporate the method of Young for the benefit of processing the received input tensor using one or more convolutional neural network layer weight matrices to generate a U by V by W output tensor (Young [0003]). 

	Claim 20 of copending application 16/202991 includes all the limitation of claim 20 except for the underlined limitation identified in the table above.  That is, claim 20 of copending application 16/202991 does not specify the “folding unfolded feature data provided to the convolution layer in at least one dimension of width and height to generate folded feature data”.  However, Young teaches “folding unfolded feature data provided to the convolution layer in at least one dimension of width and height to generate folded feature data” (the convolutional neural network layers included in the superpixel convolutional neural network system 100 may be configured to receive an X by Y by Z input tensor and process the received input tensor using one or more respective convolutional neural network layer weight matrices, or kernels [0037]; The system transforms the X by Y by Z input tensor into an X′ by Y′ by Z′ superpixel input tensor (step 202). [0063] Fig. 2; The superpixel generator 114 groups multiple components of a received input together, trading spatial extent or indexing, e.g., X and Y dimensions, for depth extent or indexing, e.g., Z dimension [0042]).  It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of copending application 16/202991 to incorporate the method of Young for the benefit of processing the received input tensor using one or more convolutional neural network layer weight matrices to generate a U by V by W output tensor (Young [0003]). 

Claim Rejections - 35 USC § 112 
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION. —The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


8.	Claims 1-20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
	Claims 1 recites “the pre-processed folded feature data” and “the pre-processed original convolution kernel”. These limitations lack antecedent basis. 
	Claim 19 recites “the pre-processed folded feature data” and “the pre-processed original convolution kernel”. These limitations lack antecedent basis.
	Claim 20 recites “the pre-processed original convolution kernel” and “the pre-processed folded feature data”. These limitations lack antecedent basis.
	Dependent claims 2-18 are also rejected under 35 U.S.C. 112(b) because they fail to correct the deficiencies of their respective independent claims.
	

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

9.	Claim 20 is rejected under 35 U.S.C 102(a)(1) as being anticipated by Young et al. (US20180018554 filed 07/13/2016)

	Regarding claim 20, Young teaches a method for performing an operation of a convolutional layer in a convolutional neural network, (The superpixel convolutional neural network system 100 can be trained on multiple batches of training examples in order to determine trained values of the parameters of the neural network layers, i.e., to adjust the values of the parameters from initial values to trained values [0052]) comprising:
	folding unfolded feature data provided to the convolution layer in at least one dimension of width and height to generate folded feature data; (The system transforms the X by Y by Z input tensor into an X′ by Y′ by Z′ superpixel input tensor (step 202). [0063] Fig. 2; The superpixel generator 114 groups multiple components of a received input together, trading spatial extent or indexing, e.g., X and Y dimensions, for depth extent or indexing, e.g., Z dimension [0042])
	pre-processing the folded feature data and an original convolution kernel of the convolution layer; (The system processes the X′ by Y′ by Z′ input tensor using the modified weight matrices to generate a transformed convolutional neural network layer output, e.g., a U′ by V′ by W′ output tensor (step 208) [0070], Fig. 2)
	folding the pre-processed original convolution kernel in the at least one dimension to generate one or more folded convolution kernels corresponding to the original convolution kernel; (Generally, for one dimension, a number of modified kernel elements may be equal to ceiling ((superinput_size+original convolutional neural network layer kernel width−1)/superinput_size). [0067]; At step (c), the example illustration 400 shows example modified two-dimensional weight matrices 416-422 that may be used to compute the super output. Each of the matrices 416-422 are kernel elements that together make up a 2 by 2 modified kernel patch [0080].) and
	convolving the pre-processed folded feature data with the one or more folded convolution kernels. (The example illustration at step (c) depicts a matrix multiplication. The column 424 represents inputs, and when an input is illustrated as being a same height as an entry of the modified weight matrices 416-422 it is multiplied by that weight. All of the weight-input products in a column may be added up to produce a total value for the output within the super output. In total there are four super inputs illustrated, e.g., 16 original inputs, contributing to calculate one super output, e.g., 4 original outputs [0080])

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.



10.	Claims 1 and 19 is rejected under 35 U.S.C. 103 as being unpatentable over Young et al. (US20180018554 filed 07/13/2016) in view of Lu et al (US10073816 filed 07/20/2017)

	Regarding claim 1, Young teaches a method for performing an operation of a convolutional layer in a convolutional neural network, (The superpixel convolutional neural network system 100 can be trained on multiple batches of training examples in order to determine trained values of the parameters of the neural network layers, i.e., to adjust the values of the parameters from initial values to trained values [0052]) comprising:
	reading unfolded feature data provided to the convolution layer and an original convolution kernel of the convolutional layer from a dynamic random access memory (DRAM); (For example, the convolutional neural network layers included in the superpixel convolutional neural network system 100 may be configured to receive an X by Y by Z input tensor and process the received input tensor using one or more respective convolutional neural network layer weight matrices, or kernels [0037]; a central processing unit will receive instructions and data from … a random access memory [0090].) and 
	folding the unfolded feature data in at least one dimension of width and height to generate folded feature data; (The system transforms the X by Y by Z input tensor into an X′ by Y′ by Z′ superpixel input tensor (step 202). [0063] Fig. 2; The superpixel generator 114 groups multiple components of a received input together, trading spatial extent or indexing, e.g., X and Y dimensions, for depth extent or indexing, e.g., Z dimension [0042])
	pre-processing the folded feature data and the original convolution kernel; (The system processes the X′ by Y′ by Z′ input tensor using the modified weight matrices to generate a transformed convolutional neural network layer output, e.g., a U′ by V′ by W′ output tensor (step 208) [0070], Fig. 2)
	storing the pre-processed folded feature data into a static random-access memory (SRAM); (a central processing unit will receive instructions and data from a read only memory or a random access memory or both. The essential elements of a computer are a central processing unit for performing or executing instructions and one or more memory devices for storing instructions and data.[0090])
	folding the pre-processed original convolution kernel in the at least one dimension to generate one or more folded convolution kernels corresponding to the original convolution kernel; (Generally, for one dimension, a number of modified kernel elements may be equal to ceiling ((superinput_size+original convolutional neural network layer kernel width−1)/superinput_size). [0067]; At step (c), the example illustration 400 shows example modified two-dimensional weight matrices 416-422 that may be used to compute the super output. Each of the matrices 416-422 are kernel elements that together make up a 2 by 2 modified kernel patch.[0080].)
	storing the one or more folded convolution kernels in the SRAM; ((a central processing unit will receive instructions and data from a read only memory or a random access memory or both. The essential elements of a computer are a central processing unit for performing or executing instructions and one or more memory devices for storing instructions and data. [0090]) and
	reading the pre-processed folded feature data and the one or more folded convolution kernels from the SRAM into a calculation unit for convolving the pre-processed folded feature data with the one or more folded convolution kernels. (The example illustration at step (c) depicts a matrix multiplication. The column  424 represents inputs, and when an input is illustrated as being a same height as an entry of the modified weight matrices 416-422 it is multiplied by that weight. All of the weight-input products in a column may be added up to produce a total value for the output within the super output. In total there are four super inputs illustrated, e.g., 16 original inputs, contributing to calculate one super output, e.g., 4 original outputs [0080])
	Young does not explicitly teach storing the pre-processed folded feature data into a static random-access memory (SRAM); storing the one or more folded convolution kernels in the SRAM; and reading the pre-processed folded feature data and the one or more folded convolution kernels from the SRAM into a calculation unit
	Lu teaches storing the pre-processed folded feature data into a static random-access memory (SRAM); (The Examiner notes that buffer 290 is the SRAM, Fig. 2)
	storing the one or more folded convolution kernels in the SRAM (storing folded convolution kernels in buffer 290, Fig. 2)
	reading the pre-processed folded feature data and the one or more folded convolution kernels from the SRAM into a calculation unit. (reading folded data from fold buffer 290 into OPU 220, Fig. 2; The OPUs 220 calculate the outer products. The distribution section 212 partitions the full matrix multiply X×Y into component outer product calculations, which are performed by the OPUs 220, col 5, lines 58-61; Each OPU 620 includes a parallel array of intermediate processing elements (IPEs) 630. Each IPE 630 includes multiple atomic processing elements (APEs) 640. Each APE 640 uses multiply-accumulate circuits (MACs), col 8, lines 38-45)
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Young to incorporate the method of Lu for the benefit of effectively implementing the unfolding by retrieving tensor elements in the order consumed by the contraction engine (Lu, col 5, lines 23-26)

	Regarding claim 19, Young teaches an apparatus for performing an operation of a convolutional layer in a convolutional neural network, (abstract) comprising:
	one or more processors (machines for processing data, including by way of example a programmable processor, a computer, or multiple processors or computers [0086]) configured to 
	execute instructions stored in a memory, execution of the instructions causing the one or more processors to perform the following steps: (a central processing unit for performing or executing instructions and one or more memory devices for storing instructions and data. [0090])
	reading unfolded feature data provided to the convolution layer and an original convolution kernel of the convolutional layer from a dynamic random access memory (DRAM); ); (For example, the convolutional neural network layers included in the superpixel convolutional neural network system 100 may be configured to receive an X by Y by Z input tensor and process the received input tensor using one or more respective convolutional neural network layer weight matrices, or kernels [0037]; a central processing unit will receive instructions and data from … a random access memory [0090].)
	folding the unfolded feature data in at least one dimension of width and height to generate folded feature data; (The system transforms the X by Y by Z input tensor into an X′ by Y′ by Z′ superpixel input tensor (step 202). [0063] Fig. 2; The superpixel generator 114 groups multiple components of a received input together, trading spatial extent or indexing, e.g., X and Y dimensions, for depth extent or indexing, e.g., Z dimension [0042])
	pre-processing the folded feature data and the original convolution kernel; (The system processes the X′ by Y′ by Z′ input tensor using the modified weight matrices to generate a transformed convolutional neural network layer output, e.g., a U′ by V′ by W′ output tensor (step 208) [0070], Fig. 2)
	storing the pre-processed folded feature data into a static random-access memory (SRAM); (a central processing unit will receive instructions and data from a read only memory or a random access memory or both. The essential elements of a computer are a central processing unit for performing or executing instructions and one or more memory devices for storing instructions and data.[0090])
	folding the pre-processed original convolution kernel in the at least one dimension to generate one or more folded convolution kernels corresponding to the original convolution kernel; (Generally, for one dimension, a number of modified kernel elements may be equal to ceiling ((superinput_size+original convolutional neural network layer kernel width−1)/superinput_size). [0067]; At step (c), the example illustration 400 shows example modified two-dimensional weight matrices 416-422 that may be used to compute the super output. Each of the matrices 416-422 are kernel elements that together make up a 2 by 2 modified kernel patch.[0080].)
	storing the one or more folded convolution kernels in the SRAM; ((a central processing unit will receive instructions and data from a read only memory or a random access memory or both. The essential elements of a computer are a central processing unit for performing or executing instructions and one or more memory devices for storing instructions and data. [0090]) and
	reading the pre-processed folded feature data and the one or more folded convolution kernels from the SRAM into a calculation unit for convolving the pre-processed folded feature data with the one or more folded convolution kernels (The example illustration at step (c) depicts a matrix multiplication. The column  424 represents inputs, and when an input is illustrated as being a same height as an entry of the modified weight matrices 416-422 it is multiplied by that weight. All of the weight-input products in a column may be added up to produce a total value for the output within the super output. In total there are four super inputs illustrated, e.g., 16 original inputs, contributing to calculate one super output, e.g., 4 original outputs.[0080])
	Young does not explicitly teach storing the pre-processed folded feature data into a static random-access memory (SRAM); storing the one or more folded convolution kernels in the SRAM; and reading the pre-processed folded feature data and the one or more folded convolution kernels from the SRAM into a calculation unit
	Lu teaches storing the pre-processed folded feature data into a static random-access memory (SRAM); (The Examiner notes that buffer 290 is the SRAM, Fig. 2)
	storing the one or more folded convolution kernels in the SRAM (storing folded convolution kernels in buffer 290, Fig. 2)
	reading the pre-processed folded feature data and the one or more folded convolution kernels from the SRAM into a calculation unit. (reading folded data from fold buffer 290 into OPU 220, Fig. 2; The OPUs 220 calculate the outer products. The distribution section 212 partitions the full matrix multiply X×Y into component outer product calculations, which are performed by the OPUs 220, col 5, lines 58-61; Each OPU 620 includes a parallel array of intermediate processing elements (IPEs) 630. Each IPE 630 includes multiple atomic processing elements (APEs) 640. Each APE 640 uses multiply-accumulate circuits (MACs), col 8, lines 38-45)
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Young to incorporate the method of Lu for the benefit of effectively implementing the unfolding by retrieving tensor elements in the order consumed by the contraction engine (Lu, col 5, lines 23-26)

11.	Claims 2-4 and 9 are rejected under 35 U.S.C. 103 as being unpatentable over Young et al. (US20180018554 filed 07/13/2016) in view of Lu et al (US10073816 filed 07/20/2017) and further in view of Du et al. ("A streaming accelerator for deep convolutional neural networks with image and feature decomposition for resource-limited system applications." arXiv preprint arXiv:1709.05116 (Sep 2017).)

	Regarding claim 2, Modified Young teaches the method of claim 1, Lu teaches wherein the SRAM includes a plurality of memory units, (The Examiner notes that buffer 290 is the SRAM, Fig. 2; the input and output buffers 290 are double buffers. The input buffer 290 includes a first buffer that buffers the retrieval of tensor elements from the device memory. It also includes a second buffer that buffers transmission of the retrieved tensor elements to the contraction engine 210, col 5, lines 37-42) 
	Modified Young does not explicitly teach at least every two pixels of the pre-processed folded feature data are stored in a same memory unit, at least every two pixels of each folded convolution kernel are stored in a same memory unit.
	Du teaches at least every two pixels of the pre-processed folded feature data are stored in a same memory unit, (To minimize data movement and to utilize maximal available on-chip SRAM bandwidth, a streaming architecture is proposed, as shown in Fig. 2. SRAM width is set to 16 Byte, corresponding to stream 8 pixels per cycle, pg. 2, left col, 3. Streaming Architecture) and 	
	at least every two pixels of each folded convolution kernel are stored in a same memory unit. (As Fig. 2(a) shows the proposed single channel column buffer with 2 x N row buffer (N is the depth of SRAM), …. In Fig. 2(b), after the first eight rows, every cycle has eight groups’ valid convolution results with the help of 2xN pixel size ROW BUF, pg. 2, left col, 3. Streaming Architecture. The Examiner notes that 2xN pixel is stored in the SRAM)
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Modified Young to incorporate the method of Du for the benefit of optimizing memory access pattern for an arbitrary size of image and number of features within limited on-chip SRAM capacity (Du, pg. 1, abstract.)

	Regarding claim 3, Modified Young teaches the method of claim 1, Modified Young does not explicitly teach wherein the calculation unit comprises a plurality of multipliers and a plurality of adders.
	Du teaches wherein the calculation unit comprises a plurality of multipliers and a plurality of adders (The CU engine array includes nine processing engines (PE) and an adder to combine the output, as shown in Fig. 4, … the accelerator uses nine multipliers to form a CU and sixteen CUs to compose a CU engine pg. 2, right col, 4.2 CU Engine Array, Fig.)
	The same motivation to combine dependent claim 2 applies here.

	Regarding claim 4, Modified Young teaches the method of claim 3, Young teaches wherein the folding the unfolded feature data in at least one dimension comprises: performing a first folding on the unfolded feature data in a first dimension by splicing every Nx consecutive slices in the first dimension in depth together, wherein the first dimension is one of width and height, and Nx is an integer greater than 1. (The example illustration shows an example X by Y by Z input tensor 302. As shown in FIG. 3, the input tensor includes XY inputs, each of depth Z. As described above with reference to FIG. 2, the X by Y by Z input tensor may be transformed into an X′ by Y′ by Z′ superpixel input tensor by grouping multiple inputs together. During the grouping, indexing or layout in the spatial dimensions (X and Y dimensions) are traded for indexing or extent in the depth dimension (Z dimension). In the example illustration 300, the X by Y by Z input tensor has been transformed into an X/2 by Y/2 by 4Z superpixel input tensor 304. As shown in FIG. 3, the superpixel input tensor includes (X/2)(Y/2) inputs, each of depth 4Z. Each superpixel input included in the input tensor 304 represents 4 original inputs, and therefore represents 4 times the amount of data represented by an original input [0076]; The superpixel generator 114 is configured to transform the X by Y by Z input tensor into a X′ by Y′ by Z′ superpixel input tensor, where X′ is smaller than or equal to X, Y′ is smaller than or equal to Y, and Z′ is larger than or equal to Z. [0043])

	Regarding claim 9, Modified Young teaches the method of claim 4, Young teaches wherein when the original convolution kernel has a stride in the first dimension that is equal to Nx, (For example, the convolutional neural network layer may include a kernel stride that controls how much convolutional filters, or weight matrices, shift in X [0039]. The Examiner notes that kernel stride is shifted in X which is the first dimension, in X direction)
	 each of the folded convolution kernels has a stride 1 in the first dimension.( For example, if a convolutional neural network layer includes a stride S in the X dimension and a stride T in the Y dimension, and the convolutional neural network layer is transformed into a superpixel convolutional neural network layer with superpixel inputs of size NM, by choosing an output superpixel size of (N/S)(M/T), the system implements kernel striding of the superpixel modified weight matrices.)

Conclusion
	Any inquiry concerning this communication or earlier communications from the examiner should be directed to MORIAM MOSUNMOLA GODO whose telephone number is (571)272-8670. The examiner can normally be reached Monday-Friday 7:30am-5:30pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Li B. Zhen can be reached on (571)272-3768. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/M.G./Examiner, Art Unit 2121        



/Li B. Zhen/Supervisory Patent Examiner, Art Unit 2121