Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.
Claims 1-22 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1-21 of U.S. Patent No. 10,796,220. 
Independent claims 1, 2, 3, 13, 14 and 15 are broadened versions of claims  6, 4, 1, 16, 14 and 13, respectively, in the Patent and thus overlapping in scope and anticipated by the Patent.  The non-statutory double patenting anticipatory analysis is demonstrated below between independent claims 1, 2, 3, 13, 14 and 15 of the instant application vis-à-vis claims 6, 4, 1, 16, 14 and 13, respectively, of the Patent.  


Instant Application Claim 1
US Pat. No. 10/796,220 Claim 6

A hardware-based programmable deep learning processor (DLP), comprising:
	an on-system memory (OSM) and one or more controllers configured to access a plurality of external memory resources via direct memory access (DMA);
	a plurality of programmable tensor engines configured to perform a plurality of convolution operations by applying one or more kernels on multi-dimensional input data to generate deep learning processing results for pattern recognition and classification based on a neural network, wherein each of the plurality of tensor engines further comprises:
		a data engine configured to prefetch the multi-dimensional input data and/or the kernels from the OSM and/or the external memory resources for the convolution operations; 
		an online memory (OLM) configured to store the prefetched input data from the OSM and/or the external memory resources; 
		one or more vector processing engines each configured to: 
			vectorize the multi-dimensional input data at each layer of the neural network to generate a plurality of vectors; 
			perform multi-dimensional fast Fourier transform (FFT) on the generated vectors and/or the kernels to create output for the convolution operations.
Claim 1:
A hardware-based programmable deep learning processor
(DLP), comprising:
	an on-system memory (OSM) and one or more controllers configured to access a plurality of external memory resources via direct memory access (DMA);
	a plurality of programmable tensor engines configured to perform a plurality of convolution operations by applying one or more kernels on multi-dimensional input data to generate deep learning processing results for pattern recognition and classification based on a neural network, wherein each of the plurality of tensor engines further comprises:
		a data engine configured to prefetch the multi-dimensional input data and/or the kernels from the OSM and/or the external memory resources for the convolution operations;



		one or more vector processing engines each configured to:
			vectorize the multi-dimensional input data at each layer of the neural network to generate a plurality of vectors; 
			perform multi-dimensional fast Fourier transform (FFT) on the generated vectors and/or the kernels to create output for the convolution operations; 
		a programmable CPU having its own instruction cache and data cache configured to store a plurality of instructions from a host and the retrieved data from the OSM and/or the external memory resources, respectively.

Claim 6:
The processor of claim 1, wherein:
each tensor engine further includes an online memory (OLM) configured to store the prefetched input data from the OSM and/or the external memory resources.


Instant Application Claim 2
US Pat. No. 10/796,220 Claim 4

A hardware-based programmable deep learning processor (DLP), comprising:
	an on-system memory (OSM) and one or more controllers configured to access a plurality of external memory resources via direct memory access (DMA);
	a plurality of programmable tensor engines configured to perform a plurality of convolution operations by applying one or more kernels on multi-dimensional input data to generate deep learning processing results for pattern recognition and classification based on a neural network, wherein each of the plurality of tensor engines further comprises: 




		one or more vector processing engines each configured to: 
			vectorize the multi-dimensional input data at each layer of the neural network to generate a plurality of vectors; 
			perform multi-dimensional fast Fourier transform (FFT) on the generated vectors and/or the kernels to create output for the convolution operations;







	wherein one or more of the plurality of convolution operations are each partitioned among the plurality of vector processing engines, wherein each of the plurality of vector processing engines is configured to perform a sub-task of each of the one or more of the plurality of convolution operations in parallel.
Claim 1:
A hardware-based programmable deep learning processor
(DLP), comprising:
	an on-system memory (OSM) and one or more controllers configured to access a plurality of external memory resources via direct memory access (DMA);
	a plurality of programmable tensor engines configured to perform a plurality of convolution operations by applying one or more kernels on multi-dimensional input data to generate deep learning processing results for pattern recognition and classification based on a neural network, wherein each of the plurality of tensor engines further comprises:
		a data engine configured to prefetch the multi-dimensional input data and/or the kernels from the OSM and/or the external memory resources for the convolution operations;
		one or more vector processing engines each configured to:
			vectorize the multi-dimensional input data at each layer of the neural network to generate a plurality of vectors; 
			perform multi-dimensional fast Fourier transform (FFT) on the generated vectors and/or the kernels to create output for the convolution operations; 
		a programmable CPU having its own instruction cache and data cache configured to store a plurality of instructions from a host and the retrieved data from the OSM and/or the external memory resources, respectively.

Claim 4:
The processor of claim 1, wherein:
the DLP is configured to partition each convolution operation for pattern classification among the plurality of vector processing engines, wherein each vector processing engine is configured to perform a sub-task of the convolution operation in parallel.


Instant Application Claim 3
US Pat. No. 10/796,220 Claim 1
A hardware-based programmable deep learning processor (DLP), comprising:
	an on-system memory (OSM) and one or more controllers configured to access a plurality of external memory resources via direct memory access (DMA);
	a plurality of programmable tensor engines configured to perform a plurality of convolution operations by applying one or more kernels on multi-dimensional input data to generate deep learning processing results for pattern recognition and classification based on a neural network, wherein each of the plurality of tensor engines further comprises:




		one or more vector processing engines each configured to: 
			vectorize the multi-dimensional input data at each layer of the neural network to generate a plurality of vectors;
			perform multi-dimensional fast Fourier transform (FFT) on the generated vectors and/or the kernels to create output for the convolution operations; 
		a programmable CPU having its own instruction cache and data cache configured to store a plurality of instructions from a host and the retrieved data from the OSM and/or the external memory resources, respectively.
A hardware-based programmable deep learning processor
(DLP), comprising:
	an on-system memory (OSM) and one or more controllers configured to access a plurality of external memory resources via direct memory access (DMA);
	a plurality of programmable tensor engines configured to perform a plurality of convolution operations by applying one or more kernels on multi-dimensional input data to generate deep learning processing results for pattern recognition and classification based on a neural network, wherein each of the plurality of tensor engines further comprises:
		a data engine configured to prefetch the multi-dimensional input data and/or the kernels from the OSM and/or the external memory resources for the convolution operations;
		one or more vector processing engines each configured to:
			vectorize the multi-dimensional input data at each layer of the neural network to generate a plurality of vectors; 
			perform multi-dimensional fast Fourier transform (FFT) on the generated vectors and/or the kernels to create output for the convolution operations; 
		a programmable CPU having its own instruction cache and data cache configured to store a plurality of instructions from a host and the retrieved data from the OSM and/or the external memory resources, respectively.


Instant Application Claim 13
US Pat. No. 10/796,220 Claim 16

A method to support hardware-based programmable vectorized fast Fourier transform (FFT) for multidimensional convolution, comprising: 
	prefetching multi-dimensional input data and/or one or more kernels from an on system memory (OSM) and/or a plurality of external memory resources via direct memory access (DMA);
	storing the prefetched input data from the OSM and/or the external memory resources in an online memory (OLM);

	vectorizing the multi-dimensional input data to generate a plurality of vectors at each layer of a neural network used for a plurality of convolution operations;
	performing multi-dimensional FFT on the generated vectors and/or the kernels to create output for the convolution operations; 
	outputting deep learning processing results for pattern recognition and classification to a host based on the output for the convolution operations.
Claim 13:
A method to support hardware-based programmable vectorized fast Fourier transform (FFT) for multidimensional convolution, comprising:
	prefetching multi-dimensional input data and/or one or more kernels from an on system memory (OSM) and/or a plurality of external memory resources via direct memory access (DMA);
	accepting a plurality of instructions from a host and submitting the instructions to program a plurality of vector processing engines for the vectorized FFT for multi-dimensional convolution; 
	vectorizing the multi-dimensional input data to generate a plurality of vectors at each layer of a neural network used for a plurality of convolution operations; 	performing multi-dimensional FFT on the generated vectors and/or the kernels to create output for the convolution operations;
	outputting deep learning processing results for pattern recognition and classification to a host based on the output for the convolution operations.

Claim 16:
The method of claim 13, further comprising:
retrieving a vector of input data across V out of N rows
column-wise from each column of the multi-dimensional
input data, one column at a time, wherein the
multiple dimensional input data is stored in the OSM or 
an online memory (OLM) in column major.


Instant Application Claim 14
US Pat. No. 10/796,220 Claim 14

A method to support hardware-based programmable vectorized fast Fourier transform (FFT) for multidimensional convolution, comprising:
	partitioning each of one or more of a plurality of convolution operations among a plurality of vector processing engines, wherein each of the plurality of vector processing engines is configured to perform a sub-task of each of the one or more of the plurality of convolution operations in parallel;



	vectorizing multi-dimensional input data to generate a plurality of vectors at each layer of a neural network used for the plurality of convolution operations; 
	performing multi-dimensional FFT on the generated vectors and/or the kernels to create output for the convolution operations; 
	outputting deep learning processing results for pattern recognition and classification to a host based on the output for the convolution operations.
Claim 13:
A method to support hardware-based programmable vectorized fast Fourier transform (FFT) for multidimensional convolution, comprising:
	prefetching multi-dimensional input data and/or one or more kernels from an on system memory (OSM) and/or a plurality of external memory resources via direct memory access (DMA);
	accepting a plurality of instructions from a host and submitting the instructions to program a plurality of vector processing engines for the vectorized FFT for multi-dimensional convolution; 
	vectorizing the multi-dimensional input data to generate a plurality of vectors at each layer of a neural network used for a plurality of convolution operations; 	performing multi-dimensional FFT on the generated vectors and/or the kernels to create output for the convolution operations;
	outputting deep learning processing results for pattern recognition and classification to a host based on the output for the convolution operations.

Claim 14:
The method of claim 13, further comprising: 
partitioning each convolution operation for pattern classification among the plurality of vector processing engines, wherein each vector processing engine is configured to perform a sub-task of the convolution operation in parallel.


Instant Application Claim 15
US Pat. No. 10/796,220 Claim 13
A method to support hardware-based programmable vectorized fast Fourier transform (FFT) for multidimensional convolution, comprising:
	



	accepting a plurality of instructions from a host and submitting the instructions to program a plurality of vector processing engines for the vectorized FFT for multidimensional convolution;
	vectorizing the multi-dimensional input data to generate a plurality of vectors at each layer of a neural network used for a plurality of convolution operations; 	performing multi-dimensional FFT on the generated vectors and/or one or more kernels to create output for the convolution operations;  
	outputting deep learning processing results for pattern recognition and classification to a host based on the output for the convolution operations.
A method to support hardware-based programmable vectorized fast Fourier transform (FFT) for multidimensional convolution, comprising:
	prefetching multi-dimensional input data and/or one or more kernels from an on system memory (OSM) and/or a plurality of external memory resources via direct memory access (DMA);
	accepting a plurality of instructions from a host and submitting the instructions to program a plurality of vector processing engines for the vectorized FFT for multi-dimensional convolution; 
	vectorizing the multi-dimensional input data to generate a plurality of vectors at each layer of a neural network used for a plurality of convolution operations; 	performing multi-dimensional FFT on the generated vectors and/or the kernels to create output for the convolution operations;
	outputting deep learning processing results for pattern recognition and classification to a host based on the output for the convolution operations.



Dependent claims 4,5,6,7,8,9,10,11 and 12 of the instant application are substantially similar to claims 2,3,5,7,8,9,10,11 and 12, respectively, of the Patent.  Therefore, they are rejected under the same nonstatutory double patenting rejection as their corresponding independent claim.
Dependent claims 16-22 of the instant application are substantially similar to claims 15-21, respectively, of the Patent.  Therefore, they are rejected under the same nonstatutory double patenting rejection as their corresponding independent claim.
Allowable Subject Matter
Claims 1-22 would be allowable if the nonstatutory double patenting rejection set forth in this Office Action is overcome. 
The most pertinent prior art appears to be Nn-X - a hardware accelerator for convolutional neural networks to Gokhale and US Pat. Pub. No. 2017/0132496 to Shoaib et al. (hereinafter Shoaib).
Gokhale discloses a hardware-based programmable deep learning processor (DLP) (fig. 3.1, hardware system processor; section 3…coprocessor is on-chip), comprising:  an on-system memory (OSM) (section 3.2…co-processor has cache on-chip, at least W x k x 2 bytes) and one or more controllers configured to access a plurality of external memory resources (fig. 3.1…external memory; section 4.1…data can be stored on disk , via I/O such as memory from USB camera, or other virtual memory locations) via direct memory access (DMA) (fig. 4.2…DMA transactions in coprocessor / programmable logic, the coprocessor having four channels for DMA transfers; sections 4.1 and 4.2…DMA transactions initiated by host processor for data transfer between external memory and coprocessor);  a plurality of programmable tensor engines (fig. 3.1…processing elements called collections are processing engines implement in hardware via programmable logic, the collections input data for convolution operations; fig. 3.2…details of each collection; Section 3.2.1…convolution operations are two-dimensional array/matrix operations, which by definition is a 2D tensor) configured to perform a plurality of convolution operations (fig. 3.2 and Section 3.2.1…each collection has convolution engine performing convolution operations) by applying one or more kernels (Section 3.2.1…w[m,n] are weights of the filter kernels) on multi-dimensional input data (Section 3.2.1…input data x[m,n]) to generate deep learning processing results for pattern recognition and classification based on a neural network (fig. 1, pattern recognition and classification of objects in images using deep learning neural network), wherein each of the tensor engines (fig. 3.2… each collection) further comprises: a data engine configured to prefetch the multi-dimensional input data and/or the kernels from the OSM and/or the external memory resources for the convolution operations (Fig. 4.2 and Section 4.1…DMA engines are implemented on the coprocessor, the DMA engines transfer data to/from external memory and the engines include a buffer to store data until it is required by the coprocessor, e.g., prefetch); one or more vector processing engines (fig. 3.2 and Section 3.2.1…convolution engine perform convolution operations which are array/vector manipulations) each configured to vectorize the multi-dimensional input data at each layer of the neural network to generate a plurality of vectors (fig. 1.1…various convolutions layers; Section 3.2.1…input layers array/vectors convolved to generate plurality of output array/vectors).
Shoaib discloses perform multi-dimensional fast Fourier transform (FFT) on the generated vectors and/or the kernels to create output for the convolution operations (Shoaib: fig. 4, item 420 and fig. 6, items 604-608…FFT of input data to create output for convolution operations).
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ALAN CHEN whose telephone number is (571) 272-4143. The examiner can normally be reached M-F 10-7.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kamran Afshar can be reached on (571) 272-7796. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/ALAN CHEN/Primary Examiner, Art Unit 2125