Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
 DETAILED OFFICE ACTION

Status of Claims

Claims 1-11 are pending in this Office Action.

Claim Rejections - 35 USC § 103

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.

4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b) (2) (C) for any potential 35 U.S.C. 102(a) (2) prior art against the later invention.

1.	Claims 1,3,4,5,6,8,9,10 and 11  are rejected under 35 U.S.C 103 as being patentable over Fengbin Tu ( NPL Doc: “Deep Convolutional Neural Network Architecture with Reconfigurable Computation Patterns”, April 12 , 2017, IEEE Transactions on Very Large Scale Integration (VLSI) Systems ( Volume: 25, Issue: 8, Aug. 2017), Pages 2220-2231) in view of BOESCH et al. (USPUB 20180189641) in further view of HAI WANG( NPL Doc: “Enhanced Efficiency 3D Convolution Based on Optimal FPGA Accelerator”, June 7, 2017, IEEE Access ( Volume: 5), Pages 6909-6914).

As per Claim 1,  Fengbin Tu teaches A method for performing a convolution operation on an image using a reconfigurable convolution engine, the method comprising: receiving, by a host processor( Page 2221- Fig. 3 – host processor, Page 2221, Col. 2 - “B. Architecture of a DCNN Accelerating System- …a host processor …”), image data for performing a convolution operation on an image by using a convolution engine( Page 2221, Col. 2- “…The convolution core loads feature maps to its input registers (Input REGs) and then performs convolutions with several parallel convolution engines (CEs),…”), wherein the plurality of instances is allocated based on the kernel size(Page 2225, Col. 2- “…OOM is highly scalable for different kernel sizes, because PEs can easily collect their required data along the S-shaped sliding trajectory, without changing PE functions. For different kernel sizes, we just need to adjust the trajectory’s length, which is equal to the kernel size (K × K)….”) , and wherein each instance, of the plurality of instances, performs parallel row wise convolution operation on the feature map ( Page 2221, Col. 2- “…The convolution core loads feature maps to its input registers (Input REGs) and then performs convolutions with several parallel convolution engines (CEs),…”),and wherein each instance further comprises a set of computing blocks operating concurrently to perform convolution operation on the feature map of the image in order to generate a convolution output ( Page 2226, Col. 1 – “…In POOM, we tile each output map into smaller blocks, so that the array can compute more maps in parallel.As the case in Fig. 8(b), the 8×8 array is divided into 16 2×2 blocks, while all blocks share the same input data….”  And  Page 2228- Fig. 11 and “…DSN1 is a 8 × 8 sharing network with 2×2 blocks. In each cycle, Block0 loads 2×2 data from the input REG level, and transfers them to the other blocks with the interconnections of DSNs. Each block feeds the data to its corresponding 2 × 2 MACs through Port “ds1.” In this way, 8×8 PE blocks always share the same input data, and compute at most 64 different output maps in parallel…”);
	Fengbin Tu does not explicitly teach wherein the image data comprises a feature map and a depth information associated to an image; determining, by the host processor, a kernel size based on the image data, clock speed associated to the convolution engine and a number of available on-chip resources; allocating a plurality of instances, to the host processor, to operate depth wise in parallel mode, and subsequently depth wise convolution operation on different feature maps resulting in layer combining, and aggregating, by the host processor, the convolution output generated by each computing block to produce a convolution result for the image, wherein the convolution output is aggregated using a pipeline adder.  
	However, within analogous art, BOESCH et al. teaches wherein the image data comprises a feature map and a depth information associated to an image ( features and depth information of an input data to the CNN taught within Paragraphs [0055-0056]) ; determining, by the host processor, a kernel size based on the image data ( Paragraphs [0063] and [0072]) , clock speed associated to the convolution engine and a number of available on-chip resources (Fig. 3 showing on chip  resource and clock cycle mentioned within Paragraphs [0072] and [0283]) ; allocating a plurality of instances, to the host processor, to operate depth wise in parallel mode ( Paragraphs [0036-0037] and  [0252]) , and subsequently depth wise convolution operation on different feature maps resulting in layer combining ( Paragraph [0216]- “…Feature and kernel buffer data applied to the CA MAC units 620 is mathematically combined according to the convolutional operations described herein, and the resulting output products from the CA MAC units 620 are passed to the CA adder tree 622….”) , 
	One of ordinary skill in the art would have been motivated to combine the teaching of BOESCH et al. within the modified teaching of the Deep Convolutional Neural Network Architecture with Reconfigurable Computation Patterns mentioned by Fengbin Tu  because the Hardware accelerator engine mentioned by BOESCH et al.   provides a system and method for implementing  convolution accelerators within Deep Convolutional Neural Network systems ( Paragraph [0002]). 
	Therefore, it would have been obvious for one in the ordinary skills in the art before the effective filing date of the claimed invention to implement the Hardware accelerator engine mentioned by BOESCH et al. within the modified teaching of the Deep Convolutional Neural Network Architecture with Reconfigurable Computation Patterns mentioned by Fengbin Tu for implementation of a system and method for convolution accelerators within Deep Convolutional Neural Network systems ( Paragraph [0002]).
	Combination of Fengbin Tu and BOESCH et al. does not explicitly teach aggregating, by the host processor, the convolution output generated by each computing block to produce a convolution result for the image, wherein the convolution output is aggregated using a pipeline adder.   
However, within analogous art, HAI WANG teaches aggregating, by the host processor, the convolution output generated by each computing block to produce a convolution result for the image ( FIGURE 1  and 7 shows the processing of convolutional computing block and the combining of the data to output and Page 6912, Col. 2- “B. OPTIMIZED XONVOLUTION ACCELERATOR IN FPGA …” ) , wherein the convolution output is aggregated using a pipeline adder ( Page 6913, Col. 1- FIGURE 8 – fully pipelined calculators for the output ) .  

	Therefore, it would have been obvious for one in the ordinary skills in the art before the effective filing date of the claimed invention to implement the Enhanced Efficiency 3D Convolution Based on Optimal FPGA Accelerator mentioned HAI WANG within the  combined modified teaching of the Deep Convolutional Neural Network Architecture with Reconfigurable Computation Patterns mentioned by Fengbin Tu and the Hardware accelerator engine mentioned by BOESCH et al. for implementation of a system and method for FPGA within the CNN acceleration platform for improving performance.

As per Claim 3, Combination of Fengbin Tu, BOESCH et al. and HAI WANG teach claim 1,
Combination of Fengbin Tu and HAI WANG does not explicitly teach wherein the image data comprises pixel resolution, number of filters to be applied, and a convolution layer.
However, within analogous art,  BOESCH et al. teaches wherein the image data comprises pixel resolution, number of filters to be applied, and a convolution layer ( Paragraphs [0036-0037] and convolution layer taught within Paragraph [0227] ) .  

As per Claim 4, Combination of Fengbin Tu, BOESCH et al. and HAI WANG teach claim 1,
Combination of Fengbin Tu, BOESCH et al. does not explicitly teach wherein result of each computing block operating in pipeline is aggregated using the pipeline adder to generate the convolution result .
However, within analogous art, HAI WANG teaches wherein result of each computing block operating in pipeline is aggregated using the pipeline adder to generate the convolution result (  Page 6913- FIGURE 7 showing the input adders within the pipeline where the data is aggregated from the above computing block ( 2D convolvers array  , IDDLs and kernel caches and Col. 1 – lines 1-3 and Col. 2 – lines 1-7) .  

As per Claim 5,  Combination of Fengbin Tu, BOESCH et al. and HAI WANG teach claim 1,
Fengbin Tu teaches  wherein one or more instances of the plurality of instances are grouped in a cluster ( Page 2227, Col. 2- “… The MAC level is a cluster of 16×16 multiply accumulators
(MAC), distributed in 16×16 PEs. Each MAC’s input register R0 loads data from the input REG level (Port “in”) and data sharing level (Port “ds0,” “ds1,” and “ds2”)….”) to perform convolution operation on the feature map ( Page 2221, Col. 1 – “…Fig. 2(a) shows a CONV layer in DCNNs. It takes N×H×L feature maps as the inputs, and has M 3-D convolutional kernels (K ×K ×N). Each kernel performs a 3-D convolution on the input maps with a sliding stride of S,….”) . 

As per Claim 6,  Fengbin Tu teaches A reconfigurable convolution engine for performing a convolution operation on an image ( Page 2220- Col. 2-“… a reconfigurable architecture called deep neural architecture (DNA), with reconfigurable computation patterns for different DCNNs….”) , the reconfigurable convolution engine comprising: a host processor ( Page 2221- Fig. 3 – host processor, Page 2221, Col. 2 - “B. Architecture of a DCNN Accelerating System- …a host processor …”) ; and a memory coupled to the host processor, wherein the host processor is capable of executing a set of instructions stored in the memory ( Page 2221- Fig. 3 – DRAM memory and Page 2222, Col. 2 – “…we propose a hybrid data reuse pattern that combines three basic reuse patterns. Memory access times are used to measure the data movements between the core and buffers based on the architecture model in Fig. 3. Due to the limited storage in the convolution core,…” ) , and wherein the set of instructions comprises: receiving image data for performing a convolution operation on an image by using a convolution engine ( Page 2221, Col. 2- “…The convolution core loads feature maps to its input registers (Input REGs) and then performs convolutions with several parallel convolution engines (CEs),…”) , wherein the plurality of instances is based on the kernel size(Page 2225, Col. 2- “…OOM is highly scalable for different kernel sizes, because PEs can easily collect their required data along the S-shaped sliding trajectory, without changing PE functions. For different kernel sizes, we just need to adjust the trajectory’s length, which is equal to the kernel size (K × K)….”) , and wherein each instance, of the plurality of instances, performs parallel row wise convolution operation on the feature map ( Page 2221, Col. 2- “…The convolution core loads feature maps to its input registers (Input REGs) and then performs convolutions with several parallel convolution engines (CEs),…”), wherein each instance further comprises a set of computing blocks operating concurrently to perform convolution operation on the feature map of the image in order to generate a convolution output ( Page 2226, Col. 1 – “…In POOM, we tile each output map into smaller blocks, so that the array can compute more maps in parallel.As the case in Fig. 8(b), the 8×8 array is divided into 16 2×2 blocks, while all blocks share the same input data….”  And  Page 2228- Fig. 11 and “…DSN1 is a 8 × 8 sharing network with 2×2 blocks. In each cycle, Block0 loads 2×2 data from the input REG level, and transfers them to the other blocks with the interconnections of DSNs. Each block feeds the data to its corresponding 2 × 2 MACs through Port “ds1.” In this way, 8×8 PE blocks always share the same input data, and compute at most 64 different output maps in parallel…”) ; 
	Fengbin Tu does not explicitly teach wherein the image data comprises a feature map and a depth information associated to an image; determining a kernel size based on the image data, clock speed associated to the convolution engine and number of available on-chip resources  ; allocating a plurality of instances to operate depth wise in parallel mode , and subsequently depth wise convolution operation on different feature maps resulting in layer combining ,and aggregating convolution output of each computing block for each instance of the plurality of instances to produce a convolution result for the image , wherein the convolution output is aggregated using a pipeline adder .  
	However, within analogous art, BOESCH et al. teaches wherein the image data comprises a feature map and a depth information associated to an image ( features and depth information of an input data to the CNN taught within Paragraphs [0055-0056]) ; determining a kernel size based on the image data ( Paragraphs [0063] and [0072]) , clock speed associated to the convolution engine and number of available on-chip resources (Fig. 3 showing on chip  resource and clock cycle mentioned within Paragraphs [0072] and [0283]) ; allocating a plurality of instances to operate depth wise in parallel mode ( Paragraphs [0036-0037] and  [0252]) , and subsequently depth wise convolution operation on different feature maps resulting in layer combining ( Paragraph [0216]- “…Feature and kernel buffer data 
applied to the CA MAC units 620 is mathematically combined according to the convolutional operations described herein, and the resulting output products from the CA MAC units 620 are passed to the CA adder tree 622….”) , 
	One of ordinary skill in the art would have been motivated to combine the teaching of BOESCH et al. within the modified teaching of the Deep Convolutional Neural Network Architecture with Reconfigurable Computation Patterns mentioned by Fengbin Tu  because the Hardware accelerator engine mentioned by BOESCH et al.   provides a system and method for implementing  convolution accelerators within Deep Convolutional Neural Network systems ( Paragraph [0002]). 
	Therefore, it would have been obvious for one in the ordinary skills in the art before the effective filing date of the claimed invention to implement the Hardware accelerator engine mentioned by BOESCH et al. within the modified teaching of the Deep Convolutional Neural Network Architecture with Reconfigurable Computation Patterns mentioned by Fengbin Tu for implementation of a system and method for convolution accelerators within Deep Convolutional Neural Network systems ( Paragraph [0002]).
	Combination of Fengbin Tu and BOESCH et al. does not explicitly teach aggregating convolution output of each computing block for each instance of the plurality of instances to produce a convolution result for the image, wherein the convolution output is aggregated using a pipeline adder .  
 	However, within analogous art, HAI WANG teaches aggregating convolution output of each computing block for each instance of the plurality of instances to produce a convolution result for the image( FIGURE 1  and 7 shows the processing of convolutional computing block and the combining of the data to output and Page 6912, Col. 2- “B. OPTIMIZED CONVOLUTION ACCELERATOR IN FPGA …” ) , wherein the convolution output is aggregated using a pipeline adder ( Page 6913, Col. 1- FIGURE 8 – fully pipelined calculators for the output ) .  
	One of ordinary skill in the art would have been motivated to combine the teaching of HAI WANG within the  combined modified teaching of the Deep Convolutional Neural Network Architecture with Reconfigurable Computation Patterns mentioned by Fengbin Tu and the Hardware accelerator engine mentioned by BOESCH et al.   because the Enhanced Efficiency 3D Convolution Based on Optimal FPGA Accelerator mentioned HAI WANG provides a system and method for implementing  FPGA within the CNN acceleration platform for improving performance. 
	Therefore, it would have been obvious for one in the ordinary skills in the art before the effective filing date of the claimed invention to implement the Enhanced Efficiency 3D Convolution Based on Optimal FPGA Accelerator mentioned HAI WANG within the  combined modified teaching of the Deep Convolutional Neural Network Architecture with Reconfigurable Computation Patterns mentioned by Fengbin Tu and the Hardware accelerator engine mentioned by BOESCH et al. for implementation of a system and method for FPGA within the CNN acceleration platform for improving performance.

As per Claim 8, Combination of Fengbin Tu, BOESCH et al. and HAI WANG teach claim 6,
Combination of Fengbin Tu and HAI WANG does not explicitly teach wherein the image data comprises pixel resolution, number of filters to be applied, and a convolution layer.
However, within analogous art, BOESCH et al. teaches wherein the image data comprises pixel resolution, number of filters to be applied, and a convolution layer( Paragraphs [0036-0037] and convolution layer taught within Paragraph [0227] ).  

As per Claim 9, Combination of Fengbin Tu, BOESCH et al. and HAI WANG teach claim 6,
Combination of Fengbin Tu, BOESCH et al. does not explicitly teach wherein result of each computing block operating in pipeline is aggregated using the pipeline adder to generate the convolution result.
However, within analogous art, HAI WANG teaches wherein result of each computing block operating in pipeline is aggregated using the pipeline adder to generate the convolution result (  Page 6913- FIGURE 7 showing the input adders within the pipeline where the data is aggregated from the above computing block ( 2D convolvers array  , IDDLs and kernel caches and Col. 1 – lines 1-3 and Col. 2 – lines 1-7) .  

As per Claim 10, Combination of Fengbin Tu, BOESCH et al. and HAI WANG teach claim 6, 
Fengbin Tu teaches  wherein one or more instances of the plurality of instances are grouped in a cluster ( Page 2227, Col. 2- “… The MAC level is a cluster of 16×16 multiply accumulators
(MAC), distributed in 16×16 PEs. Each MAC’s input register R0 loads data from the input REG level (Port “in”) and data sharing level (Port “ds0,” “ds1,” and “ds2”)….”) to perform convolution operation on the feature map ( Page 2221, Col. 1 – “…Fig. 2(a) shows a CONV layer in DCNNs. It takes N×H×L feature maps as the inputs, and has M 3-D convolutional kernels (K ×K ×N). Each kernel performs a 3-D convolution on the input maps with a sliding stride of S,….”) .  

As per Claim 11, Fengbin Tu teaches the program comprising a program code: a program code for receiving image data for performing a convolution operation on an image by using a convolution engine ( Page 2221, Col. 2- “…The convolution core loads feature maps to its input registers (Input REGs) and then performs convolutions with several parallel convolution engines (CEs),…”) , wherein the plurality of instances is based on the kernel size (Page 2225, Col. 2- “…OOM is highly scalable for different kernel sizes, because PEs can easily collect their required data along the S-shaped sliding trajectory, without changing PE functions. For different kernel sizes, we just need to adjust the trajectory’s length, which is equal to the kernel size (K × K)….”) , and wherein each instance, of the plurality of instances, performs parallel row wise convolution operation on the feature map ( Page 2221, Col. 2- “…The convolution core loads feature maps to its input registers (Input REGs) and then performs convolutions with several parallel convolution engines (CEs),…”), and wherein each instance further comprises a set of computing blocks operating concurrently to perform convolution operation on the feature map of the image in order to generate a convolution output ( Page 2226, Col. 1 – “…In POOM, we tile each output map into smaller blocks, so that the array can compute more maps in parallel.As the case in Fig. 8(b), the 8×8 array is divided into 16 2×2 blocks, while all blocks share the same input data….”  And  Page 2228- Fig. 11 and “…DSN1 is a 8 × 8 sharing network with 2×2 blocks. In each cycle, Block0 loads 2×2 data from the input REG level, and transfers them to the other blocks with the interconnections of DSNs. Each block feeds the data to its corresponding 2 × 2 MACs through Port “ds1.” In this way, 8×8 PE blocks always share the same input data, and compute at most 64 different output maps in parallel…”) ; 
	Fengbin Tu does not explicitly teach A non-transitory computer readable medium embodying a program executable in a computing device for performing a convolution operation on an image using a reconfigurable convolution engine, wherein the image data comprises a feature map and a depth information associated to an image; a program code for determining a kernel size based on the image data, clock speed associated to the convolution engine and number of available on-chip resources; a program code for allocating a plurality of instances to operate in depth wise in parallel mode, and subsequently depth wise convolution operation on different feature maps resulting in layer combining, and a program code for aggregating convolution output of each computing block for each instance of the plurality of instances to produce a convolution result for the image, wherein the convolution output is aggregated using a pipeline adder.  
	However, within analogous art, BOESCH et al. teaches A non-transitory computer readable medium embodying a program executable in a computing device for performing a convolution operation on an image using a reconfigurable convolution engine ( Paragraph [0240]- “…the configuration registers may be accessed and programmed by a host processor such as applications processor 128, a DSP of DSP cluster 122, a command passed into 
the stream switch 500 and processed by message/command logic 512, or by some 
other circuitry….” AND [0296]- “…processing chains by programming the stream 
switch 500.  The programming may be carried out by storing particular values in 
particular ones of the CAF control registers 402 (FIG. 4)…”)  ,wherein the image data comprises a feature map and a depth information associated to an image( features and depth information of an input data to the CNN taught within Paragraphs [0055-0056]) ; a program code for determining a kernel size based on the image data ( Paragraphs [0063] and [0072]) , clock speed associated to the convolution engine and number of available on-chip resources (Fig. 3 showing on chip  resource and clock cycle mentioned within Paragraphs [0072] and [0283]) ; a program code for allocating a plurality of instances to operate in depth wise in parallel mode ( Paragraphs [0036-0037] and  [0252]) , and subsequently depth wise convolution operation on different feature maps resulting in layer combining ( Paragraph [0216]- “…Feature and kernel buffer data applied to the CA MAC units 620 is mathematically combined according to the convolutional operations described herein, and the resulting output products from the CA MAC units 620 are passed to the CA adder tree 622….”) , 
	One of ordinary skill in the art would have been motivated to combine the teaching of BOESCH et al. within the modified teaching of the Deep Convolutional Neural Network Architecture with Reconfigurable Computation Patterns mentioned by Fengbin Tu  because the Hardware accelerator engine mentioned by BOESCH et al.   provides a system and method for implementing  convolution accelerators within Deep Convolutional Neural Network systems ( Paragraph [0002]). 
	Therefore, it would have been obvious for one in the ordinary skills in the art before the effective filing date of the claimed invention to implement the Hardware accelerator engine mentioned by BOESCH et al. within the modified teaching of the Deep Convolutional Neural Network Architecture with Reconfigurable Computation Patterns mentioned by Fengbin Tu for implementation of a system and method for convolution accelerators within Deep Convolutional Neural Network systems ( Paragraph [0002]).
Combination of Fengbin Tu and BOESCH et al. does not explicitly teach a program code for aggregating convolution output of each computing block for each instance of the plurality of instances to produce a convolution result for the image, wherein the convolution output is aggregated using a pipeline adder.  
 	However, within analogous art, HAI WANG teaches a program code for aggregating convolution output of each computing block for each instance of the plurality of instances to produce a convolution result for the image ( FIGURE 1  and 7 shows the processing of convolutional computing block and the combining of the data to output and Page 6912, Col. 2- “B. OPTIMIZED XONVOLUTION ACCELERATOR IN FPGA …” ) , wherein the convolution output is aggregated using a pipeline adder ( Page 6913, Col. 1- FIGURE 8 – fully pipelined calculators for the output ) .  
	One of ordinary skill in the art would have been motivated to combine the teaching of HAI WANG within the  combined modified teaching of the Deep Convolutional Neural Network Architecture with Reconfigurable Computation Patterns mentioned by Fengbin Tu and the Hardware accelerator engine mentioned by BOESCH et al.   because the Enhanced Efficiency 3D Convolution Based on Optimal FPGA Accelerator mentioned HAI WANG provides a system and method for implementing  FPGA within the CNN acceleration platform for improving performance. 
	Therefore, it would have been obvious for one in the ordinary skills in the art before the effective filing date of the claimed invention to implement the Enhanced Efficiency 3D Convolution Based on Optimal FPGA Accelerator mentioned HAI WANG within the  combined modified teaching of the Deep Convolutional Neural Network Architecture with Reconfigurable Computation Patterns mentioned by Fengbin Tu and the Hardware accelerator engine mentioned by BOESCH et al. for implementation of a system and method for FPGA within the CNN acceleration platform for improving performance.

It is noted that any citations to specific, pages, columns, lines, or figures in the prior art references and any interpretation of the reference should not be considered to be limiting in any way. A reference is relevant for all it contains and may be relied upon for all that it would have reasonably suggested to one having ordinary skill in the art. See MPEP 2123. 

Allowable Subject Matter

2.          Claims 2 and 7 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

3.         The following is an examiner’s statement of reasons for objecting the claims as allowable subject matter: 

As to claim 2, prior art of record does not teach or suggest the limitation mentioned within claim 2: “…clustering one or more computing blocks with other computing blocks present in the set of computing blocks, when convolution operation performed, for an intermediate layer of the image, by the one or more computing blocks is complete, and wherein the other computing blocks are performing convolution operation on subsequent images. ” 

As to claim 7, prior art of record does not teach or suggest the limitation mentioned within claim 7: “…cluster one or more computing blocks with other computing blocks present in the set of computing blocks, when convolution operation performed, for an intermediate layer of the image, by the one or more computing blocks is complete, and wherein the other computing blocks are performing convolution operation on subsequent images. ” 




Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee.  Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance.”

Examiner’s Notes

4.	The Examiner acknowledges the following prior arts below as pertinent to the current applications claim limitations and inventive concept, although the following prior arts shown below were not relied upon to address the limitations within the claim , they are analogous art mentioning the inventive concept key points on ( reconfiguration convolution engine within neural networks, Deep neural network ( DNN),FPGA , image feature maps and Plurality of filters and Kernel sizes  etc.).

1) 	USPUB- 20200118423
2)	USPUB- 20190122378
3)	USPUB- 20190043242
4) 	USPUB- 20190043203
5) 	USPUB- 20170356976
6) 	USPUB - 20170345130

7)	Yufei Ma,"Optimizing the Convolution Operation to Accelerate Deep Neural Networks on FPGA",03 April 2018,IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 26, NO. 7, JULY 2018, Pages 1354-1365.

8) 	Cl´ement Farabet,"Hardware Accelerated Convolutional Neural Networks for Synthetic Vision Systems," 03 August 2010, Proceedings of 2010 IEEE International Symposium on Circuits and Systems,Pages 257-259.

9)	Yu Wang, “RECONFIGURABLE PROCESSOR FOR DEEP LEARNING IN AUTONOMOUS VEHICLES,"25 September, 2017,  ITU Journal: ICT Discoveries, Special Issue No. 1, 25 Sept. 2017,Pages 1-8.

10)	Gabriel J. García,"A Survey on FPGA-Based Sensor Systems: Towards Intelligent and Reconfigurable Low-Power Sensors for Computer Vision, Control and Signal Processing",31 March 2014, Sensors 2014, 14, 6247-6278; doi:10.3390/s140406247, Pages 6247-6260.







Conclusion

5. 	Any inquiry concerning this communication or earlier communications from the examiner should be directed to OMAR S. ISMAIL whose telephone number is (571)272-9799 and Fax # (571)273-9799. The examiner can normally be reached on M-F: 9:00 AM - 6:00 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http:/ If attempts to reach the examiner by telephone are unsuccessful, the examiner's supervisor, David C. Payne can be reached on (571)272-3024. The fax phone number for the organization where this application or proceeding is assigned is (571)273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free)? If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/OMAR S ISMAIL/Primary Examiner, Art Unit 2637