DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 06/22/2021, 07/19/2021, 09/16/2021, 12/28/2021 and 04/06/2022 have been entered and considered. Initialed copy/copies of the PTO-1449 by the Examiner is/are attached.

Specification
The title of the invention is not descriptive.  A new title is required that is clearly indicative of the invention to which the claims are directed. 
The following title is suggested: PROCESSING APPARATUS FOR PERFORMING PROCESSING USING A CONVOLUTIONAL NEURAL NETWORK. 

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claims 1-10, 12 and 14-18 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Yu et al (NPL titled: Light-OPU: An FPGA-based Overlay Processor for Lightweight Convolutional Neural Networks) (Cited in IDS). 
 	As to independent claim 1, Yu discloses a processing apparatus for performing operations with a convolutional neural network having a plurality of layers (an FPGA-based overlay processor with a corresponding compilation flow for general LW-CNN accelerations – see abstract), the apparatus comprising: a data holder (on chip buffer – see section 4, [p][002]) configured to hold at least some of data of a plurality of channels in a target layer among the plurality of layers (see section 4, [p][002] and Fig 4); a plurality of processors (computation engine – see section 4, [p][001]), each configured to perform, in parallel, a product-sum operation using the data of one channel of the target layer and a coefficient corresponding to the target layer (“we develop two operation modes for the computation engine. With conventional mode targeting at traditional convolutional layers, channel parallelism is explored and FM gets reused. For DW-mode, multiple extra levels of parallelism are explored to handle the DW-CONV layer” – see section 4.1, [p][001] and computation engine is able to operate in different modes according to layer types in order to explore different combinations of parallelism – see section 4, [p][001]); and a selector configured to select whether to perform first processing or second processing on the basis of information specifying processing in the target layer (see section 4, [p][001] – “Hardware modules in Light-OPU are parameter tunable, which switch modes at run-time based on parameter registers updated by instructions. The computation engine is able to operate in different modes according to layer types in order to explore different combinations of parallelism”), the first processing including inputting the data of one channel of the target layer, held by the data holder, into one of the plurality of processors (see section 4.1.2 – DW Mode), and the second processing including inputting the data of one channel of the target layer, held by the data holder, to the plurality of processors in parallel (see section 4.1.1 -  Conventional Mode).

 	As to claim 2, Yu teaches the processing apparatus, wherein in the second processing, the data of each one channel among the plurality of channels in the target layer is input to the plurality of processors from the data holder in parallel (“channel parallelism ” – see section 4.1.1 and “input and output channel parallelism” – see Fig 5).

 	As to claim 3, Yu teaches the processing apparatus, wherein in the second processing, each of the plurality of processors is further configured to output a computation result corresponding to one channel in a next layer after the target layer, using the data of a corresponding channel in the target layer which has been input in sequence (“[t]he computation engine is able to operate in different modes according to layer types in order to explore different combinations of parallelism” – see section 4, [p][001] and “Post process” section in Fig 4).

 	As to claim 4, Yu teaches the processing apparatus, further comprising: a plurality of data storages, each corresponding to a different one of the plurality of processors, wherein in the first processing, each of the plurality of data storages is configured to supply the data of the one channel in the target layer to a corresponding one of the processors (intra-kernel parallelism – see section 4.1.2 and Fig. 6), and in the second processing, one data storages among the plurality of data storages is configured to supply the data of the same one channel in the target layer to each of the plurality of processors (“channel parallelism” – see section 4.1.1 and Fig. 5).

 	As to independent claim 5, this claim differs from claim 1 in that claim 1 is method whereas claim 5 is apparatus and the element an accumulator configured to accumulate processing results of each of the plurality of processors is additionally recited. Yu discloses a light-OPU (see abstract) including an accumulator configured to accumulate processing results of each of the plurality of processors (“selective adder trees after PE array” – see section 4.1.1).

 	Claim 6 is rejected for the same reasons as set forth in the rejection of the claim 2, as claim 2 is method claim for the apparatus claimed in claim 6.  

 	As to claim 7, Yu teaches the processing apparatus, further comprising: a plurality of result storages (“Output buffers” – see Fig. 4), each corresponding to one of the plurality of processors (note that the computer engine is connected to the post process which send output to the output buffers – see Fig. 4), wherein in the first processing, each of the plurality of processors is further configured to output the processing result to a corresponding one of the result storages (“[f]or a DW-CONV with n input channels and n output channels, each of the output FM channels is produced by one kernel channel convolving with only one input FM channel” – see section 4.1), and in the second processing, each of the plurality of processors is further configured to output the processing result to the accumulator (residual addition and dense block concatenation are also included – see section 6.1), and the accumulator is further configured to output a result of the accumulating to one of the plurality of result storages (“selective adder trees after PE array, the computation engine is able to efficiently handle the computation” – see section 4.1.1).

 	Claim 8 is rejected for the same reasons as set forth in the rejection of the claim 4, as claim 4 is method claim for the apparatus claimed in claim 8.

	As to claim 9, Yu teaches the processing apparatus, wherein in the first processing, each of the plurality of processors is further configured to output a computation result corresponding to one channel in a next layer after the target layer, using the data of one channel in the target layer (“[f]or a DW-CONV with n input channels and n output channels, each of the output FM channels is produced by one kernel channel convolving with only one input FM channel” – see section 4.1).

	As to claim 10, Yu teaches the processing apparatus, further comprising: a coefficient holder configured to hold at least some of the coefficients used in the product-sum operations in the target layer (“Kernel buffers” – see Fig. 4); and a supply controller configured to control a supply of data from the data holder and the coefficient holder to the plurality of processors (“Data fetch” – see Fig. 4), wherein each of the plurality of processors is further configured to perform the product-sum operation by calculating a product of one piece of the data and one of the coefficients which have been input and then totaling the calculated products (“[o]ne PE computes the inner product of two 1D vectors of length N” – see section 3.1).

 	As to claim 12, Yu teaches the processing apparatus, wherein the coefficients are filter weighting coefficients of a filter for convolution processing (see kernels in Figs  5 and 6), and a size of the filter is configurable for each of the layers (for e.g. kernel sizes 1x1, 3x3, 5x5 and 7x7 in Table 4).

 	As to claim 14, Yu teaches the processing apparatus, wherein the data holder is memory (“on-chip buffers” – see Fig. 4), each of the processors includes a computing core having a multiplier and an adder (“[o]ne PE computes the inner product of two 1D vectors of length” – see section 3.1, [p][002], bullet 3), and the processing apparatus includes a chip on which the memory and the computing cores are provided (computation engine - see Fig 4).

 	As to claim 15, Yu teaches the processing apparatus according, wherein the selector includes an address designator configured to designate an address, in the memory, of the data input to the computing cores, the address designator being provided on the chip (data selection, data fetch, data copy – see Fig. 4).

	As to claim 16, Yu teaches the processing apparatus, wherein the selector includes a multiplexer configured to select an input to the computing cores from a plurality of sources or to select one output among outputs from the plurality of computing cores, the multiplexer being provided on the chip (data selection, data fetch, data copy – see Fig. 4).

	As to claim 17, Yu teaches the processing apparatus, wherein operations with the convolutional neural network are performed for a processing target image (camera in – see Fig. 1), and the data in the target layer is a feature image obtained in a process of performing the operations with the convolutional neural network (see section 1, [p][003] – “[l]ight-OPU accelerates conventional convolution”).

 	As to claim 18, Yu teaches the processing apparatus, further comprising: a controller configured to control the plurality of processors to perform the operations with the convolutional neural network on the processing target image (Compute controls all processing elements (PEs)- see section 3.1); and an image processor configured to perform image processing on the processing target image on the basis of a processing result obtained by performing the operations with the convolutional neural network on the processing target image (processed results – see Fig. 1).

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.


Claim 11 is  rejected under 35 U.S.C. 103 as being unpatentable over Yu et al  in view of Horio et al (Pub No.: 2015/0074163).
 	As to claim , Yu does discloses the processing apparatus, wherein the data and the coefficients used in the product-sum operations in the target layer are classified into a plurality of groups (“[f]or example, Fig. 3 shows a fragment of the instruction execution process for one FM block’s computation, where several instructions executed at different time points can be grouped together and read at the same time” – see section 3.2,  [p][002]), however, does not expressly disclose the selector is further configured to select whether to perform the first processing or the second processing on the basis of the group of the data and the coefficients used in the product-sum operations.
 	Horio discloses a product-sum operation circuit wherein the selector is further configured to select whether to perform the first processing or the second processing on the basis of the group of the data and the coefficients used in the product-sum operations (“an input selector which outputs an element of the first matrix and an element of the second matrix to input terminals of the plurality of multipliers according to the number of rows and the number of columns of the first matrix and the second matrix; and an output selector selects and outputs the addition results of the plurality of first adders or the plurality of second adders according to the number of rows and columns of the first matrix and the second matrix, as the third matrix” – see [p][0010]).
Yu & Horio are combinable because they are directed to product sum multipliers.
Before the effective filing date of the claims invention, it would have been obvious to a person of ordinary skill in the art to have added the product-sum operation circuit of Horio to the Light-OPU of Yu. The suggestion/motivation for doing so would have to select and output the addition results of each of the plurality of first-adders or each of the plurality of second-adders according to the number of rows and the number of columns of the first-matrix and the second-matrix, as the third-matrix  (see abstract) .	
 	Therefore, it would have been obvious to combine Yu with Horio to obtain the invention as specified in claim 11.

Claim 13 is  rejected under 35 U.S.C. 103 as being unpatentable over Yu et al  in view of Sun et al (NPL titled: A Lightweight Neural Network Combining Dilated Convolution and Depthwise Separable Convolution).
 	As to claim 13, Yu does not expressly disclose the processing apparatus, wherein the coefficients are filter weighting coefficients of a filter for dilated convolution processing.
 	Sun discloses lightweight neural network including wherein the coefficients are filter weighting coefficients of a filter for dilated convolution processing (“dilated convolution as a filter to extract the feature of the image” – see section 2, [p][001]).
 Yu & Sun are combinable because they are directed to product sum multipliers.
Before the effective filing date of the claims invention, it would have been obvious to a person of ordinary skill in the art to have added the lightweight neural network of Sun to the Light-OPU of Yu. The suggestion/motivation for doing so would have to add a dilation filter to obtain more image information without increasing the amount of calculation then use the dilated filter to convolve each input channel, and the final filter combines the output of different convolution channels (see section 2.1, [p][001]).	
 	Therefore, it would have been obvious to combine Yu with Horio to obtain the invention as specified in claim 13.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Park et al (Pub No.: 20220335282) discloses a Neural Processing Unit For Reusing Weights During Depth-wise Convolution Operation.
NAGAMATSU et al (Pub No.: 20220300253) discloses an ARITHMETIC OPERATION DEVICE AND ARITHMETIC OPERATION SYSTEM.

Inquiries 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ANDRAE S. ALLISON whose telephone number is (571)270-1052.  The examiner can normally be reached on Monday-Friday, 8:00 am - 5:00 pm, EST.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Vu Le can be reached on (571) 272-7223.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/ANDRAE S ALLISON/
Primary Examiner, Art Unit 2663
December 2, 2022