Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
1.  A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on April 20th, 2022 has been entered.

Response to Arguments
2.  Applicant’s arguments, filed April 20th, 2022, with respect to the rejections of the independent claims under 35 USC 103 have been fully considered and are persuasive.  Therefore, the rejections have been withdrawn.  However, upon further consideration, new grounds of rejection are made in view of Son et al (US 2018/0129893).

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

3.  Claims 1 and 3-20 are rejected under 35 U.S.C. 103 as being unpatentable over Liu et al (US 2020/0326938, herein Liu) in view of Najibi et al (US 2018/0285682, herein Najibi) and Son et al (US 2018/0129893, herein Son).
Regarding claim 1, Liu teaches an image processing system for convolving an image, the image processing system comprising:
a memory that stores the image, a plurality of skip values, a pixel base address, and a plurality of non-zero coefficients, and wherein each skip value corresponds to a location offset of each non-zero coefficient with respect to a previous non-zero coefficient of the plurality of non-zero coefficients (Fig 3, memory system 302 & instruction buffer 313, [0003], convolution performed on an image using matrix multiplication, [0031-0033], [0049], [0068], base address of image matrix, offsets used to indicate non-zero coefficients in matrices); and
processing circuitry coupled to the memory comprising (Fig 3, processor 300):
a plurality of registers ([0032-0033]);
a load-store circuit configured to retrieve from the memory and store into the plurality of registers a set of rows of the image, the plurality of skip values, and the pixel base address ([0023], [0032-0033], retrieving inputs from memory and storing in registers); and
a convolution circuit configured to execute, for a plurality of times, a multiply-accumulate (MAC) instruction and a load instruction parallelly in one clock cycle on a set of rows of the image and the plurality of non-zero coefficients to convolve the image with the plurality of non-zero coefficients, wherein the MAC and load instructions are executed parallelly in one clock cycle on first and second rows of the set of rows, respectively, such that the first and second rows are associated with first and second non-zero coefficients of the plurality of non-zero coefficients and first and second skip values of the plurality of skip values, respectively, and wherein the load instruction on the second row is executed based on the pixel base address, the second skip value, and a width of each row of the set of rows (Fig 3, execution engine 324, [0003], [0031-0033], [0049], [0062-0068], parallel MAC and LOAD instructions used to convolve input matrices, offsets used to identify non-zero coefficients and elements of matrices using a base address of the image matrix).
Liu fails to teach wherein the processing circuitry retrieves a merged kernel including only a plurality of non-zero coefficients of a set of kernels or executing the instructions to convolve the image with the merged kernel.
Najibi teaches an image processing system configured to retrieve a merged kernel including a plurality of coefficients of a set of kernels and execute instructions to convolve an image with the merged kernel ([0101-0105], CNN used for image classification, each layer uses combination of kernels to feed the machine learning component).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to combine the teachings of Liu with Najibi to utilize merged kernels in the convolutional neural network operations.  Liu does not disclose the specifics of how the neural network layers are generated, as Liu’s disclosure is focused on the lower-level operations of the matrix multiplication algorithm which is used for convolving the image by a neural network.  However, given that both Liu and Najibi disclose the use of convolutional neural networks to perform image classification, using kernels to merge coefficients of the layers that are to be processed with the matrix multiplication algorithm would merely entail a combination of known prior art elements to achieve predictable results, and thus would have been obvious to one of ordinary skill in the art.
Liu and Najibi fail to teach wherein the merged kernel only includes a plurality of non-zero coefficients.
Son teaches an image processing system comprising a memory that stores a merged kernel that only includes a plurality of non-zero coefficients of a set of kernels ([0099], [0107], load only non-zero input elements of kernel).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to combine the teachings of Liu with Najibi with those of Son to generate a kernel including only non-zero elements.  While Liu teaches the use of skip values to indicate respective locations of non-zero elements of a kernel, Liu does not explicitly teach generating a kernel to be loaded as input to a convolutional neural network that includes only such non-zero elements.  However, Son also teaches kernel elements being skipped when they have a value of zero (Son [0099]).  Therefore, selective loading of kernel elements to avoid loading zero input elements (Son [0107]) may reduce the memory footprint of the input kernel and thus increase the efficiency of the processing system.  Doing so would merely entail a combination of known prior art elements to achieve predictable results, and thus would have been obvious to one of ordinary skill in the art.

Regarding claim 3, the combination of Liu, Najibi, and Son teaches the image processing system of claim 1, wherein the convolution of the image with the merged kernel corresponds to generation of a set of feature maps, and wherein the processing circuitry is further configured to store the set of feature maps in the memory (Liu [0003], [0019], input features represented by input matrix, Najibi [0093], feature recognition).

Regarding claim 4, the combination of Liu, Najibi, and Son teaches the image processing system of claim 1, wherein the plurality of registers comprise: a vector register set comprising first and second vector registers that are configured to store the first and second rows, respectively; 2482157156US01a weight register that is configured to store the merged kernel, wherein the plurality of non-zero coefficients of the set of kernels are arranged column-wise serially in the merged kernel; and a skip register that is configured to store the plurality of skip values, wherein the location offset corresponding to each skip value includes a row offset and a column offset (Liu [0047], [0057], use of vectors & [0068-0072], plurality of registers used for storing matrix rows, coefficients, and offsets & Najibi [0105], use of kernels).

Regarding claim 5, the combination of Liu, Najibi, and Son teaches the image processing system of claim 4, wherein the load-store circuit is coupled with the vector register set, the weight register, and the skip register, and configured to: load the merged kernel in the weight register to store the merged kernel therein; load the plurality of skip values in the skip register to store the plurality of skip values therein; and load, by executing the load instruction on the first row, the first row in the first vector register to store the first row therein, wherein the load-store circuit loads the first row in the first vector register based on the pixel base address, the first skip value, and the width of each row (Liu Fig 3, memory system 302, [0062-0068], load instruction processing & Najibi [0105], use of kernels).

Regarding claim 6, the combination of Liu, Najibi, and Son teaches the image processing system of claim 5, wherein the load-store circuit is further configured to load, by executing the load instruction on the second row, the second row in the second vector register to store the second row therein, and wherein after the first row is loaded in the first vector register in one clock cycle, the second row is loaded in the second vector register in a subsequent clock cycle (Liu [0062-0068]).

Regarding claim 7, the combination of Liu, Najibi, and Son teaches the image processing system of claim 4, wherein to execute the MAC instruction, the convolution circuit is further configured to execute (i) multiplication and accumulation operations on the first row and the first non-zero coefficient, and (ii) a logical shift operation on the merged kernel (Liu [0043], [0062-0068] & Najibi [0105]).

Regarding claim 8, the combination of Liu, Najibi, and Son teaches the image processing system of claim 7, wherein the convolution circuit is coupled with the vector register set and the weight register, and configured to receive the first and second rows and the merged kernel from the vector register set and the weight register, respectively, to execute the MAC instruction thereon (Liu Fig 3, [0003], [0033], second stage processor 306).

Regarding claim 9, the combination of Liu, Najibi, and Son teaches the image processing system of claim 8, wherein the convolution circuit comprises: a plurality of multipliers that are coupled with the vector register set and the weight register, and configured to execute the multiplication operation to multiply each element associated with each row with a corresponding non-zero coefficient of the merged kernel and generate pluralities of multiplication outputs; and an accumulation register that is coupled with the plurality of multipliers, and configured to receive the pluralities of multiplication outputs and execute the accumulation operation to accumulate the pluralities of multiplication outputs therein (Liu Fig 3, [0033], [0062-0068]).

Regarding claim 10, the combination of Liu, Najibi, and Son teaches the image processing system of claim 9, wherein the convolution circuit further comprises a first shifter circuit that is coupled with the weight register, and configured to execute the logical shift operation on the merged kernel to shift a current non-zero coefficient of the merged kernel by replacing the current non-zero coefficient with a subsequent non- zero coefficient of the merged kernel, when a first plurality of elements associated with the currently loaded row in the vector register are multiplied with the current non-zero coefficient (Liu Fig 3, [0043], [0062-0068]).

Regarding claim 11, the combination of Liu, Najibi, and Son teaches the image processing system of claim 8, wherein the convolution circuit further comprises a second shifter circuit that is coupled with the skip register, and configured to execute the logical shift operation on the plurality of skip values to shift a current skip value of the plurality of skip values by replacing the current skip value with a subsequent skip value of the plurality of skip values (Liu Fig 3, [0043], [0062-0068]).

Regarding claim 12, the combination of Liu, Najibi, and Son teaches the image processing system of claim 4, wherein when a width of the image is at most half of a width of each vector register of the vector register set, each vector register of the vector register set is configured to store at least two rows (Liu [0048], different representations of coefficient data).

Regarding claim 13, the combination of Liu, Najibi, and Son teaches the image processing system of claim 1, wherein the MAC instruction and the load instruction include first and second pluralities of instructions, respectively, in a very long instruction word (VLIW) architecture, and wherein each instruction of the first and second pluralities of instructions corresponds to a single instruction multiple data (SIMD) instruction (Liu [0046-0047], [0070]).

Regarding claim 14, the combination of Liu, Najibi, and Son teaches the image processing system of claim 1, wherein the convolution circuit is further configured to generate a completion notification when the image is convolved with the merged kernel completely (Liu [0023], perform matrix multiplication operations until all elements have been loaded).

Claims 15-19 and 20 are a method embodiment of the system embodiment of claims 1, 3, 5, 6, 7, and 8-10, respectively.  Therefore, the above rejections for claims 1, 3, 5, 6, 7, and 8-10 are applicable to claims 15-19 and 20, respectively.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Lee (US 2020/0202198) discloses a neural network processor for loading only non-zero elements of an input kernel.
Lo (US 2018/0046898) discloses a neural network engine for skipping operations on zero coefficient input data.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHAEL J METZGER whose telephone number is (571)272-3105. The examiner can normally be reached Monday-Friday 7:30-4.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jyoti Mehta can be reached on 571-270-3995. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/MICHAEL J METZGER/             Primary Examiner, Art Unit 2182