DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .


Response to Amendment
This Office Action is in response to applicant’s communication filed 12 August 2021, in response to the Office Action mailed 12 May 2021.  The applicant’s remarks and any amendments to the claims or specification have been considered, with the results that follow.


Information Disclosure Statement
As required by M.P.E.P. 609(c), the applicant's submission of the Information Disclosure Statement, dated 1 October 2021, is acknowledged by the examiner and the cited references have been considered in the examination of the claims now pending.  As required by M.P.E.P 609 C(2), a copy of the PTOL-1449 initialed and dated by the examiner is attached to the instant office action.


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.

4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claim 1-10 and 12-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Chakradhar (US 2011/0029471) in view of Henry (US 2017/0102941).

As per claim 1, Chakradhar teaches a method in a hardware implementation of a Convolutional Neural Network (CNN) [a coprocessor and method for processing convolutional neural networks (CNN) (abstract, etc.)], the method comprising: receiving a first subset of data and storing the first subset of data in one or more buffers of the hardware implementation [a memory subsystem including input memory, output memory, temporary memory, and instruction memory (figs. 3-4, etc.) receives and sends input, kernel, and intermediate/output data from/to the CNN coprocessor template (140) under control of an input switch to implement the CNN layers (paras. 0043-51, etc.) as well as initially/additionally from a host interface (para. 0054, etc.)], the first subset of data comprising at least a portion of weight data and at least a portion of input data for a CNN layer [the memory includes input, kernel, and intermediate/output data from/to the CNN coprocessor template (figs. 3-4, paras. 0043-51, etc.)]; passing the first subset of data from the one or more buffers to at least one convolution engine of the hardware implementation and performing, in one or more passes of the at least one convolution engine, a convolution of the first subset of data to generate a first partial result [a memory subsystem including input memory, output memory, temporary memory, and instruction memory (figs. 3-4, etc.) receives and sends input, kernel, and intermediate/output data from/to the CNN coprocessor template (140) under control of an input switch to implement the CNN layers (paras. 0043-51, etc.)]; receiving a second subset of data, the second subset of data comprising at least a portion of weight data and at least a portion of input data for the CNN layer [a memory subsystem including input memory, output memory, temporary memory, and instruction memory (figs. 3-4, etc.) receives and sends input, kernel, and intermediate/output data from/to the CNN coprocessor template (140) under control of an input switch to implement the CNN layers (paras. 0043-51, etc.) where intermediate data may include data from a prior layer or portions of a layer to be combined (paras. 0024, 0044, 0047, 0053-54, etc.)]; passing the second subset of data from the one or more buffers to the at least one convolution engine and performing, in one or more subsequent passes of the at least one convolution engine, a convolution of the second subset of data to generate a second partial result [a memory subsystem including input memory, output memory, temporary memory, and instruction memory (figs. 3-4, etc.) receives and sends input, kernel, and intermediate/output data from/to the CNN coprocessor template (140) under control of an input switch to implement the CNN layers (paras. 0043-51, etc.) where intermediate data may include data from a prior layer or portions of a layer to be combined (paras. 0024, 0044, 0047, 0053-54, etc.)]; and combining the first partial result and the second partial result to generate at least a portion of convolved data for the CNN layer [intermediate data may include data from a prior layer or portions of a layer to be combined to produce an output (paras. 0024, 0044, 0047, 0053-54, etc.)]; wherein each convolution engine of the at least one convolution engine comprises a plurality of elements of multiply logic and a plurality of elements of addition logic, the plurality of elements of addition logic forming an adder tree configured to generate a sum of the outputs of the plurality of elements of multiply logic [the CNN coprocessor includes a number of convolvers (150) and additional addition logic (224, 226) (figs. 3-5, etc.) where the logic includes multipliers (para. 0069, etc.) and an adder tree (paras. 0013, 0052, etc.)].
While Chakradhar teaches storing the different weight and input data in one or more buffers for different sizes/portions of layers of the CNN (see above) and that they may have different degrees of overlap (see, e.g., Chakradhar: para. 0011), it does not explicitly teach storing the second subset of data in the one or more buffers such that the second subset of data replaces at least a portion of the first subset of data in the one or more buffers.
Henry teaches storing the second subset of data in the one or more buffers such that the second subset of data replaces at least a portion of the first subset of data in the one or more buffers [as each layer of the neural network is to be performed by the processor’s neural network units (NNU), which may perform convolutions, the current row of the input data in the input data RAM is overwritten with the next set of input data, as well as overwriting the weights in the weight RAM with the next layer’s weights (paras. 0075, 0117, 0216-219, 0333-335, etc.)].
Chakradhar and Henry are analogous art, as they are within the same field of endeavor, namely processor implementations for machine learning.

Henry provides motivation as [by enabling overwriting the NNU program can handle larger datasets/smaller memory (paras. 0216, etc.)].

As per claim 2, Chakradhar/Henry teaches wherein: the first subset of data comprises a first portion of the input data for the CNN layer and all or a portion of the weight data for the CNN layer; and the second subset of data comprises a second portion of the input data for the CNN layer and all or a portion of the weight data for the CNN layer [a memory subsystem including input memory, output memory, temporary memory, and instruction memory (Chakradhar: figs. 3-4, etc.) receives and sends input, kernel, and intermediate/output data from/to the CNN coprocessor template (140) under control of an input switch to implement the CNN layers (Chakradhar: paras. 0043-51, etc.) where intermediate data may include data from a prior layer or portions of a layer to be combined (Chakradhar: paras. 0024, 0044, 0047, 0053-54, etc.) as well as initially/additionally from a host interface (Chakradhar: para. 0054, etc.)].

As per claim 3, Chakradhar/Henry teaches wherein the second portion of the input data comprises a subset of the first portion of the input data and wherein the size [intermediate data may include data from a prior layer or portions of a layer to be combined (Chakradhar: paras. 0024, 0044, 0047, 0053-54, etc.) where the configuration is chosen based upon the size of the input images, kernels and the number of units (Chakradhar: paras. 0024, 0036, 0040, etc.); where when different kernels are applied to the same image input then the second portion will be a subset of the first portion, as a set is a subset of itself].

As per claim 4, Chakradhar/Henry teaches wherein the first subset of data and the second subset of data each comprise all of the weight data for the CNN layer [a memory subsystem including input memory, output memory, temporary memory, and instruction memory (Chakradhar: figs. 3-4, etc.) receives and sends input, kernel, and intermediate/output data from/to the CNN coprocessor template (140) under control of an input switch to implement the entire CNN layers (Chakradhar: paras. 0043-51, etc.) where intermediate data may include data from a prior layer or portions of a layer to be combined (Chakradhar: paras. 0024, 0044, 0047, 0053-54, etc.) as well as initially/additionally from a host interface (Chakradhar: para. 0054, etc.)].

As per claim 5, Chakradhar/Henry teaches wherein: the first subset of data comprises a first portion of the weight data for the CNN layer and all or a portion of the input data for the CNN layer; and the second subset of data comprises a second portion of the weight data for the CNN layer and all or a portion of the input data for the CNN [a memory subsystem including input memory, output memory, temporary memory, and instruction memory (Chakradhar: figs. 3-4, etc.) receives and sends input, kernel, and intermediate/output data from/to the CNN coprocessor template (140) under control of an input switch to implement the entire CNN layers (Chakradhar: paras. 0043-51, etc.)].

As per claim 6, Chakradhar/Henry teaches wherein the first subset of data and the second subset of data each comprise all of the input data for the CNN layer [a memory subsystem including input memory, output memory, temporary memory, and instruction memory (Chakradhar: figs. 3-4, etc.) receives and sends input, kernel, and intermediate/output data from/to the CNN coprocessor template (140) under control of an input switch to implement the entire CNN layers (Chakradhar: paras. 0043-51, etc.)].

As per claim 7, Chakradhar/Henry teaches wherein the first portion of the weight data comprises a different portion of the weight data for the CNN layer to the second portion of the weight data [a memory subsystem including input memory, output memory, temporary memory, and instruction memory (Chakradhar: figs. 3-4, etc.) receives and sends input, kernel, and intermediate/output data from/to the CNN coprocessor template (140) under control of an input switch to implement the entire CNN layers (Chakradhar: paras. 0043-51, etc.) where intermediate data may include data from a prior layer or portions of a layer to be combined (Chakradhar: paras. 0024, 0044, 0047, 0053-54, etc.)].

As per claim 8, Chakradhar/Henry teaches wherein the combining of the first partial result and the second partial result comprises writing the first partial result and the second partial result to a memory [intermediate data may include data from a prior layer or portions of a layer to be combined (Chakradhar: paras. 0024, 0044, 0047, 0053-54, etc.)].

As per claim 9, Chakradhar/Henry teaches wherein: the weight data for the CNN layer comprises a plurality of weights and the plurality of weights form one or more filters [the CNN uses an array of weights forming a kernel (filter) (Chakradhar: paras. 0009-10, etc.)]; the first subset of data comprises a first portion of a filter of one or more filters and all or a portion of the input data for the CNN layer [a memory subsystem including input memory, output memory, temporary memory, and instruction memory (Chakradhar: figs. 3-4, etc.) receives and sends input, kernel, and intermediate/output data from/to the CNN coprocessor template (140) under control of an input switch to implement the entire CNN layers (Chakradhar: paras. 0043-51, etc.) where intermediate data may include data from a prior layer or portions of a layer to be combined (Chakradhar: paras. 0024, 0044, 0047, 0053-54, etc.)]; and the second subset of data comprises a second portion of the filter and all or a portion of the input data for the CNN layer [a memory subsystem including input memory, output memory, temporary memory, and instruction memory (Chakradhar: figs. 3-4, etc.) receives and sends input, kernel, and intermediate/output data from/to the CNN coprocessor template (140) under control of an input switch to implement the entire CNN layers (Chakradhar: paras. 0043-51, etc.) where intermediate data may include data from a prior layer or portions of a layer to be combined (Chakradhar: paras. 0024, 0044, 0047, 0053-54, etc.)].

As per claim 10, Chakradhar/Henry teaches wherein combining the first partial result and the second partial result to generate at least a portion of convolved data for a layer of the CNN comprises: performing, in one or more passes of the at least one convolution engine, a convolution of the first portion of the filter with the input data to generate the first partial result; performing, in the one or more subsequent passes of the at least one convolution engine, a convolution of the second portion of the filter with the input data to generate the second partial result; placing the first partial result in an accumulation buffer; and combining the first partial result with the second partial result in an accumulator [a memory subsystem including input memory, output memory, temporary memory, and instruction memory (Chakradhar: figs. 3-4, etc.) receives and sends input, kernel, and intermediate/output data from/to the CNN coprocessor template (140) under control of an input switch to implement the entire CNN layers (Chakradhar: paras. 0043-51, etc.) where intermediate data may include data from a prior layer or portions of a layer to be combined (Chakradhar: paras. 0024, 0044, 0047, 0053-54, etc.) which are aggregated in the aggregation logic (Chakradhar: fig. 4; paras. 0031, 0052, etc.)].

[a memory subsystem including input memory, output memory, temporary memory, and instruction memory (Chakradhar: figs. 3-4, etc.) receives and sends input, kernel, and intermediate/output data from/to the CNN coprocessor template (140) under control of an input switch to implement the entire CNN layers (Chakradhar: paras. 0043-51, etc.)].

As per claim 13, Chakradhar/Henry teaches receiving command data that defines the first subset of data and the second subset of data for processing in the CNN layer [the operation of the CNN is controlled by instructions (Chakradhar: paras. 0055-56, etc.)].

As per claim 14, see the rejection of claim 1, above, wherein Chakradhar/Henry teaches the interface and convolution engine(s) [a memory subsystem including input memory, output memory, temporary memory, and instruction memory (Chakradhar: figs. 3-4, etc.) receives and sends input, kernel, and intermediate/output data from/to the CNN coprocessor template (140) under control of an input switch to implement the entire CNN layers (Chakradhar: paras. 0043-51, etc.)].

As per claim 15, see the rejection of claim 2, above.



As per claim 17, see the rejection of claim 8, above.

As per claim 18, see the rejection of claim 9, above.

As per claim 19, see the rejection of claim 13, above.

As per claim 20, see the rejection of claims 1 and 14, above, wherein Chakradhar/Henry also teaches a non-transitory computer readable medium having stored thereon a computer readable description of an integrated circuit [the operation of the CNN is controlled by instructions stored in a memory (Chakradhar: paras. 0055-56, etc.); where the entire system may also be embodied as a computer-readable medium providing program code (Chakradhar: para. 0028, etc.)].


Claim 11 is/are rejected under 35 U.S.C. 103 as being unpatentable over Chakradhar (US 2011/0029471), in view of Henry (US 2017/0102941), and further in view of Chetlur (US 2016/0062947).

As per claim 11, Chakradhar/Henry teaches the method according to claim 9, as described above.

Chetlur teaches wherein the first portion of the filter and the second portion of the filter are non-overlapping portions of the filter [the inputs and filters are partitioned into subsets forming tiles, which can include independent (non-overlapping) filter tiles, for the portions of the filters, or any other partition scheme desired (paras. 0009, 0059, 0065, 0070, 0096, etc.)].
Chakradhar/Henry and Chetlur are analogous art, as they are within the same field of endeavor, namely implementing a CNN.
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to allow for independent or other desired partitioning of the filter, as taught by Chetlur, for the partitioning of the kernels in the system of Chakradhar/Henry.
Chetlur provides motivation as [by tuning the size of the tiles (partitions) the time required to execute the layers of the CNN may be reduced (paras. 0009, etc.)].


Response to Arguments
Applicant's arguments filed 12 August 2021 have been fully considered but they are not persuasive.

Applicant argues that Henry does not teach storing the second subset of data in the one or more buffers such that the second subset of data replaces at least a portion of the first subset of data in the one or more buffers 
However, Henry teaches that as each layer of the neural network is to be performed by the processor’s neural network units (NNU), the current row of the input data in the input data RAM is overwritten with the next set of input data, as well as overwriting the weights in the weight RAM with the next layer’s weights (paras. 0075, 0117, 0216-219, 0333-335, etc.); which is combined with the splitting of CNN layer calculations using intermediate data taught by Chakradhar, above.

Applicant’s arguments with respect to Ross have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.


Conclusion
The following is a summary of the treatment and status of all claims in the application as recommended by M.P.E.P. 707.07(i): claims 1-20 are rejected.

The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Penn (US 2014/0288928 – cited in an attached IDS and prior action) –discloses a system with a number of convolution units dividing processing.
Ross (US 2016/0342891 and US 2016/0342893) – discloses a system utilizing a systolic array (similar to the citation to Ross in the rejections above) which divides the activation inputs and weights of a layer into portions based upon the size of the array.
Henry (US 2017/0103312) – discloses a system dividing weights/inputs to parallel processing units.
Kirsch (US 2012/0183224) – discloses a system including overwriting input image line data for a convolution engine.
Werner (US 2017/0097884) – discloses pipelined convolutional operations including overwriting portions of image data in memory with incoming image data to minimize memory usage.
Yamamoto (US 2010/0215253) – discloses a CNN implementation utilizing multiple ring buffers and including overwriting data in the buffers.

The examiner requests, in response to this Office action, that support be shown for language added to any original claims on amendment and any new claims. That is, indicate support for newly added claim language by specifically pointing to page(s) and line number(s) in the specification and/or drawing figure(s). This will assist the examiner in prosecuting the application.

When responding to this office action, Applicant is advised to clearly point out the patentable novelty which he or she thinks the claims present, in view of the state of the .

Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to GEORGE GIROUX whose telephone number is (571)272-9769. The examiner can normally be reached M-F 10am-6pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.

Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/GEORGE GIROUX/Primary Examiner, Art Unit 2128