DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .


Response to Amendment
This Office Action is in response to applicant’s communication filed 23 February 2021, in response to the Office Action mailed 23 November 2020.  The applicant’s remarks and any amendments to the claims or specification have been considered, with the results that follow.

The prior objections to the claims have been withdrawn due to the amendments filed.


Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action 25 January 2021 has been entered.


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claim 1-10 and 12-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Ross (US 2016/0342893) in view of Henry (US 2017/0102941).

As per claim 1, Ross teaches a method in a hardware implementation of a Convolutional Neural Network (CNN) [a special-purpose hardware circuit that computes neural network inferences (para. 0007, etc.)], the method comprising: receiving a first subset of data and storing the first subset of data in one or more buffers of the hardware implementation [the system receives the weights and inputs from dynamic memory and a unified buffer, or both may be stored in the buffer (paras. 0031-33, fig. 3, etc.)], the first subset of data comprising at least a portion of weight data and at least a portion of input data for a CNN layer [a host interface can shift in portions of the kernel weights and activation inputs to each cell in the array of cells, which can then pass results and kernel weights to adjacent cells, etc., until the calculations are completed and outputs accumulated for a convolutional layer is completed (paras. 0043-47, 0054-57, 0061-64, etc.)]; passing the first subset of data from the one or more buffers to at least one convolution engine of the hardware [a host interface can shift in portions of the kernel weights and activation inputs from the unified buffer to each cell in the systolic array of cells (at least one convolution engine), which can then pass results and kernel weights to adjacent cells, etc., until the calculations are completed and outputs accumulated for a convolutional layer is completed (paras. 0043-47, 0054-57, 0061-64, etc.)]; receiving a second subset of data, the second subset of data comprising at least a portion of weight data and at least a portion of input data for the CNN layer [a host interface can shift in portions of the kernel weights and activation inputs to each cell in the array of cells, which can then pass results and kernel weights to adjacent cells, etc., until the calculations are completed and outputs accumulated for a convolutional layer is completed (paras. 0043-47, 0054-57, 0061-64, etc.)]; passing the second subset of data from the one or more buffers to the at least one convolution engine and performing, in one or more subsequent passes of the at least one convolution engine, a convolution of the second subset of data to generate a second partial result [a host interface can shift in portions of the kernel weights and activation inputs to each cell in the array of cells, which can then pass results and kernel weights to adjacent cells, etc., until the calculations are completed and outputs accumulated for a convolutional layer is completed (paras. 0043-47, 0054-57, 0061-64, etc.)]; and combining the first partial result and the second partial result to generate at least a portion of convolved data for the CNN layer [a host interface can shift in portions of the kernel weights and activation inputs to each cell in the array of cells, which can then pass results and kernel weights to adjacent cells, etc., until the calculations are completed and outputs accumulated by the accumulator (combining the results) for a convolutional layer (paras. 0043-47, 0054-57, 0061-64, etc.)].
While Ross teaches storing the different weight and input data in one or more buffers (see above), it does not explicitly teach storing the second subset of data in the one or more buffers such that the second subset of data replaces at least a portion of the first subset of data in the one or more buffers.
Henry teaches storing the second subset of data in the one or more buffers such that the second subset of data replaces at least a portion of the first subset of data in the one or more buffers [as each layer of the neural network is to be performed by the processor’s neural network units (NNU), which may perform convolutions, the current row of the input data in the input data RAM is overwritten with the next set of input data, as well as overwriting the weights in the weight RAM with the next layer’s weights (paras. 0075, 0117, 0216-219, 0333-335, etc.)].
Ross and Henry are analogous art, as they are within the same field of endeavor, namely processor implementations for machine learning.
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to overwrite previous data in the memory storying inputs/weights for the NNUs processing NN layers, as taught by Henry, for the storing of the inputs/weights in the unified buffer for the neural network cells to process each layer in the system taught by Ross.
[by enabling overwriting the NNU program can handle larger datasets/smaller memory (paras. 0216, etc.)].

As per claim 2, Ross/Henry teaches wherein: the first subset of data comprises a first portion of the input data for the CNN layer and all or a portion of the weight data for the CNN layer; and the second subset of data comprises a second portion of the input data for the CNN layer and all or a portion of the weight data for the CNN layer [a host interface can shift in portions of the kernel weights and activation inputs to each cell in the array of cells, which can then pass results and kernel weights to adjacent cells, etc., until the calculations are completed and outputs accumulated for a convolutional layer is completed (Ross: paras. 0043-47, 0054-57, 0061-64, etc.)].

As per claim 3, Ross/Henry teaches wherein the second portion of the input data comprises a subset of the first portion of the input data and wherein the size of the subset of the first portion of the input data is based upon a size of a convolution kernel [the kernel inputs may be overlapping but separate portions which depends on the size of the kernel and the array (Ross: paras. 0009, 0027, 0034, 0088; fig. 7; etc.)].

As per claim 4, Ross/Henry teaches wherein the first subset of data and the second subset of data each comprise all of the weight data for the CNN layer [a host interface can shift in portions of the kernel weights and activation inputs to each cell in the array of cells, which can then pass results and kernel weights to adjacent cells, etc., until the calculations are completed and outputs accumulated by the accumulator (combining the results) for a convolutional layer (Ross: paras. 0043-47, 0054-57, 0061-64, etc.)].

As per claim 5, Ross/Henry teaches wherein: the first subset of data comprises a first portion of the weight data for the CNN layer and all or a portion of the input data for the CNN layer; and the second subset of data comprises a second portion of the weight data for the CNN layer and all or a portion of the input data for the CNN layer [a host interface can shift in portions of the kernel weights and activation inputs to each cell in the array of cells, which can then pass results and kernel weights to adjacent cells, etc., until the calculations are completed and outputs accumulated for a convolutional layer is completed (Ross: paras. 0043-47, 0054-57, 0061-64, etc.)].

As per claim 6, Ross/Henry teaches wherein the first subset of data and the second subset of data each comprise all of the input data for the CNN layer [a host interface can shift in portions of the kernel weights and activation inputs to each cell in the array of cells, which can then pass results and kernel weights to adjacent cells, etc., until the calculations are completed and outputs accumulated by the accumulator (combining the results) for a convolutional layer (Ross: paras. 0043-47, 0054-57, 0061-64, etc.)].

[the kernel inputs may be separate portions (Ross: paras. 0009, 0088; fig. 7; etc.)].

As per claim 8, Ross/Henry teaches wherein the combining of the first partial result and the second partial result comprises writing the first partial result and the second partial result to a memory [a host interface can shift in portions of the kernel weights and activation inputs to each cell in the array of cells, which can then pass results and kernel weights to adjacent cells, etc., until the calculations are completed and outputs accumulated by the accumulator which stores the result data (Ross: paras. 0043-47, 0054-57, 0061-64, etc.)].

As per claim 9, Ross/Henry teaches wherein: the weight data for the CNN layer comprises a plurality of weights and the plurality of weights form one or more filters [a host interface can shift in portions of the kernel weights (aka a filter) and activation inputs to each cell in the array of cells, which can then pass results and kernel weights to adjacent cells, etc., until the calculations are completed and outputs accumulated for a convolutional layer is completed (Ross: paras. 0043-47, 0054-57, 0061-64, etc.)]; the first subset of data comprises a first portion of a filter of one or more filters and all or a portion of the input data for the CNN layer [a host interface can shift in portions of the kernel weights (aka a filter) and activation inputs to each cell in the array of cells, which can then pass results and kernel weights to adjacent cells, etc., until the calculations are completed and outputs accumulated for a convolutional layer is completed (Ross: paras. 0043-47, 0054-57, 0061-64, etc.)]; and the second subset of data comprises a second portion of the filter and all or a portion of the input data for the CNN layer [a host interface can shift in portions of the kernel weights (aka a filter) and activation inputs to each cell in the array of cells, which can then pass results and kernel weights to adjacent cells, etc., until the calculations are completed and outputs accumulated for a convolutional layer is completed (Ross: paras. 0043-47, 0054-57, 0061-64, etc.)].

As per claim 10, Ross/Henry teaches wherein combining the first partial result and the second partial result to generate at least a portion of convolved data for a layer of the CNN comprises: performing, in one or more passes of the at least one convolution engine, a convolution of the first portion of the filter with the input data to generate the first partial result; performing, in the one or more subsequent passes of the at least one convolution engine, a convolution of the second portion of the filter with the input data to generate the second partial result; placing the first partial result in an accumulation buffer; and combining the first partial result with the second partial result in an accumulator [a host interface can shift in portions of the kernel weights and activation inputs to each cell in the array of cells, which can then pass results and kernel weights to adjacent cells, etc., until the calculations are completed and outputs accumulated by the accumulator which stores the result data (Ross: paras. 0043-47, 0054-57, 0061-64, etc.)].

[a host interface can shift in portions of the kernel weights and activation inputs to each cell in the array of cells, which can then pass results and kernel weights to adjacent cells, etc., until the calculations are completed and outputs accumulated for a convolutional layer is completed (Ross: paras. 0043-47, 0054-57, 0061-64, etc.)].

As per claim 13, Ross/Henry teaches receiving command data that defines the first subset of data and the second subset of data for processing in the CNN layer [The host interface 302 can receive instructions that include parameters for a neural network computation. The parameters can include at least one or more of the following: how many layers should be processed, corresponding sets of weight inputs for each layer of the layer, an initial set of activation inputs, i.e., the input to the neural network from which the inference is to be computed, corresponding input and output sizes of each layer, a stride value for the neural network computation, and a type of layer to be processed (Ross: para. 0034, etc.)].

As per claim 14, see the rejection of claim 1, above, wherein Ross/Henry teaches the interface and convolution engine(s) [a host interface provides data to the array of cells (convolution engine(s)) (Ross: paras. 0034-37, 0045; figs. 3, 4, 7; etc.)].

As per claim 15, see the rejection of claim 2, above.

As per claim 16, see the rejection of claim 5, above.

As per claim 17, see the rejection of claim 8, above.

As per claim 18, see the rejection of claim 9, above.

As per claim 19, see the rejection of claim 13, above.

As per claim 20, see the rejection of claims 1 and 14, above, wherein Ross/Henry also teaches a non-transitory computer readable medium having stored thereon a computer readable description of an integrated circuit [embodiments of the subject matter described in this specification can be implemented as one or more computer programs, i.e., one or more modules of computer program instructions encoded on a tangible non transitory program carrier for execution by, or to control the operation of, data processing apparatus (Ross: para. 0112, etc.)].


Claim 11 is/are rejected under 35 U.S.C. 103 as being unpatentable over Ross (US 2016/0342893), in view of Henry (US 2017/0102941), and further in view of Chetlur (US 2016/0062947).


While Ross teaches that various parameters including the filters may be set by instructions (see above) it does not explicitly teach wherein the first portion of the filter and the second portion of the filter are non-overlapping portions of the filter.
Chetlur teaches wherein the first portion of the filter and the second portion of the filter are non-overlapping portions of the filter [the inputs and filters are partitioned into subsets forming tiles, which can include independent (non-overlapping) filter tiles, for the portions of the filters, or any other partition scheme desired (paras. 0009, 0059, 0065, 0070, 0096, etc.)].
Ross and Chetlur are analogous art, as they are within the same field of endeavor, namely implementing a CNN.
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to allow for independent or other desired partitioning of the filter, as taught by Chetlur, for the partitioning of the filters in the system of Ross.
Chetlur provides motivation as [by tuning the size of the tiles (partitions) the time required to execute the layers of the CNN may be reduced (paras. 0009, etc.)].


Response to Arguments
Applicant's arguments filed 25 January 2021 have been fully considered but they are not persuasive.

Applicant argues that the finality of the prior Office Action was premature.
However, second or any subsequent actions on the merits shall be final, except where the examiner introduces a new ground of rejection that is neither necessitated by applicant’s amendment of the claims, nor based on information submitted in an information disclosure statement filed during the period set forth in 37 CFR 1.97(c) with the fee set forth in 37 CFR 1.17(p). See MPEP § 706.07(a).  In this case the new ground of rejection was necessitated by applicant’s amendments to the claims, including the timing relationship between the first and second subsets of data being processed in different passes by the at least one convolution engine(s), rather than just using some of the at least one convolution engines to perform convolutions for different subsets of data.  Applicant also argues that “the at least one convolution engine” would necessitate using the same engine, but that is only true in the case of a single convolution engine, where the claim explicitly recites “at least one”.

Applicant’s further arguments are directed to the amendments made to the claims, which have been addressed above, including the newly cited reference to Henry.


Conclusion
The following is a summary of the treatment and status of all claims in the application as recommended by M.P.E.P. 707.07(i): claims 1-20 are rejected.

The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Penn (US 2014/0288928 – cited in an attached IDS and prior action) –discloses a system with a number of convolution units dividing processing.
Ross (US 2016/0342891) – discloses a system utilizing a systolic array (similar to the citation to Ross in the rejections above) which divides the activation inputs and weights of a layer into portions based upon the size of the array.
Henry (US 2017/0103312) – discloses a system dividing weights/inputs to parallel processing units.
Kirsch (US 2012/0183224) – discloses a system including overwriting input image line data for a convolution engine.
Werner (US 2017/0097884) – discloses pipelined convolutional operations including overwriting portions of image data in memory with incoming image data to minimize memory usage.

The examiner requests, in response to this Office action, that support be shown for language added to any original claims on amendment and any new claims. That is, indicate support for newly added claim language by specifically pointing to page(s) and line number(s) in the specification and/or drawing figure(s). This will assist the examiner in prosecuting the application.

When responding to this office action, Applicant is advised to clearly point out the patentable novelty which he or she thinks the claims present, in view of the state of the art disclosed by the references cited or the objections made. He or she must also show how the amendments avoid such references or objections.  See 37 CFR 1.111(c).

Any inquiry concerning this communication or earlier communications from the examiner should be directed to GEORGE GIROUX whose telephone number is (571)272-9769.  The examiner can normally be reached on M-F 10am-6pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kamran Afshar can be reached on 571-272-7796.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access 






/GEORGE GIROUX/Primary Examiner, Art Unit 2125