DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .


Response to Amendment
This Office Action is in response to applicant’s communication filed 14 April 2022, in response to the Office Action mailed 15 October 2021.  The applicant’s remarks and any amendments to the claims or specification have been considered, with the results that follow.


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claims 1 and 3-24 is/are rejected under 35 U.S.C. 103 as being unpatentable over Aydonat (US 2017/0103299), in view of Simard (US 2007/0086655 – cited in an attached IDS), and further in view of Pechanek (US5,329,611).

As per claim 1, Aydonat teaches an apparatus to process a neural network, the apparatus comprising: a plurality of fully connected layer chips coupled by an interconnect [one or more processing elements implementing convolution layers connected to one or more processing elements implementing fully connected layers (abstract, etc.)]; a plurality of convolutional layer chips each coupled by an interconnect to a respective fully connected layer chip of the plurality of fully connected layer chips [one or more processing elements implementing convolution layers connected to one or more processing elements implementing fully connected layers (abstract, etc.)]; and each of the plurality of fully connected layer chips and the plurality of convolutional layer chips comprising an interconnect to couple each of a forward propagation compute intensive tile of a column of compute intensive tiles between a first memory intensive tile and a second memory intensive tile [the processing elements include elements implementing forward propagation in the convolution and other layers (para. 0059, fig. 9, etc.) which may include computation units connected to multiple memory blocks (paras. 0038-40, 0044, 0064-66, etc.)], wherein each of the plurality of fully connected layer chips and the plurality of convolutional layer chips comprises a plurality of rows and columns of compute intensive tiles coupled to a plurality of rows and columns of memory intensive tiles [the processing elements include and array of elements implementing forward propagation in the convolution and other layers (para. 0059, fig. 9, etc.) which may include computation units connected to multiple memory blocks (paras. 0038-40, 0044, 0064-66; figs. 9-10; etc.) both of which may be organized into associated rows and columns (paras. 0107-110, etc.)], wherein each memory intensive tile comprises storage for a multiple dimensional data array and a plurality of functional units coupled to the storage [the processing elements include and array of elements implementing forward propagation in the convolution and other layers (para. 0059, fig. 9, etc.) which may include computation units connected to multiple memory blocks (paras. 0038-40, 0044, 0064-66; figs. 9-10; etc.) both of which may be organized into associated rows and columns (paras. 0107-110, etc.)], and each compute intensive tile comprises a multiple dimensional array of processing elements [the processing elements include and array of elements implementing forward propagation in the convolution and other layers (para. 0059, fig. 9, etc.) which may include computation units connected to multiple memory blocks (paras. 0038-40, 0044, 0064-66; figs. 9-10; etc.) both of which may be organized into associated rows and columns (paras. 0107-110, etc.)].
While Aydonat teaches training of the CNN (see, e.g., Aydonat: paras. 0035-36, etc.) it does not explicitly teach each of the plurality of fully connected layer chips and the plurality of convolutional layer chips comprising a back propagation compute intensive tile, and a weight gradient compute intensive tile.  Furthermore, while Aydonat teaches that the plurality of fully connected layer and convolutional layer chips may be implemented as an array of tiles (see, e.g., Aydonat: paras. 0060, 0107-110, etc.) and that the tiles comprise data storage and functional units, and that the system supports activation functions and control operations (see above, regarding the array and storage/functional units; and paras. 0058 and 066 regarding activation functions and control operations) it does not explicitly teach each memory intensive tile comprises storage for a multiple dimensional data array, and a plurality of functional units, coupled to the storage, comprising circuitry that supports one or more activation functions, and each compute intensive tile comprises a multiple dimensional array of processing elements, and a scalar processing element to execute control operations.
Simard teaches each of the each of the plurality of fully connected layer chips and the plurality of convolutional layer chips comprising an interconnect to couple each of a forward propagation compute intensive tile, a back propagation compute intensive tile, and a weight gradient compute intensive tile [the function computed by the neural network is obtained by computing the "forward propagation," whereby output is determined by multiplying an input by a weight. Operations for the forward propagation can be similar computing the output features as described with regard to FIG. 2. However, for training purposes, the gradients of the error with respect to the weights can be computed in order to update the weights (weight update), and the gradients of the error with respect to the input features can be computed in order to update previous layers (back propagation) (paras. 0035-38) including modules to compute the weight gradient (para. 0060, etc.) for the layer chips of Aydonat above].
Aydonat and Simard are analogous art, as they are within the same field of endeavor, namely processing convolutional neural network functions.
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to include modules for computing back propagation and weight gradients for training the CNN, as taught by Simard, in the processing elements performing training and operation of the CNN in the system taught by Aydonat.
Simard provides motivation as [Subsampling is a technique used in convolutional neural networks to reduce resolution for faster computation and better generalization (para. 0035, etc.) and the modules facilitate increasing learning speed and/or throughput (para. 0006, etc.)].
Pechanek teaches wherein each of the plurality of fully connected layer chips and the plurality of convolutional layer chips comprises a plurality of rows and columns of compute intensive tiles coupled to a plurality of rows and columns of memory intensive tiles [the neurons and synapses for the neural network are implemented using groups of multiple processing and functional units (col. 2, lines 34-63), for the processing elements implementing neurons of the CNN of Aydonat, above], each memory intensive tile comprises storage for a multiple dimensional data array, and a plurality of functional units, coupled to the storage, comprising circuitry that supports one or more activation functions [each of the groups include an array of synapse processing units connected to activation function units and including storage for instructions and data (col. 2, lines 34-63; col. 8, lines 27-59; col. 9, line 1 to col. 10, line 28; etc.)], and each compute intensive tile comprises a multiple dimensional array of processing elements, and a scalar processing element to execute control operations [each of the groups include an array of processing units including instruction and data storage, which can include processing of control instructions (see col. 2, lines 34-63; col. 8, lines 27-59; col. 9, line 1 to col. 10, line 28; etc. for descriptions of the array and included elements in each group; see col. 3, lines 14-39; col. 6, line 43 to col. 7, line 46; col. 17, line 18 to col. 18, line 52; etc. for control operations)].
Aydonat and Pechanek are analogous art, as they are within the same field of endeavor, namely neural network acceleration/processing hardware.
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to utilize the arrays of synapse/neuron processing elements, taught by Pechanek, to implement the processing elements for the neurons of the convolutional neural network processing system taught by Aydonat.
Pechanek provides motivation as [a scalable array of processing elements can be used to implement neural network operations to allow training the network that includes modification/scaling of the connections between neurons (col. 2, lines 25-31; col. 6, line 23 to col. 7, line 46; etc.)].

As per claim 3, Aydonat/Simard/Pechanek teaches wherein each compute intensive tile comprises an accumulator array coupled to the multiple dimensional data array [the processing elements include an accumulator unit using a logic array block (Aydonat: paras. 0065, 0109, etc.) and can perform accumulation and store accumulator results (Pechanek: col. 8, lines 27-59; claim 19, etc.)].

As per claim 4, Aydonat/Simard/Pechanek teaches wherein each compute intensive tile comprises an instruction memory, and an instruction decoder [the memory may store instruction and associated codes (Aydonat: paras. 0087, 0091, etc.) and the processing elements may include means for instruction decoding, distribution, and execution (Pechanek: col. 17, line 18 to col. 18, line 52; claims 33 and 36; etc.)].

As per claim 5, Aydonat/Simard/Pechanek teaches wherein the forward propagation compute intensive tile, the back propagation compute intensive tile and the weight gradient compute intensive tile are to fetch an input feature from the first memory intensive tile and store an output feature into the second memory intensive tile [the function computed by the neural network is obtained by computing the "forward propagation," whereby output is determined by multiplying an input by a weight. Operations for the forward propagation can be similar computing the output features as described with regard to FIG. 2. However, for training purposes, the gradients of the error with respect to the weights can be computed in order to update the weights (weight update), and the gradients of the error with respect to the input features can be computed in order to update previous layers (back propagation) (Simard: paras. 0035-38, fig. 2, etc.) where computation units connected to multiple memory blocks for reading inputs and writing outputs (Aydonat: paras. 0038-40, 0044, 0064-66; figs. 9-10; etc.)].

As per claim 6, Aydonat/Simard/Pechanek teaches wherein partial output features from the compute intensive tiles are accumulated into a third memory intensive tile to compute an activation function [the accumulated results from the PE arrays 901-904 may be transmitted to one of the buffers 951-954 which transmits the computed output layer back to kernels and components in the PE arrays 901-904 for a next round of layer computation (Aydonat: para. 0061, etc.)].

As per claim 7, Aydonat/Simard/Pechanek teaches wherein a convolution layer chip is to operate on a set of inputs in parallel to generate updated weight gradients and a respective fully connected layer chip is to operate on a set of outputs from the convolution layer chip [the function computed by the neural network is obtained by computing the "forward propagation," whereby output is determined by multiplying an input by a weight. Operations for the forward propagation can be similar computing the output features as described with regard to FIG. 2. However, for training purposes, the gradients of the error with respect to the weights can be computed in order to update the weights (weight update), and the gradients of the error with respect to the input features can be computed in order to update previous layers (back propagation) (Simard: paras. 0035-38, fig. 2, etc.) where computation units connected to multiple memory blocks for reading inputs and writing outputs (Aydonat: paras. 0038-40, 0044, 0064-66; figs. 9-10; etc.)].

As per claim 8, Aydonat/Simard/Pechanek teaches a circuit to map fully connected layers of the neural network to the plurality of fully connected layer chips and map convolution layers of the neural network to the plurality of convolutional layer chips [mapping includes determining how to implement logic gates and logic elements in the optimized logic representation with the types or categories of resources available on the target (Aydonat: para. 0044, etc.) and sequencing to coordinate data transmission between units (Aydonat: para. 0006, fig. 9, etc.)].

As per claim 9, see the rejection of claim 1, above, wherein Aydonat/Simard/Pechanek also teaches a method comprising receiving a neural network comprising a plurality of fully connected layers and a plurality of convolutional layers with a processing system [a method for implementing a convolutional neural network (CNN) accelerator on a target includes utilizing one or more processing elements to implement  convolution and fully connected layers of the CNN (Aydonat: abstract, etc.)], the processing system as described above; and mapping the plurality of fully connected layers of the neural network to the plurality of fully connected layer chips and the plurality of convolution layers of the neural network to the plurality of convolutional layer chips [mapping includes determining how to implement logic gates and logic elements in the optimized logic representation with the types or categories of resources available on the target (Aydonat: para. 0044, etc.) and sequencing to coordinate data transmission between units (Aydonat: para. 0006, fig. 9, etc.); and where the neural network is partitioned and mapped to the groups of neurons comprising processing element arrays (Pechanek: col. 1, lines 50-58; col. 18, lines 10-44; col. 23, line 30 to col 24, line 6; etc.)].
Examiner’s Note: the motivation and reasoning for the combination is provided in the rejection above.

As per claim 10, Aydonat/Simard/Pechanek teaches generating updated weight gradients for the neural network with the processing system [for training purposes, the gradients of the error with respect to the weights can be computed in order to update the weights (weight update), and the gradients of the error with respect to the input features can be computed in order to update previous layers (back propagation) (Simard: paras. 0035-38, fig. 2, etc.)].

As per claim 11, see the rejection of claim 7, above.

As per claim 12, see the rejection of claim 6, above.

As per claim 13, Aydonat/Simard/Pechanek teaches wherein the mapping comprises allocating columns for each layer of the neural network to the memory intensive tiles [the processing elements include and array of elements implementing forward propagation in the convolution and other layers (Aydonat: para. 0059, fig. 9, etc.) and mapping includes determining how to implement logic gates and logic elements in the optimized logic representation with the types or categories of resources available on the target (Aydonat: para. 0044, etc.) and sequencing to coordinate data transmission between units (Aydonat: para. 0006, fig. 9, etc.); where the neural network is partitioned and mapped to the groups of neurons comprising processing element arrays (Pechanek: col. 1, lines 50-58; col. 18, lines 10-44; col. 23, line 30 to col 24, line 6; etc.)].

As per claim 14, Aydonat/Simard/Pechanek teaches wherein the mapping further comprises distributing errors of each layer across its allocated columns of the memory intensive tiles [for training purposes, the gradients of the error with respect to the weights can be computed in order to update the weights (weight update), and the gradients of the error with respect to the input features can be computed in order to update previous layers (back propagation) (Simard: paras. 0035-38, fig. 2, etc.) where mapping includes determining how to implement logic gates and logic elements in the optimized logic representation with the types or categories of resources available on the target (Aydonat: para. 0044, etc.) and sequencing to coordinate data transmission between units (Aydonat: para. 0006, fig. 9, etc.)].

As per claim 15, Aydonat/Simard/Pechanek teaches wherein the mapping further comprises distributing features of each layer across its allocated columns of the memory intensive tiles [mapping includes determining how to implement logic gates and logic elements in the optimized logic representation with the types or categories of resources available on the target (Aydonat: para. 0044, etc.) and sequencing to coordinate data transmission between units (Aydonat: para. 0006, fig. 9, etc.) including assigning features to the processing elements to be processed and stored (Aydonat: paras. 0035-36, etc.)].

As per claim 16, Aydonat/Simard/Pechanek teaches wherein the mapping further comprises assigning computation of a forward propagation function, a back propagation function, and a weight gradient function to a forward propagation compute intensive tile, a back propagation compute intensive tile, and a weight gradient compute intensive tile of the plurality of rows and columns of compute intensive tiles [the function computed by the neural network is obtained by computing the "forward propagation," whereby output is determined by multiplying an input by a weight. Operations for the forward propagation can be similar computing the output features as described with regard to FIG. 2. However, for training purposes, the gradients of the error with respect to the weights can be computed in order to update the weights (weight update), and the gradients of the error with respect to the input features can be computed in order to update previous layers (back propagation) (Simard: paras. 0035-38, fig. 2, etc.) where mapping includes determining how to implement logic gates and logic elements in the optimized logic representation with the types or categories of resources available on the target (Aydonat: para. 0044, etc.) and sequencing to coordinate data transmission between units (Aydonat: para. 0006, fig. 9, etc.)].

As per claim 17, see the rejections of claim 1 and 9, above, wherein Aydonat/Simard/Pechanek also teaches a non-transitory machine readable medium that stores code that when executed by a machine causes the machine to perform the method [It should be appreciated that embodiments of the present invention may be provided as a computer program product, or software, that may include a computer-readable or machine-readable medium having instructions. The instructions on the computer-readable or machine-readable medium may be used to program a computer system or other electronic device (Aydonat: para. 0106, etc.)].

As per claim 18, see the rejection of claim 10, above.

As per claim 19, see the rejection of claim 7, above.

As per claim 20, see the rejection of claim 6, above.

As per claim 21, see the rejection of claim 13, above.

As per claim 22, see the rejection of claim 14, above.

As per claim 23, see the rejection of claim 15, above.

As per claim 24, see the rejection of claim 16, above.


Claim(s) 2 is/are rejected under 35 U.S.C. 103 as being unpatentable over Aydonat, Simard, and Pechanek as applied to claim 1 above, and further in view of Bardsley (US 2003/0018929).

As per claim 2, Aydonat/Simard/Pechanek teaches the apparatus to process the neural network of claim 1, as described above.
While Aydonat/Simard/Pechanek also teaches using data flow execution and selected addressing (see, e.g., Aydonat: para. 0061 and Pechanek: col. 13, line 1 to col. 14, line 33 for controlling addressing; and Pechanek: abstract; col. 3, lines 3-39; etc. for data flow operations) it does not explicitly teach wherein each memory intensive tile comprises a data flow tracker to track selected address ranges that are accessed.
Bradsley teaches wherein each memory intensive tile comprises a data flow tracker to track selected address ranges that are accessed [a data trace unit is connected to the on-chip memory and may trace the address ranges accessed (paras. 0019-21, etc.)].
Aydonat/Simard/Pechanek and Bradsley are analogous art, as they are within the same field of endeavor, namely data flow architectures.
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to utilize a data trace unit for address tracking for data flow operations, as taught by Bradsley, for the addressing used in the data flow operations of Aydonat/Simard/Pechanek.
Bradsley provides motivation as [the data trace unit allows real-time performance monitoring that works with or without instruction tracing and provides synchronization and selection of multiple elements of tracking, while providing tradeoff between monitoring and bandwidth usage (paras. 0009-21, etc.)].


Response to Arguments
Applicant’s amendment to the title has been entered.

Applicant’s arguments with respect to claim(s) 1-24 have been addressed by the updated rejections, above, including the newly cited reference to Pechanek.


Conclusion
The following is a summary of the treatment and status of all claims in the application as recommended by M.P.E.P. 707.07(i): claim 25 is cancelled; claims 1-24 are rejected.

The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Murphy (US 5,077,677) – discloses a system including routing and control operations between processing elements for neural networks.
Henry (US 2017/0103305) – discloses a neural network unit (NNU) including arrays of neural processing units (NPU) and activation function units (AFU) for performing convolutional neural network operations.

The examiner requests, in response to this Office action, that support be shown for language added to any original claims on amendment and any new claims. That is, indicate support for newly added claim language by specifically pointing to page(s) and line number(s) in the specification and/or drawing figure(s). This will assist the examiner in prosecuting the application.

When responding to this office action, Applicant is advised to clearly point out the patentable novelty which he or she thinks the claims present, in view of the state of the art disclosed by the references cited or the objections made. He or she must also show how the amendments avoid such references or objections.  See 37 CFR 1.111(c).

Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to GEORGE GIROUX whose telephone number is (571)272-9769. The examiner can normally be reached M-F 10am-6pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Omar Fernandez Rivas can be reached on 571-272-2589. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/GEORGE GIROUX/Primary Examiner, Art Unit 2128