DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claim(s) 1 is/are rejected under 35 U.S.C. 103 as being unpatentable over Zhang (US 20180314671A1) in view of Vantrease (US 20190294413A1).
Zhang teaches a digital circuit device with embedded memory (on-chip block RAM, [0063]) for neural network [0021], the device (200) comprising:  a controller (110); a matrix of processing blocks (104), wherein each processing block is communicatively coupled to the controller and N neighboring processing blocks, wherein N is an integer number equal to or larger than 1 (systolic array architecture 200 comprises a two-dimensional array of process elements, in the two-dimensional array comprised of processing element 220 through processing element 256, processing elements 220 through processing element 256 are interconnected via an interconnect system that allows a process element to pass to a neighboring processing element or receive data from a neighboring processing element in accordance with timing protocols implemented by processing element scheduler 110, bidirectional interconnect 264 between process element 220 and process element 230, [0028]); and a cyclic bidirectional interconnection configured to transmit each processing block’s output to the N neighboring processing blocks ([0028], PE passes input data to a neighboring PE every cycle, output feature map data are shifted out across the PE array using the bidirectional interconnect feature between PEs, [0035]).
	However, Zhang does not expressly teach that the digital circuit device is a digital integrated circuit device for neural network inferring.  However, Vantrease teaches a digital integrated circuit with embedded memory (an apparatus comprising a storage device, Claim 4 of Vantrease; the apparatus of claim 4, wherein the apparatus comprises an integrated circuit, Claim 10 of Vantrease) for neural network inferring (neural network processing units may be used for inference, [0068]).
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Zhang so that the digital circuit device is a digital integrated circuit device for neural network inferring because Vantrease suggests that it is well-known in the art to use the trained neural network to perform various tasks, which is referred to as the inference process [0002].
Claim(s) 2-12 is/are rejected under 35 U.S.C. 103 as being unpatentable over Zhang (US 20180314671A1) and Vantrease (US 20190294413A1) in view of Charles (US 20200160905A1) and Ovsiannikov (US 20190187983A1).
As per Claim 2, Zhang and Vantrease are relied upon for the teachings as discussed above relative to Claim 1.  Zhang teaches wherein each processing block (104) comprises:  and a processing unit configured to receive inputs from an output from one of the N neighboring processing blocks, and the controller (110), and provide an output as an output of the processing block to the N neighboring processing blocks [0028].
However, Zhang does not teach wherein each processing block comprises:  a variant word buffer to store variant words during a neural network inferring procedure; and a processing unit configured to receive inputs from the variant word buffer.  However, Vantrease teaches wherein each processing block comprises:  a variant buffer to store variant data (each PE may also include sequential logic circuitries (e.g., registers, latches, flip-flops, etc.) to store input data, weights, and output data, [0078]) during a neural network inferring procedure [0068]; and a processing unit configured to receive inputs from the variant buffer, an output from one of the N neighboring processing blocks, and provide an output as an output of the processing block to the N neighboring processing blocks (each PE may also include sequential logic circuitries (e.g., registers, latches, flip-flops, etc.) to store input data, weights, and output data for the adder and multiplier circuitry, PE 620 b may receive a first input data element of pixel group 612 as well as a partial sum comprising weighted first input data element of pixel group 610 from PE 620 a, PE 620 b may multiply the input data element with a weight, add the multiplication product to the partial sum to generate an updated partial sum, and store the updated partial sum in an internal register, PE 620 b may forward the updated partial sum to a PE 620 c, [0078]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Zhang so that each processing block comprises:  a variant buffer to store variant data during a neural network inferring procedure; and a processing unit configured to receive inputs from the variant buffer because Vantrease suggests that this way, the variant data can be quickly stored and also quickly retrieved for further processing [0078].
However, Zhang and Vantrease do not teach an invariant word buffer to store invariant words during one complete neural network inferring procedure; a variant word buffer to store variant words; and a processing unit configured to receive inputs from the invariant word buffer, the variant word buffer.  However, Charles teaches an invariant word buffer to store invariant words (additional memory 50, for example, a non-volatile memory, intended to store the data of configuration of shuffle circuit, [0099], [0104], to perform an operation of shuffling of a word, only one piece of shuffle configuration data is to be stored, [0068]) during one complete neural network inferring procedure (neural network inference, [0103]); a variant word buffer to store variant words (executing, during accesses to the memory content, arithmetic operations having data stored in the memory circuit as operands, [0051], simultaneously activating in read mode a plurality of memory circuit columns, the calculation operations may be implemented on words, [0060]) during a neural network inferring procedure [0103]; and a processing unit configured to receive inputs from the invariant word buffer (shuffle circuit is capable of delivering on its output port the bits stored in its data input register, shuffled according to a shuffle operation defined according to the state of its configuration register 36, [0066], [0068]), the variant word buffer (to implement calculation operations, in a memory circuit formed of cells 10, simultaneously activating in read mode two of cells, [0058], [0051, 0060]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Zhang and Vantrease to include an invariant word buffer to store invariant words during one complete neural network inferring procedure; a variant word buffer to store variant words; and a processing unit configured to receive inputs from the invariant word buffer, the variant word buffer because Charles suggests that the configuration data can be stored in a non-volatile memory so that the configuration data can be retained even after power is removed [0099].
However, Zhang, Vantrease, and Charles do not expressly teach wherein each processing block comprises:  an input selector to select an output from one of the N neighboring processing blocks; and a processing unit configured to receive inputs from the input selector.  However, Ovsiannikov teaches wherein each processing block comprises:  an input selector to select an output from one of the N neighboring processing blocks; and a processing unit configured to receive inputs from the input selector (processing element 120 includes a first multiplexer 660, which determines whether newly received data or data from the output are used for calculation, [0089], establish data paths between any one of the processing elements 120 and any other processing element 120, [0055]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Zhang, Vantrease, and Charles so that each processing block comprises:  an input selector to select an output from one of the N neighboring processing blocks; and a processing unit configured to receive inputs from the input selector as suggested by Ovsiannikov.  It is well-known in the art to use a multiplexer to select an output.
8.	As per Claim 3, Zhang, Vantrease, and Charles do not expressly teach wherein the input selector is a memory addressing selector or a multiplexer.  However, Ovsiannikov teaches wherein the input selector is a memory addressing selector or a multiplexer [0089].  This would be obvious for the reasons given in the rejection for Claim 2.
9.	As per Claim 4, Zhang and Vantrease do not teach wherein the invariant words include neural network parameters.  However, Charles teaches wherein the invariant words include neural network parameters [0099, 0104, 0068, 0103].  This would be obvious for the reasons given in the rejection for Claim 2.
10.	As per Claim 5, Zhang does not teach wherein the variant words include layer input data of a neural network.  However, Vantrease teaches wherein the variant data include layer input data of a neural network (neural network may include multiple processing nodes arranged on layers, each processing node on a layer (input layer) may receive a sequential stream of input data elements, [0017], [0078]).  This would be obvious for the reasons given in the rejection for Claim 2.
	However, Zhang and Vantrease do not expressly teach that the variant data is variant words.  However, Charles teaches the variant words [0051, 0060].  Thus, this teaching from Charles can be implemented into device of Vantrease so that the variant words include layer input data of a neural network. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Zhang and Vantrease so that the variant data is variant words as suggested by Charles.  It is well-known in the art that a word is 2 bytes and a byte is 8 bits, and it is well-known in the art for data to be in words.
11.	As per Claim 6, Zhang teaches wherein N is an integer between 1 and 8 (process element to pass to a neighboring process element, [0028]).
12.	As per Claim 7, Zhang, Vantrease, and Charles do not expressly teach wherein the processing unit is configured to receive as input from the input selector a selected output of one of the N neighboring processing blocks.  However, Ovsiannikov teaches wherein the processing unit is configured to receive as input from the input selector a selected output of one of the N neighboring processing blocks [0089, 0055].  This would be obvious for the reasons given in the rejection for Claim 2.
13.	As per Claim 8, Zhang does not teach wherein the output of the processing unit is stored in the variant word buffer.  However, Vantrease teaches wherein the output of the processing unit is stored in the variant buffer [0078].  This would be obvious for the reasons given in the rejection for Claim 2.
	However, Zhang and Vantrease do not teach wherein the output of the processing unit is stored in one of the invariant word buffer and the variant word buffer.  However, Charles teaches the invariant word buffer and the variant word buffer, as discussed in the rejection for Claim 2.  Thus, this teaching from Charles can be implemented into the device of Vantrease so that the output of the processing unit is stored in one of the invariant word buffer and the variant word buffer.  This would be obvious for the reasons given in the rejection for Claim 2.
14.	As per Claim 9, Zhang teaches wherein the processing unit is configured to perform one or more of a multiply-accumulate function, a partial convolution function, a partial pooling function, a normalization function, or an identity function based on an input from the controller (multiplication results from multiplier 308 through multiplier 312 are accumulated, [0040], suitable scheduling of the PE executions, this function is performed by processing element scheduler 110, [0037]).
15.	As per Claim 10, Zhang does not teach wherein the variant word buffer is implemented as a static random-access memory (SRAM) block.  However, Vantrease teaches wherein the variant buffer is implemented as a static random-access memory (SRAM) block (the data cached at state buffer 522 may include, for example, the input data and weights, state buffer 522 may include static random access memory (SRAM), [0074]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Zhang so that the variant word buffer is implemented as a static random-access memory (SRAM) block as suggested by Vantrease.  It is well-known in the art that SRAM is typically used to store data that needs to be quickly retrieved.
However, Zhang and Vantrease do not teach wherein the variant word buffer is implemented as the SRAM block, and the invariant word buffer is implemented as a non-volatile memory (NVM) block.  However, Charles teaches where the variant word buffer [0051, 0060] is implemented as a SRAM block (SRAM storage cell 12, [0054]), and the invariant word buffer is implemented as a non-volatile memory (NVM) block [0099, 0104, 0068].  This would be obvious for the reasons given in the rejection for Claim 2.
16.	As per Claim 11, Zhang and Vantrease do not teach wherein network parameters and local instructions are stored in the NVM block such that upon activation, the matrix of processing blocks is ready to perform neural network inferring on the received inputs without an activation setup.  However, Charles teaches wherein network parameters and local instructions are stored in the NVM block such that the processing block is ready to perform neural network inferring on received inputs ([0099, 0104], shuffle circuit 30 is configurable, via its configuration register 36, to implement any of the K! possible shuffle operations, [0067], [0068, 0103]).  It is well-known in the art that upon activation, the data stored in the NVM is ready to be accessed without an activation setup.  Thus, Charles teaches wherein network parameters and local instructions are stored in the NVM block such that upon activation, the matrix of processing blocks is ready to perform neural network inferring on the received inputs without an activation setup [0099, 0104, 0067, 0068, 0103].  This would be obvious for the reasons given in the rejection for Claim 2.
17.	As per Claim 12, Zhang teaches wherein an input distribution to each layer is organized as localized distribution such that any two neighboring inputs in the input organization are distributed to either the same processing block or two neighboring processing blocks in the matrix of processing blocks (first input data sequence 520 is sequentially clocked into the first column of systolic array 500 that is comprised of processing element 502, processing element 508, processing element 514, second input data sequence 522 is sequentially clocked into the second column of systolic array 500 that is comprised of processing element 504, processing element 510, processing element 516, [0045]).
18.	Claim(s) 13 is/are rejected under 35 U.S.C. 103 as being unpatentable over Zhang (US 20180314671A1), Vantrease (US 20190294413A1), Charles (US 20200160905A1), and Ovsiannikov (US 20190187983A1) in view of Shalev (US010489479B1).
	Zhang, Vantrease, Charles, and Ovsiannikov are relied upon for the teachings as discussed above relative to Claim 12.
	However, Zhang, Vantrease, Charles, and Ovsiannikov do not teach wherein an input of a convolutional layer of the matrix is arranged as an input tensor and the localized distribution of the input tensor is derived from localized distributions of input matrices of the input tensor.  However, Shalev teaches wherein an input of a convolutional layer of the matrix is arranged as an input tensor and the localized distribution of the input tensor is derived from localized distributions of input matrices of the input tensor (in the convolutional layers, a three-dimensional array of input data (commonly referred to as a 3D matrix or tensor), col. 1, lines 25-29; sequence in which the input data values are distributed to the processing elements, col. 2, lines 10-17).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Zhang, Vantrease, Charles, and Ovsiannikov so that an input of a convolutional layer of the matrix is arranged as an input tensor and the localized distribution of the input tensor is derived from localized distributions of input matrices of the input tensor because Shalev suggests that it is well-known in the art that in the convolutional layers, a 3D array of input data (commonly referred to as a 3D matrix or tensor) is convolved with a tensor (col. 1, lines 25-29).
19.	Claim(s) 14 is/are rejected under 35 U.S.C. 103 as being unpatentable over Zhang (US 20180314671A1), Vantrease (US 20190294413A1), Charles (US 20200160905A1), Ovsiannikov (US 20190187983A1), and Shalev (US010489479B1) in view of Wu (US009779786B1).
	Zhang, Vantrease, Charles, Ovsiannikov, and Shalev are relied upon for the teachings as discussed above relative to Claim 13.
	However, Zhang, Vantrease, Charles, Ovsiannikov, and Shalev do not teach wherein the localized distribution of the input tensor is derived from a matrix distribution for all input matrices of the input tensor by assigning an input in the input tensor to a same processing block determined by the matrix distribution the input belongs to.  However, Wu teaches wherein the localized distribution of the input tensor is derived from a matrix distribution for all input matrices of the input tensor by assigning an input in the input tensor to a same processing block determined by the matrix distribution the input belongs to (each tensor operation can correspond to a processing element applying different masks to the same input data, the input matrix being operated on can be separated into a set of input data slices, col. 3, lines 17-21; each processing element can receive a three-by-three input data slice from different portions of a corresponding IFM, col. 6, lines 14-16).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Zhang, Vantrease, Charles, Ovsiannikov, and Shalev so that wherein the localized distribution of the input tensor is derived from a matrix distribution for all input matrices of the input tensor by assigning an input in the input tensor to a same processing block determined by the matrix distribution the input belongs to because Wu suggests that it is well-known in the art that matrices, or more broadly tensors, are used by processing circuitry to provide solutions to a variety of different problems.  It is well-known in the art for image processing to use convolution matrices (col. 1, lines 12-15).
20.	Claim(s) 15-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Zhang (US 20180314671A1), Vantrease (US 20190294413A1), Charles (US 20200160905A1), and Ovsiannikov (US 20190187983A1) in view of Meeker (US 20050257026A1).
21.	As per Claim 15, Claim 15 is similar in scope to Claim 2, except that Claim 15 has the additional limitation of providing an output based on the application of the function to four neighboring processing blocks of the processing block.  Zhang, Vantrease, Charles, and Ovsiannikov do not teach providing an output based on the application of the function to four neighboring processing blocks of the processing block.  However, Meeker teaches providing an output based on the application of the function to four neighboring processing blocks of the processing block (each PE is coupled to its 4 nearest neighbors, the NO, SO, EO and WO outputs of each PE are connected to the SI, NI, WI and EI inputs of the 4 nearest neighbor PEs, [0099]).  Thus, Claim 15 is rejected under the same rationale as Claim 2 along with this additional teaching from Meeker.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Zhang, Vantrease, Charles, and Ovsiannikov to include providing an output based on the application of the function to four neighboring processing blocks of the processing block because Meeker suggests that this structure is well suited for processing of data that has a 2-d structure, such as image pixel data [0004].
22.	As per Claim 16, Zhang, Vantrease, Charles, and Ovsiannikov do not expressly teach receiving the first input from the input selector comprises receiving a selected one of four outputs from the four neighboring processing blocks of the processing block.  However, Meeker teaches receiving the first input from the input selector comprises receiving a selected one of four outputs from the four neighboring processing blocks of the processing block [0099].  This would be obvious for the reasons given in the rejection for Claim 15.
23.	As per Claim 17, Claim 17 is similar in scope to Claim 9, and therefore is rejected under the same rationale.
24.	As per Claim 18, Zhang does not teach further comprising:  storing the output to one or more of the buffers in the processing block.  However, Vantrease teaches further comprising:  storing the output to one or more of the buffers in the processing block [0078].  This would be obvious for the reasons given in the rejection for Claim 2.
25.	As per Claim 19, Claim 19 is similar in scope to Claims 4-5, and therefore are rejected under the same rationale.  As per Claim 20, Claim 20 is similar in scope to Claim 12, and therefore is rejected under the same rationale.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JONI HSU whose telephone number is (571)272-7785. The examiner can normally be reached M-F 10am-6:30pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kee Tung can be reached on (571)272-7794. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





JH
/JONI HSU/Primary Examiner, Art Unit 2611