DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
1.	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

2.	Claims 1-20 are pending.

Information Disclosure Statement
3.	The information disclosure statements (IDSs) submitted on 12/19/2019, 3/10/20201 5/25/2020, 6/23/2021 and 2/14/2022 are in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Drawings
4.	The drawings have been reviewed and are accepted as being in compliance with the provisions of 37 CFR 1.121.

Priority
5.	Acknowledgment is made of applicant’s claim for foreign priority under 35 U.S.C. 119 (a)-(d). The certified copy has been filed on 3/4/2020.



Claim Rejections - 35 USC § 112
6.	The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


7.	Claims 3, 6, 13, and 14 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Regarding claims 3, 13, and 14, the phrase "to be" renders the claimsindefinite because it is unclear whether the limitation(s) following the phrase are part of he claimed invention. See MPEP § 2173.05(d).
Regarding Claim 6, the claim recites “as a whole” is not clear for the examiner if the data block as a “whole” means “whole vector” “data blocks as a percentage whole” the Examiner interpreted as the result of the convolution or inner product.

Claim Rejections - 35 USC § 102
8.	The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
 (a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

9.	Claim(s) 1-20 is/are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Henry et al (US 20170103305), hereinafter “Henry”.

As per Claim 1, Henry discloses:
An integrated circuit chip apparatus comprising: a main processing circuit and a plurality of basic processing circuits, (Par [0067], “An IC is also referred to as a chip, a microchip, or a die” and par [0084], “The NPU 126 includes a register 205, a 2-input multiplexed register (mux-reg) 208, an arithmetic logic unit (ALU) 204, an accumulator 202, and an activation function unit (AFU) 212…” the IC being the main and the MUX being the basics included into the NNU (neural network unit to perform multiply-accumulate function)  wherein the main processing circuit or at least one of the plurality of basic processing circuits includes a data type conversion circuit configured to convert data between a floating point data type and a fixed point data type, (Par [0090], “In one embodiment, the NPU 126 is pipelined. For example, the NPU 126 may include registers of the ALU 204, such as a register between the multiplier and the adder and/or other circuits of the ALU 204, and a register that holds the output of the AFU 212.” And par [0238], “…flow proceeds to block units typically require logic to perform rounding of floating-point results, logic to convert between integer and floating-point formats or between different floating-point precision formats (e.g., extended precision, double precision, single precision, half precision), leading zero and leading one detectors, and logic to deal with special floating-point numbers, such as denormal numbers, NANs and infinity.” “NPU” neural processing unit and “ALU” arithmetic logic unit, “AFU” activation function unit) the plurality of basic processing circuits are configured to perform a first set of neural network computations in parallel on data transferred by the main processing circuit, (Par [0094] For each instruction of the program, all of the NPUs 126 perform the instruction in parallel.” And see par [0463], also includes the specifics of the “parallel fashion” between the input and computed output of all the neurons, performing recurrent neural network computations.) and transfer a plurality of computation results to the main processing circuit, (par [0308], transfer data between the RAM buffers)  and the main processing circuit is configured to perform a second set of neural network computations in series on the plurality of computation results. (Par [02439], “fixed-point numbers are represented with an indication of the number of bits of storage that are fractional bits for an entire set of numbers, however, the indication is located in a single, shared storage that globally indicates the number of fractional bits for all the numbers of the entire set, e.g., a set of inputs to a series of operations, a set of accumulated values of the series, a set of outputs..” there is a plurality of NPUs which perform second and series of computations, see also par [0245]).

As per Claim 2, the rejection of Claim 1 is incorporated and Henry further discloses: further comprising: a branch processing circuit, wherein the branch processing circuit is located between the main processing circuit and at least one basic processing circuit, (Par [0068], “…such as a branch, call or return instruction” and see Figure 38, comprising a sequencer for loop or branch instructions ) wherein the branch processing circuit is configured to forward data between the main processing circuit and at least one basic processing circuit. (par [0068], “such as a branch target address, return address or exception vector” and par [0075], “The execution units 112 may also include integer units, media units, floating-point units and a branch unit.” And par [0342], doing a “feed forward” See Claim 1 including the main and basic circuits).

As per Claim 3, the rejection of Claim 1 is incorporated and Henry further discloses: wherein the main processing circuit is configured to: receive a data block to be computed and a computation instruction: (Par [0111], “At block 602, the processor 100, i.e., the architectural program running on the processor 100, writes the input values to the current hidden layer”) convert the data block to a fixed point data block using the data type conversion circuit; (Par [0090], “In one embodiment, the NPU 126 is pipelined. For example, the NPU 126 may include registers of the ALU 204, such as a register between the multiplier and the adder and/or other circuits of the ALU 204, and a register that holds the output of the AFU 212.” And par [0238], “…flow proceeds to block units typically require logic to perform rounding of floating-point results, logic to convert between integer and floating-point formats or between different floating-point precision formats (e.g., extended precision, double precision, single precision, half precision), leading zero and leading one detectors, and logic to deal with special floating-point numbers, such as denormal numbers, NANs and infinity.”)
divide the fixed point data block into a distribution data block and a broadcasting data block according to the computation instruction; (Par [0252], “The activation function 2934 specifies the function applied to the accumulator 202 value 217 to generate the output 133 of the NPU 126. As described above and below in more detail, the activation functions 2934 include, but are not limited to: sigmoid; hyperbolic tangent; soft plus; rectify; divide by specified power of two; multiply by a user-specified reciprocal value to accomplish an effective division; pass-through full accumulator; and pass-through the accumulator as a canonical size” and par [0307], “The weight RAM buffer 3524 is coupled between the weight RAM 124 and media registers 118 for buffering transfers of data between them”.) partition the distribution data block to obtain a plurality of basic data blocks: distribute the plurality of basic data blocks to the plurality of basic processing circuits; (Par [0252] and par [0304], “The clock reduction logic 3504 is similar in many respects to the clock generation logic 3502 in that it includes a clock distribution network, or clock tree, that distributes the secondary clock signal to various blocks of the NNU 121”)
and broadcast the broadcasting data block to the plurality of basic processing circuits, (Par [0252], “activation function is specified by the initiate instruction and applied in response to an output instruction, e.g., write AFU output instruction at address 4 of FIG. 4, in which embodiment the activation function instruction at address 3 of FIG. 4 is subsumed by the output instruction.” And see par [0445], “FIG. 49 are replaced by a single shared AFU 1112 that receives the four outputs 217 of the four accumulators 202 and generates four outputs to OUTBUF[0], OUTBUF[1], OUTBUF[2], and OUTBUF[3]. The NNU 121 of FIG. 52”).

As per Claim 4, the rejection of Claim 3 is incorporated and Henry further discloses: wherein the basic processing circuits are configured to: perform inner product computations on the basic data blocks and the broadcasting data block in the fixed point data type to obtain the plurality of computation results.  (Par [0191], “convolution kernel” and [0195], “As the NNU program performs the convolution, it writes back the resulting matrix to the weight RAM 12” The convolution being the “inner product computations” as claimed.  And Par [0236-0237], “… Full Precision Fixed-Point Accumulation” “The floating-point unit automatically takes care of multiplying the mantissa, adding the exponents, and then normalizing the result back to a value of 0.8991.times.10.sup.59” and Par [0252], “The activation function 2934 specifies the function applied to the accumulator 202 value 217 to generate the output 133 of the NPU 126. As described above and below in more detail, the activation functions 2934 include, but are not limited to: sigmoid; hyperbolic tangent; soft plus; rectify; divide by specified power of two; multiply by a user-specified reciprocal value to accomplish an effective division; pass-through full accumulator; and pass-through the accumulator as a canonical size” and par [0307], “The weight RAM buffer 3524 is coupled between the weight RAM 124 and media registers 118 for buffering transfers of data between them”.) p ).

As per Claim 5, the rejection of Claim 4 is incorporated and Henry further discloses: wherein the main processing circuit is  further configured to: convert the plurality of computation results to the floating point data type;  ( (Par [0236-0237], “… Full Precision Fixed-Point Accumulation” “The floating-point unit automatically takes care of multiplying the mantissa, adding the exponents, and then normalizing the result back to a value of 0.8991.times.10.sup.59” and Par [0244], “the fixed-point hardware assist performs the necessary scaling and saturating to convert the full-precision accumulated value to an output value using the user-specified indications of the number of fractional bits of the accumulated value and the desired number of fractional bits in the output value” and par [0245]).

As per Claim 6, the rejection of Claim 3 is incorporated and Henry further discloses:
wherein the main processing circuit is configured to broadcast the broadcasting data block as a whole to the plurality of basic processing circuits. (Par [0195], “As the NNU program performs the convolution, it writes back the resulting matrix to the weight RAM 124.”)

As per Claim 7, the rejection of Claim 3 is incorporated and Henry further discloses:
wherein the basic processing circuits are configured to accumulate results of the inner product computations to obtain the computation results. (Par [0191], “convolution kernel” and [0195], “As the NNU program performs the convolution, it writes back the resulting matrix to the weight RAM 12” The convolution being the “inner product computations” as claimed.  And Par [0236-0237], “… Full Precision Fixed-Point Accumulation” “The floating-point unit automatically takes care of multiplying the mantissa, adding the exponents, and then normalizing the result back to a value of 0.8991.times.10.sup.59”).

Claim 8, the rejection of Claim 4 is incorporated and Henry further discloses: wherein the computation result transferred to the main processing circuit by each basic processing circuit includes a plurality of inner product results, and the main processing circuit is configured to: accumulate the plurality of inner product results to obtain an accumulation result corresponding to each basic processing circuit; (Par [0191], “convolution kernel” and [0195], “As the NNU program performs the convolution, it writes back the resulting matrix to the weight RAM 12” The convolution being the “inner product computations” as claimed.  And Par [0236-0237], “… Full Precision Fixed-Point Accumulation” “The floating-point unit automatically takes care of multiplying the mantissa, adding the exponents, and then normalizing the result back to a value of 0.8991.times.10.sup.59”).
and sort the accumulation results corresponding to the plurality of basic processing circuits to obtain an instruction result of the computation instruction. (Par [0088], “The results 133 of all of the N NPUs 126 may be written back concurrently to either the data RAM 122 or to the weight RAM 124. Preferably, the AFU 212 is configured to perform multiple activation functions…” and par [0194-0195], “convolve a data matrix 2406 of a chunk of the data array 2404, the NPUs 126 repeatedly read, in order, the nine rows of the data RAM 122 that hold the convolution kernel 2042” and “As the NNU program performs the convolution, it writes back the resulting matrix to the weight RAM 124.”)

Claim 9, the rejection of Claim 3 is incorporated and Henry further discloses:
wherein the main processing circuit is further configured to: divide the broadcasting data block into a plurality of partial broadcasting data blocks; and (Par [0214], “Alternatively, rather than specifying a pass through activation function, a divide activation function is specified that divides the accumulator 202 value 217 by a divisor, such as described herein, e.g., with respect to FIGS. 29A and 30, e.g., using one of the "dividers" 3014/3016 of FIG. 30.”) sequentially broadcast the plurality of partial broadcasting data blocks to the plurality of basic processing circuits. ; (Par [0195], “FIG. 25, the architectural program writes the weight RAM 124 with the values of a data matrix 2406. As the NNU program performs the convolution, it writes back the resulting matrix to the weight RAM 12” The convolution being the “inner product computations” as claimed, goes back to the register, and par [0307-308], “similarly, the data RAM buffer 3522 is coupled between the data RAM 122 and media registers 118 for buffering transfers of data between them. Preferably, the data RAM buffer 3522 is similar to one or more of the embodiments of the buffer 1704 of FIG. 17. Preferably, the portion of the data RAM buffer 3522…”).

As per Claim 10, the rejection of Claim 1 is incorporated and Henry further discloses: wherein the basic processing circuits are configured to, for each partial broadcasting data block: perform inner product computations on the partial broadcasting data block and the corresponding basic data blocks in the fixed point data type to obtain a plurality of inner product results; (Par [0195], “FIG. 25, the architectural program writes the weight RAM 124 with the values of a data matrix 2406. As the NNU program performs the convolution, it writes back the resulting matrix to the weight RAM 12” The convolution being the “inner product computations” as claimed.) accumulate the inner product results to obtain a plurality of partial computation results, (Par [0236-0237], “… Full Precision Fixed-Point Accumulation” “The floating-point unit automatically takes care of multiplying the mantissa, adding the exponents, and then normalizing the result back to a value of 0.8991.times.10.sup.59”)
and transfer the partial computation results to the main processing circuit. (Par [0307-308], “similarly, the data RAM buffer 3522 is coupled between the data RAM 122 and media registers 118 for buffering transfers of data between them. Preferably, the data RAM buffer 3522 is similar to one or more of the embodiments of the buffer 1704 of FIG. 17. Preferably, the portion of the data RAM buffer 3522…”).

As per Claim 11, the rejection of Claim 1 is incorporated and Henry further discloses: wherein the main processing circuit includes a main register or a main on-chip caching circuit, and the basic processing circuit includes a basic register or a basic on-chip caching circuit. (Par [0004-0005], and par [0014], “In the embodiment of FIG. 11, a neuron is split into two portions, the activation function unit portion and the ALU portion (which also includes the shift register portion), and each activation function unit portion is shared by multiple ALU portions.” And par [0030], “control register” and See Figures 1-3 and 11) 

As per Claim 12, the rejection of Claim 10 is incorporated and Henry further discloses: wherein the main processing circuit includes one or more of a vector computing unit circuit, an arithmetic and logic unit circuit, an accumulator circuit, a matrix transposition circuit, a direct memory access circuit, a data type conversion circuit, or a data rearrangement circuit. (Par [0090], “In one embodiment, the NPU 126 is pipelined. For example, the NPU 126 may include registers of the ALU 204, such as a register between the multiplier and the adder and/or other circuits of the ALU 204, and a register that holds the output of the AFU 212.” And par [0238], “…flow proceeds to block units typically require logic to perform rounding of floating-point results, logic to convert between integer and floating-point formats or between different floating-point precision formats (e.g., extended precision, double precision, single precision, half precision), leading zero and leading one detectors, and logic to deal with special floating-point numbers, such as denormal numbers, NANs and infinity.”).

As per Claim 13, the rejection of Claim 10 is incorporated and Henry further discloses: wherein the main processing circuit is configured to: obtain a data block to be computed and a computation instruction; (Par [0111], “At block 602, the processor 100, i.e., the architectural program running on the processor 100, writes the input values to the current hidden layer”)  divide the data block into a distribution data block and a broadcasting data block according to the computation instruction: partition the distribution data block to obtain a plurality of basic data blocks: (Par [0252], “The activation function 2934 specifies the function applied to the accumulator 202 value 217 to generate the output 133 of the NPU 126. As described above and below in more detail, the activation functions 2934 include, but are not limited to: sigmoid; hyperbolic tangent; soft plus; rectify; divide by specified power of two; multiply by a user-specified reciprocal value to accomplish an effective division; pass-through full accumulator; and pass-through the accumulator as a canonical size” and par [0307], “The weight RAM buffer 3524 is coupled between the weight RAM 124 and media registers 118 for buffering transfers of data between them”.)  distribute the plurality of basic data blocks to the plurality of basic processing circuits; (Par [0252] and par [0304], “The clock reduction logic 3504 is similar in many respects to the clock generation logic 3502 in that it includes a clock distribution network, or clock tree, that distributes the secondary clock signal to various blocks of the NNU 121”) and broadcast the broadcasting data block to the plurality of basic processing circuits. (Par [0252], “activation function is specified by the initiate instruction and applied in response to an output instruction, e.g., write AFU output instruction at address 4 of FIG. 4, in which embodiment the activation function instruction at address 3 of FIG. 4 is subsumed by the output instruction.” And see par [0445], “FIG. 49 are replaced by a single shared AFU 1112 that receives the four outputs 217 of the four accumulators 202 and generates four outputs to OUTBUF[0], OUTBUF[1], OUTBUF[2], and OUTBUF[3]. The NNU 121 of FIG. 52”).

As per Claim 14, the rejection of Claim 13 is incorporated and Henry further discloses: wherein the basic processing circuits are configured to: convert the basic data blocks and the broadcasting data block into data blocks of the fixed point data type; (Par [0071], “ The rename unit 106 allocates, in program order”, Par [0191], “convolution kernel” and [0195], “As the NNU program performs the convolution, it writes back the resulting matrix to the weight RAM 12” The convolution being the “inner product computations” as claimed.  And Par [0236-0237], “… Full Precision Fixed-Point Accumulation” “The floating-point unit automatically takes care of multiplying the mantissa, adding the exponents, and then normalizing the result back to a value of 0.8991.times.10.sup.59”) perform inner product computations between the basic data blocks and the broadcasting data block in the fixed point data type to obtain fixed point computation results; (Par [0195], Par [0236-0237], “… Full Precision Fixed-Point Accumulation”, par  [0252], “The activation function 2934 specifies the function applied to the accumulator 202 value 217 to generate the output 133 of the NPU 126...”  and par [0307], “The weight RAM buffer 3524 is coupled between the weight RAM 124 and media registers 118 for buffering transfers of data between them”.) 
convert the computation results from the fixed point data type to the floating point data type; (Par [0236-0237], “… Full Precision Fixed-Point Accumulation” “The floating-point unit automatically takes care of multiplying the mantissa, adding the exponents, and then normalizing the result back to a value of 0.8991.times.10.sup.59” and Par [0244], “the fixed-point hardware assist performs the necessary scaling and saturating to convert the full-precision accumulated value to an output value using the user-specified indications of the number of fractional bits of the accumulated value and the desired number of fractional bits in the output value” and par [0245]).
and transfer the computation results in the floating point data type to the main processing circuit, (par [0308], transfer data between the RAM buffers) wherein the main processing circuit is configured to process the computation results to obtain an instruction result of the data block to be computed and the computation instruction.   (Par [0094] For each instruction of the program, all of the NPUs 126 perform the instruction in parallel.” And see par [0463], also includes the specifics of the “parallel fashion” between the input and computed output of all the neurons, performing recurrent neural network computations.)


As per Claim 15, the rejection of Claim 2 is incorporated and Henry further discloses: wherein the integrated circuit chip apparatus includes a plurality of the branch processing circuits, wherein the main processing circuit are connected to the plurality of branch processing circuits respectively, and each branch processing circuit is connected to at least one basic processing circuit. (par [0068], “such as a branch target address, return address or exception vector” and par [0075], “The execution units 112 may also include integer units, media units, floating-point units and a branch unit.” And par [0342], doing a “feed forward”).

As per Claim 16, the rejection of Claim 1 is incorporated and Henry further discloses: wherein the data is one or more of a vector, a matrix, or an n-dimensional data block, where n is an integer larger than 2. (Par [0038], “…input vector or matrix to produce the output vector or matrix” and par [0061], “32-bit integer or smaller”)

As per Claim 17, the rejection of Claim 3 is incorporated and Henry further discloses: wherein the computation instruction is multiplication instruction, wherein the main processing circuit determines a multiplier data block as the broadcasting data block and a multiplicand data block as the distribution data block. (Par [0087], “Although FIG. 2 shows only a multiplier 242 and adder 244 in the ALU 204, preferably the ALU 204 includes other elements to perform the other operations described above. For example, preferably the ALU 204 includes a comparator (not shown) for comparing the accumulator 202 with a data/weight word and a mux (not shown) that selects the larger (maximum) of the two values indicated by the comparator for storage in the accumulator 202.” And par [0293], “... the ALU 204 need not include adders that would be needed in a floating-point implementation to add the exponents of the multiplicands for the multiplier 242”)

As per Claim 18, the rejection of Claim 3 is incorporated and Henry further discloses: wherein the computation instruction is a convolution instruction, wherein the main processing circuit determines an input data block as the broadcasting data block, and a convolution kernel as the distribution data block.  (Par [0191], “convolution kernel” and [0195], “As the NNU program performs the convolution, it writes back the resulting matrix to the weight RAM 12” The convolution being the “inner product computations” as claimed.  And Par [0236-0237], “… Full Precision Fixed-Point Accumulation” “The floating-point unit automatically takes care of multiplying the mantissa, adding the exponents, and then normalizing the result back to a value of 0.8991.times.10.sup.59” and Par [0252], “The activation function 2934 specifies the function applied to the accumulator 202 value 217 to generate the output 133 of the NPU 126. As described above and below in more detail, the activation functions 2934 include, but are not limited to: sigmoid; hyperbolic tangent; soft plus; rectify; divide by specified power of two; multiply by a user-specified reciprocal value to accomplish an effective division; pass-through full accumulator; and pass-through the accumulator as a canonical size” and par [0307], “The weight RAM buffer 3524 is coupled between the weight RAM 124 and media registers 118 for buffering transfers of data between them”).

Conclusion
10.	The prior art made of record and not relied upon is considered pertinent to applicant’s disclosure.
	Henry et al (US 20170102921) relates to APPARATUS EMPLOYING USER-SPECIFIED BINARY POINT FIXED POINT ARITHMETIC, specifically by advantageously, embodiments are described herein in which the ALUs are integer units, but the activation function units include fixed-point arithmetic hardware assist, or acceleration. This enables the ALU portions to be smaller and faster, which facilitates having more ALUs within a given space on the die. This implies more neurons per die space, which is particularly advantageous in a neural network unit.
11.	Any inquiry concerning this communication or earlier communications from the examiner should be directed to ANGELICA RUIZ whose telephone number is (571)270-3158. The examiner can normally be reached M-F 10:00 am to 6:00 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Pierre M Vital can be reached on (571) 272-4215. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/ANGELICA RUIZ/Primary Examiner, Art Unit 2162