DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
1.	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

2.	Claims 1-20 are pending.

Information Disclosure Statement
3.	The information disclosure statements (IDSs) submitted on, 9/2/2020, 6/23/2021, and 2/14/2022 are in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Drawings
4.	The drawings have been reviewed and are accepted as being in compliance with the provisions of 37 CFR 1.121.

Priority
5. 	Acknowledgment is made of applicant’s claim for foreign priority under 35 U.S.C. 119 (a)-(d). The certified copy has been filed on 3/4/2020.

Double Patenting
6.	The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the claims at issue are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); and In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on a nonstatutory double patenting ground provided the reference application or patent either is shown to be commonly owned with this application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159.  See MPEP §§ 706.02(l)(1) - 706.02(l)(3) for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/forms/. The filing date of the application in which the form is filed  determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to http://www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.

7.	Claims 1-20 are provisionally rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1-20 of copending Application No. U.S. 16/721875. Although the claims at issue are not identical, they are not patentably distinct from each other because see differences below.
This is a provisional nonstatutory double patenting rejection because the patentably indistinct claims have not in fact been patented.

17/010761
16/721875
1. An integrated circuit chip apparatus, comprising: a main processing circuit and a plurality of basic processing circuits, wherein the main processing circuit comprises a data type conversion circuit configured to convert data between a floating point data type and a fixed point data type: wherein the main processing circuit is configured to: receive an input data block, a weight data block, and a multiplication instruction; 
convert the input data block and the weight data block to an input data block of the fixed point type and a weight data block of the fixed point type, respectively, using the data type conversion circuit; designate the input data block of the fixed point type as a distribution data block and the weight data block of the fixed point type as a broadcasting data block according to the multiplication instruction; 

distribute the plurality of basic data blocks to at least one of the plurality of basic processing circuits; and

broadcast the broadcasting data block to the plurality of basic processing circuits;

wherein the at least one of the plurality of basic processing circuits is configured to perform computations on the broadcasting data block and the received basic data blocks according to the fixed point type to obtain computation results, 

and transfer the computation results to the main processing circuit;
wherein the main processing circuit is configured to process the computation results to obtain an instruction result of the multiplication instruction.
An integrated circuit chip apparatus comprising: a main processing circuit and a plurality of basic processing circuits, wherein the main processing circuit or at least one of the plurality of basic processing circuits includes a data type conversion circuit configured to convert data between a floating point data type and a fixed point data type, the plurality of basic processing circuits are configured to perform a first set of neural network computations in parallel on data transferred by the main processing circuit, and transfer a plurality of computation results to the main processing circuit, and the main processing circuit is configured to perform a second set of neural network computations in series on the plurality of computation results.

As per Claim 4, wherein the basic processing circuits are configured to: perform inner product computations on the basic data blocks and the broadcasting data block in the fixed point data type to obtain the plurality of computation results.  

As per Claim 6, wherein the main processing circuit is configured to broadcast the broadcasting data block as a whole to the plurality of basic processing circuits.


Claim 17 wherein the computation instruction is multiplication instruction, wherein the main processing circuit determines a multiplier data block as the broadcasting data block and a multiplicand data block as the distribution data block.


Claim Rejections - 35 USC § 102
8.	The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
 (a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

9.	Claim(s) 1-20 is/are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Henry et al (US 20170103305), hereinafter “Henry”.
As per Claim 1, Henry discloses:
An integrated circuit chip apparatus, comprising: a main processing circuit and a plurality of basic processing circuits, (Par [0067], “An IC is also referred to as a chip, a microchip, or a die” and par [0084], “The NPU 126 includes a register 205, a 2-input multiplexed register (mux-reg) 208, an arithmetic logic unit (ALU) 204, an accumulator 202, and an activation function unit (AFU) 212…” the IC being the main and the MUX being the basics included into the NNU (neural network unit to perform multiply-accumulate function)  wherein the main processing circuit comprises a data type conversion circuit configured to convert data between a floating point data type and a fixed point data type: (Par [0090], “In one embodiment, the NPU 126 is pipelined. For example, the NPU 126 may include registers of the ALU 204, such as a register between the multiplier and the adder and/or other circuits of the ALU 204, and a register that holds the output of the AFU 212.” And par [0238], “…flow proceeds to block units typically require logic to perform rounding of floating-point results, logic to convert between integer and floating-point formats or between different floating-point precision formats (e.g., extended precision, double precision, single precision, half precision), leading zero and leading one detectors, and logic to deal with special floating-point numbers, such as denormal numbers, NANs and infinity.” “NPU” neural processing unit and “ALU” arithmetic logic unit, “AFU” activation function unit) wherein the main processing circuit is configured to: receive an input data block, a weight data block, and a multiplication instruction; (See Figure 7, comprising data and weight RAMS, followed by ALU unit (includes multiplier and adder) convert the input data block and the weight data block to an input data block of the fixed point type and a weight data block of the fixed point type, respectively, using the data type conversion circuit; (Par [0238], “Additionally, flow proceeds to block units typically require logic to perform rounding of floating-point results, logic to convert between integer and floating-point formats or between different floating-point precision formats (e.g., extended precision, double precision, single precision, half precision), leading zero and leading one detectors, and logic to deal with special floating-point numbers, such as denormal numbers, NANs and infinity.”) designate the input data block of the fixed point type as a distribution data block and the weight data block of the fixed point type as a broadcasting data block according to the multiplication instruction; (Par [0173-176], “weight of words received from the weight RAM 124” various designations, and see also figures 19-20) partition the distribution data block to obtain a plurality of basic data blocks; (Par [0252] and par [0304], “The clock reduction logic 3504 is similar in many respects to the clock generation logic 3502 in that it includes a clock distribution network, or clock tree, that distributes the secondary clock signal to various blocks of the NNU 121”) distribute the plurality of basic data blocks to at least one of the plurality of basic processing circuits; (Par [0252] and par [0304], “The clock reduction logic 3504 is similar in many respects to the clock generation logic 3502 in that it includes a clock distribution network, or clock tree, that distributes the secondary clock signal to various blocks of the NNU 121”) and broadcast the broadcasting data block to the plurality of basic processing circuits; (Par [0252], “activation function is specified by the initiate instruction and applied in response to an output instruction, e.g., write AFU output instruction at address 4 of FIG. 4, in which embodiment the activation function instruction at address 3 of FIG. 4 is subsumed by the output instruction.” And see par [0445], “FIG. 49 are replaced by a single shared AFU 1112 that receives the four outputs 217 of the four accumulators 202 and generates four outputs to OUTBUF[0], OUTBUF[1], OUTBUF[2], and OUTBUF[3]. The NNU 121 of FIG. 52”) wherein the at least one of the plurality of basic processing circuits is configured to perform computations on the broadcasting data block and the received basic data blocks according to the fixed point type to obtain computation results, (Par [02439], “fixed-point numbers are represented with an indication of the number of bits of storage that are fractional bits for an entire set of numbers, however, the indication is located in a single, shared storage that globally indicates the number of fractional bits for all the numbers of the entire set, e.g., a set of inputs to a series of operations, a set of accumulated values of the series, a set of outputs..” there is a plurality of NPUs which perform second and series of computations, see also par [0245]).
and transfer the computation results to the main processing circuit; And par [0293], “... the ALU 204 need not include adders that would be needed in a floating-point implementation to add the exponents of the multiplicands for the multiplier 242”)
wherein the main processing circuit is configured to process the computation results to obtain an instruction result of the multiplication instruction. (Par [0085], “The ALU 204 performs arithmetic and/or logical operations on its inputs to generate a result provided on its output.”).

As per Claim 2, the rejection of Claim 1 is incorporated and Henry further discloses: wherein the input data block is a vector or a matrix; (Par [0038], “…input vector or matrix to produce the output vector or matrix” and par [0061], “32-bit integer or smaller”)

As per Claim 3, the rejection of Claim 1 is incorporated and Henry further discloses: wherein the weight data block is a vector or a matrix. (Par [0038], “…input vector or matrix to produce the output vector or matrix” and par [0061], “32-bit integer or smaller”)

As per Claim 4, the rejection of Claim 1 is incorporated and Henry further discloses: wherein: the at least one of the plurality of basic processing circuits is configured to perform multiplication on the broadcasting data block and the received basic data blocks according to the fixed point type to obtain products of the fixed point type, (Par [0191], “convolution kernel” and [0195], “As the NNU program performs the convolution, it writes back the resulting matrix to the weight RAM 12” The convolution being the “inner product computations”)  and transfer the products as computation results to the main processing circuit, the main processing circuit is configured to convert the products of the fixed point type to products of the floating point type using the data type conversion circuit, (Par [0090], “In one embodiment, the NPU 126 is pipelined. For example, the NPU 126 may include registers of the ALU 204, such as a register between the multiplier and the adder and/or other circuits of the ALU 204, and a register that holds the output of the AFU 212.” And par [0238], “…flow proceeds to block units typically require logic to perform rounding of floating-point results, logic to convert between integer and floating-point formats or between different floating-point precision formats (e.g., extended precision, double precision, single precision, half precision), leading zero and leading one detectors, and logic to deal with special floating-point numbers, such as denormal numbers, NANs and infinity.” “NPU” neural processing unit and “ALU” arithmetic logic unit, “AFU” activation function unit)  accumulate the products of the floating point type to obtain accumulation results, and sort the accumulation results to obtain the instruction result. And Par [0236-0237], “… Full Precision Fixed-Point Accumulation” “The floating-point unit automatically takes care of multiplying the mantissa, adding the exponents, and then normalizing the result back to a value of 0.8991.times.10.sup.59” and Par [0252], “The activation function 2934 specifies the function applied to the accumulator 202 value 217 to generate the output 133 of the NPU 126. As described above and below in more detail, the activation functions 2934 include, but are not limited to: sigmoid; hyperbolic tangent; soft plus; rectify; divide by specified power of two; multiply by a user-specified reciprocal value to accomplish an effective division; pass-through full accumulator; and pass-through the accumulator as a canonical size” and par [0307], “The weight RAM buffer 3524 is coupled between the weight RAM 124 and media registers 118 for buffering transfers of data between them”).

As per Claim 5, the rejection of Claim 1 is incorporated and Henry further discloses: wherein: the at least one of the plurality of basic processing circuits is configured to perform inner product computations on the broadcasting data block and the received basic data blocks according to the fixed point type to obtain inner products of the fixed point type, and transfer the inner products as computation results to the main processing circuit, (Par [0191], “convolution kernel” and [0195], “As the NNU program performs the convolution, it writes back the resulting matrix to the weight RAM 12” The convolution being the “inner product computations” as claimed.  And Par [0236-0237], “… Full Precision Fixed-Point Accumulation” “The floating-point unit automatically takes care of multiplying the mantissa, adding the exponents, and then normalizing the result back to a value of 0.8991.times.10.sup.59” and Par [0252], “The activation function 2934 specifies the function applied to the accumulator 202 value 217 to generate the output 133 of the NPU 126. As described above and below in more detail, the activation functions 2934 include, but are not limited to: sigmoid; hyperbolic tangent; soft plus; rectify; divide by specified power of two; multiply by a user-specified reciprocal value to accomplish an effective division; pass-through full accumulator; and pass-through the accumulator as a canonical size” and par [0307], “The weight RAM buffer 3524 is coupled between the weight RAM 124 and media registers 118 for buffering transfers of data between them”.) the main processing circuit is configured to convert the inner products of the fixed point type to inner products of the floating point type using the data type conversion circuit, and sort the inner products to obtain the instruction result. (Par [0088], “The results 133 of all of the N NPUs 126 may be written back concurrently to either the data RAM 122 or to the weight RAM 124. Preferably, the AFU 212 is configured to perform multiple activation functions…” and par [0194-0195], “convolve a data matrix 2406 of a chunk of the data array 2404, the NPUs 126 repeatedly read, in order, the nine rows of the data RAM 122 that hold the convolution kernel 2042” and “As the NNU program performs the convolution, it writes back the resulting matrix to the weight RAM 124.”)

As per Claim 6, the rejection of Claim 1 is incorporated and Henry further discloses: further comprising a branch processing circuit, wherein the branch processing circuit is located between the main processing circuit and at least one basic processing circuit, wherein the branch processing circuit is configured to forward data between the main processing circuit and at least one basic processing circuit. (par [0068], “such as a branch target address, return address or exception vector” and par [0075], “The execution units 112 may also include integer units, media units, floating-point units and a branch unit.” And par [0342], doing a “feed forward” See Claim 1 including the main and basic circuits).

As per Claim 7, the rejection of Claim 1 is incorporated and Henry further discloses:, wherein the main processing circuit is configured to broadcast the broadcasting data block as a whole to the plurality of basic processing circuits. (Par [0195], “As the NNU program performs the convolution, it writes back the resulting matrix to the weight RAM 124.”)

As per Claim 8, the rejection of Claim 1 is incorporated and Henry further discloses: wherein the main processing circuit is further configured to partition the broadcasting data block into a plurality of partial broadcasting data blocks, (Par [0214], “Alternatively, rather than specifying a pass through activation function, a divide activation function is specified that divides the accumulator 202 value 217 by a divisor, such as described herein, e.g., with respect to FIGS. 29A and 30, e.g., using one of the "dividers" 3014/3016 of FIG. 30.”) and sequentially broadcast the plurality of partial broadcasting data blocks to the plurality of basic processing circuits. (Par [0195], “FIG. 25, the architectural program writes the weight RAM 124 with the values of a data matrix 2406. As the NNU program performs the convolution, it writes back the resulting matrix to the weight RAM 12” The convolution being the “inner product computations” as claimed, goes back to the register, and par [0307-308], “similarly, the data RAM buffer 3522 is coupled between the data RAM 122 and media registers 118 for buffering transfers of data between them. Preferably, the data RAM buffer 3522 is similar to one or more of the embodiments of the buffer 1704 of FIG. 17. Preferably, the portion of the data RAM buffer 3522…”).

As per Claim 9, the rejection of Claim 8 is incorporated and Henry further discloses: wherein the at least one of the plurality of basic processing circuits is configured to sequentially perform inner product processing on the partial broadcasting data blocks and the basic data blocks according to the fixed point type to obtain results of inner product processing, (Par [0195], “FIG. 25, the architectural program writes the weight RAM 124 with the values of a data matrix 2406. As the NNU program performs the convolution, it writes back the resulting matrix to the weight RAM 12” The convolution being the “inner product computations” as claimed.) and transfer the results of inner product processing to the main processing circuit. (Par [0307-308], “similarly, the data RAM buffer 3522 is coupled between the data RAM 122 and media registers 118 for buffering transfers of data between them. Preferably, the data RAM buffer 3522 is similar to one or more of the embodiments of the buffer 1704 of FIG. 17. Preferably, the portion of the data RAM buffer 3522…”).

As per Claim 11, the rejection of Claim 1 is incorporated and Henry further discloses: wherein the multiplication instruction is for performing a matrix-multiply-vector computation, and the main processing circuit is further configured to transfer data of at least one row of a matrix to a basic processing circuit at a time. (Par [0175], “Each mux-reg 208A receives its corresponding narrow data word 207A of one row of the D rows of the data RAM 122” and Par [0238], “some of the complexities of floating-point units include logic that performs exponent calculations associated with floating-point addition and multiplication/division (adders to add/subtract exponents of operands to produce resulting exponent value for floating-point multiplication/division, subtracters to determine subtract exponents of operands to determine binary point alignment shift amounts for floating-point addition), shifters that accomplish binary point alignment of the mantissas for floating-point addition”).

As per Claims 12-20, being the method and device claims corresponding to the system claims 1-4 respectively and rejected under the same reason set forth in connection of the rejections of Claims 1-4 and further Henry discloses: (Par [0238]).

Allowable Subject Matter
10.	Claim 10 is objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

Conclusion
11.	The prior art made of record and not relied upon is considered pertinent to applicant’s disclosure.
	Henry et al (US 20170102921) relates to APPARATUS EMPLOYING USER-SPECIFIED BINARY POINT FIXED POINT ARITHMETIC, specifically by advantageously, embodiments are described herein in which the ALUs are integer units, but the activation function units include fixed-point arithmetic hardware assist, or acceleration. This enables the ALU portions to be smaller and faster, which facilitates having more ALUs within a given space on the die. This implies more neurons per die space, which is particularly advantageous in a neural network unit.
12.	Any inquiry concerning this communication or earlier communications from the examiner should be directed to ANGELICA RUIZ whose telephone number is (571)270-3158. The examiner can normally be reached M-F 10:00 am to 6:00 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Pierre M Vital can be reached on (571) 272-4215. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/ANGELICA RUIZ/Primary Examiner, Art Unit 2162                                                                                                                                                                                                        July 30, 2022