DETAILED ACTION 
This non-final office action is responsive to the application filed 22 March 2018.
Claims 1-20 are presently pending.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 04 February 2020 is in compliance with the provisions of 37 CFR 1.97, except where lined through.  Accordingly, the information disclosure statement is being considered by the examiner, except for the lined through references.
The lined through reference fails to comply with 37 CFR 1.98(a)(3)(i) because it does not include a concise explanation of the relevance, as it is presently understood by the individual designated in 37 CFR 1.56(c) most knowledgeable about the content of the information, of each reference listed that is not in the English language.  It has been placed in the application file, but the information referred to therein has not been considered.

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the 
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claim(s) 1, 3-5, 8-11, 13-15, and 18-20 is/are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Falcon et al. (US 2016/0026912) (“Falcon”).
Regarding claim 1, Falcon teaches a micro-processor circuit, adapted to perform a neural network operation (Falcon, P[0083], “FIG. 10 illustrates a more detailed embodiment for implementing an example neural network, in accordance with embodiments of the present disclosure. In one embodiment, example CNN [convolutional neural network] 900 using a weight-shifting mechanism for CNNs may be implemented using a processing device 1000.” Falcon, P[0128], “a processing system may include any system that has a processor, such as, for example, a digital signal processor (DSP), a microcontroller, an application specific integrated circuit (ASIC), or a microprocessor.”), and comprising: 
a parameter generation module, receiving in parallel a plurality of input parameters and a plurality of weight parameters of the neural network operation (Falcon, PP[0085, 0088] and FIG. 11, “Execution cluster 1114 [a parameter generation module] may include a number of calculation circuits 1118, distribution logics 1116, 1122, and delay elements 1120. Distribution logic 1116 may include multiplexers to transmit                         
                            
                                
                                    x
                                
                                
                                    i
                                
                            
                        
                     [a plurality of input parameters] to inputs of different calculation circuits 1118. Besides input signal                         
                            
                                
                                    x
                                
                                
                                    i
                                
                            
                        
                    , distribution logic 1116 may also assign weight coefficients                         
                            
                                
                                    w
                                
                                
                                    i
                                
                            
                        
                    , 1, …, N [a plurality of weight parameters] to different calculation circuits. … each of calculation circuits 1118 may accept sixteen input values in parallel to achieve modular and efficient computation.” Because FIG. 11 shows calculation circuits accepting inputs and weights together, it follows that weights may also be accepted in parallel.), and 
generating in parallel a plurality of sub-output parameters according to the input parameters and the weight parameters (Falcon, PP[0115-0126], “FIG. 14 is a flowchart of an example embodiment of a method 1400 for weight-shifting, in accordance with embodiments of the present disclosure. Method 1400 may illustrate operations performed by, for example, CNN 900, processing device 1000, or calculation circuit 1200. … At 1440, the scaled weights may be used to determine suitable calculations, such as convolution or dot-product, on the input [generating a plurality of sub-output parameters according to the input parameters and the weight parameters]. The previous results may also be used, if available. … Furthermore, method 1400 may be performed fully or in part in parallel with each other.”); 
a compute module, coupled to the parameter generation module, receiving in parallel the sub-output parameters, and summing the sub-output parameters to generate a summed parameter (Falcon, PP[0089, 0099], “FIG. 12 illustrates an example embodiment of a calculation circuit 1200 that may be used to implement fully or in part calculation circuit 1118. … FIG. 13A is a more detailed illustration of MAC unit 1210. Given N input values from input latches 1302, which in turn may come from input data 1202 and weights 1204, elements of input data 1202 and weights 1204 are multiplied pair-wise at 1304 and then added together in accumulators 1306.” Falcon, P[0081], “the multiplication and sum operations may be implemented in parallel on multi-core CPU or GPU,”); and a
 truncation logic, coupled to the compute module, receiving the summed parameter, and performing a truncation operation based on the summed parameter to generate an output parameter of the neural network operation (Falcon, PP[0124, 0126], “At 1460, in another embodiment the results may be truncated. For example, the upper integer bits and lower fractional bits may be truncated according to an expected output format. At 1465, the result may be output as the determined calculated value associated with the layer. … Furthermore, method 1400 may be performed fully or in part in parallel with each other.”).

Regarding claim 3, Falcon teaches the micro-processor circuit as claimed in claim 1, wherein the parameter generation module encodes the weight parameters according to a value range of the weight parameters to generate a plurality of encoded weight parameters (Falcon, P[0094], “In one embodiment, for a given layer, the maximum and minimum values of weights 1204 may be determined. In another embodiment and based on such a determination, weights 1204 may be scaled up to meet a defined range. For example, if weights 1204 are given as positive and negative fractions less than one, then weights 1204 may be scaled up to the range (-1, 1).”), 
wherein the parameter generation module generates the sub-output parameters according to the input parameters and the encoded weight parameters (Falcon, FIG. 12 and PP[0093, 0097], “Weights 1204 may be stored in memory or storage of the processor until they are needed for use by calculation circuit 1200. Input data 1202 may be read from various input layers of for example, images. … Furthermore, after weights are scaled up for use in weights 1204, weight values may be truncated in order to preserve a desired lower precision. For example, if calculation circuit 1200 is to use weights with eight bits of precision, the bottom sixteen bits may be truncated from weights before they are provided as weights 1204. Calculation circuit 1200 may utilize these, for example, eight-bit weight values to perform dot-product, convolution, or other calculations for CNN.”).

Regarding claim 4, Falcon teaches the micro-processor circuit as claimed in claim 1, wherein if a value range of the weight parameters comprises two value types, the parameter generation module adopts a first encoding method to encode the weight parameters (Falcon, P[0094], “In one embodiment, for a given layer, the maximum and minimum values [example of two value types] of weights 1204 may be determined. In another embodiment and based on such a determination, weights 1204 may be scaled up to meet a defined range. For example, if weights 1204 are given as positive and negative [example of two value types] fractions less than one, then weights 1204 may be scaled up to the range (-1, 1) [disclosing a first encoding method to encode the weight parameters].”).

Regarding claim 5, Falcon teaches the micro-processor circuit as claimed in claim 4, wherein the parameter generation module takes an original code or a complement of one of the sub-input parameters as one of the sub-output parameters according to the encoded weight parameters generated according to the first encoding method (Falcon, (P[0099-0100], “If input data 1202 and weights 1204 are each eight-bits wide (and in 1.7 format, wherein a bit is used to represent the sign [one of the sub-input parameters] and seven bits are used to represent a fractional part of a fixed-point number), then there may be sixteen pairs of inputs from input latches 1302. Returning to FIG. 12, in one embodiment, MAC [multiply-and-accumulate] unit 1210 may output the results of convolution and dot product operations to latches 1212, 1214 [the output is according to the encoded weight parameters generated according to the first encoding method]. The output form may include a bit for the sign [an original code or a complement of one of the sub-input parameters as one of the sub-output parameters], two bits for the integer, and fourteen bits for the fractional part.”).

Regarding claim 8, Falcon teaches the micro-processor circuit as claimed in claim 1, wherein the compute module comprises a plurality of adder layers (Falcon, FIG. 12 and P[0089], “Calculation circuit 1200 may include [a plurality of adder layers], for example, a multiply-and-accumulate (MAC) unit 1210, a signal extension unit 1216, a 4:2 carry-save adder (CSA) 1218, a 24-bit wide adder 1220, and an activation function 1234.”), and 
each of the adder layers comprises a plurality of adders (Falcon, FIGS. 13A and 13B show that each of adder layers MAC and 24-bit wide adder comprises a plurality of adders.), and 
the adders are used for executing in parallel a plurality of adding operations (Falcon, P[0081], “the multiplication and sum operations may be implemented in parallel on multi-core CPU or GPU.”).

Regarding claim 9, Falcon teaches the micro-processor circuit as claimed in claim 1, wherein a bit width of the output parameter generated through the truncation operation is equal to a bit width of each of the input parameters (Falcon, FIG. 12 shows that the bit width of output data, or a bit width of the output parameter, in 1.7 format generated through truncate logic, or the truncation operation, is equal to the bit width of input data in 1.7 format, or a bit width of each of the input parameters.).

Regarding claim 10, Falcon teaches the micro-processor circuit as claimed in claim 1, wherein the micro-processor circuit executes a micro-instruction to complete the neural network operation (Falcon, P[0023], “The following description describes weight-shifting mechanism for reconfigurable processing units within or in association with a processor, virtual processor, package, computer system, or other processing apparatus. In one embodiment, such a weight-shifting mechanism may be used in convolution neural networks (CNN).” Falcon, P[0053], “in one embodiment, the decoder decodes a received instruction into one or more operations called ‘micro-instructions’ or ‘micro-operations’ (also called micro op or uops) that the machine may execute. In other embodiments, the decoder parses the instruction into an opcode and corresponding data and control fields that may be used by the micro-architecture to perform operations in accordance with one embodiment.”), 
a source operand of the micro-instruction comprises the input parameters and the weight parameters (Falcon, P[0032], “For example, in one embodiment, the bits in a 64-bit register may be organized as a source operand containing four separate 16-bit data elements, each of which represents a separate 16-bit value.” Falcon, P[0107], “For example, processing device 1000 may include registers for storing weights or input values as well as multiplexers to route values to appropriate multiplication circuits.”), and 
a destination operand of the micro-instruction comprises the output parameter of the neural network operation (Falcon, P[0032], “In one embodiment, a SIMD instruction specifies a single vector operation to be performed on two source vector operands to generate a destination vector operand (also referred to as a result vector operand) of the same or different size, with the same or different number of data elements, and in the same or different data element order.” Falcon, P[0123], “The truncated and scaled results may be stored in memory, a register, or otherwise sent to another calculation circuit.”).  

Regarding claims 11, 13-15, and 18-20; claims 11, 13-15, and 18-20 are directed to a method of performing a neural network operation, adapted to a micro-processor circuit comprising a parameter generation module, a compute module and a truncation logic, the method performing steps recited in claims 1, 3-5, and 8-10, respectively. Therefore, the rejections made to claims 1, 3-5, and 8-10 are applied to claims 11, 13-15, and 18-20.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claims 2, 6-7, 12, and 16-17 is/are rejected under 35 U.S.C. 103 as being unpatentable over Falcon in view of Zhu et al. (“Trained Ternary Quantization,” 23 February 2017, arXiv:1612.01064v3 [cs.LG], pp. 1-10) (“Zhu”).
Regarding claim 2, Falcon teaches the micro-processor circuit as claimed in claim 1, wherein … a bit width of the micro-processor circuit is greater than a sum of the bit widths of all the input parameters and the weight parameters (Falcon, FIG. 12 shows that a bit width of the micro-processor circuit, e.g., MAC Unit 1210 outputs a bit width of 19 bits, is greater than a sum of the bit widths of all the input parameters and the weight parameters, i.e. 8+8=16 bit).  
Falcon does not each the circuit, wherein a bit width of each of the input parameters is greater than a bit width of each of the weight parameters ….
However, Zhu teaches the circuit, wherein a bit width of each of the input parameters is greater than a bit width of each of the weight parameters (Zhu, p. 8, Section 6.1.2, “by substituting 32-bit weights with 2-bit ternary weights [a bit width of each of the weight parameters], our model is approximately 16x smaller than original 32-bit AlexNet.” Zhu, p. 5, Section 5, in one experiment, “CIFAR-10 is an image classification benchmark containing images of size 32×32RGB pixels [a bit width of each of the input parameters] in a training set of 50000 and a test set of 10000.”).
Both Falcon and Zhu are directed to configuring weights for use in neural networks. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the bit widths in Falcon such that a bit width of each of the input parameters is greater than a bit width of each of the weight parameters, as disclosed in Zhu. Doing so “can reduce the precision of weights in neural networks to ternary values” to solve the problem of deploying large neural networks on “mobile devices with limited power budgets” (Zhu, p. 1, Abstract).

Regarding claim 6, Falcon teaches the micro-processor circuit as claimed in claim 1.
Falcon does not teach the circuit, wherein if a value range of the weight parameters comprises three value types, the parameter generation module adopts a second encoding method to encode the weight parameters.
However, Zhu teaches the circuit, wherein if a value range of the weight parameters comprises three value types, the parameter generation module adopts a second encoding method to encode the weight parameters (Zhu, pp. 3-4, Sections 4 and 4.1, Equation (6) shows that full-resolution weights that fall into one of three categories, where a value range of the weight parameters comprises three value types, are translated into quantized ternary weights by adopting a second encoding method to encode the weight parameters.).  
Both Falcon and Zhu are directed to configuring weights for use in neural networks. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the weight encoding scheme in Falcon to include a second encoding method if a value range of the weight parameters comprises three value types, as disclosed in Zhu. Doing so solves the problem of deploying large neural networks on mobile devices with limited power budgets (Zhu, p. 1, Abstract).

Regarding claim 7, Falcon in view of Zhu teaches the micro-processor circuit as claimed in claim 6.
Falcon further teaches the circuit, wherein the parameter generation module takes a zero code or an original code or a complement of one of the sub-input parameters as one of the sub-output parameters according to the encoded weight parameters generated according to the second encoding method (Falcon, (P[0099-0100], “If input data 1202 and weights 1204 are each eight-bits wide (and in 1.7 format, wherein a bit is used to represent the sign [one of the sub-input parameters] and seven bits are used to represent a fractional part of a fixed-point number), then there may be sixteen pairs of inputs from input latches 1302. Returning to FIG. 12, in one embodiment, MAC [multiply-and-accumulate] unit 1210 may output the results of convolution and dot product operations to latches 1212, 1214 [the output is according to the encoded weight parameters generated according to the given encoding method]. The output form may include a bit for the sign [a zero code or an original code or a complement of one of the sub-input parameters as one of the sub-output parameters], two bits for the integer, and fourteen bits for the fractional part.”).

Regarding claims 12 and 16-17, claims 12 and 16-17 are directed to a method of performing a neural network operation, adapted to a micro-processor circuit comprising a parameter generation module, a compute module and a truncation logic, the method performing steps recited in claims 2 and 6-7, respectively. Therefore, the rejections made to claims 2 and 6-7 are applied to claims 12 and 16-17.



















Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Diril et al. (US 2019/0205094) (“Diril”) teaches identifying multiplier groups for input-weight pairs based on precision levels (e.g., a number of significant bits, or a bit width) for weights (Diril, P[0037]).
Zhou et al. (“DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients,” 17 July 2016, arXiv:1606.06160v2 [cs.NE], pp. 1-14) (“Zhou”) teaches “a method to train convolutional neural networks that have low bitwidth weights” (Zhou, p. 1, Abstract).

Any inquiry concerning this communication or earlier communications from the examiner should be directed to CATHERINE F LEE whose telephone number is (571)270-7487.  The examiner can normally be reached on Monday thru Friday, 10:00AM-6:00PM EDT.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Miranda Huang can be reached on (571)270-7092.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/C.F.L./Examiner, Art Unit 2124                                                                                                                                                                                                        
/MIRANDA M HUANG/Supervisory Patent Examiner, Art Unit 2124