Detailed Action
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Specification
The specification is objected to as failing to provide proper antecedent basis for the claimed subject matter.  See 37 CFR 1.75(d)(1) and MPEP § 608.01(o).  Correction of the following is required: 

In claim 13, the terms, “the first circuit” and “the second circuit” being found in line 8, and earlier in the claim language, it says “a first multiply circuit” in line 2 and “a second multiply circuit” in line 5. Accordingly, the terms, “the first circuit” and “second circuit” is objected to as failing to provide proper antecedent basis for the claimed subject matter.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1 - 5, 8, 10 – 20 are rejected under 35 U.S.C. 103 as being unpatentable over 
Singh US PGPub: US 2019/0266485 A1 Aug. 29, 2019 and in view of
Li US PGPub: US 2018/0046905 A1 Feb. 15, 2018.

Regarding claims 1, 17, 18, Singh discloses,

a neural processor and an electronic device (deep convolutional neural network DCNN processor – Figs. 3, 4, 7A, 7B), comprising: 

a buffer circuit, a memory storing a first set of instructions associated with a machine learning model and a buffer circuit configured to store input data associated with the machine learning model (the convolutional accelerator CA includes several internal memory buffers. The internal memory buffers may be formed as registers, flip flops, static or dynamic random access memory SRAM or DRAM, or in some other structural configuration – Figs. 3, 6, paragraph 0165) configured to store input data (the internal memory buffers may be formed using a multiport architecture that lets, for example, one device perform data “store” operations in the memory while another device performs data “read” operations in the memory – Figs. 3, 6, paragraph 0165); and

a neural engine circuit (an applications processor, a digital signal processor DSP cluster – Fig. 3/138, paragraphs 0077, 0110) coupled to the buffer circuit (at least one communication bus architecture communicatively coupling the applications processor, the DSP cluster, and the CAF to the on-board memory – paragraph 0077), 
the neural engine circuit comprising a first multiply circuit  and a neural processor (the arithmetic unit may include a first multiplexor circuit – paragraph 0071) and a second multiply circuit (the arithmetic unit may include a second multiplexor circuit – paragraph 0071), and the neural engine circuit configured to: 

receive, in a first mode (the arithmetic unit for deep learning acceleration 700 includes dedicated circuits to retrieve data, accept data, route data, multiply operands to produce products, add values to produce sums, shift values right or left by any number of places, combine data, serialize data, interleave data, and perform other like operations – Fig. 7A, paragraph 0177. Several modes of operation are supported by the arithmetic units for deep learning acceleration 700 – paragraph 0195), first input data (Fig. 7A/706A, paragraphs 0177, 0180) and second input data (Fig. 7A/706B, paragraphs 0177, 0180) from the buffer circuit (the arithmetic unit for deep learning acceleration 700 is arranged to receive constant vector data from a vector constant memory 706 – paragraph 0179), 
generate, in the first mode using the first multiply circuit, first output data of a first bit width by multiplying the first input data to a first kernel coefficient (a first constant input A 706A passes the constant vector data that is processed as the “A” operand in Equation 1 – Fig. 7A, paragraph 0180. The CA buffers 612, 614 may be 64 bytes, 128 bytes, 256 bytes or some other size – paragraph 0167. Scalar constants A, B, C, 710C may include bits, bytes, nibbles, words, or differently formed scalar values for application within the arithmetic unit for deep learning acceleration 700.  The scalar constants 710C may be provided into the deep learning accelerator 700 in any desirable relationship with the streaming data X, Y, provided at first and second stream inputs 702, 702, respectively, and with vector constant data provided at vector constant inputs A, B, C, 706A, 706B, 706C – paragraph 0189), 

generate, in the first mode using the second multiply circuit, second output data of the first bit width by multiplying the second input data to a second kernel coefficient (a second constant input B 706B passes the constant vector data that is processed as the “B” operand in Equation 1 – Fig. 7A, paragraph 0180. The scalar constants 710C may be provided into the deep learning accelerator 700 in any desirable relationship with the streaming data X, Y, provided at first and second stream inputs 702, 702, respectively, and with vector constant data provided at vector constant inputs A, B, C, 706A, 706B, 706C – paragraph 0189), 

receive, in the second mode, third input data from the buffer circuit (Fig. 7A/706C, paragraphs 0177 0180), 

operate, in the second mode, the first multiply circuit with at least the second multiply circuit as a combined computation circuit to generate third output data (the CA adder tree 622 mathematically combines – e.g., sums, the incoming MAC unit data and batch data passed through the first CA input data port – paragraph 0169. Affine transformations may be applied to support deep learning sub-processes such as biasing, batch normalization, scaling, mean subtraction, element-wise addition, and other linear combinations of vector type operations such as max-average pooling, and the like – paragraph 0174) of the second bit width by multiplying the third input data to a third kernel coefficient (a third constant input C 706C passes the constant vector data that is processed as the “C” operand in Equation 1 – paragraph 0180. An output 712 of the arithmetic unit for deep learning acceleration 700 is arranged to pass sums, products, or other data generated in the arithmetic unit for deep learning acceleration 700. The output 712 may pass discrete values. Alternatively in these or other embodiments, the output 712 may pass a stream of values – Fig. 7A/712, paragraph 0190),

but, does not disclose, “bit width multiplication”.

Li teaches, an artificial neural network that implement efficient data access control in the neural network hardware acceleration system. Specifically, it proposes an overall design of a device that can process data receiving, bit-width transformation and data storing. By employing the technical disclosure, neural network hardware acceleration system can avoid the data access process becomes the bottleneck in neural network computation (ABSTRACT, Figs, 1, 3, 5, 7, 8).

Perform bit-width transformation operation on said parameters and output the transformed parameters to said processing elements Pes (paragraph 0062).

The arithmetic unit receives a (v, x) entry from the sparse matrix read unit and performs the multiply accumulate operation b.sub.x=b.sub.x+v×a.sub.j. Index x is used to index an accumulator array (the destination activation registers) while v is multiplied by the activation value at the head of the activation queue (paragraph 0049).

There are multiple bit-width converters for parameters and vectors. Such design can fully exploit the usage of memory bandwidth and PEs' parallel computation capability (paragraph 0093).

It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the deep convolutional neural network DCNN processor of Singh (Singh, Figs. 3, 4, 7A, 7B) wherein the deep convolutional neural network DCNN processor of Singh, would have incorporated, an effective data access control in the neural network hardware acceleration system having, multiple bit-width converters for parameters and vectors of Li (Li, ABSTRACT, Figs, 1, 3, 5, 7, 8, paragraphs 0049, 0062, 0093) for an efficient data access control device for neural network hardware acceleration system, which provides parameters and input matrices for PEs and stores the results of computation in a more efficiently way (Li, paragraphs 0061, 0080).

Regarding claims 2, 14, Singh discloses,

the neural processor of claim 1, wherein the first and second input data correspond to data of a first layer of a convolutional neural network, and the third input data correspond to data of a second layer of the neural network different from the first layer (several modes of operation are supported by the arithmetic units for deep learning acceleration 700 – paragraph 0195. The DCNN is arranged in a plurality of “layers,” and different types of predictions are made at each layer – paragraph 0004. This second arithmetic unit depicted in FIG. 7F is configured differently from the first arithmetic unit depicted in FIG. 7E – paragraph 0229).

Regarding claims 3, 15, Singh discloses,

the neural processor of claim 2, wherein the second layer precedes the first layer in the convolutional neural network (deep convolutional neural network DCNN processor – Figs. 3, 4, 7A, 7B. The arithmetic unit solely dedicated to performance of a plurality of parallel operations, wherein each one of the plurality of parallel operations carries out a portion of a formula, the formula being: output=AX+BY+C – paragraphs 0069, 0077. DSPs 138 can operate concurrently – e.g., in parallel – paragraph 0115. Aspects of the embodiments can be modified, if necessary to employ concepts of the various patents, application and publications to provide yet further embodiments – paragraph 0262).

Regarding claim 4, Singh discloses,

the neural processor of claim 1, wherein the first multiply circuit and the second multiply circuit are configured to generate the first output and the second output in parallel (the arithmetic unit solely dedicated to performance of a plurality of parallel operations, wherein each one of the plurality of parallel operations carries out a portion of a formula, the formula being: output=AX+BY+C – paragraphs 0069, 0077. DSPs 138 can operate concurrently – e.g., in parallel – paragraph 0115).

Regarding claims 5, 19, 20, Singh discloses,

the neural processor of claim 1, wherein the first multiply circuit comprises a plurality of adders (adder circuitry, the adder circuitry arranged as at least one adder circuit to perform at least some summation operations of the formula – paragraph 0070. Adder circuits – paragraph 0140) and a plurality of demultiplexers (the data switch 506 may include multiplexor logic, demultiplexor logic, or some other form of switching logic – paragraph 0150), at least one of the demultiplexers configured to switch to a first output line in the first mode and switch to a second output line in the second mode (several modes of operation are supported by the arithmetic units for deep learning acceleration 700 – paragraph 0195).

Regarding claim 8, Singh discloses,

the neural processor of claim 1, wherein the neural engine circuit is further configured to operate at a third bit width in a third mode (several modes of operation are supported by the arithmetic units for deep learning acceleration 700 – paragraph 0195).

Regarding claim 10, Singh discloses,

the neural processor of claim 1, wherein the first kernel coefficient and the second kernel coefficient have the same value (the arithmetic unit for deep learning acceleration 700 includes dedicated circuits to retrieve data, accept data, route data, multiply operands to produce products, add values to produce sums, shift values right or left by any number of places, combine data, serialize data, interleave data, and perform other like operations – Fig. 7A, paragraph 0177).

Regarding claim 11, Singh discloses,

the neural processor of claim 1, wherein the neural processor is configured to determine a mode of the neural engine circuit based on an instruction included in code associated with a convolutional neural network (several modes of operation are supported by the arithmetic units for deep learning acceleration 700 – paragraph 0195).

Regarding claim 12, Singh discloses all the claimed features,

but, does not disclose, the neural processor of claim 1, wherein the neural processor is configured to automatically determine a selection of bit width.

Singh briefly teaches, the goal of the neural network is to accurately predict whether or not a particular feature is included in an input data set, the CNN can be further directed to automatically adjust weighting values that are applied in a voting layer – paragraph 0054. Buffer a selected amount – e. g., one or more bits, of data passed from a data source (paragraph 0157).

At the same time, Li teaches, an artificial neural network that implement efficient data access control in the neural network hardware acceleration system. Specifically, it proposes an overall design of a device that can process data receiving, bit-width transformation and data storing. By employing the technical disclosure, neural network hardware acceleration system can avoid the data access process becomes the bottleneck in neural network computation (ABSTRACT, Figs, 1, 3, 5, 7, 8).

Perform bit-width transformation operation on said parameters and output the transformed parameters to said processing elements Pes (paragraph 0062).

The arithmetic unit receives a (v, x) entry from the sparse matrix read unit and performs the multiply accumulate operation b.sub.x=b.sub.x+v×a.sub.j. Index x is used to index an accumulator array (the destination activation registers) while v is multiplied by the activation value at the head of the activation queue (paragraph 0049).

There are multiple bit-width converters for parameters and vectors. Such design can fully exploit the usage of memory bandwidth and PEs' parallel computation capability (paragraph 0093).

It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the deep convolutional neural network DCNN processor of Singh (Singh, Figs. 3, 4, 7A, 7B) wherein the deep convolutional neural network DCNN processor of Singh, would have incorporated, an effective data access control in the neural network hardware acceleration system having, multiple bit-width converters for parameters and vectors of Li (Li, ABSTRACT, Figs, 1, 3, 5, 7, 8, paragraphs 0049, 0062, 0093) for an efficient data access control device for neural network hardware acceleration system, which provides parameters and input matrices for PEs and stores the results of computation in a more efficiently way (Li, paragraphs 0061, 0080).

Regarding claim 13, Singh discloses,

a method (the convolutional accelerator CA includes several internal memory buffers. The internal memory buffers may be formed as registers, flip flops, static or dynamic random access memory SRAM or DRAM, or in some other structural configuration – Figs. 3, 6, paragraph 0165. The internal memory buffers may be formed using a multiport architecture that lets, for example, one device perform data “store” operations in the memory while another device performs data “read” operations in the memory – Figs. 3, 6, paragraph 0165) for operating a neural processor (deep convolutional neural network DCNN processor – Figs. 3, 4, 7A, 7B), the method comprising: 

generating, by a first multiply circuit (the arithmetic unit may include a first multiplexor circuit – paragraph 0071) of a neural engine circuit operating a first mode (the arithmetic unit for deep learning acceleration 700 includes dedicated circuits to retrieve data, accept data, route data, multiply operands to produce products, add values to produce sums, shift values right or left by any number of places, combine data, serialize data, interleave data, and perform other like operations – Fig. 7A, paragraph 0177. Several modes of operation are supported by the arithmetic units for deep learning acceleration 700 – paragraph 0195), first output data of a first bit width by multiplying the first input data to a first kernel coefficient (a first constant input A 706A passes the constant vector data that is processed as the “A” operand in Equation 1 – Fig. 7A, paragraph 0180. The CA buffers 612, 614 may be 64 bytes, 128 bytes, 256 bytes or some other size – paragraph 0167. Scalar constants A, B, C, 710C may include bits, bytes, nibbles, words, or differently formed scalar values for application within the arithmetic unit for deep learning acceleration 700.  The scalar constants 710C may be provided into the deep learning accelerator 700 in any desirable relationship with the streaming data X, Y, provided at first and second stream inputs 702, 702, respectively, and with vector constant data provided at vector constant inputs A, B, C, 706A, 706B, 706C – paragraph 0189); 

generating, by a second multiply circuit (the arithmetic unit may include a second multiplexor circuit – paragraph 0071) of the neural engine circuit operating in the first mode, second output data of the first bit width by multiplying the second input data to a second kernel coefficient (a second constant input B 706B passes the constant vector data that is processed as the “B” operand in Equation 1 – Fig. 7A, paragraph 0180. The scalar constants 710C may be provided into the deep learning accelerator 700 in any desirable relationship with the streaming data X, Y, provided at first and second stream inputs 702, 702, respectively, and with vector constant data provided at vector constant inputs A, B, C, 706A, 706B, 706C – paragraph 0189); and 

by the first circuit and the second circuit operating as a part of a combined computation circuit, third output data of a second bit width by multiplying the third input data with a third kernel coefficient (the CA adder tree 622 mathematically combines – e.g., sums, the incoming MAC unit data and batch data passed through the first CA input data port – paragraph 0169. Affine transformations may be applied to support deep learning sub-processes such as biasing, batch normalization, scaling, mean subtraction, element-wise addition, and other linear combinations of vector type operations such as max-average pooling, and the like – paragraph 0174) of the second bit width by multiplying the third input data to a third kernel coefficient (a third constant input C 706C passes the constant vector data that is processed as the “C” operand in Equation 1 – paragraph 0180. An output 712 of the arithmetic unit for deep learning acceleration 700 is arranged to pass sums, products, or other data generated in the arithmetic unit for deep learning acceleration 700. The output 712 may pass discrete values. Alternatively in these or other embodiments, the output 712 may pass a stream of values – Fig. 7A/712, paragraph 0190),

but, does not disclose, “bit width multiplication”.

Li teaches, an artificial neural network that implement efficient data access control in the neural network hardware acceleration system. Specifically, it proposes an overall design of a device that can process data receiving, bit-width transformation and data storing. By employing the technical disclosure, neural network hardware acceleration system can avoid the data access process becomes the bottleneck in neural network computation (ABSTRACT, Figs, 1, 3, 5, 7, 8).

Perform bit-width transformation operation on said parameters and output the transformed parameters to said processing elements Pes (paragraph 0062).

The arithmetic unit receives a (v, x) entry from the sparse matrix read unit and performs the multiply accumulate operation b.sub.x=b.sub.x+v×a.sub.j. Index x is used to index an accumulator array (the destination activation registers) while v is multiplied by the activation value at the head of the activation queue (paragraph 0049).

There are multiple bit-width converters for parameters and vectors. Such design can fully exploit the usage of memory bandwidth and PEs' parallel computation capability (paragraph 0093).

It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the deep convolutional neural network DCNN processor of Singh (Singh, Figs. 3, 4, 7A, 7B) wherein the deep convolutional neural network DCNN processor of Singh, would have incorporated, an effective data access control in the neural network hardware acceleration system having, multiple bit-width converters for parameters and vectors of Li (Li, ABSTRACT, Figs, 1, 3, 5, 7, 8, paragraphs 0049, 0062, 0093) for an efficient data access control device for neural network hardware acceleration system, which provides parameters and input matrices for PEs and stores the results of computation in a more efficiently way (Li, paragraphs 0061, 0080).

Regarding claim 16, Singh discloses,

the method of claim 14, further comprising: 

receiving, from code associated with a convolutional neural network (the arithmetic unit for deep learning acceleration 700 includes dedicated circuits to retrieve data, accept data, route data, multiply operands to produce products, add values to produce sums, shift values right or left by any number of places, combine data, serialize data, interleave data, and perform other like operations – Fig. 7A, paragraph 0177. Several modes of operation are supported by the arithmetic units for deep learning acceleration 700 – paragraph 0195), a first instruction to perform calculation of the first layer in the convolutional neural network using the first bit width (a first constant input A 706A passes the constant vector data that is processed as the “A” operand in Equation 1 – Fig. 7A, paragraph 0180. The CA buffers 612, 614 may be 64 bytes, 128 bytes, 256 bytes or some other size – paragraph 0167. Scalar constants A, B, C, 710C may include bits, bytes, nibbles, words, or differently formed scalar values for application within the arithmetic unit for deep learning acceleration 700.  The scalar constants 710C may be provided into the deep learning accelerator 700 in any desirable relationship with the streaming data X, Y, provided at first and second stream inputs 702, 702, respectively, and with vector constant data provided at vector constant inputs A, B, C, 706A, 706B, 706C – paragraph 0189); 

causing, based on the first instruction, the neural engine circuit in the first mode (a first constant input A 706A passes the constant vector data that is processed as the “A” operand in Equation 1 – Fig. 7A, paragraph 0180. The CA buffers 612, 614 may be 64 bytes, 128 bytes, 256 bytes or some other size – paragraph 0167. Scalar constants A, B, C, 710C may include bits, bytes, nibbles, words, or differently formed scalar values for application within the arithmetic unit for deep learning acceleration 700.  The scalar constants 710C may be provided into the deep learning accelerator 700 in any desirable relationship with the streaming data X, Y, provided at first and second stream inputs 702, 702, respectively, and with vector constant data provided at vector constant inputs A, B, C, 706A, 706B, 706C – paragraph 0189); 

receiving, from the code associated with the convolutional neural network, a second instruction to perform calculation of the second layer in the convolutional neural network using the second bit width (a second constant input B 706B passes the constant vector data that is processed as the “B” operand in Equation 1 – Fig. 7A, paragraph 0180. The scalar constants 710C may be provided into the deep learning accelerator 700 in any desirable relationship with the streaming data X, Y, provided at first and second stream inputs 702, 702, respectively, and with vector constant data provided at vector constant inputs A, B, C, 706A, 706B, 706C – paragraph 0189); and 

causing, based on the second instruction, the neural engine circuit in the second mode (the CA adder tree 622 mathematically combines – e.g., sums, the incoming MAC unit data and batch data passed through the first CA input data port – paragraph 0169. Affine transformations may be applied to support deep learning sub-processes such as biasing, batch normalization, scaling, mean subtraction, element-wise addition, and other linear combinations of vector type operations such as max-average pooling, and the like – paragraph 0174), 

receive, in the second mode, third input data from the buffer circuit (Fig. 7A/706C, paragraphs 0177 0180),

but, does not disclose, “bit width calculation”.

Li teaches, an artificial neural network that implement efficient data access control in the neural network hardware acceleration system. Specifically, it proposes an overall design of a device that can process data receiving, bit-width transformation and data storing. By employing the technical disclosure, neural network hardware acceleration system can avoid the data access process becomes the bottleneck in neural network computation (ABSTRACT, Figs, 1, 3, 5, 7, 8).

Perform bit-width transformation operation on said parameters and output the transformed parameters to said processing elements Pes (paragraph 0062).

The arithmetic unit receives a (v, x) entry from the sparse matrix read unit and performs the multiply accumulate operation b.sub.x=b.sub.x+v×a.sub.j. Index x is used to index an accumulator array (the destination activation registers) while v is multiplied by the activation value at the head of the activation queue (paragraph 0049).

There are multiple bit-width converters for parameters and vectors. Such design can fully exploit the usage of memory bandwidth and PEs' parallel computation capability (paragraph 0093).

It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the deep convolutional neural network DCNN processor of Singh (Singh, Figs. 3, 4, 7A, 7B) wherein the deep convolutional neural network DCNN processor of Singh, would have incorporated, an effective data access control in the neural network hardware acceleration system having, multiple bit-width converters for parameters and vectors of Li (Li, ABSTRACT, Figs, 1, 3, 5, 7, 8, paragraphs 0049, 0062, 0093) for an efficient data access control device for neural network hardware acceleration system, which provides parameters and input matrices for PEs and stores the results of computation in a more efficiently way (Li, paragraphs 0061, 0080).
Allowable Subject Matter
Claim 6, 7 and 9 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

The prior arts made of record and not relied upon are considered pertinent to applicants disclosure. 

Ardywibowo US PGPub: US 2022/0101133 A1 Mar. 31, 2022.
Dynamic quantization for energy efficient deep network includes receiving, at a layer of the DNN during an inference stage, a layer input comprising content associated with a DNN input received at the DNN. The method also includes quantizing one or more parameters of a plurality of parameters associated with the layer based on the content of the layer input. The method further includes performing a task corresponding to the DNN input, the task performed with the one or more one quantized parameters (ABSTRACT).
a system-on-a-chip (SOC) 100, which may include a central processing unit (CPU) 102 or a multi-core CPU configured for dynamic bit-width quantization in accordance with certain aspects of the present disclosure. Variables (e.g., neural signals and synaptic weights), system parameters associated with a computational device (e.g., neural network with weights), delays, frequency bin information, and task information may be stored in a memory block associated with a neural processing unit (NPU) 108, in a memory block associated with a CPU 102, in a memory block associated with a graphics processing unit (GPU) 104, in a memory block associated with a digital signal processor (DSP) 106, in a memory block 118, or may be distributed across multiple block (Figs. 1, 4, 5, paragraph 0028).

Lee US PGPub: US 2020/0202199 A1 Jun. 25, 2020.
A neural network processing method and apparatus based on nested bit representation. The processing method includes obtaining first weights for a first layer of a source model of a first layer of a neural network, determining a bit-width for the first layer of the neural network, obtaining second weights for the first layer of the neural network by extracting at least one bit corresponding to the determined bit-width from each of the first weights for the first layer of a source model corresponding to the first layer of the neural network, and processing input data of the first layer of the neural network by executing the first layer of the neural network based on the obtained second weights (ABSTRACT).
The source model 110 may execute the neural network 120 with the first bit-width X1 corresponding to a low bit-width. In another example, in an example in which a high processing accuracy is beneficial or a high processing difficulty is beneficial, the source model 110 may execute the neural network 120 with the nth bit-width Xn corresponding to a high bit-width. A bit-precision corresponds to a bit-width, and thus a variable bit-width may indicate a variable bit-precision (Fig. 1, paragraphs 0074, 0075).

Henry US PGPub: US 2018/0165575 A1 Jun. 14, 2018.
In a neural network unit, each neural processing unit (NPU) of an array of N NPUs receives respective first and second upper and lower bytes of 2N bytes received from first and second RAMs. In a first mode, each NPU sign-extends the first upper byte to form a first 16-bit word and performs an arithmetic operation on the first 16-bit word and a second 16-bit word formed by the second upper and lower bytes. In a second mode, each NPU sign-extends the first lower byte to form a third 16-bit word and performs the arithmetic operation on the third 16-bit word and the second 16-bit word formed by the second upper and lower bytes. In a third mode, each NPU performs the arithmetic operation on a fourth 16-bit word formed by the first upper and lower bytes and the second 16-bit word formed by the second upper and lower bytes (ABSTRACT).

Lin US PGPub: US 2021/0097887 A1 Apr. 1, 2021.
Multimodel neural network, where an artificial neural network (ANN) with multiple layers between the input and output layers. The DNN finds the correct mathematical manipulation to turn the input into the output, whether it be a linear relationship or a non-linear relationship (Figs. 1, 3, paragraphs 0002, 0024, 0025).

Chen US PGPub: US 2020/0097792 A1 Mar. 26, 2020.
A processing apparatus and processing method to: a computational circuit configured to compute the data to be computed, which includes performing acceleration computations on the data to be computed by using an adder circuit and a multiplier circuit; and a control circuit configured to control the memory and the computational circuit, which includes performing acceleration computations according to the data to be computed (ABSTRACT, Figs. 1/Data width adjustment circuit, 14/s1403, 24/s2403, paragraph 0010).

Yang US PGPub: US 2021/0201122 A1 Jul. 1, 2022.
a data processing method, apparatus, device, a storage medium, and a computer program product. The method includes: obtaining to-be-processed data input to a first calculating unit in a plurality of calculating units, wherein the to-be-processed data includes data of a first bit width; obtaining a processing parameter of the first calculating unit, wherein the processing parameter includes a parameter of a second bit width; and obtaining an output result of the first calculating unit based on the to-be-processed data and the processing parameter, wherein a bit width of to-be-processed data input to a second calculating unit in the plurality of calculating units is different from a bit width of the to-be-processed data input to the first calculating unit, and/or a bit width of a processing parameter input to the second calculating unit is different from a bit width of the processing parameter input to the first calculating unit (ABSTRACT, Figs. 1/11 – 14, 2/s202, 3/s304, 6, paragraphs 0007, 0095).

Contact Information
Any inquiry concerning this communication or earlier communications from the examiner should be directed to NIMESH PATEL whose telephone number is (571)270-1228. The examiner can normally be reached Monday thru Friday: 6:30 AM - 3:30 PM EST.

Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.

If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Rafael Perez-Gutierrez can be reached on 571-272-7915. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.

Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/NIMESH PATEL/Primary Examiner, Art Unit 2642