A Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
The claims 8-14 are presented for examination.
Information Disclosure Statement
The information disclosure statements (IDS) filed 08/20/2019; 09/10/2020 are in compliance with the provisions of 37 CFR 1.97 and 1.98. Accordingly, the information disclosure statement is being considered by the examiner.
Election/Restrictions
Applicant's election with traverse of group II: claims 8-14 in the reply filed on 03/08/2022 is acknowledged.  The traversal is on the ground(s) that “Applicant traverses the Restriction Requirement for the following reasons. It is respectfully submitted that it should be no undue burden on the Examiner to consider all claims in the single application. Accordingly, the Restriction Requirement should be overcome and withdrawn.”.  This is not found persuasive because applicant traverses to withdrawn the restriction without providing any reason why that is no burden on Examiner to consider all claims in the signal application or a specific reason why the restriction should be withdrawn. As per MPEP 808.01:  applicant is required to specifically point out the reason(s) on which he or she bases his or her conclusion(s) that a requirement to restrict is in error. A mere broad allegation that the requirement is in error does not comply with the requirement of 37 CFR 1.111. Thus, the required provisional election (see MPEP § 818.01(b)) becomes an election without traverse if accompanied by an incomplete traversal of the requirement for restriction.
The requirement is still deemed proper and is therefore made FINAL.
Claims 1-7, 15-20 are withdrawn from further consideration pursuant to 37 CFR 1.142(b), as being drawn to a nonelected (Claims 1-7, 15-20), there being no allowable generic or linking claim. Applicant timely traversed the restriction (election) requirement in the reply filed on 03/08/2022.

Priority
The following claimed benefit is acknowledged: the instant application, filed 08/20/2019 claims priority from provisional application 62721003, filed 08/22/2018.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 14 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Step 1 analysis:
In the instant case , claim(s) 8-13 is/are directed to a computer readable storage medium, which do not fall within the four statutory categories (i.e., process, machine, manufacture, or composition of matter).
Specifically, claim 8-13 is/are toward "computer readable storage medium”. The broadest reasonable interpretation of “computer readable storage medium” covers transitory propagating signals. The disclosure in specification, Fig. 8, on page 17, “Referring to FIG. 8, an embodiment of a computerized neural network system 7 according to this disclosure is shown to include an M-bit neural network accelerator 71, and a storage module 70 (a computer readable storage medium, such as flip flops, DRAM, SRAM, nonvolatile  memory, a hard disk drive, a solid state drive, a cloud storage, etc.).” It is unclear whether other types of signal, e.g., signal in wired transmission, is excluded from a storage media.
Therefore, a broadest reasonable interpretation of claims 8-13 cover a
transitory signal. When the broadest reasonable interpretation of a claim covers a signal
per se, the claim must be rejected under 35 U.S.C. 101 as covering non-statutory
subject matter. See In re Nuijten, 500 F.3d 1346, 1356-57 (Fed. Cir. 2007) (transitory
embodiments are not directed to statutory subject matter); MPEP 9th Ed., § 2106.I. To
overcome this rejection, applicant should insert –- non-transitory — before “computer readable storage medium”. Such an amendment is not considered new matter. See the
“Subject Matter Eligibility of Computer Readable Media” memo dated January 26, 2010
(OG Cite: 1351 OG 212; OG Date: 23 Feb 2010).
  Claim 14 is directed to a system which falls into one of the four statutory categories (i.e., process, machine, manufacture, or composition of matter). 
Step 2A analysis:
Based on the claims being determined to be within of the four categories (Step 1), it must be determined if the claims are directed to a judicial exception (i.e., law of nature, natural phenomenon, and abstract idea), in this case the claims fall within the judicial exception of an abstract idea. Specifically, the abstract idea of “Mental Processes/Concepts performed in the human mind (including an observation, evaluation, judgment, opinion)” and mathematical concept. 
Regarding claim 14: 
Step 2A: Prong 1 analysis:
-“A computerized system comprising a plurality of multipliers, and a plurality of adders coupled to said multipliers, said multipliers and said adders to cooperatively perform computation, wherein, for some data pieces each including multiple bits that respectively correspond to multiple bit orders and each being used in the computation of some of the multipliers, one of the bits that corresponds to the bit order of i represents 2' in decimal when having a first bit value, and represents -2' in decimal when having a second bit value, where N is a number of bits of the data piece, i is an integer, and (N-1) >=i>=0” (mathematical concept).
Step 2A: Prong 2 Analysis:
 The claim 14 recites “A computerized system comprising a plurality of multipliers, and a plurality of adders coupled to said multipliers, said multipliers and said adders to cooperatively perform computation”. This limitation is recited at high level of generality and amounts to no more than mere instructions to apply the judicial exception using a generic computer component to perform a process as the claim is recited. (See MPEP 2106.05(f)).  
Step 2B Analysis:
The claim 14 recites “A computerized system comprising a plurality of multipliers, and a plurality of adders coupled to said multipliers, said multipliers and said adders to cooperatively perform computation”. As discussed above, the additional limitation is recited at high level of generality and amounts to no more than mere instructions to apply the judicial exception using a generic computer component to perform the process as the claim is recited. (See MPEP 2106.05(f)).

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. 
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 8, 9, 11, 12, 13 and 14 are rejected under 35 U.S.C. 103 as being unpatentable over EL-YANIV et al. (Patent No. US20170286830– hereinafter, EL-YANIV)  in view of Han et al. (DEEP COMPRESSION: COMPRESSING DEEP NEURAL NETWORKS WITH PRUNING, TRAINED QUANTIZATION AND HUFFMAN CODING- Stanford University, Stanford, CA 94305, USA -hereinafter, Han) and further in view of Lee et al. (UNPU: A 50.6TOPS/W Unified Deep Neural Network Accelerator with 1b-to-16b Fully-Variable Weight Bit-Precision- KAIST, Daejeon, Korea - hereinafter, Lee).
Regarding to claim 8, EL-YANIV teaches a computer program product comprising a neural network code that is stored on a computer readable storage medium, and that, when executed by a neural network accelerator, establishes a neural network having a plurality of sets of batch normalization parameters and a plurality of weights (EL-YANIV, [Par.0047], “[Par.0047], “Reference is now made to FIG. 1 which is a method for training a neural network having neurons with quantized activation functions for calculating quantized activation value connected by connections with quantized weight functions for calculating quantized weights, optionally binary, weights and referred to herein as a quantized neural network (QNN) for inference or otherwise analyzing new data” Examiner’s note,  the neural network with plurality neurons and connection weight values arranged in the layers, the batch normalization parameter representing the neuron in the neural network, therefore the neural network including a plurality batch normalization parameters and the plurality connection weight between the layers, [Par.0050-0051], “The neural network may be any DNN, including any feed-forward artificial neural network such as a convolutional neural network (CNN), fully connected neural network (FNN) and/or recurrent neural network (RNN). …During a training phase, as further indicated below, a floating point weight value and a quantized weight value are stored per connection and optionally, a batch normalization parameters are stored per neuron. Optionally, for an inference phase, the training phase outputs a neural network without floating point weight value per connection and only with a quantized weight value per connection. Optionally no a batch normalization value is stored per neuron in the outputted neural network.”),
[…]
wherein the sets of the batch normalization parameters respectively correspond to the different bitwidths (EL-YANIV, [Par.0084-0095], “

    PNG
    media_image1.png
    393
    494
    media_image1.png
    Greyscale
”),
[…]
However, EL-YANIV does not teach said neural network being switchable among a plurality of bitwidth modes that respectively correspond to different bitwidths, and wherein in each of the bitwidth modes, each of the weights has one of the bitwidths that corresponds to the bitwidth mode; wherein, when executed by the neural network accelerator, said neural network operates in one of the bitwidth modes that corresponds to a bitwidth of the neural network accelerator, and one of the sets of the batch normalization parameters that corresponds to the bitwidth of the neural network accelerator is used by the neural network accelerator.
On the other hand, Han teaches said neural network being switchable among a plurality of bitwidth modes that respectively correspond to different bitwidths (Han, [page 3, section 3, second paragraph],  “Weight sharing is illustrated in Figure 3. Suppose we have a layer that has 4 input neurons and 4 output neurons, the weight is a 4 X 4 matrix. On the top left is the 4 X 4 weight matrix, and on the bottom left is the 4 X 4 gradient matrix. The weights are quantized to 4 bins (denoted with 4 colors), all the weights in the same bin share the same value, thus for each weight, we then need to store only a small index into a table of shared weights. During update, all the gradients are grouped by the color and summed together, multiplied by the learning rate and subtracted from the shared centroids from last iteration. For pruned AlexNet, we are able to quantize to 8-bits (256 shared weights) for each CONV layers, and 5-bits (32 shared weights) for each FC layer without any loss of accuracy.” Examiner’s note, each layer having the different bit-widths corresponding to the bit width modes, therefore, the processing of the neural network from one layer to other layer corresponding to switchable among a plurality bitwidth modes with respective bit-width.),
	and wherein in each of the bitwidth modes, each of the weights has one of the bitwidths that corresponds to the bitwidth mode (Han, [page 3, section 3, second paragraph],  “Weight sharing is illustrated in Figure 3. Suppose we have a layer that has 4 input neurons and 4 output neurons, the weight is a 4 X 4 matrix. On the top left is the 4 X 4 weight matrix, and on the bottom left is the 4 X 4 gradient matrix. The weights are quantized to 4 bins (denoted with 4 colors), all the weights in the same bin share the same value, thus for each weight, we then need to store only a small index into a table of shared weights. During update, all the gradients are grouped by the color and summed together, multiplied by the learning rate and subtracted from the shared centroids from last iteration. For pruned AlexNet, we are able to quantize to 8-bits (256 shared weights) for each CONV layers, and 5-bits (32 shared weights) for each FC layer without any loss of accuracy.” Examiner’s note, a specific weight value is corresponding to the bit-width mode such as 8 bits and 5 bits modes.);
EL-YANIV and Han are analogous in arts because they have the same field of endeavor of quantization the weight value in neural network.
Accordingly, it would have been prima facie obvious to one of the ordinary skills in the art before the effective filing date of the claimed invention to have modified the neural network with plurality of weight and batch normalization parameters taught by  EL-YANIV’s method, and further in view of Han by having neural network being switchable among a plurality of bitwidth modes that respectively correspond to different bitwidths and wherein in each of the bitwidth modes, each of the weights has one of the bitwidths that corresponds to the bitwidth mode. The modification would have been obvious because one of the ordinary skills in art would be motivated to improve the performance of the training by reducing the storage requirement of the neural network without effecting their accuracy (Han, [Section 1, page 2, the fifth paragraph], “ Our goal is to reduce the storage and energy required to run inference on such large networks so they can be deployed on mobile devices. To achieve this goal, we present “deep compression”: a three-stage pipeline (Figure 1) to reduce the storage required by neural network in a manner that preserves the original accuracy. First, we prune the networking by removing the redundant connections, keeping only the most informative connections. Next, the weights are quantized so that multiple connections share the same weight, thus only the codebook (effective weights) and the indices need to be stored. Finally, we apply Huffman coding to take advantage of the biased distribution of effective weights.”).
However, El-Yaniv and Han do not teach wherein, when executed by the neural network accelerator, said neural network operates in one of the bitwidth modes that corresponds to a bitwidth of the neural network accelerator, and one of the sets of the batch normalization parameters that corresponds to the bitwidth of the neural network accelerator is used by the neural network accelerator.
On the other hand, Lee teaches wherein, when executed by the neural network accelerator, said neural network operates in one of the bitwidth modes that corresponds to a bitwidth of the neural network accelerator (Lee, second paragraph, first page, left column, Fig 13.3.1], “In this paper, we present a unified neural processing unit (UNPU) supporting CLs, RLs, and FCLs with fully-variable weight bit-precision from 1b to 16b. As shown in Fig. 13.3.1, the reuse of input features (IFs) is more efficient than the reuse of weights under low-weight bit-precision and the operations of CLs become identical to those of RLs and FCLs when the IFs of the CLs are vectorized into a 1-dimensional vector so that the hardware can be fully shared in the UNPU by IF reuse. Moreover, the lookup-table-based bit-serial PE (LBPE) is implemented for energy-optimal DNN operations with variable-weight bit-precisions from 1b to 16b through iterations of 1b weight operations. Furthermore, an aligned feature loader (AFL) minimizes the amount of off-chip memory accesses required to fetch IFs by exploiting the data locality among convolution operations.”),
and one of the sets of the batch normalization parameters that corresponds to the bitwidth of the neural network accelerator is used by the neural network accelerator (Lee, [first page, first paragraph, right column], “Figure 13.3.4 shows the architecture of the LBPE. The key idea of the LBPE is that partial-sums are repeatedly calculated during the weight bit-serial MAC operation. A LBPE consists of 4 PE clusters, adder trees to accumulate the results of each PE cluster, and shift-and-add logic for bit-serial multiplications. Each PE cluster contains 4 look-up-table (LUT) modules and a controller that determines whether the value from LUTs is added or subtracted. In the LUT module, a table with 8 entries is used, supporting 3-way MAC for multi-bit multiplication and 4-way MAC for 1b multiplication. The LUT is updated after IFs load into the AFLs, and IF values are reused for all output channels of the layer currently being processed. The 1b weight Psums are fetched from the LUT prepared in advance and accumulated for MAC operation. The LUT can fetch 12 Psums in parallel so that a total of 48×12 Psums (64×12 for 1b case) can be calculated simultaneously on a LBPE in 1 cycle. With the help of table-based operations, the LBPE improves energy efficiency more than conventional bit-serial PEs [4]. When IFs are reused 1024 times, the energy-consumption of LBPEs, including the LUT update, is reduced by 23.1%, 27.2%, 41.0%, and 53.6%, for the case of 16b, 8b, 4b, and 1b weight operations, respectively, compared with fixed-point MAC units under the same throughput conditions.” Examiner’s note, Fig. 13.3.4 shows each batch normalization parameter that is corresponding to the specific bit-width values (bit-width mode).).
EL-YANIV, Han and Lee are analogous in arts because they have the same field of endeavor of quantization the weight value in neural network.
Accordingly, it would have been prima facie obvious to one of the ordinary skills in the art before the effective filing date of the claimed invention to have modified the EL-YANIV and Han’s method, and further in view of Lee by executing by the neural network accelerator, said neural network operates in one of the bitwidth modes that corresponds to a bitwidth of the neural network accelerator, and one of the sets of the batch normalization parameters that corresponds to the bitwidth of the neural network accelerator is used by the neural network accelerator. The modification would have been obvious because one of the ordinary skills in art would be motivated to improve the performance of the training by having the lower weight bit precision to achieve the better result and energy efficiency, (Lee, [first page, first paragraph], “Although the PEs for RLs can be reconfigured into PEs for CLs or vice versa, only a partial reconfiguration was possible resulting in marginal performance improvement. Moreover, previous works [1-2] supported a limited set of weight bit precisions, such as either 4b or 8b or 16b. However, lower weight bit-precisions can achieve better throughput and higher energy efficiency, and the optimal bit-precision can be varied according to different accuracy/performance requirements. Therefore, a unified DNN accelerator with fully-variable weight bit-precision is required for the energy-optimal operation of DNNs within a mobile environment.”).
Regarding claim 9, El-Yaniv as modified in view of Han teaches the computer program product of Claim 8, wherein said neural network is an N-bit neural network, where N is a positive integer, and each of the weights of said neural network is composed of N bits; wherein, for each of the bit width modes, the corresponding one of the different bit widths is smaller than or equal to N (Han, [page 3, section 3, second paragraph],  “Weight sharing is illustrated in Figure 3. Suppose we have a layer that has 4 input neurons and 4 output neurons, the weight is a 4 X 4 matrix. On the top left is the 4 X 4 weight matrix, and on the bottom left is the 4 X 4 gradient matrix. The weights are quantized to 4 bins (denoted with 4 colors), all the weights in the same bin share the same value, thus for each weight, we then need to store only a small index into a table of shared weights. During update, all the gradients are grouped by the color and summed together, multiplied by the learning rate and subtracted from the shared centroids from last iteration. For pruned AlexNet, we are able to quantize to 8-bits (256 shared weights) for each CONV layers, and 5-bits (32 shared weights) for each FC layer without any loss of accuracy.”);
EL-YANIV and Han are analogous in arts because they have the same field of endeavor of quantization the weight value in neural network.
Accordingly, it would have been prima facie obvious to one of the ordinary skills in the art before the effective filing date of the claimed invention to have modified the neural network with plurality of weight and batch normalization parameters taught by  EL-YANIV’s method, and further in view of Han by having neural network is an N-bit neural network, where N is a positive integer, and each of the weights of said neural network is composed of N bits; wherein, for each of the bit width modes, the corresponding one of the different bit widths is smaller than or equal to N. The modification would have been obvious because one of the ordinary skills in art would be motivated to improve the performance of the training by reducing the storage requirement of the neural network without effecting their accuracy (Han, [Section 1, page 2, the fifth paragraph], “ Our goal is to reduce the storage and energy required to run inference on such large networks so they can be deployed on mobile devices. To achieve this goal, we present “deep compression”: a three-stage pipeline (Figure 1) to reduce the storage required by neural network in a manner that preserves the original accuracy. First, we prune the networking by removing the redundant connections, keeping only the most informative connections. Next, the weights are quantized so that multiple connections share the same weight, thus only the codebook (effective weights) and the indices need to be stored. Finally, we apply Huffman coding to take advantage of the biased distribution of effective weights.”).
However, El-Yaniv and Han do not teaches wherein the neural network accelerator is an M-bit neural network accelerator of which the bitwidth is M, where M is a positive integer that is equal to one of the different bitwidths that respectively correspond to the bitwidth modes, and M<N; and wherein, the neural network is caused by the neural network accelerator to operate in said one of the bitwidth modes that corresponds to a bitwidth of M by narrowing, for some of the plurality of weights of the neural network, the weights from N bits to M bit (s), where for each of the some of the plurality of weights, the M bit(s) is (are) related to the most significant M bit (s) of the weight, and the neural network is executed by the neural network accelerator using one of the sets of the batch normalization parameters that corresponds to the bitwidth of M.
On the other hand, Lee teaches wherein the neural network accelerator is an M-bit neural network accelerator of which the bitwidth is M, where M is a positive integer that is equal to one of the different bitwidths that respectively correspond to the bitwidth modes, and M<N; and wherein, the neural network is caused by the neural network accelerator to operate in said one of the bitwidth modes that corresponds to a bitwidth of M by narrowing, for some of the plurality of weights of the neural network, the weights from N bits to M bit (s), where for each of the some of the plurality of weights, the M bit(s) is (are) related to the most significant M bit (s) of the weight, and the neural network is executed by the neural network accelerator using one of the sets of the batch normalization parameters that corresponds to the bitwidth of M (Lee, (Lee, [first page, first paragraph, right column], “Figure 13.3.4 shows the architecture of the LBPE. The key idea of the LBPE is that partial-sums are repeatedly calculated during the weight bit-serial MAC operation. A LBPE consists of 4 PE clusters, adder trees to accumulate the results of each PE cluster, and shift-and-add logic for bit-serial multiplications. Each PE cluster contains 4 look-up-table (LUT) modules and a controller that determines whether the value from LUTs is added or subtracted. In the LUT module, a table with 8 entries is used, supporting 3-way MAC for multi-bit multiplication and 4-way MAC for 1b multiplication. The LUT is updated after IFs load into the AFLs, and IF values are reused for all output channels of the layer currently being processed. The 1b weight Psums are fetched from the LUT prepared in advance and accumulated for MAC operation. The LUT can fetch 12 Psums in parallel so that a total of 48×12 Psums (64×12 for 1b case) can be calculated simultaneously on a LBPE in 1 cycle. With the help of table-based operations, the LBPE improves energy efficiency more than conventional bit-serial PEs [4]. When IFs are reused 1024 times, the energy-consumption of LBPEs, including the LUT update, is reduced by 23.1%, 27.2%, 41.0%, and 53.6%, for the case of 16b, 8b, 4b, and 1b weight operations, respectively, compared with fixed-point MAC units under the same throughput conditions.” Examiner’s note, generating the neural network accelerator by reducing the bit width from 16 bit to 1 bit in order to improve the accuracy.).
EL-YANIV, Han and Lee are analogous in arts because they have the same field of endeavor of quantization the weight value in neural network.
Accordingly, it would have been prima facie obvious to one of the ordinary skills in the art before the effective filing date of the claimed invention to have modified the EL-YANIV and Han’s method, and further in view of Lee by having the neural network accelerator is an M-bit neural network accelerator of which the bitwidth is M, where M is a positive integer that is equal to one of the different bitwidths that respectively correspond to the bitwidth modes, and M<N; and wherein, the neural network is caused by the neural network accelerator to operate in said one of the bitwidth modes that corresponds to a bitwidth of M by narrowing, for some of the plurality of weights of the neural network, the weights from N bits to M bit (s), where for each of the some of the plurality of weights, the M bit(s) is (are) related to the most significant M bit (s) of the weight, and the neural network is executed by the neural network accelerator using one of the sets of the batch normalization parameters that corresponds to the bitwidth of M. The modification would have been obvious because one of the ordinary skills in art would be motivated to improve the performance of the training by having the lower weight bit precision to achieve the better result and energy efficiency, (Lee, [first page, first paragraph], “Although the PEs for RLs can be reconfigured into PEs for CLs or vice versa, only a partial reconfiguration was possible resulting in marginal performance improvement. Moreover, previous works [1-2] supported a limited set of weight bit precisions, such as either 4b or 8b or 16b. However, lower weight bit-precisions can achieve better throughput and higher energy efficiency, and the optimal bit-precision can be varied according to different accuracy/performance requirements. Therefore, a unified DNN accelerator with fully-variable weight bit-precision is required for the energy-optimal operation of DNNs within a mobile environment.”).
 Regarding claim 11, El-Yaniv further in view of Han teaches the computer program product of Claim 9, wherein one of the N bits that corresponds to a bit order of i represents 2^i in decimal when having a first bit value, and represents -2^i in decimal when having a second bit value, where i is an integer, and (N-1)>= i>=0 (Han, [page 3, section 3, second paragraph] “Weight sharing is illustrated in Figure 3. Suppose we have a layer that has 4 input neurons and 4 output neurons, the weight is a 4 X 4 matrix. On the top left is the 4 X 4 weight matrix, and on the bottom left is the 4 X 4 gradient matrix. The weights are quantized to 4 bins (denoted with 4 colors), all the weights in the same bin share the same value, thus for each weight, we then need to store only a small index into a table of shared weights. During update, all the gradients are grouped by the color and summed together, multiplied by the learning rate and subtracted from the shared centroids from last iteration. For pruned AlexNet, we are able to quantize to 8-bits (256 shared weights) for each CONV layers, and 5-bits (32 shared weights) for each FC layer without any loss of accuracy.” Examiner’s note, the N bit (8 bits) corresponding to the bit order of i  of the CONV layer with 256 shared weights.).
EL-YANIV and Han are analogous in arts because they have the same field of endeavor of quantization the weight value in neural network.
Accordingly, it would have been prima facie obvious to one of the ordinary skills in the art before the effective filing date of the claimed invention to have modified the neural network with plurality of weight and batch normalization parameters taught by  EL-YANIV’s method, and further in view of Han by having one of the N bits that corresponds to a bit order of i represents 2^i in decimal when having a first bit value, and represents -2^i in decimal when having a second bit value, where i is an integer, and (N-1)>= i>=0. The modification would have been obvious because one of the ordinary skills in art would be motivated to improve the performance of the training by reducing the storage requirement of the neural network without effecting their accuracy (Han, [Section 1, page 2, the fifth paragraph], “ Our goal is to reduce the storage and energy required to run inference on such large networks so they can be deployed on mobile devices. To achieve this goal, we present “deep compression”: a three-stage pipeline (Figure 1) to reduce the storage required by neural network in a manner that preserves the original accuracy. First, we prune the networking by removing the redundant connections, keeping only the most informative connections. Next, the weights are quantized so that multiple connections share the same weight, thus only the codebook (effective weights) and the indices need to be stored. Finally, we apply Huffman coding to take advantage of the biased distribution of effective weights.”).
Regarding claim 12, El-yaniv teaches a computerized neural network system, comprising: a storage module storing the computer program product of Claim 8, and a neural network accelerator coupled to said storage module, and configured to execute the neural network code of the computer program product (El-yaniv, [Par.0019], “The system comprises a storage comprising a neural network model having a plurality of neurons each associated with a quantized activation function adapted to output a quantized activation value selected from a first finite set, the plurality of neurons are arranged in a plurality of layers and being connected by a plurality of connections each associated with a quantized connection weight function adapted to output a quantized connection weight value selected from a second finite set, at least one processor coupled to the storage for executing a code comprising”).
Regrading to claim 13, El-Yaniv teaches the computerized neural network system of Claim 12, further comprising a server computer and a device remotely coupled to said server computer through a communication network, wherein said storage module is within said server computer, and said neural network accelerator is within said device and is remotely coupled to said storage module through the communication network (El-yaniv, [Par.0042], “The computer readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider”).
Regarding claim 14, El-yaniv as modified in view of Han teaches a computerized system comprising a plurality of multipliers, and a plurality of adders coupled to said multipliers, said multipliers and said adders to cooperatively perform computation, wherein, for some data pieces each including multiple bits that respectively correspond to multiple bit orders and each being used in the computation of some of the multipliers (Han, [page 3, section 3, second paragraph] “Weight sharing is illustrated in Figure 3. Suppose we have a layer that has 4 input neurons and 4 output neurons, the weight is a 4 X 4 matrix. On the top left is the 4 X 4 weight matrix, and on the bottom left is the 4 X 4 gradient matrix. The weights are quantized to 4 bins (denoted with 4 colors), all the weights in the same bin share the same value, thus for each weight, we then need to store only a small index into a table of shared weights.”),
one of the bits that corresponds to the bit order of i represents 2' in decimal when having a first bit value, and represents -2' in decimal when having a second bit value, where N is a number of bits of the data piece, i is an integer, and (N-1) >i>0 (Han, [page 3, section 3, second paragraph] “During update, all the gradients are grouped by the color and summed together, multiplied by the learning rate and subtracted from the shared centroids from last iteration. For pruned AlexNet, we are able to quantize to 8-bits (256 shared weights) for each CONV layers, and 5-bits (32 shared weights) for each FC layer without any loss of accuracy.” Examiner’s note, the N bit (8 bits) corresponding to the bit order of i  of the CONV layer with 256 shared weights.).
EL-YANIV and Han are analogous in arts because they have the same field of endeavor of quantization the weight value in neural network.
Accordingly, it would have been prima facie obvious to one of the ordinary skills in the art before the effective filing date of the claimed invention to have modified the neural network with plurality of weight and batch normalization parameters taught by  EL-YANIV’s method, and further in view of Han by having A computerized system comprising a plurality of multipliers, and a plurality of adders coupled to said multipliers, said multipliers and said adders to cooperatively perform computation, wherein, for some data pieces each including multiple bits that respectively correspond to multiple bit orders and each being used in the computation of some of the multipliers, one of the bits that corresponds to the bit order of i represents 2' in decimal when having a first bit value, and represents -2' in decimal when having a second bit value, where N is a number of bits of the data piece, i is an integer, and (N-1) >i>0. The modification would have been obvious because one of the ordinary skills in art would be motivated to improve the performance of the training by reducing the storage requirement of the neural network without effecting their accuracy (Han, [Section 1, page 2, the fifth paragraph], “ Our goal is to reduce the storage and energy required to run inference on such large networks so they can be deployed on mobile devices. To achieve this goal, we present “deep compression”: a three-stage pipeline (Figure 1) to reduce the storage required by neural network in a manner that preserves the original accuracy. First, we prune the networking by removing the redundant connections, keeping only the most informative connections. Next, the weights are quantized so that multiple connections share the same weight, thus only the codebook (effective weights) and the indices need to be stored. Finally, we apply Huffman coding to take advantage of the biased distribution of effective weights.”).
Claim(s) 10 is/are rejected under 35 U.S.C. 103 as being unpatentable over EL-YANIV et al. (Patent No. US20170286830– hereinafter, EL-YANIV)  in view of Han et al. (DEEP COMPRESSION: COMPRESSING DEEP NEURAL NETWORKS WITH PRUNING, TRAINED QUANTIZATION AND HUFFMAN CODING- Stanford University, Stanford, CA 94305, USA -hereinafter, Han) and further in view of Lee et al. (UNPU: A 50.6TOPS/W Unified Deep Neural Network Accelerator with 1b-to-16b Fully-Variable Weight Bit-Precision- KAIST, Daejeon, Korea - hereinafter, Lee) and further in view of Balakrishnan et al (Pub. No.: 20180019732-hereinafter, Balakrishnan). 
Regarding claim 10, El-yaniv as modified in view of Han, Lee and Balakrishnan teaches the computer program product of Claim 9, wherein the weight is narrowed from the N bits to the M bit (s) by directly truncating the least significant (N-M) bit (s) of the weight (Balakrishnan, [Par.0030], “Second, the weight h is truncated to a weight h.sub.HM including the high and medium bits and where the least significant bit of h.sub.HM is rounded. That is, if the most significant bit of h.sub.L is one, a one is added in the least significant bit position of h.sub.HM. Using the above example weight, h is 010100011001, the bottom four bits are removed, which truncates to 01010001. Because the most significant bit of h.sub.L is 1, rounding adds a one in the least significant bit position to yield 01010010. This is expressed mathematically in Equation 2: h.sub.HM=round(h,4) (2)”).
EL-YANIV, Han, Lee and Balakrishnan are analogous in arts because they have the same field of endeavor of quantization the weight value in neural network.
Accordingly, it would have been prima facie obvious to one of the ordinary skills in the art before the effective filing date of the claimed invention to have modified the EL-YANIV and Han’s method, and further in view of Lee by having the weight is narrowed from the N bits to the M bit (s) by directly truncating the least significant (N-M) bit (s) of the weight. The modification would have been obvious because one of the ordinary skills in art would be motivated to improve the performance of the training by avoiding an additional significant noise to the final output, (Balakrishnan, [Par.0035], “Therefore, the output of dithered quantizer 602 is the 9 most significant bits of ADC output x(n) 601. Mathematics, simulations and/or experimentation determine the allowable level of truncation that avoids the addition of significant noise to the final output.”).
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant’s disclosure is provide below.
Mellempudi  et al. (NPL: Ternary Neural Networks with Fine-Grained Quantization -hereinafter, Mellempudi) teaches an effecting of a training the batch normalization parameter with quantized weight value . 
Su et al. (NPL: Redundancy-reduced MobileNet Acceleration on Reconfigurable Logic For ImageNet Classification- hereinafter, Su) teaches an improvement of the power efficiency during the training of the neural network by having a low bit precision. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to EM N TRIEU whose telephone number is (571)272-5747.  The examiner can normally be reached on 7:30 - 5:00 M_TH.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Omar Fernandez Rivas can be reached on (571) 272-2589. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/E.T./Examiner, Art Unit 2128  

/OMAR F FERNANDEZ RIVAS/Supervisory Patent Examiner, Art Unit 2128