DETAILED ACTION
This is the first office action regarding application number 17/551,572, filed December 15, 2021.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

Claim Objections
Claims 1, 8, 10, and 16 are objected to because of the following informalities:
Claims 1 and 10: The conjunction word “that” is missing in the following limitation: “… to data values corresponding to the set of activations that are adjusted based on an activation weighted entropy …”. Appropriate correction is required.
Claims 8 and 16: The conjunction word “that” is missing from the following limitation: “… to data values corresponding to the set of weights that are adjusted based on an weight weighted entropy …”. Appropriate correction is required.

Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.
Claims 1-16 are rejected on the ground of nonstatutory double patenting as being unpatentable over 
Claims 1-2, 5-12, and 15-18 of U.S. Patent No. 11,250,320 (Lee, Junhaeng; Yoo, Sungjoo; and Park, Eunhyeok). The bolded text between the instant application and the issued patent indicate a merged claim limitation between the instant application and the issued patent that express the same scope. Although the claims at issue are not identical, they are not patentably distinct from each other because the issued patent discloses all of the features and limitations in the instant application, thereby making the claims in the instant application obvious over the issued patent. While certain terms in the instant application are now replaced with synonyms as part of applicant’s amended claims, under its broadest reasonable interpretation these synonyms still convey the same context and scope as the corresponding claims from the issued patent.
Instant Claim (application #17/551,572)
Patent Claim (U.S. Patent 11,250,320 / application #15/880,690)
Applicant: Samsung Electronics Co., Ltd., Seoul National University R&DB Foundation
Applicant: Samsung Electronics Co., Ltd., Seoul National University R&DB Foundation
Inventors: Junhaeng Lee, Sungjoo Yoo, Eunhyeok Park
Inventors: Junhaeng Lee, Sungjoo Yoo, Eunhyeok Park
Filed: December 15, 2021
Filed: January 26, 2018


Claim 1
Claim 1
A processor-implemented neural network method, the method comprising:
A processor-implemented neural network method, the method comprising:
obtaining a set of floating point data processed in a layer included in a neural network;
obtaining a set of floating point data processed in a layer included in a neural network;
determining a weighted entropy based on data values included in the set of floating point data;
determining a weighted entropy based on data values included in the set of floating point data;
adjusting quantization levels assigned to the data values based on the weighted entropy;
adjusting quantization levels assigned to the data values based on the weighted entropy;
quantizing the data values included in the set of floating point data in accordance with the adjusted quantization levels;
quantizing the data values included in the set of floating point data in accordance with the adjusted quantization levels;
implementing the neural network using the quantized data values and based on input data provided to the neural network; and
implementing the neural network using the quantized data values and based on input data provided to the neural network; and
indicating a result of the implementation,
indicating a result of the implementation,

wherein, the set of floating point data includes a set of weights, and 

the determining of the weighted entropy comprises:

grouping the set of weights into a plurality of clusters;

determining respective relative frequencies for each of the grouped clusters by dividing a total number of weights included in each of the respective grouped clusters by a total number of weights included in the
set of weights;

determining respective representative importances of each of the grouped clusters based on sizes of weights included in each of the grouped clusters; and

determining the weighted entropy based on the respective relative frequencies and the respective representative importances,

wherein the respective representative importances are average values of importances corresponding to the
weights included in each of the grouped clusters, each of the importances being quadratically proportional to the size of corresponding weight, and
wherein, the set of floating point data includes a set of activations, activation quantization levels assigned, using an entropy-based logarithm data representation-based quantization method, to data values corresponding to the set of activations that are adjusted based on an activation weighted entropy, and the data values corresponding to the set of activations are quantized in accordance with the adjusted activation quantization levels.
wherein, the set of floating point data includes a set of activations, activation quantization levels assigned, using an entropy-based logarithm data representation- based quantization method, to data values corresponding to the set of activations are adjusted based on an activation weighted entropy, and the data values corresponding to the set of activations are quantized in accordance with the adjusted activation quantization levels.


Claim 2
Claim 2
The method of claim 1,
The method of claim 1,
wherein the determining of the weighted entropy includes applying a weighting factor based on determined sizes of the data values to a determined distribution of the data values included in the set of floating point data.
wherein the determining of the weighted entropy includes applying a weighting factor based on determined sizes of the data values to a determined distribution of the data values included in the set of floating point data.


Claim 3
Claim 5
The method of claim 1, wherein the determining of the activation weighted entropy comprises:
The method of claim 1, wherein the determining of the activation weighted entropy comprises:
determining respective relative activation frequencies for each of the activation quantization levels by dividing a total number of activations included in each of the
respective activation quantization levels by a total number of activations included in the set of activations;
determining respective relative activation frequencies for each of the activation quantization levels by dividing a total number of activations included in each of the
respective activation quantization levels by a total number of activations included in the set of activations;
determining respective activation data values corresponding to each of the activation quantization levels as respective representative activation importances of each of the activation quantization levels; and
determining respective activation data values corresponding to each of the activation quantization levels as respective representative activation importances of each of the activation quantization levels; and
determining the activation weighted entropy based on the respective relative activation frequencies and the respective representative activation importances.
determining the activation weighted entropy based on the respective relative activation frequencies and the respective representative activation importances.


Claim 4
Claim 6
The method of claim 3,
The method of claim 5,
wherein the adjusting of the activation quantization levels comprises adjusting the activation quantization levels assigned to the respective activation data values by adjusting a value corresponding to a first activation quantization level among the activation quantization levels and a size of an interval between the activation quantization levels in a direction of increasing the activation weighted entropy.
wherein the adjusting of the activation quantization levels comprises adjusting the activation quantization levels assigned to the respective activation data values by adjusting a value corresponding to a first activation quantization level among the activation quantization levels and a size of an interval between the activation quantization levels in a direction of increasing the activation weighted entropy.


Claim 5
Claim 7
The method of claim 3,
The method of claim 5,
wherein the adjusting of the activation quantization levels comprises adjusting a log base, which is controlling of the activation quantization levels, in a direction that maximizes the activation weighted entropy.
wherein the adjusting of the activation quantization levels comprises adjusting a log base, which is controlling of the activation quantization levels, in a direction that maximizes the activation weighted entropy.


Claim 6
Claim 8
The method of claim 1,
The method of claim 1,


By virtue of dependency, Claim 8 includes all limitations from independent Claim 1; see above Claim 1 mapping.
wherein, the obtaining the set of floating point data, the determining of the weighted entropy, the adjusting of the quantization levels, and the quantizing of the data values included in the set of floating point data are performed with respect to each of a plurality of layers included in the neural network, with respective adjusted quantization levels being optimized and assigned for each of the plurality of layers.



wherein, the obtaining, determining, adjusting, and quantizing are performed with respect to each of a plurality of layers included in the neural network, with respective adjusted quantization levels being
optimized and assigned for each of the plurality of layers.



	
According to independent claim 1, the “obtaining” step is in reference to the limitation “obtaining a set of floating point data …”; the “determining” step is in reference to the limitation “determining a weighted entropy …”; the “adjusting” step is in reference to the limitation “adjusting quantization levels …”; and the “quantizing” step is in reference to the limitation “quantizing the data values included in the set of floating point data …”.


Claim 7
Claim 9
The method of claim 1,
The method of claim 1,
wherein the implementing of the neural network comprises training the neural network based on the quantized data values.
wherein the implementing of the neural network comprises training the neural network based on the quantized data values.


Claim 8
Claim 1
The method of claim 1, 


By virtue of dependency, Claim 8 includes all limitations from Claim 1; refer to above Claim 1 mapping.
A processor-implemented neural network method, the method comprising:

… determining a weighted entropy based on data values included in the set of floating point data;

… adjusting quantization levels assigned to the data values based on the weighted entropy;

… quantizing the data values included in the set of floating point data in accordance with the adjusted quantization levels;
wherein the set of floating point data includes a set of weights,
… wherein, the set of floating point data includes a set of weights, and 

the determining of the weighted entropy comprises:

grouping the set of weights into a plurality of clusters;

determining respective relative frequencies for each of the grouped clusters by dividing a total number of weights included in each of the respective grouped clusters by a total number of weights included in the set of weights;

determining respective representative importances of each of the grouped clusters based on sizes of weights included in each of the grouped clusters; and

determining the weighted entropy based on the respective relative frequencies and the respective representative importances,

wherein the respective representative importances are average values of importances corresponding to the weights included in each of the grouped clusters, each of the importances being quadratically proportional to the size of corresponding weight, …
weight quantization levels assigned, using an entropy-based clustering-based quantization method, to data values corresponding to the set of weights that are adjusted based on a weight weighted entropy, and the data values corresponding to the set of weights are quantized in accordance with the adjusted activation quantization levels.


Under its broadest reasonable interpretation, the term “quantization” broadly indicates a process or series of steps that restrict a set of input data values into a representation identified by a discrete set of values, and hence this limitation broadly recites identifying, assigning, and adjusting a set of weights into discrete set of floating-point weight values using a weight entropy-based clustering-based quantization method. The corresponding limitations in the issued patent recite determining a weighted entropy on a set of floating point data, adjusting quantization levels, and quantizing the associated data values corresponding to a set of floating-point weight values, using a clustering method that groups the set of weights into a plurality of clusters, where each set of clusters are represented by average values of the grouped weights in each cluster (hence representing a set of quantized levels for each group of weights). A person having ordinary skill in the art would understand that this clustering method of applying weighted entropy to identify groups of weights represented by average values of the weights in each group broadly recites an entropy-based clustering-based quantization method, and hence this recited limitation from the instant application is functionally equivalent in scope to the highlighted limitations recited in the issued patent.



Claim 9
Claim 10
A non-transitory computer-readable medium storing instructions, 
A non-transitory computer-readable medium storing instructions, 
which when executed by a processor, cause the processor to implement the method of claim 1.
which when executed by a processor, cause the processor to implement the method of claim 1.


Claim 10
Claim 11
A neural network apparatus, the apparatus comprising:
A neural network apparatus, the apparatus comprising:
a processor configured to:
a processor configured to:
obtain a set of floating point data processed in a layer included in a neural network;
obtain a set of floating point data processed in a layer included in a neural network;
determine a weighted entropy based on data values included in the set of floating point data;
determine a weighted entropy based on data values included in the set of floating point data;
adjust quantization levels assigned to the data values based on the weighted entropy;
adjust quantization levels assigned to the data values based on the weighted entropy;
quantize the data values included in the set of floating point data in accordance with the adjusted quantization levels;
quantize the data values included in the set of floating point data in accordance with the adjusted quantization levels;
implement the neural network using the quantized data values and based on input data provided to the neural network; and
implement the neural network using the quantized data values and based on input data provided to the neural network; and
indicate a result of the implementation,
indicate a result of the implementation,

wherein, the set of floating point data includes a set of weights, and 

the processor is further configured to:

group the set of weights into a plurality of clusters;

determine respective relative frequencies for each of the grouped clusters by dividing a total number of weights included in each of the respective grouped clusters by a total number of weights included in the
set of weights;

determine respective representative importances of each of the grouped clusters based on sizes of weights included in each of the grouped clusters; and

determine the weighted entropy based on the respective relative frequencies and the respective representative importances,

wherein the respective representative importances are average values of importances corresponding to the
weights included in each of the grouped clusters, each of the importances being quadratically proportional to the size of corresponding weight, and
wherein, the set of floating point data includes a set of activations, activation quantization levels assigned, using an entropy-based logarithm data representation-based quantization method, to data values corresponding to the set of activations that are adjusted based on an activation weighted entropy, and the data values corresponding to the set of activations are quantized in accordance with the adjusted activation quantization levels.
wherein, the set of floating point data includes a set of activations, activation quantization levels assigned, using an entropy-based logarithm data representation- based quantization method, to data values corresponding to the set of activations are adjusted based on an activation weighted entropy, and the data values corresponding to the set of activations are quantized in  accordance with the adjusted activation quantization levels.


Claim 11
Claim 12
The apparatus of claim 10,
The apparatus of claim 11,
wherein the determining of the weighted entropy includes applying a weighting factor based on determined sizes of the data values to a determined distribution of the data values included in the set of floating point data.
wherein the determining of the weighted entropy includes applying a weighting factor based on determined sizes of the data values to a determined distribution of the data values included in the set of floating point data.


Claim 12
Claim 15
The apparatus of claim 10, wherein, for the determining of the activation weighted entropy, the processor is configured to:
The apparatus of claim 11, wherein, for the determining of the activation weighted entropy, the processor is further configured to:
determine respective relative activation frequencies for each of the activation quantization levels by dividing a
total number of activations included in each of the respective activation quantization levels by a total number of activations included in the set of activations;
determine respective relative activation frequencies for each of the activation quantization levels by dividing a
total number of activations included in each of the respective activation quantization levels by a total number of activations included in the set of activations;
determine respective activation data values corresponding to each of the activation quantization levels as respective representative activation importances of each of the activation quantization levels; and
determine respective activation data values corresponding to each of the activation quantization levels as respective representative activation importances of each of the activation quantization levels; and
determine the activation weighted entropy based on the respective relative activation frequencies and the respective representative activation importances.
determine the activation weighted entropy based on the respective relative activation frequencies and the respective representative activation importances.


Claim 13
Claim 16
The apparatus of claim 12,
The apparatus of claim 15,
wherein the processor is configured to adjust the activation quantization levels assigned to the respective activation data values by adjusting a value corresponding to a first activation quantization level among the activation quantization levels and a size of an interval between the activation quantization levels in a direction of increasing the activation weighted entropy.
wherein the processor is configured to adjust the activation quantization levels assigned to the respective activation data values by adjusting a value corresponding to a first activation quantization level among the activation quantization levels and a size of an interval between the activation quantization levels in a direction of increasing the activation weighted entropy.


Claim 14
Claim 17
The apparatus of claim 12, 
The apparatus of claim 15,
wherein the processor is configured to adjust the activation quantization levels by adjusting a log base, which is controlling of the activation quantization levels, in a direction that maximizes the activation weighted entropy.
wherein the processor is configured to adjust the activation quantization levels by adjusting a log base, which is controlling of the activation quantization levels, in a direction that maximizes the activation weighted entropy.


Claim 15
Claim 18
The apparatus of claim 10,
The apparatus of claim 11,


By virtue of dependency, Claim 18 includes all limitations from independent Claim 11; see above Claim 11 mapping.
wherein the processor is further configured to perform the obtaining the set of floating point data, the determining of the weighted entropy, the adjusting of the quantization levels, and the quantizing of the data values included in the set of floating point data with respect to each of a plurality of layers included in the neural network, with respective adjusted quantization levels being optimized and assigned for each of the plurality of layers.
wherein the processor is further configured to perform the obtaining, determining, adjusting, and quantizing with respect to each of a plurality of layers included in the neural network, with respective adjusted quantization levels being optimized and assigned for each of the plurality of layers.




According to independent claim 11, the “obtaining” step is in reference to the limitation “obtaining a set of floating point data …”; the “determining” step is in reference to the limitation “determining a weighted entropy …”; the “adjusting” step is in reference to the limitation “adjusting quantization levels …”; and the “quantizing” step is in reference to the limitation “quantizing the data values included in the set of floating point data …”.


Claim 16
Claim 1
The apparatus of claim 10, 



By virtue of dependency, Claim 16 includes all limitations from Claim 10; refer to above Claim 10 mapping.
A neural network apparatus, the apparatus comprising:

… obtain a set of floating point data processed in a layer included in a neural network; …

… determine a weighted entropy based on data values included in the set of floating point data; …

… adjust quantization levels assigned to the data values based on the weighted entropy; …

… quantize the data values included in the set of floating point data in accordance with the adjusted quantization levels;
wherein the set of floating point data includes a set of weights,
… wherein, the set of floating point data includes a set of weights, and

the processor is further configured to:

group the set of weights into a plurality of clusters;

determine respective relative frequencies for each of the grouped clusters by dividing a total number of weights included in each of the respective grouped clusters by a total number of weights included in the set of weights;

determine respective representative importances of each of the grouped clusters based on sizes of weights included in each of the grouped clusters; and

determine the weighted entropy based on the respective relative frequencies and the respective representative importances,

wherein the respective representative importances are average values of importances corresponding to the weights included in each of the grouped clusters, each of the importances being quadratically proportional to the size of corresponding weight, …
weight quantization levels assigned, using an entropy-based clustering-based quantization method, to data values corresponding to the set of weights that are adjusted based on a weight weighted entropy, and the data values corresponding to the set of weights are quantized in accordance with the adjusted activation quantization levels.


Under its broadest reasonable interpretation, the term “quantization” broadly indicates a process or series of steps that restrict a set of input data values into a representation identified by a discrete set of values, and hence this limitation broadly recites identifying, assigning, and adjusting a set of weights into discrete set of floating-point weight values using a weight entropy-based clustering-based quantization method. The corresponding limitations in the issued patent recite determining a weighted entropy on a set of floating point data, adjusting quantization levels, and quantizing the associated data values corresponding to a set of floating-point weight values, using a clustering method that groups the set of weights into a plurality of clusters, where each set of clusters are represented by average values of the grouped weights in each cluster (hence representing a set of quantized levels for each group of weights). A person having ordinary skill in the art would understand that this clustering method of applying weighted entropy to identify groups of weights represented by average values of the weights in each group broadly recites an entropy-based clustering-based quantization method, and hence this recited limitation from the instant application is functionally equivalent in scope to the highlighted limitations recited in the issued patent.



Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.

The following is a quotation of the first paragraph of pre-AIA  35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.

Claims 8 and 16 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. 
The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA  35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention.
Regarding Claims 8 and 16,
	Both claims recite the following limitation “… the data values corresponding to the set of weights are quantized in accordance with the adjusted activation quantization levels”, but the specification fails to disclose any method or series of steps that describe that the data values corresponding to the set of weights are quantized in accordance with adjusted activation quantization levels. Examiner points out that Applicant’s specification paragraphs [0075]-[0089] broadly describe a method (described as a clustering-based quantization method, [0087]-[0089]) of quantizing the set of weights through a grouping of weights into a plurality of clusters, in accordance to a respective size ([0075]: “When the set of floating point data is the set of weights, the neural network apparatus may group the set of weights into a plurality of clusters. When it is necessary to classify the weights into N quantization levels, the neural network apparatus may classify each of the weights in accordance with a respective size and map each of the weights into one of N clusters. For example, the neural network apparatus may group the set of weights into N clusters             
                
                    
                        C
                    
                    
                        0
                    
                
            
        ,…,            
                 
                
                    
                        C
                    
                    
                        N
                        -
                        1
                    
                
            
        .”). This respective size is further applied to determine a respective importance for each of the grouped clusters, where this respective importance is used to determine a representative weight that is used for quantization ([0080]: “… the neural network apparatus may determine a representative importance of each of the grouped clusters based on the sizes of the respective weights included in each of the grouped clusters … may be a mathematical representation of respective effects of the weights of the grouped cluster on the final output.”; and [0085]: “The neural network apparatus may determine respective weights corresponding to the respective importance of each of the grouped clusters, e.g., as respective representative weights of each of the grouped clusters, and quantize each of the weights included in each of the grouped clusters into the corresponding representative weight of each of the grouped clusters.”). Hence, rather than applying adjusted activation quantization levels to quantize data values corresponding to the set of weights, Applicant’s specification explicitly indicates that the quantizing of these weight data values is in accordance to determining a respective size for each of the clusters, and thus Applicant’s specification fails to disclose any method or series of steps in which the quantizing of a set of weights is based on an adjusted activation quantization level. The specification must describe and support the claims such that the public is informed of the boundaries of what constitutes infringement of the patent, as well as determining whether the claimed invention meets all the criteria for patentability by distinctly claiming the subject matter which the inventor regards as the invention. See MPEP 2163. Given that there is no basis or support for this limitation, this newly introduced claim limitation fails to comply with the written description requirement. For the purposes of examination, this limitation will be interpreted according to what is supported in the Applicant’s specification, where the data values corresponding to the set of weights are quantized in accordance to a respective size (“… the data values corresponding to the set of weights are quantized in accordance with a respective size”).

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 1-16 are rejected under 35 U.S.C. 103 as being unpatentable over 
Hwang et al., Fixed-point feedforward deep neural network design using weights +1, 0, and -1, 2014 IEEE Workshop on Signal Processing Systems (SiPS), October 20-22 2014 [hereafter referred as Hwang] in view of Guiasu, Silviu, Grouping Data by Using the Weighted Entropy, Journal of Statistical Planning and Inference 15 (1986) Elsevier Science Publishers B.V., 1986 [hereafter referred as Guiasu], in further view of Miyashita et al., Convolutional Neural Networks using Logarithmic Data Representation, arXiv:1603.01025v2, March 17 2016 [hereafter referred as Miyashita].
Regarding Claim 1, 
Hwang teaches
A processor-implemented neural network method (Examiner’s note: Hwang teaches implementing deep neural networks using hardware (VLSI) and software running on embedded computing systems, including quantizing processes that reduce the word-length of weights and activation signals in deep neural networks. A person having ordinary skill in the art would understand that an embedded computing system implementing a neural network using hardware interconnections and arithmetic units, and associated software to perform the training procedure would have a processor and associated memory storing executable instructions to perform the quantized processes described in Hwang (Hwang p.1 Section I. Introduction: “Implementation of deep neural networks using VLSI or embedded computing systems is needed for real-time and low-power applications …”; and p.5 Section V. Concluding Remarks: “… We have developed a training procedure to reduce the word-length of weights and that of signals in deep neural networks. … The signal word-length that affects the complexity of interconnection and arithmetic units can also be reduced to 3 bits without sacrificing the performance much … This research is useful for not only hardware based implementations but also real-time software development.”), the method comprising: 
obtaining a set of floating point data processed in a layer included in a neural network (Examiner’s note: Hwang teaches quantizing a feedforward deep neural network with multiple hidden layers using 3-bit fixed-point ternary weights, where the floating-point values for the extracted weights and activation signals at each layer are being quantized at each neural network layer. Hwang further teaches extracting from each layer k a signal vector yk and weight matrix Wk to determine the corresponding next layer signal vector and weight matrix, where the signal vector of the current layer is based on applying the bias, weight matrix of the current layer, and the activation function. Hence, the identification of these signal vectors and weight matrices at each layer of a floating-point based neural network corresponds to a process of obtaining a set of floating-point data from each layer in a neural network (Hwang p.1 Figure 1 and Abstract: “Feedforward deep neural networks that employ multiple hidden layers show high performance in many applications, but they demand complex hardware for implementation. The hardware complexity can be much lowered by minimizing the word-length of weights and signals … The designed fixed-point networks with ternary weights (+1, 0, and -1) and 3-bit signal show only negligible performance loss when compared to the floating-point counterparts.”; p.1 Section I. Introduction: “… In a general feedforward deep neural network with multiple hidden layers as depicted in Fig. 1, each layer k has a signal vector yk, which is propagated to the next layer by multiplying the weight matrix Wk+1, adding biases bk+1, and applying the activation function φk+1 (·) as follows: yk+1 = φk+1 (Wk+1 yk + bk+1). … each weight matrix between two layers demands N1 ×N2 weights, where N1 and N2 are the number of units for the anterior layer and the posterior layer, respectively …The number of output signals and that of biases are both N2. … we propose a high performance fixed-point optimization method that can greatly reduce the word-length of weights and signals for implementing DNNs. The proposed scheme allows design of DNNs for real-world problems only with ternary (+1, 0, and -1) weights and 2 or 3 bits of fixed-point signals.”; p.2 Section II. Direct Quantization with Exhaustive Search: “… The floating-point weights are obtained by employing unsupervised greedy learning with restricted Boltzmann machines (RBMs) as pre-training … 1) Prepare a fully trained floating-point weights. …”; p.3 col.1 1st-3rd paragraphs (Section II): “… we maintain both the high-precision and low-precision weights and signals … we also store high-precision weights for adaptation … The low-precision weights are obtained by quantizing the high-precision weights …”; p.4 2nd paragraph (Section IV.A.): “… The original miss classification rate for the test set was … with floating-point arithmetic …”; and p.5 Section V. Concluding Remarks: “… We find that the performance gap between the floating-point and fixed-point networks shrinks as the number of units in each layer increases …”).); 
determining a … entropy based on data values included in the set of floating point data (Examiner’s note: Under its broadest reasonable interpretation, the term “entropy” broadly indicates a measure of information, and hence this limitation broadly recites determining an amount of information present in a set of floating point data. Hwang teaches identifying and determining an initial grouping for weights and activation signals based on their complexity, range, and quantization sensitivity, where the complexity, range, and quantization sensitivity are interpreted as measurements of information content (“entropy”) present in the data, and the initial grouping represents the number of quantization points for the set of weights and activation signals. Hence this process that determines an initial set of groupings for the set of weights and activation signals based on measurements such as complexity, range, and quantization sensitivity corresponds to a process for determining a entropy based on data values included in the set of floating point data (Hwang p.2 Section II. Direct Quantization with Exhaustive Search: “… A deep neural network usually contains millions of weights and thousands of internal signals. Since applying a different data format for each weight or signal is too complex, it is needed to group them according to their range and the quantization sensitivity [13]. In a deep neural network with several layers, it is convenient to separate each layer for the grouping. Among the weights in each layer, we notice that the biases need to have high precision because their range is usually much larger than that of other weights. Assigning a high precision fixed-point format, such as 8 bits, to the biases does not increase the hardware complexity much because the number of them is small. The quantization sensitivity can also be determined from simulations that apply quantized weights for a specific group while using the floating-point data type for other groups [13]. We found that the quantization sensitivity of signals in the hidden layers is mostly the same and very low, but that in the input of the network depends on applications very much. … Once the number of quantization points for each weight matrix is given, the goal is to minimize the output error of the network …”).); 
adjusting quantization levels assigned to the data values based on the … entropy (Examiner’s note: Under its broadest reasonable interpretation, the term “quantization” broadly indicates a process or series of steps that restrict a set of input data values into a representation identified by a discrete set of values, and hence this limitation broadly recites adjusting a discrete set of values representing quantization levels based on the identified entropy. As indicated earlier, Hwang teaches identifying and determining an initial grouping for weights and activation signals based on their complexity, range, and quantization sensitivity, where the complexity, range, and quantization sensitivity are interpreted as measurements of information content (“entropy”) present in the data. Hwang further teaches performing a greedy algorithm to determine the optimum quantization step size by performing and testing iterative adjustments around the initial step size to minimize the L2-based output error of the network, where for the case of classification, minimizing the output error of the network involves minimizing a mean cross-entropy through application of a modified backpropagation algorithm that accumulates the weight changes learned over each quantized step adjustment. This process is continued for each layer in the network. From the earlier teachings of Hwang, the activation signal vector extracted from each layer is based on the weight matrix of the current layer, and hence this process of quantizing the weights at each layer through a modified backpropagation algorithm also quantizes the activation signal vector. Hence this process of performing iterative adjustments around the initial step size to determine the optimum quantization step size that minimizes the L2-based output error of the network through a backpropagation algorithm corresponds to a process for adjusting quantization levels assigned to the data values based on an entropy (Hwang p.1 Section I. Introduction; p.2 Section II. Direct Quantization with Exhaustive Search: “… the optimum step size is initially determined by using an L2-error minimizing approach that is similar to Lloyd-Max quantization, and then the quantization step size is fine tuned by using exhaustive search. … the greedy approach is applied as follows: 1) Prepare a fully trained floating-point weights. 2) Quantize all input data and signals of hidden layers. 3) … try several step sizes around the initial step size and measure the output error of the network with the training set. The initial step size is determined using the L2-error minimizing approach. 4) Choose the step size that minimizes the output error and quantize the weights. 5) Perform the third and fourth steps for the next layer until it reaches the last layer. … The output error can be the mean cross-entropy for classification …”; and pp.2-3 Section III. Retrain with Error Backpropagation on Quantized Domain: “… we use a fixed-point optimization scheme that retrains the quantized neural network. This method reapplies the error backpropagation that is modified to deal with quantized weights and signals … The error backpropagation is basically a gradient descent method that minimizes the output error … we need to modify the backpropagation algorithm to properly accumulate the small amount of weight changes … The low-precision weights are obtained by quantizing the high-precision weights and used in the forward and backward steps of the backpropagation algorithm. … We further modify the backpropagation algorithm to quantize the signals or the outputs of the units … For the sigmoid activation function in (2), the derivative is usually calculated by                 
                    
                        
                            ϕ
                        
                        
                            '
                        
                    
                    
                        
                            x
                        
                    
                    =
                    ϕ
                    
                        
                            x
                        
                    
                    
                        
                            1
                            -
                            ϕ
                            
                                
                                    x
                                
                            
                        
                    
                    =
                    y
                    (
                    1
                    -
                    y
                    )
                
             where                 
                    ϕ
                    (
                    x
                    )
                
             is the activation function and                 
                    y
                    =
                    ϕ
                    
                        
                            x
                        
                    
                
             is the output signal of the unit …”).); 
quantizing the data values included in the set of floating point data in accordance with the adjusted quantization levels (Examiner’s note: As indicated earlier, Hwang teaches performing a modified backpropagation algorithm that accumulates the weight changes learned over each quantized step adjustment, where these weight changes are used to quantize the set of weights and activation signals at each network layer. Hwang further teaches that these weight changes learned over the updates are used to quantize the stored high-precision weights into the low-precision weights. From the earlier teachings of Hwang, the activation signal vector extracted from each layer is based on the weight matrix of the current layer, and hence this process of quantizing the weights at each layer through a modified backpropagation algorithm also quantizes the corresponding activation signal vector produced at each neural network layer. Hence this modified backpropagation process that quantizes the high-precision weights into low-precision weights, and uses these quantized weight values to further quantize the corresponding activation signal values corresponds to a process for quantizing the data values included in the set of floating point data in accordance with the adjusted quantization levels (Hwang p.1 Section I. Introduction; p.2 Section II. Direct Quantization with Exhaustive Search; and pp.2-3 Section III. Retrain with Error Backpropagation on Quantized Domain).); 
implementing the neural network using the quantized data values and based on input data provided to the neural network (Examiner’s note: As indicated earlier, Hwang teaches performing a modified backpropagation algorithm that accumulates the weight changes learned over each quantized step adjustment, where these weight changes are used to quantize the set of weights and activation signals at each network layer (Hwang p.1 Section I. Introduction; p.2 Section II. Direct Quantization with Exhaustive Search; and pp.2-3 Section III. Retrain with Error Backpropagation on Quantized Domain: “… The mini-batch based backpropagation algorithm [14] updates the weight wij, the synaptic strength from the unit j to the unit i … We further modify the backpropagation algorithm to quantize the signals or the outputs of the units. … The overall algorithm is summarized in Fig. 3.”). Hwang additionally teaches applying this quantization strategy to two neural networks that perform handwritten digit recognition and phoneme recognition, respectively, and evaluating the respective performance for each neural network, such that the respective application of the quantization strategy and performance evaluation (e.g., classification rate for handwritten digit recognition; frame-level phone error rate for phoneme recognition) for each neural network correspond to respective implementations of a neural network using the quantized data values and based on input data provided to the neural network (Hwang p.3 Section IV. Evaluation: “The proposed quantization strategy is evaluated with two neural network examples: handwritten digit recognition and phoneme recognition …”; pp.3-4 Tables I and II, Section IV.A. Handwritten Digit Recognition: 1) MNIST Database: The MNIST database consists of 28 by 28 grey level images of handwritten digits. A training set has 60,000 examples and a test set has 10,000 examples. … 2) Neural Network Configuration: … The input layer has 784 units, which is followed by two 500-unit and one 2,000-unit hidden layers. … 3) Training: … we ran 100 epochs of the backpropagation with stochastic gradient descent using the mini-batch size of 100, the fixed learning rate of 0.1, and the momentum of 0.9. Also, the same parameters are used for the proposed retraining algorithm. 4) Experimental Results: The experimental results with various weight and signal quantization are summarized in TABLE I. The original miss classification rate for the test set was 0.97% with floating-point arithmetic. The direct quantization shows a miss rate of 4.28% with 3-point weights, and 1.20% with 7-point weights. On the other hand, the retraining approach shows the result that is quite close to the original one. The miss rate with 3-point weights and 3-bit signal quantization is 1.08%. Applying 8-bit signal quantization does not show significant difference compared to 3-bit signal quantization, which means 3 bits for signal word-length is enough for hidden layers. … we can notice that the retrain algorithm helps reducing not only the word-length of weights but also that of signals … Note that weights are changes in both direction … This clearly shows that fixed-point optimization with retraining is not just adjustment of quantization boundaries, but slightly moving weights around the quantization boundaries in both directions.”; and pp.4-5 Section IV.B. Phoneme Recognition).), and 
indicating a result of the implementation (Examiner’s note: As indicated earlier, Hwang teaches applying the quantization strategy to two neural networks that perform handwritten digit recognition and phoneme recognition, respectively, and evaluating the respective performance for each neural network, where the respective performance evaluation (e.g., classification rate for handwritten digit recognition; frame-level phone error rate for phoneme recognition) as well as the associated quantization findings of the correspond to respective results of the implementation (Hwang p.3 Section IV. Evaluation; pp.3-4 Section IV.A. Handwritten Digit Recognition; and pp.4-5 Section IV.B. Phoneme Recognition).),
wherein, the set of floating point data includes a set of activations (Examiner’s note: As indicated earlier, Hwang teaches quantizing a feedforward deep neural network with multiple hidden layers using 3-bit fixed-point ternary weights, where the floating-point values for the extracted weights and activation signals at each layer are being quantized. Hwang further teaches extracting from each layer k a signal vector yk and weight matrix Wk to determine the corresponding next layer signal vector and weight matrix, where the signal vector of the current layer is based on applying the bias, weight matrix of the current layer, and the activation function. Hence, the identification of these high-precision floating-point signal vectors and weight matrices at each layer of a floating-point based neural network corresponds to a process of obtaining a set of floating-point activation data from each layer in a neural network (Hwang p.1 Figure 1 and Abstract; p.1 Section I. Introduction; p.2 Section II. Direct Quantization with Exhaustive Search; p.3 col.1 1st-3rd paragraphs (Section II); p.4 2nd paragraph (Section IV.A.); and p.5 Section V. Concluding Remarks).) … 
… activation quantization levels assigned, using an entropy-based … quantization method, to data values corresponding to the set of activations (Examiner’s note: Under its broadest reasonable interpretation, this limitation broadly recites assigning and adjusting a set of activations based on an entropy-based quantization method. As indicated earlier, Hwang teaches identifying and determining an initial grouping for weights and activation signals based on their complexity, range, and quantization sensitivity, where the complexity, range, and quantization sensitivity are interpreted as measurements of information content (“entropy”) present in the data. Hwang further teaches performing a greedy algorithm to determine the optimum quantization step size by performing and testing iterative adjustments around the initial step size to minimize the L2-based output error of the network, where for the case of classification, minimizing the output error of the network involves minimizing a mean cross-entropy through application of a modified backpropagation algorithm that accumulates the weight changes learned over each quantized step adjustment. This process of performing iterative adjustments around the initial step size to determine the optimum quantization step size that minimizes the L2-based output error of the network through a backpropagation algorithm corresponds to a process for adjusting quantization levels assigned to the data values based on an entropy. This process is continued for each layer in the network. From the earlier teachings of Hwang, the activation signal vector extracted from each layer is based on the weight matrix of the current layer, and hence this process of quantizing the weights at each layer through a modified backpropagation algorithm also quantizes the corresponding activation signal vector produced at each neural network layer. Hence this process that assigns and adjusts a set of quantization levels for a set of weights based on an optimum quantization step size that minimizes the mean cross-entropy output error of the network, and further backpropagates these learned and adjusted weight quantization levels to further quantize the corresponding activation signal vectors at each neural network layer also corresponds to a process that assigns and adjusts a set of activations based on an entropy-based quantization method (Hwang p.1 Section I. Introduction; p.2 Section II. Direct Quantization with Exhaustive Search; and pp.2-3 Section III. Retrain with Error Backpropagation on Quantized Domain).) …
… the data values corresponding to the set of activations are quantized in accordance with the adjusted activation quantization levels (Examiner’s note: As indicated earlier, Hwang teaches performing a modified backpropagation algorithm that accumulates the weight changes learned over each quantized step adjustment, where these weight changes are used to quantize the set of weights and activation signals at each network layer. Hwang further teaches that these weight changes learned over the updates are used to quantize the stored high-precision weights into the low-precision weights. From the earlier teachings of Hwang, the activation signal vector extracted from each layer is based on the weight matrix of the current layer, and hence this process of quantizing the weights at each layer through a modified backpropagation algorithm also quantizes the corresponding activation signal vector produced at each neural network layer. Hence this modified backpropagation process that quantizes the high-precision weights into low-precision weights, and uses these quantized weight values to further quantize the corresponding activation signal values corresponds to a process for quantizing the data values included in the set of floating point data in accordance with the adjusted activation quantization levels (Hwang p.1 Section I. Introduction; p.2 Section II. Direct Quantization with Exhaustive Search; and pp.2-3 Section III. Retrain with Error Backpropagation on Quantized Domain).) …
While Hwang teaches determining an initial grouping for weights and activation signals based on their complexity, range, and quantization sensitivity, where the complexity, range, and quantization sensitivity are interpreted as measurements of information content (“entropy”) present in the data, Hwang does not explicitly teach
… determining a weighted entropy based on data values … 
… adjusting … based on the weighted entropy …
… data values … that are adjusted based on an activation weighted entropy …
Guiasu teaches
… determining a weighted entropy based on data values (Examiner’s note: Guiasu teaches a grouping algorithm involving grouping data into class intervals based on data complexity, where additional factors such as information content and class homogeneity are further determined and applied to identify each grouping of data, where the information content and class homogeneity represent weighting factors on the information initially contained in the set of data (representing a “weighted entropy”). Guiasu further teaches groups of data elements in a data set X, and measuring the weighted entropy to identify the amount of information for each group of data within X, where the weighted entropy I(                
                    
                        
                            P
                        
                        
                            n
                        
                    
                
            ) is based on a relative frequency p(                
                    
                        
                            X
                        
                        
                            i
                        
                    
                
            ) (expressed as a ratio of the number of data elements per partition/class and the total number of data elements X) and a weight w(                
                    
                        
                            X
                        
                        
                            i
                        
                    
                
            ) (expressed as a ratio of the sum of all element values per partition/class and the total number of data elements per partition/class). When combined with the teachings found in the Hwang reference, this data set X represents the initial groupings of a set of floating point data (i.e., a set of floating point weight values and a set of floating point activation values) in which the weighted entropy metric can be further calculated for each subset of floating point data, where the subset of floating-point weights are initially grouped into a set of classes based on complexity, which also results in the subset of floating-point activations being grouped into another set of classes (as these activations are based on the corresponding floating-point weights through dot product multiplication). Hence this process of grouping weight and activation data values based on data complexity and measuring information content I(                
                    
                        
                            P
                        
                        
                            n
                        
                    
                
            ) for each group of data through calculation of a weighted entropy metric corresponds to a process for determining a weighted entropy based on data values included in the set of floating point data (Guiasu p.63 Section 1. Introduction: “Grouping data is a way of coping with complexity. It is well known that when the raw data are grouped in classes a certain amount of information is lost, since no distinction is made between observations falling into the same class. The larger the class interval is, the greater is the amount of information lost. … In the choice of a class interval a reasonable compromise must be reached between information content and class homogeneity. The aim of the paper is to show how the weighted entropy, a generalization of Shannon's entropy from information theory, may be used to balance the amount of information and the degree of homogeneity associated to a partition of data in classes.”; and p.64 eqs. 2.1, 2.2, 2.3, and pp.63-64 Section 2. Information balance for weighted data:

    PNG
    media_image1.png
    433
    610
    media_image1.png
    Greyscale
).) …
… adjusting … based on the weighted entropy (Examiner’s note: Guiasu further teaches determining an optimal number of partitions                 
                    
                        
                            P
                        
                        
                            n
                        
                    
                
             by identifying a bound that balances the measured information content I(                
                    
                        
                            P
                        
                        
                            n
                        
                    
                
            ) (“weighted entropy”) with the measured degree of homogeneity H                
                    (
                    
                        
                            P
                        
                        
                            n
                        
                    
                    )
                
             within each partition. Guiasu additionally teaches identifying and selecting finer partitions than the initial partition, where each selected finer partition successively replaces the earlier identified partition, with its weighted entropy increasing from 0, at the same time the degree of homogeneity decreases towards 0. Hence, when combined with the teachings in the Hwang reference, this process of selecting finer partitions than the initial partition in order to reach a bound that balances the weighted entropy with the degree of homogeneity within each partition corresponds to a process that adjusts the quantization levels based on the weighted entropy (Guiasu pp.64 4th paragraph-p.65 4th paragraph: “… The grouping of the initial raw data set X in the subsets of partition (2.1) is characterized by the information balance, where I(X) … measures the amount of information contained by the initial raw data … I(                
                    
                        
                            P
                        
                        
                            n
                        
                    
                
            )  … measures the amount of information contained by the class making up the partition                 
                    
                        
                            P
                        
                        
                            n
                        
                    
                
            , and H                
                    (
                    
                        
                            P
                        
                        
                            n
                        
                    
                    )
                
             … measures the degree of homogeneity of the partition                 
                    
                        
                            P
                        
                        
                            n
                        
                    
                
             … If                 
                    P
                    *
                
             is a finer partition than                 
                    P
                
              then we have I(                
                    P
                    *
                
            ) ≥ I(                
                    P
                
            ) and H                
                    (
                    P
                    *
                    )
                
             ≤ H                
                    (
                    P
                    )
                
             … Remark. Successively replacing a partition by a finer one, I(                
                    
                        
                            P
                        
                        
                            n
                        
                    
                
            )  will increase from 0 … while H                
                    (
                    
                        
                            P
                        
                        
                            n
                        
                    
                    )
                
             will decrease from I(X) to 0.”; p.66 Section 3. The trade-off between information and homogeneity: “By choosing a partition                 
                    
                        
                            P
                        
                        
                            n
                        
                    
                
            , we settle a trade-off between the amount of information supplied by the selected classes and the degree of homogeneity of these classes …”).) …
… data values … that are adjusted based on an activation weighted entropy (Examiner’s note: As indicated earlier, Guiasu further teaches determining an optimal number of partitions                 
                    
                        
                            P
                        
                        
                            n
                        
                    
                
             by identifying a bound that balances the measured information content I(                
                    
                        
                            P
                        
                        
                            n
                        
                    
                
            ) (“weighted entropy”) with the measured degree of homogeneity H                
                    (
                    
                        
                            P
                        
                        
                            n
                        
                    
                    )
                
             within each partition. Guiasu additionally teaches identifying and selecting finer partitions than the initial partition, where each selected finer partition successively replaces the earlier identified partition, with its weighted entropy increasing from 0, at the same time the degree of homogeneity decreases towards 0. Hence, when combined with the teachings in the Hwang reference, this process of selecting finer partitions than the initial partition in order to reach a bound that balances the weighted entropy with the degree of homogeneity within each partition corresponds to a process where the assigned quantization levels are adjusted based on an activation weighted entropy (Guiasu pp.64 4th paragraph-p.65 4th paragraph: “… The grouping of the initial raw data set X in the subsets of partition (2.1) is characterized by the information balance, where I(X) … measures the amount of information contained by the initial raw data … I(                
                    
                        
                            P
                        
                        
                            n
                        
                    
                
            )  … measures the amount of information contained by the class making up the partition                 
                    
                        
                            P
                        
                        
                            n
                        
                    
                
            , and H                
                    (
                    
                        
                            P
                        
                        
                            n
                        
                    
                    )
                
             … measures the degree of homogeneity of the partition                 
                    
                        
                            P
                        
                        
                            n
                        
                    
                
             … If                 
                    P
                    *
                
             is a finer partition than                 
                    P
                
              then we have I(                
                    P
                    *
                
            ) ≥ I(                
                    P
                
            ) and H                
                    (
                    P
                    *
                    )
                
             ≤ H                
                    (
                    P
                    )
                
             … Remark. Successively replacing a partition by a finer one, I(                
                    
                        
                            P
                        
                        
                            n
                        
                    
                
            )  will increase from 0 … while H                
                    (
                    
                        
                            P
                        
                        
                            n
                        
                    
                    )
                
             will decrease from I(X) to 0.”; p.66 Section 3. The trade-off between information and homogeneity: “By choosing a partition                 
                    
                        
                            P
                        
                        
                            n
                        
                    
                
            , we settle a trade-off between the amount of information supplied by the selected classes and the degree of homogeneity of these classes …”).) …
Hwang and Guiasu are analogous art since both teach methods for grouping/partitioning data elements in a data set based on data complexity.
It would have been obvious to a person having ordinary skill in the art before the effective filing date to take the step of determining entropy based on the floating point data set of Hwang and enhance it with the step of determining weighted entropy based on the floating point data set of Guiasu as a way to perform partitioning of data elements in a data set (i.e., determining quantization levels). The motivation to combine is taught in Guiasu, where the weighted entropy represents a class grouping that balances information loss (i.e., accuracy loss resulting from quantization) and class element variation (i.e., class homogeneity), where having variation and diversity in the data is advantageous in the context of neural network training improves the performance and classification result in a trained network (Guiasu p.63 Section 1. Introduction: “Grouping data is a way of coping with complexity. It is well known that when the raw data are grouped in classes a certain amount of information is lost, since no distinction is made between observations falling into the same class. The larger the class interval is, the greater is the amount of information lost. On the other hand, if too many distinct classes are used, the presentation of information is somewhat misleading because conspicuous irregularities merely reflect the accidents of sampling. In the choice of a class interval a reasonable compromise must be reached between information content and class homogeneity. The aim of the paper is to show how the weighted entropy, a generalization of Shannon's entropy from information theory, may be used to balance the amount of information and the degree of homogeneity associated to a partition of data in classes.”).
However, Hwang in view of Guiasu does not teach
… using an entropy-based logarithm data representation-based quantization method, to data values corresponding to the set of activations … 
Miyashita teaches
… using an entropy-based logarithm data representation-based quantization method, to data values corresponding to the set of activations (Examiner’s note: Under its broadest reasonable interpretation in light of Applicant’s specification paragraphs [0126]-[0128] and Figure 6, this limitation broadly recites a weighted entropy-based quantization process as being defined as a process that determines and adjusts quantization levels as log scale index values, and applies these quantized log scale index values in an activation function. Miyashita teaches quantizing neural network activations by calculating the logarithmic activations (using LogQuant) for each neural layer input x into a series of base-2 log levels of                 
                    
                        
                            x
                        
                        ~
                    
                
             (i.e.,                 
                    
                        
                            2
                        
                        
                            
                                
                                    x
                                
                                ~
                            
                        
                    
                
            ) according to the logarithmic quantization and computation described in Miyashita pp.2-3 Section 3.1, and applying these quantized base-2 log levels in a ReLU function. By Applicant’s definition established in Applicant’s specification paragraph [0126]-[0128] and Figure 6, this process is defined as an entropy-based logarithm data representation-based quantization method. Miyashita teaches as part of the quantization process a clipping function is applied to the quantized activations                 
                    
                        
                            x
                        
                        ~
                    
                
             in addition to incorporating a full scale range (FSR) offset factor, where the FSR handles the activation range variations between neural network layers, such that the application of these clipped quantized activations with FSR offset represent adjustments to the data values corresponding to the set of quantized activation levels (Miyashita p.1 Abstract: “… we propose a new data representation that enables state-of-the-art networks to be encoded to 3 bits with negligible loss in classification performance. To perform this, we take advantage of the fact that the weights and activations in a trained network naturally have non-uniform distributions. Using non-uniform, base-2 logarithmic representation to encode weights, communicate activations, and perform dot-products enables networks to 1) achieve higher classification accuracies than fixed-point at the same resolution and 2) eliminate bulky digital multipliers.”; pp.2-3 Section 3.1 Proposed Method 1.: “… The first proposed method in Figure 1(b) is to transform one operand to its log representation, convert the resulting transformation back to the linear domain, and multiply this by the other operand. This is simply                 
                    
                        
                            w
                        
                        
                            T
                        
                    
                    x
                     
                    ≃
                     
                    
                        
                            ∑
                            
                                i
                                =
                                1
                            
                            
                                n
                            
                        
                        
                            
                                
                                    w
                                
                                
                                    i
                                
                            
                             
                            ×
                             
                            
                                
                                    2
                                
                                
                                    
                                        
                                            
                                                
                                                    x
                                                
                                                ~
                                            
                                        
                                        
                                            i
                                        
                                    
                                
                            
                        
                    
                    =
                    
                        
                            ∑
                            
                                i
                                =
                                1
                            
                            
                                n
                            
                        
                        
                            B
                            i
                            t
                            s
                            h
                            i
                            f
                            t
                            (
                            
                                
                                    w
                                
                                
                                    i
                                
                            
                        
                    
                    ,
                    
                        
                            
                                
                                    x
                                
                                ~
                            
                        
                        
                            i
                        
                    
                    )
                    ,
                
             where                 
                    
                        
                            
                                
                                    x
                                
                                ~
                            
                        
                        
                            i
                        
                    
                    =
                    Q
                    u
                    a
                    n
                    t
                    i
                    z
                    e
                    
                        
                            
                                
                                    l
                                    o
                                    g
                                
                                
                                    2
                                
                            
                            
                                
                                    
                                        
                                            x
                                        
                                        
                                            i
                                        
                                    
                                
                            
                        
                    
                
            , Quantize(∙) quantizes ∙ to an integer, and Bitshift(a,b) is the function that bitshifts a value a by an integer b in fixed-point arithmetic. In floating-point, this operations is simply an addition of b with the exponent part of a … Quantizing the activations and weights in the log-domain (log2(x) and log2(w)) instead of x and w is also motivated by leveraging structure of the non-uniform distributions of x and w …”; pp.4-5, eqs. 5-7, Tables 1 and 2, and Section 4.1. Logarithmic Representation of Activations: “… we describe the logarithmic quantization layer LogQuant that performs the element-wise operation as follows:
 
    PNG
    media_image2.png
    250
    539
    media_image2.png
    Greyscale

These layers perform the logarithmic quantization and computation as detailed in Section 3.1. Tables 1 and 2 illustrate the addition of these layers to the models. The quantizer has a specified full scale range, and this range in linear scale is                 
                    
                        
                            2
                        
                        
                            F
                            S
                            R
                        
                    
                
             … This offset parameter is chosen to properly handle the variation of activation ranges from layer to layer … Note that since we assume applying quantization after reLU function, x is 0 or positive and then we use unsigned format without sign bit for activations.”).) …
Hwang in view of Guiasu and Miyashita are analogous art since both teach methods of quantizing weights and activations in neural networks.
It would have been obvious to a person having ordinary skill in the art before the effective filing date to take the entropy-based data representation-based quantization method of Hwang in view of Guiasu and enhance it with the entropy-based logarithm data representation-based quantization method of Miyashita as a way to quantize activations in neural networks. The motivation to combine is taught in Miyashita, as quantization introduces a form of compression of the data set without significant deterioration in performance of the neural network, thus allowing for these computations to be performed on lower-precision platforms such as mobile or embedded platforms (Miyashita p.1 Abstract: “Recent advances in convolutional neural networks have considered model complexity and hardware efficiency to enable deployment onto embedded systems and mobile devices. For example, it is now well-known that the arithmetic operations of deep networks can be encoded down to 8-bit fixed-point without significant deterioration in performance. However, further reduction in precision down to as low as 3-bit fixed-point results in significant losses in performance. In this paper we propose a new data representation that enables state-of-the-art networks to be encoded to 3 bits with negligible loss in classification performance.”; and p.1 Section 1. Introduction: “In order for these large networks to run in real-time applications such as for mobile or embedded platforms, it is often necessary to use low-precision arithmetic and apply compression techniques.”; and p.5 col.1 Section 4.1 Logarithmic Representation of Activations: “… Using only 3 bits to represent the activations for both logarithmic and linear quantizations, the top-5 accuracy is still very close to that of the original, unquantized model encoded at floating-point 32b. However, logarithmic representations tolerate a large dynamic range of FSRs. For example, using 4b log, we can obtain 3 order of magnitude variations in the full scale without a significant loss of top-5 accuracy.”).
Regarding Claim 2, 
Hwang in view of Guiasu, in further view of Miyashita teaches
The method of claim 1, wherein the weighted entropy is determined by applying a weighting factor based on determined sizes of the data values to a determined distribution of the data values included in the set of floating point data (Examiner’s note: Under its broadest reasonable interpretation in light of Applicant’s specification paragraph [0007], this limitation broadly recites determining a weighted entropy based on applying a weighting factor representing a value based on a determined data size to a determined data distribution. As indicated earlier, Guiasu teaches a grouping algorithm involving grouping data into class intervals based on data complexity, where additional factors such as information content and class homogeneity are further determined and applied to identify each grouping of data, where the information content and class homogeneity represent weighting factors on the information initially contained in the set of data (representing a “weighted entropy”). Guiasu further teaches groups of data elements in a data set X, and measuring the weighted entropy to identify the amount of information for each group of data within X, where the weighted entropy I(                
                    
                        
                            P
                        
                        
                            n
                        
                    
                
            ) is based on a relative frequency p(                
                    
                        
                            X
                        
                        
                            i
                        
                    
                
            ) and a weight w(                
                    
                        
                            X
                        
                        
                            i
                        
                    
                
            ), where this weight w(                
                    
                        
                            X
                        
                        
                            i
                        
                    
                
            ) is expressed as a ratio based on a determined data size (i.e., sum of all data elements) of a partition/class to a determined data distribution (i.e., the total number of data elements) within the class, thus corresponding to a weighting factor as recited under the broadest reasonable interpretation of Applicant’s limitation. When combined with the teachings found in the Hwang reference, this data set X represents the initial groupings of a set of floating point data including a set of floating point weight values and a set of floating point activation values, where the floating-point weights are initially grouped into a set of classes based on complexity, and hence this process of grouping weight data values based on data complexity and measuring information content I(                
                    
                        
                            P
                        
                        
                            n
                        
                    
                
            ) for each group of data based on a weight w(                
                    
                        
                            X
                        
                        
                            i
                        
                    
                
            ) corresponds to a process for determining a weighted entropy based on applying a weighting factor representing a value based on a determined data size to a determined data distribution (Guiasu p.63 Section 1. Introduction; and p.64 eqs. 2.1, 2.2, 2.3, and pp.63-64 Section 2. Information balance for weighted data).).  
Regarding Claim 3, 
Hwang in view of Guiasu, in further view of Miyashita teaches
The method of claim 1, wherein the determining of the activation weighted entropy comprises: 
determining respective relative activation frequencies for each of the quantization levels by dividing a total number of activations included in each of the respective quantization levels by a total number of activations included in the set of activations (Examiner’s note: As indicated earlier, Guiasu teaches a grouping algorithm involving grouping data into class intervals based on data complexity, where additional factors such as information content and class homogeneity are further determined and applied to identify each grouping of data, where the information content and class homogeneity represent weighting factors on the information initially contained in the set of data (representing a “weighted entropy”). Guiasu further teaches groups of data elements in a data set X, and measuring the weighted entropy to identify the amount of information for each group of data within X, where the weighted entropy I(                
                    
                        
                            P
                        
                        
                            n
                        
                    
                
            ) is based on a relative frequency p(                
                    
                        
                            X
                        
                        
                            i
                        
                    
                
            ) and a weight w(                
                    
                        
                            X
                        
                        
                            i
                        
                    
                
            ), where this relative frequency p(                
                    
                        
                            X
                        
                        
                            i
                        
                    
                
            ) is expressed as a ratio based on dividing a total number of data elements in each partition/class by a total number of data elements X in all partitions/classes. When combined with the teachings found in the Hwang reference, applying this weighted entropy calculation taught in Guiasu by computing the relative frequency for each activation quantization level results in the determination of respective relative activation frequencies for each of the quantization levels, where each relative activation frequency p(                
                    
                        
                            X
                        
                        
                            i
                        
                    
                
            ) is expressed as a ratio based on dividing a total number of activations in each quantization level by a total number of activations in all quantization levels (Guiasu p.63 Section 1. Introduction; and p.64 eqs. 2.1, 2.2, 2.3, and pp.63-64 Section 2. Information balance for weighted data).); 
determining respective activation data values corresponding to each of the activation quantization levels as respective representative activation importances of each of the quantization levels (Examiner’s note: As indicated earlier, Guiasu teaches a grouping algorithm involving grouping data into class intervals based on data complexity, where additional factors such as information content and class homogeneity are further determined and applied to identify each grouping of data, where the information content and class homogeneity represent weighting factors on the information initially contained in the set of data (representing a “weighted entropy”). Guiasu further teaches groups of data elements in a data set X, and measuring the weighted entropy to identify the amount of information for each group of data within X, where the weighted entropy I(                
                    
                        
                            P
                        
                        
                            n
                        
                    
                
            ) is based on a relative frequency p(                
                    
                        
                            X
                        
                        
                            i
                        
                    
                
            ) and a weight w(                
                    
                        
                            X
                        
                        
                            i
                        
                    
                
            ), where this weight w(                
                    
                        
                            X
                        
                        
                            i
                        
                    
                
            ) is expressed as a ratio based on a determined data size (i.e., sum of all data elements) of a partition/class to a determined data distribution (i.e., the total number of data elements) within the class, where each of these non-negative weights are assigned to each data element such that these weights are directly proportional to their importance (thus representing respective representative importance values). When combined with the teachings found in the Hwang reference, applying this weighted entropy calculation taught in Guiasu by computing and assigning the weights for each activation quantization level (where each respective weight is directly proportional to the importance of the activation quantization level) results in the determination of respective representative activation importances for each of the quantization levels (Guiasu p.63 Section 1. Introduction; and p.64 eqs. 2.1, 2.2, 2.3, and pp.63-64 Section 2. Information balance for weighted data).); and 
determining the activation weighted entropy based on the respective relative activation frequencies and the respective representative activation importances (Examiner’s note: As indicated earlier, Guiasu teaches a grouping algorithm involving grouping data into class intervals based on data complexity, where additional factors such as information content and class homogeneity are further determined and applied to identify each grouping of data, where the information content and class homogeneity represent weighting factors on the information initially contained in the set of data (representing a “weighted entropy”). Guiasu further teaches groups of data elements in a data set X, and measuring the weighted entropy to identify the amount of information for each group of data within X, where the weighted entropy I(                
                    
                        
                            P
                        
                        
                            n
                        
                    
                
            ) is based on a relative frequency p(                
                    
                        
                            X
                        
                        
                            i
                        
                    
                
            ) and a weight w(                
                    
                        
                            X
                        
                        
                            i
                        
                    
                
            ). As established in the earlier recited limitations, when combined with the teachings found in the Hwang reference, applying this weight entropy calculation taught in Guiasu results in the relative frequency p(                
                    
                        
                            X
                        
                        
                            i
                        
                    
                
            ) representing the respective relative activation frequencies, with the weight w(                
                    
                        
                            X
                        
                        
                            i
                        
                    
                
            ) representing the respective representative activation importances for each quantization level (Guiasu p.63 Section 1. Introduction; and p.64 eqs. 2.1, 2.2, 2.3, and pp.63-64 Section 2. Information balance for weighted data).).
Regarding Claim 4, 
Hwang in view of Guiasu, in further view of Miyashita teaches
The method of claim 3, wherein the adjusting of the activation quantization levels comprises  adjusting the quantization levels assigned to the respective activation data values by adjusting a value corresponding to a first quantization level among the quantization levels and a size of an interval between the quantization levels in a direction of increasing the weighted entropy (Examiner’s note: Under its broadest reasonable interpretation, the term “entropy” broadly indicates a measure of information, and hence this limitation broadly recites applying the logarithm data representation-based quantization method by adjusting a value corresponding to a first quantization level among quantization levels and a size of an interval between quantization levels to maximize the amount of information. As indicated earlier, Miyashita teaches quantizing neural network activations by calculating the logarithmic activations (using LogQuant) for each neural layer input x into a series of base-2 log levels of                 
                    
                        
                            x
                        
                        ~
                    
                
             (i.e.,                 
                    
                        
                            2
                        
                        
                            
                                
                                    x
                                
                                ~
                            
                        
                    
                
            ) according to the logarithmic quantization and computation described in Miyashita pp.2-3 Section 3.1, and applying these quantized base-2 log levels in a ReLU function. As indicated earlier, Miyashita also teaches a quantizer that performs this quantization process that includes applying a clipping function to the quantized activations                 
                    
                        
                            x
                        
                        ~
                    
                
             in addition to incorporating a full scale range (FSR) offset factor. The FSR offset handles the activation range variations between neural network layers, such that the application of these clipped quantized activations with FSR offset represent adjustments to the data values corresponding to the set of quantized activation levels. As shown in Miyashita p.4 Tables 1 and 2, the usage of the FSR offset is applied at each LogQuant quantized level (based on the output activations received from each ReLU layer in the neural network), where each FSR-based offset represents the size of an interval between quantization levels, and the output activations x received from each ReLU function at each neural network layer represents a value corresponding to a quantization level among the quantization levels. Miyashita additionally teaches that the quantizer (applying the log quantization process) minimizes the quantization error by allowing many smaller activation values to be more finely represented, which in turn improves classification accuracy, where this denser distribution of these smaller activation values represents a direction of maximizing the information present in the groups of activation and weight data at each neural network layer (and hence represents a scenario of maximizing the weighted entropy for the activation and weight data at each neural network layer). Hence, this process of applying log quantization (represented by a quantizer which performs adjustments based on a FSR based offset and output activations x received from each ReLU function at each neural network layer) that in turn minimizes the quantization error by producing a denser distribution of smaller activation values corresponds to a process adjusting a value corresponding to a first quantization level among quantization levels and a size of an interval between quantization levels to maximize the amount of information (i.e., weighted entropy) to improve classification accuracy (Miyashita p.1 Abstract; pp.2-3 Section 3.1 Proposed Method 1; pp.4-5, eqs. 5-7 and Section 4.1. Logarithmic Representation of Activations: “… Tables 1 and 2 illustrate the addition of these layers to the models. The quantizer has a specified full scale range, and this range in linear scale is                 
                    
                        
                            2
                        
                        
                            F
                            S
                            R
                        
                    
                
            , where we express this as simply FSR … The FSR values for each layer are shown in Tables 1 and 2; they show fsr added by an offset parameter. This offset parameter is chosen to properly handle the variations of activation ranges from layer to layer using 100 images from the training set. The fsr is a parameter which is global to the network and is tuned to perform the experiments to measure the effect of FSR on classification accuracy.”; and p.5 Figures 2 and 3, and col.1 2nd paragraph: “Figure 2 illustrates the effect of the quantizer on activations following the conv2_2 layer used in VGG16 … the log-quantized distribution illustrates how the log-encoded activations are uniformly equalized across many output bins … Many smaller activation values are more finely represented by log quantization … The total quantization error                 
                    
                        
                            1
                        
                        
                            N
                        
                    
                    
                        
                            ∥
                            Q
                            u
                            a
                            n
                            t
                            i
                            z
                            e
                            
                                
                                    x
                                
                            
                            -
                            x
                            ∥
                        
                        
                            1
                        
                    
                
            , where Quantize(∙) is LogQuant(∙) … x is the vectorized activations of size N, is less for the log-quantized case. This result is illustrated in Figure 3”).).  
Regarding Claim 5, 
Hwang in view of Guiasu, in further view of Miyashita teaches
The method of claim 3, wherein the adjusting of the activation quantization levels comprises adjusting a log base, which is controlling of the quantization levels, in a direction that maximizes the weighted entropy (Examiner’s note: Under its broadest reasonable interpretation, the term “entropy” broadly indicates a measure of information, and hence this limitation broadly recites applying the logarithm data representation-based quantization method by adjusting a log base to maximize the amount of information. Miyashita teaches performing the same log quantization procedure on the activations and weights for each layer of a convolutional neural network, and changing the dot product representation of each layer from base-2 to base-√2, where the dot product representation includes the sum of corresponding quantized weight and activation signals. Miyashita additionally teaches that in general, applying the log quantization process minimizes the quantization error by allowing many smaller activation values to be more finely represented, where this denser distribution of these smaller activation values represents a direction of maximizing the information present in the groups of activation and weight data at each neural network layer (and hence represents a scenario of maximizing the weighted entropy for the activation and weight data at each neural network layer) (Miyashita p.5 Figure 2 and col.1 2nd paragraph). Miyashita further teaches that the classification accuracy of a convolutional neural network is affected by the quantization error, and that by moving from base-2 to a finer granularity such as base-√2 the quantization errors are also reduced, improving the classification accuracy (Miyashita p.3 col.2 and Section 3.3. Accumulation in log domain 1st paragraph: “… let                 
                    
                        
                            s
                        
                        
                            n
                        
                    
                    =
                    
                        
                            w
                        
                        
                            1
                        
                    
                    
                        
                            x
                        
                        
                            1
                        
                    
                    +
                    …
                    +
                    
                        
                            w
                        
                        
                            n
                        
                    
                    
                        
                            x
                        
                        
                            n
                        
                    
                
            ,                 
                    
                        
                            
                                
                                    s
                                
                                ~
                            
                        
                        
                            n
                        
                    
                    =
                    
                        
                            l
                            o
                            g
                        
                        
                            2
                        
                    
                    
                        
                            
                                
                                    s
                                
                                
                                    n
                                
                            
                        
                    
                
            , and                 
                    
                        
                            
                                
                                    p
                                
                                ~
                            
                        
                        
                            i
                        
                    
                    =
                    
                        
                            
                                
                                    w
                                
                                ~
                            
                        
                        
                            i
                        
                    
                    +
                    
                        
                            
                                
                                    x
                                
                                ~
                            
                        
                        
                            i
                        
                    
                
             … for n in general,                 
                    
                        
                            
                                
                                    s
                                
                                ~
                            
                        
                        
                            n
                        
                    
                    ≃
                    
                        
                            max
                        
                        ⁡
                        
                            
                                
                                    
                                        
                                            
                                                
                                                    p
                                                
                                                ~
                                            
                                        
                                        
                                            n
                                            -
                                            1
                                        
                                    
                                    ,
                                    
                                        
                                            
                                                
                                                    p
                                                
                                                ~
                                            
                                        
                                        
                                            n
                                        
                                    
                                
                            
                        
                    
                    +
                    B
                    i
                    t
                    s
                    h
                    i
                    f
                    t
                    (
                    1
                    ,
                    -
                    
                        
                            ⌊
                            
                                
                                    
                                        
                                            p
                                        
                                        ~
                                    
                                
                                
                                    n
                                    -
                                    1
                                
                            
                            ⌋
                            -
                            
                                
                                    
                                        
                                            p
                                        
                                        ~
                                    
                                
                                
                                    n
                                
                            
                        
                    
                
             (4) …”; and p.6 Section 4.3 Logarithmic Representation of Weights of Convolutional Layers: “We now represent the convolutional layers using the same procedure. We keep the representation of activations at 4b log and the representation of weights of FC layers at 4b log … We also perform the dot products using two different bases: 2, √2. Note that there is no additional overhead for log base-√2 as it is computed with the same equation shown in Equation 4.”). When combined with the earlier teachings of Miyashita, applying log quantization and changing the log base not only reduces the quantization error but also results in denser distribution of these smaller activation values, and hence this process of adjusting the log base for a log quantization procedure applied on a convolutional neural network also maximizes the amount of information present in the activations and weights in the neural network (i.e., weighted entropy) to improve classification accuracy (Miyashita p.7 Figure 6 and pp.6-7 Section 4.3 Logarithmic Representation of Weights of Convolutional Layers: “… Table 5 shows the classification results. The results illustrate an approximate 6% drop in performance from floating point down to 5b base-2 but a relatively minor 1:7% drop for 5b base-√2. … The distributions of quantization errors for both 5b base-2 and 5b base-√2 are shown in Figure 6. The total quantization error on the weights,                 
                    
                        
                            1
                        
                        
                            N
                        
                    
                    
                        
                            ∥
                            Q
                            u
                            a
                            n
                            t
                            i
                            z
                            e
                            
                                
                                    x
                                
                            
                            -
                            x
                            ∥
                        
                        
                            1
                        
                    
                
            , where, x is the vectorized weights of size N, is 2x smaller for base- √2 than for base-2.”).).  
Regarding Claim 6, 
Hwang in view of Guiasu, in further view of Miyashita teaches
The method of claim 1, wherein, the obtaining of the set of floating point data, the determining of the weighted entropy, the adjusting of the quantization levels, and the quantizing of the data values included in the set of floating point data are performed with respect to each of a plurality of layers included in the neural network (Examiner’s note: Under its broadest reasonable interpretation, this limitation broadly recites performing the corresponding limitations recited in independent Claim 1 on each of a plurality of layers in a neural network. As indicated earlier, Hwang teaches quantizing a feedforward deep neural network with multiple hidden layers using 3-bit fixed-point ternary weights, where the floating-point values for the extracted weights and activation signals at each layer are being quantized at each neural network layer. As indicated earlier, Hwang teaches determining an initial grouping for weights and activation signals based on their complexity, range, and quantization sensitivity, where the complexity, range, and quantization sensitivity are interpreted as measurements of information content (“entropy”) present in the data. As indicated earlier, Hwang teaches performing a modified backpropagation algorithm that accumulates the weight changes learned over each quantized step adjustment, where these weight changes are used to quantize the set of weights and activation signals at each network layer (Hwang p.1 Section I. Introduction; p.2 Section II. Direct Quantization with Exhaustive Search; and pp.2-3 Section III. Retrain with Error Backpropagation on Quantized Domain). A person having ordinary skill in the art would understand that the quantization process for a neural network that involves performing a modified backpropagation algorithm to accumulate and learn the weight changes over each quantized step adjustment also involves obtaining a set of floating point data, adjusting the quantization levels, and quantizing each of the activation signals and weights associated with each neural network layer. Furthermore, as indicated in the corresponding claim limitations of recited claim 1, the combination of the teachings from Guiasu and Miyashita with Hwang further teach the additional specific features recited within each respective limitation, and hence this limitation is also rejected under similar rationale and motivations found for each corresponding limitation recited in Claim 1.) …
… with respective adjusted quantization levels being optimized and assigned for each of the plurality of layers (Examiner’s note: As indicated earlier, Hwang teaches identifying and determining an initial grouping for weights and activation signals based on their complexity, range, and quantization sensitivity, where the complexity, range, and quantization sensitivity are interpreted as measurements of information content (“entropy”) present in the data. Hwang further teaches performing a greedy algorithm to determine the optimum quantization step size by performing and testing iterative adjustments around the initial step size to minimize the L2-based output error of the network, where for the case of classification, minimizing the output error of the network involves minimizing a mean cross-entropy through application of a modified backpropagation algorithm that accumulates the weight changes learned over each quantized step adjustment. This process of performing iterative adjustments around the initial step size to determine the optimum quantization step size that minimizes the L2-based output error of the network through a backpropagation algorithm corresponds to a process for adjusting quantization levels assigned to the data values based on an entropy. This process is continued for each layer in the network. From the earlier teachings of Hwang, the activation signal vector extracted from each layer is based on the weight matrix of the current layer, and hence this process of quantizing the weights at each layer through a modified backpropagation algorithm also quantizes the corresponding activation signal vector produced at each neural network layer. Hence this process that assigns a set of quantization levels for a set of weights based on an optimum quantization step size, and further backpropagates these learned and adjusted weight quantization levels to further quantize the corresponding activation signal vectors at each neural network layer also corresponds to a process that uses these optimum quantization step sizes to further adjust the quantization levels that are assigned for each neural network level (Hwang p.1 Section I. Introduction; p.2 Section II. Direct Quantization with Exhaustive Search; and pp.2-3 Section III. Retrain with Error Backpropagation on Quantized Domain).).  
Regarding Claim 7, 
Hwang in view of Guiasu, in further view of Miyashita teaches
The method of claim 1, wherein the implementing of the neural network comprises training the neural network based on the quantized data values (Examiner’s note: As indicated earlier, Hwang teaches performing a modified backpropagation algorithm that accumulates the weight changes learned over each quantized step adjustment, where these weight changes are used to quantize the set of weights and activation signals at each network layer. Hwang further teaches that these weight changes learned over the updates are used to quantize the stored high-precision weights into the low-precision weights. From the earlier teachings of Hwang, the activation signal vector extracted from each layer is based on the weight matrix of the current layer, and hence this process of quantizing the weights at each layer through a modified backpropagation algorithm also quantizes the corresponding activation signal vector produced at each neural network layer. As indicated earlier, Hwang additionally teaches applying the quantization strategy that includes this backpropagation algorithm to two neural networks that perform handwritten digit recognition and phoneme recognition, respectively, and evaluating the respective performance for each neural network, where the respective performance evaluation (e.g., classification rate for handwritten digit recognition; frame-level phone error rate for phoneme recognition) as well as the associated quantization findings of the correspond to respective results of the implementation. Hence this process of performing the recited quantization strategy that includes applying modified backpropagation process on two neural network examples taught in Hwang corresponds to an implementation that includes performing a neural network training process based on the quantized weights and activation values (Hwang p.1 Section I. Introduction; p.2 Section II. Direct Quantization with Exhaustive Search; pp.2-3 Section III. Retrain with Error Backpropagation on Quantized Domain; p.3 Section IV. Evaluation; pp.3-4 Section IV.A. Handwritten Digit Recognition; and pp.4-5 Section IV.B. Phoneme Recognition).).  
Regarding Claim 8, 
Hwang in view of Guiasu, in further view of Miyashita teaches
The method of claim 1, 
wherein the set of floating point data includes a set of weights (Examiner’s note: As indicated earlier, Hwang teaches quantizing a feedforward deep neural network with multiple hidden layers using 3-bit fixed-point ternary weights, where the floating-point values for the extracted weights and activation signals at each layer are being quantized. Hwang further teaches extracting from each layer k a signal vector yk and weight matrix Wk to determine the corresponding next layer signal vector and weight matrix, where the signal vector of the current layer is based on applying the bias, weight matrix of the current layer, and the activation function. Hence, the identification of these high-precision floating-point signal vectors and weight matrices at each layer of a floating-point based neural network corresponds to a process of obtaining a set of floating-point weight data from each layer in a neural network (Hwang p.1 Figure 1 and Abstract; p.1 Section I. Introduction; p.2 Section II. Direct Quantization with Exhaustive Search; p.3 col.1 1st-3rd paragraphs (Section II); p.4 2nd paragraph (Section IV.A.); and p.5 Section V. Concluding Remarks).) …
… weight quantization levels assigned, using an entropy-based … quantization method, to data values corresponding to the set of weights (Examiner’s note: Under its broadest reasonable interpretation, this limitation broadly recites assigning and adjusting a set of activations based on an entropy-based quantization method. As indicated earlier, Hwang teaches identifying and determining an initial grouping for weights and activation signals based on their complexity, range, and quantization sensitivity, where the complexity, range, and quantization sensitivity are interpreted as measurements of information content (“entropy”) present in the data. Hwang further teaches performing a greedy algorithm to determine the optimum quantization step size by performing and testing iterative adjustments around the initial step size to minimize the L2-based output error of the network, where for the case of classification, minimizing the output error of the network involves minimizing a mean cross-entropy through application of a modified backpropagation algorithm that accumulates the weight changes learned over each quantized step adjustment. This process of performing iterative adjustments around the initial step size to determine the optimum quantization step size that minimizes the L2-based output error of the network through a backpropagation algorithm corresponds to a process for adjusting quantization levels assigned to the data values based on an entropy. This process is continued for each layer in the network. Hence this process that assigns and adjusts a set of quantization levels for a set of weights based on an optimum quantization step size that minimizes the mean cross-entropy output error of the network also corresponds to a process that assigns and adjusts a set of weights based on an entropy-based quantization method (Hwang p.1 Section I. Introduction; p.2 Section II. Direct Quantization with Exhaustive Search; and pp.2-3 Section III. Retrain with Error Backpropagation on Quantized Domain).) …
… using an entropy-based clustering-based quantization method (Examiner’s note: As indicated earlier, Guiasu teaches a grouping algorithm involving grouping data into class intervals based on data complexity, where additional factors such as information content and class homogeneity are further determined and applied to identify each grouping of data, where the information content and class homogeneity represent weighting factors on the information initially contained in the set of data (representing a “weighted entropy”). Guiasu further teaches groups of data elements in a data set X, and measuring the weighted entropy to identify the amount of information for each group of data within X, where the weighted entropy I(                
                    
                        
                            P
                        
                        
                            n
                        
                    
                
            ) is based on a relative frequency p(                
                    
                        
                            X
                        
                        
                            i
                        
                    
                
            ) (expressed as a ratio of the number of data elements per partition/class and the total number of data elements X) and a weight w(                
                    
                        
                            X
                        
                        
                            i
                        
                    
                
            ) (expressed as a ratio of the sum of all element values per partition/class and the total number of data elements per partition/class). As indicated earlier, Guiasu further teaches determining an optimal number of partitions                 
                    
                        
                            P
                        
                        
                            n
                        
                    
                
             by identifying a bound that balances the measured information content I(                
                    
                        
                            P
                        
                        
                            n
                        
                    
                
            ) (“weighted entropy”) with the measured degree of homogeneity H                
                    (
                    
                        
                            P
                        
                        
                            n
                        
                    
                    )
                
             within each partition. As indicated earlier, Guiasu additionally teaches identifying and selecting finer partitions than the initial partition, where each selected finer partition successively replaces the earlier identified partition, with its weighted entropy increasing from 0, at the same time the degree of homogeneity decreases towards 0. When combined with the teachings found in the Hwang reference, this data set X represents the initial groupings of a set of floating point data including a subset of floating point weight values, where these subset of floating-point weights are initially grouped into a set of classes based on complexity, and further adjusted to reach a bound that balances the weighted entropy with the degree of homogeneity within each partition. Hence, this process of grouping weight data values based on data complexity and measuring information content I(                
                    
                        
                            P
                        
                        
                            n
                        
                    
                
            ) for each group of weight data values through calculation of a weighted entropy metric for further determination and adjustment of the groups of weight data values corresponds to a process that uses an entropy based clustering based quantization method to further determine the groupings of floating-point weight data values (Guiasu p.63 Section 1. Introduction; and p.64 eqs. 2.1, 2.2, 2.3, and pp.63-64 Section 2. Information balance for weighted data; and p.64 4th paragraph-p.65 4th paragraph).) …
… data values … that are adjusted based on a weight weighted entropy (Examiner’s note: As indicated earlier, Guiasu further teaches determining an optimal number of partitions                 
                    
                        
                            P
                        
                        
                            n
                        
                    
                
             by identifying a bound that balances the measured information content I(                
                    
                        
                            P
                        
                        
                            n
                        
                    
                
            ) (“weighted entropy”) with the measured degree of homogeneity H                
                    (
                    
                        
                            P
                        
                        
                            n
                        
                    
                    )
                
             within each partition. Guiasu additionally teaches identifying and selecting finer partitions than the initial partition, where each selected finer partition successively replaces the earlier identified partition, with its weighted entropy increasing from 0, at the same time the degree of homogeneity decreases towards 0. Hence, when combined with the teachings in the Hwang reference, this process of selecting finer partitions than the initial partition in order to reach a bound that balances the weighted entropy with the degree of homogeneity within each partition corresponds to a process where the assigned quantization levels are adjusted based on an activation weighted entropy (Guiasu pp.64 4th paragraph-p.65 4th paragraph: “… The grouping of the initial raw data set X in the subsets of partition (2.1) is characterized by the information balance, where I(X) … measures the amount of information contained by the initial raw data … I(                
                    
                        
                            P
                        
                        
                            n
                        
                    
                
            )  … measures the amount of information contained by the class making up the partition                 
                    
                        
                            P
                        
                        
                            n
                        
                    
                
            , and H                
                    (
                    
                        
                            P
                        
                        
                            n
                        
                    
                    )
                
             … measures the degree of homogeneity of the partition                 
                    
                        
                            P
                        
                        
                            n
                        
                    
                
             … If                 
                    P
                    *
                
             is a finer partition than                 
                    P
                
              then we have I(                
                    P
                    *
                
            ) ≥ I(                
                    P
                
            ) and H                
                    (
                    P
                    *
                    )
                
             ≤ H                
                    (
                    P
                    )
                
             … Remark. Successively replacing a partition by a finer one, I(                
                    
                        
                            P
                        
                        
                            n
                        
                    
                
            )  will increase from 0 … while H                
                    (
                    
                        
                            P
                        
                        
                            n
                        
                    
                    )
                
             will decrease from I(X) to 0.”; p.66 Section 3. The trade-off between information and homogeneity: “By choosing a partition                 
                    
                        
                            P
                        
                        
                            n
                        
                    
                
            , we settle a trade-off between the amount of information supplied by the selected classes and the degree of homogeneity of these classes …”).) …
… the data values corresponding to the set of weights are quantized in accordance with a respective size (Examiner’s note: As established earlier, this limitation exhibits a 112(a) lack of written description issue, and hence for purposes of examination, this limitation will be interpreted as broadly reciting that the data values corresponding to a set of weights are quantized in accordance to a respective size. As indicated earlier, Guiasu teaches a grouping algorithm involving grouping data into class intervals based on data complexity, where additional factors such as information content and class homogeneity are further determined and applied to identify each grouping of data, where the information content and class homogeneity represent weighting factors on the information initially contained in the set of data (representing a “weighted entropy”). Guiasu further teaches groups of data elements in a data set X, and measuring the weighted entropy to identify the amount of information for each group of data within X, where the weighted entropy I(                
                    
                        
                            P
                        
                        
                            n
                        
                    
                
            ) is based on a relative frequency p(                
                    
                        
                            X
                        
                        
                            i
                        
                    
                
            ) (expressed as a ratio of the number of data elements per partition/class and the total number of data elements X) and a weight w(                
                    
                        
                            X
                        
                        
                            i
                        
                    
                
            ) (expressed as a ratio of the sum of all element values per partition/class and the total number of data elements per partition/class), where the weight w(                
                    
                        
                            X
                        
                        
                            i
                        
                    
                
            ) of a group of data that is based on a sum of all element values and the total number of data elements is interpreted as providing an indication of a respective size for each group of data. As indicated earlier, Guiasu further teaches determining an optimal number of partitions                 
                    
                        
                            P
                        
                        
                            n
                        
                    
                
             by identifying a bound that balances the measured information content I(                
                    
                        
                            P
                        
                        
                            n
                        
                    
                
            ) (“weighted entropy”) with the measured degree of homogeneity H                
                    (
                    
                        
                            P
                        
                        
                            n
                        
                    
                    )
                
             within each partition. As indicated earlier, Guiasu additionally teaches identifying and selecting finer partitions than the initial partition, where each selected finer partition successively replaces the earlier identified partition, with its weighted entropy increasing from 0, at the same time the degree of homogeneity decreases towards 0. Hence, this process of measuring information content I(                
                    
                        
                            P
                        
                        
                            n
                        
                    
                
            ) for each group of weight data values through calculation of a weighted entropy metric that uses a respective size for each group of weight data values, such that the weighted entropy further determines the adjustment of the groups of weight data values, corresponds to a process that quantizes the weight data values included in the set of floating point data in accordance to a respective size (Guiasu p.63 Section 1. Introduction; and p.64 eqs. 2.1, 2.2, 2.3, and pp.63-64 Section 2. Information balance for weighted data; and p.64 4th paragraph-p.65 4th paragraph).).
Regarding Claim 9, 
	Claim 9 recites a non-transitory computer-readable medium storing instructions, which when executed by a processor, cause the processor to implement the claim limitations recited in claim 1, and hence is rejected under similar rationale and motivations provided by Hwang, Guiasu, and Miyashita as indicated in Claim 1. In addition, Hwang teaches implementing deep neural networks using hardware (VLSI) and software running on embedded computing systems, including quantizing processes that reduce the word-length of weights and activation signals in deep neural networks. A person having ordinary skill in the art would understand that an embedded computing system implementing a neural network using hardware interconnections and arithmetic units, and associated software to perform the training procedure would have a processor and associated memory storing executable instructions to perform the quantized processes described in Hwang (Hwang p.1 Section I. Introduction: “Implementation of deep neural networks using VLSI or embedded computing systems is needed for real-time and low-power applications …”; and p.5 Section V. Concluding Remarks: “… We have developed a training procedure to reduce the word-length of weights and that of signals in deep neural networks. … The signal word-length that affects the complexity of interconnection and arithmetic units can also be reduced to 3 bits without sacrificing the performance much … This research is useful for not only hardware based implementations but also real-time software development.”). 
Regarding Claim 10,
	Claim 10 recites a neural network apparatus comprising a processor configured to perform claim limitations that are similar in scope to the corresponding claim limitation recited in Claim 1, and hence is rejected under similar rationale and motivations provided by Hwang, Guiasu, and Miyashita as indicated in Claim 1.
Regarding Claim 11,
	Claim 11 recites the apparatus of claim 10, further comprising of claim limitations that are similar in scope to the corresponding claim limitation recited in Claim 2, and hence is rejected under similar rationale provided by Hwang, Guiasu, and Miyashita as indicated in Claim 2, in view of the rejections applied to Claim 10.
Regarding Claim 12,
	Claim 12 recites the apparatus of claim 10, further comprising of claim limitations that are similar in scope to the corresponding claim limitation recited in Claim 3, and hence is rejected under similar rationale provided by Hwang, Guiasu, and Miyashita as indicated in Claim 3, in view of the rejections applied to Claim 10.
Regarding Claim 13,
	Claim 13 recites the apparatus of claim 12, further comprising of claim limitations that are similar in scope to the corresponding claim limitation recited in Claim 4, and hence is rejected under similar rationale provided by Hwang, Guiasu, and Miyashita as indicated in Claim 4, in view of the rejections applied to Claim 12.
Regarding Claim 14,
	Claim 14 recites the apparatus of claim 12, further comprising of claim limitations that are similar in scope to the corresponding claim limitation recited in Claim 5, and hence is rejected under similar rationale provided by Hwang, Guiasu, and Miyashita as indicated in Claim 5, in view of the rejections applied to Claim 12.
Regarding Claim 15,
	Claim 15 recites the apparatus of claim 10, further comprising of claim limitations that are similar in scope to the corresponding claim limitation recited in Claim 6, and hence is rejected under similar rationale provided by Hwang, Guiasu, and Miyashita as indicated in Claim 6, in view of the rejections applied to Claim 10.
Regarding Claim 16,
	Claim 16 recites the apparatus of claim 10, further comprising of claim limitations that are similar in scope to the corresponding claim limitation recited in Claim 8, and hence is rejected under similar rationale provided by Hwang, Guiasu, and Miyashita as indicated in Claim 8, in view of the rejections applied to Claim 10.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to WILLIAM WAI YIN KWAN whose telephone number is 303-297-4332.  The examiner can normally be reached on Monday-Friday 8:00am - 4:30pm PT.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Li Zhen can be reached on 571-272-3768. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/WILLIAM WAI YIN KWAN/Examiner, Art Unit 2121   


/Li B. Zhen/Supervisory Patent Examiner, Art Unit 2121