DETAILED ACTION
This is the first office action regarding application number 15/880,690, filed January 26, 2018.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

Specification
The disclosure is objected to because of the following informality: paragraphs [0051]-[0056]: references to a “neural network 10” in Figure 7 are being used thoughout these paragraphs, but Figure 7 does not contain any element explicitly labeled with the number “10” or “neural network”. Appropriate correction is required.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:

The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 6-11 and 19-24 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly 
Regarding Claim 6, 
The term "determining respective relative frequencies for each of the grouped clusters by respectively dividing a total number of weights included in each of the grouped clusters by a total number of weights included in the set of weights" in claim 6 is indefinite because it is unclear how the term “respectively dividing” is being used to modify the phrase “a total number of weights included in each of the grouped clusters”. Per the Merriam-Webster dictionary, the term “include” means “to take in or comprise as part of a whole or group”; when coupled with the claim language in claim 6 is interpreted as ”a total number of weights taking in each of the grouped clusters”, which is essentially is a different way of stating the original “set of weights”. The specification only re-phrases the same claim language, and hence does not provide any further clarification of the scope of the term in question. For purposes of examination, this term will be interpreted as "determining respective relative frequencies for each of the grouped clusters by  total number of weights included in each of the [respective] grouped clusters by a total number of weights included in the set of weights".
Regarding Claims 7-8,
Claims 7-8 are dependent claims of Claim 6, and hence are also rejected as being indefinite under 35 U.S.C. 112(b) or pre-AIA  35 U.S.C. 112, second paragraph, by virtue of dependency.
Regarding Claim 9, 
The term "determining respective relative frequencies for each of the quantization levels by respectively dividing a total number of activations included in each of the quantization levels by a total number of activations included in the set of activations" in claim 9 is indefinite because it is unclear how the term “respectively dividing” is being used to modify the phrase “a total number of activations included in each of the quantization levels”. Per the Merriam-Webster dictionary, the term “include” means “to take in or comprise as part of a whole or group”; when coupled with the claim language in claim 9 is interpreted to mean the ”total number of activations taking in each of the quantization levels”, which is essentially is a different way of stating the original “set of activations”. The specification only re-phrases the same claim language, and hence does not provide any further clarification of the scope of the term in question. For purposes of examination, this term will be interpreted as "determining respective relative frequencies for each of the quantization levels by  total number of activations included in each of the [respective] quantization levels by a total number of activations included in the set of activations".
Regarding Claims 10-11,
Claims 10-11 are dependent claims of Claim 9, and hence are also rejected as being indefinite under 35 U.S.C. 112(b) or pre-AIA  35 U.S.C. 112, second paragraph, by virtue of dependency.
Regarding Claim 19, 
The term "determine respective relative frequencies for each of the grouped clusters by respectively dividing a total number of weights included in each of the grouped clusters by a total number of weights included in the set of weights" in claim 19 is indefinite because it is unclear how the term “respectively dividing” is being used to modify the phrase “a total number of weights included in each of the grouped clusters”. Per the Merriam-Webster dictionary, the term “include” means “to take in or comprise as part of a whole or group”; when coupled with the claim language in claim 19 is interpreted as ”a total number of weights taking in each of the grouped clusters”, which is essentially is a different way of stating the original “set of weights”. The specification only re-phrases the same claim language, and hence does not provide any further clarification of scope of the term in question. For purposes of examination, this term will be interpreted as "determine respective relative frequencies for each of the grouped clusters by  total number of weights included in each of the [respective] grouped clusters by a total number of weights included in the set of weights".
Regarding Claims 20-21,
Claims 20-21 are dependent claims of Claim 19, and hence are also rejected as being indefinite under 35 U.S.C. 112(b) or pre-AIA  35 U.S.C. 112, second paragraph, by virtue of dependency.
Regarding Claim 22, 
The term "determine respective relative frequencies for each of the quantization levels by respectively dividing a total number of activations included in each of the quantization levels by a total number of activations included in the set of activations" in claim 22 is indefinite because it is unclear how the term “respectively dividing” is being used to modify the phrase “a total number of activations included in each of the quantization levels”.  Per the Merriam-Webster dictionary, the term “include” means “to take in or comprise as part of a whole or group”; when coupled with the claim language in claim 22 is interpreted to mean the ”total number of activations taking in each of the quantization levels”, which is essentially is a different way of stating the original “set of activations”. The specification only re-phrases the same claim language, and hence does not provide any further clarification of the scope of the term in question. For purposes of examination, this term will be interpreted as "determine respective relative frequencies for each of the quantization levels by  total number of activations included in each of the [respective] quantization levels by a total number of activations included in the set of activations".
Regarding Claims 23-24,
Claims 23-24 are dependent claims of Claim 22, and hence are also rejected as being indefinite under 35 U.S.C. 112(b) or pre-AIA  35 U.S.C. 112, second paragraph, by virtue of dependency.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


When considering subject matter eligibility under 35 U.S.C. 101, it must be determined whether the claim is directed to one of the four statutory categories of invention, i.e., process, machine, manufacture, or composition of matter (Step 1). If the claim does fall within one of the statutory categories, the second step in the analysis is to determine whether the claim is directed to a judicial exception (Step 2A). The Step 2A analysis is broken into two prongs. In the first prong (Step 2A, Prong 1), it is determined whether or not the claims recite a judicial exception (e.g., mathematical concepts, mental processes, certain methods of organizing human activity). If it is determined in Step 2A, Prong 1 that the claims recite a judicial exception, the analysis proceeds to the second prong (Step 2A, Prong 2), where it is determined whether or not the claims integrate the judicial exception into a practical application. If it is determined at step 2A, Prong 2 that the claims do not integrate the judicial exception into a practical application, the analysis proceeds to determining whether the claim is a patent-eligible application of the exception (Step 2B). If an abstract idea is present in the claim, any element or combination of elements in the claim must be sufficient to ensure that the claim integrates the judicial exception into a practical application, or else amounts to significantly more than the abstract idea itself. Applicant is advised to consult MPEP 2106 for more details of the analysis.
Claims 1, 3-12, 14-15, and 17-26 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more than the abstract idea itself, and hence is not patent-eligible subject matter. 
Regarding Claim 1, 
Step 1: The claim recites a processor-implemented neural network method, therefore it falls into one of the four statutory categories (i.e., process, machine, article of manufacture, or composition of matter).
Step 2A Prong 1: This claim recites the following abstract ideas:
determining a weighted entropy based on data values included in the set of floating point data (Under its broadest reasonable interpretation, this claim element recites a judicial exception, as determining a weighted entropy based on data values represents a mathematical relationship, as the weighted entropy is a form of organizing information and manipulating information through mathematical correlations. See MPEP 2106.04(a)(2)(I-A), example iv.); 
adjusting quantization levels assigned to the data values based on the weighted entropy (Under its broadest reasonable interpretation, this claim element recites a judicial exception, as adjusting quantization levels assigned to the data values based on the weighted entropy represents a mental step (observations, judgments, evaluations, opinions) that is implementable in the human mind, with aid of pen and paper. See MPEP 2106.04(a)(2)(III).); and 
quantizing the data values included in the set of floating point data in accordance with the adjusted quantization levels (Under its broadest reasonable interpretation, this claim element recites a judicial exception, as quantizing data values in accordance with the adjusted quantization levels represents a mathematical relationship, where the act of quantizing is considered a form of organizing information and manipulating information through mathematical correlations. See MPEP 2106.04(a)(2)(I-A), example iv.).  
Step 2A Prong 2: This claim further recites:
obtaining a set of floating point data processed in a layer included in a neural network (This claim element is directed to gathering data, which is an insignificant extra-solution activity for use in a claimed process. See MPEP 2106.05(g). This additional element does not add a meaningful limitation to the claim, and hence does not integrate the judicial exception into a practical application.); …
Step 2B: This claim further recites:
obtaining a set of floating point data processed in a layer included in a neural network (This claim element is directed to storing and retrieving information in memory, which is a well-known, understood, routine, conventional activity, and hence does not add significantly more than the judicial exception, alone or in combination with other elements in the claim. See MPEP 2106.05(d)(II), list 1, example iv.); …
Regarding Claim 3, 
Step 1: The claim recites the method of claim 1, therefore it falls into one of the four statutory categories (i.e., process, machine, article of manufacture, or composition of matter).
Step 2A Prong 1: This claim is a dependent claim of Claim 1, and hence inherits the same abstract ideas mentioned above. This claim further recites the following abstract idea:
wherein the weighted entropy is determined by applying a weighting factor based on determined sizes of the data values to a determined distribution of the data values included in the set of floating point data (Under its broadest reasonable interpretation, this claim element recites a judicial exception, as applying a weighting factor represents a mathematical relationship, where the weighting factor being based on determined data value sizes and determined data distribution is considered a form of organizing information and manipulating information through mathematical correlations. See MPEP 2106.04(a)(2)(I-A), example iv.).  
Step 2A Prong 2: This claim does not recite any additional elements to be further analyzed at this step.
Step 2B: This claim does not recite any additional elements to be further analyzed at this step.
Regarding Claim 4, 
Step 1: The claim recites the method of claim 4, therefore it falls into one of the four statutory categories (i.e., process, machine, article of manufacture, or composition of matter).
Step 2A Prong 1: This claim is a dependent claim of Claim 1, and hence inherits the same abstract ideas mentioned above.
Step 2A Prong 2: This claim further recites:
wherein the set of floating point data are a set of activations processed in the layer (This claim element places an additional limitation on the type of floating point data (set of activations), as well as generally linking the method to a technological environment. Type definitions and a general association to a technological environment do not further integrate the judicial exception into a practical application. See MPEP 2106.05(h).).  
Step 2B: This claim further recites:
wherein the set of floating point data are a set of activations processed in the layer (As analyzed in Step 2A Prong 2, type definitions and a general association to a technological environment do not further integrate the judicial exception into a practical application. See MPEP 2106.05(h). Hence this claim element does not add significantly more than the judicial exception, alone or in combination with other elements in the claim.).  
Regarding Claim 5, 
Step 1: The claim recites the method of claim 1, therefore it falls into one of the four statutory categories (i.e., process, machine, article of manufacture, or composition of matter).
Step 2A Prong 1: This claim is a dependent claim of Claim 1, and hence inherits the same abstract ideas mentioned above.
Step 2A Prong 2: This claim further recites:
wherein the set of floating point data are a set of weights processed in the layer (This claim element places an additional limitation on the type of floating point data (set of weights), as well as generally linking the method to a technological environment. Type definitions and a general association to a technological environment do not further integrate the judicial exception into a practical application. See MPEP 2106.05(h).).  
Step 2B: This claim further recites:
wherein the set of floating point data are a set of weights processed in the layer (As analyzed in Step 2A Prong 2, type definitions and a general association to a technological environment do not further integrate the judicial exception into a practical application. See MPEP 2106.05(h). Hence this claim element does not add significantly more than the judicial exception, alone or in combination with other elements in the claim.).  
Regarding Claim 6, 
Step 1: The claim recites the method of claim 1, therefore it falls into one of the four statutory categories (i.e., process, machine, article of manufacture, or composition of matter).
Step 2A Prong 1: This claim is a dependent claim of Claim 1, and hence inherits the same abstract ideas mentioned above. This claim further recites the following abstract ideas:
the determining of the weighted entropy comprises:
grouping the set of weights into a plurality of clusters (Under its broadest reasonable interpretation, this claim element recites a judicial exception, as grouping the set of weights into a plurality of clusters represents a mental step (observations, judgments, evaluations, opinions) that is implementable in the human mind, with aid of pen and paper. See MPEP 2106.04(a)(2)(III).);
determining respective relative frequencies for each of the grouped clusters by  total number of weights included in each of the [respective] grouped clusters by a total number of weights included in the set of weights (Under its broadest reasonable interpretation, this claim element recites a judicial exception, as determining respective relative frequencies for each of the grouped clusters by dividing a total number of weights in each of the grouped clusters by a total number of weights represents a mathematical calculation. See MPEP 2106.04(a)(2)(I-C).); 
determining respective representative importances of each of the grouped clusters based on sizes of weights included in each of the grouped clusters (Under its broadest reasonable interpretation, this claim element recites a judicial exception, as determining ; and 
determining the weighted entropy based on the respective relative frequencies and the respective representative importances (Under its broadest reasonable interpretation, this claim element recites a judicial exception, as determining the weighted entropy represents a mathematical relationship, where expressing the weighted entropy based on the respective relative frequencies and respective representative importances is considered a form of organizing information and manipulating information through mathematical correlations. See MPEP 2106.04(a)(2)(I-A), example iv.).   
Step 2A Prong 2: This claim further recites:
the set of floating point data is a set of weights (This claim element places an additional limitation on the type of floating point data (set of weights), as well as generally linking the method to a technological environment. Type definitions and a general association to a technological environment do not further integrate the judicial exception into a practical application. See MPEP 2106.05(h).), …
Step 2B: This claim further recites:
the set of floating point data is a set of weights (As analyzed in Step 2A Prong 2, type definitions and a general association to a technological environment do not further integrate the judicial exception into a practical application. See MPEP 2106.05(h). Hence this claim element does not add significantly more than the judicial exception, alone or in combination with other elements in the claim.), …  
Regarding Claim 7, 
Step 1: The claim recites the method of claim 6, therefore it falls into one of the four statutory categories (i.e., process, machine, article of manufacture, or composition of matter).
Step 2A Prong 1: This claim is a dependent claim of Claim 6, and hence inherits the same abstract ideas mentioned above. This claim further recites the following abstract idea:
quantizing the weights included in each of the grouped clusters respectively into the corresponding representative weight for each of the grouped clusters (Under its broadest reasonable interpretation, this claim element recites a judicial exception, as quantizing the weights into the corresponding representative weight for each cluster represents a mathematical relationship, where the act of quantizing is considered a form of organizing information and manipulating information through mathematical correlations. See MPEP 2106.04(a)(2)(I-A), example iv.)).  
Step 2A Prong 2: This claim further recites:
determining respective weights corresponding to the respective representative importances of each of the grouped clusters as a corresponding representative weight for each of the grouped clusters (This claim element identifies the respective weights corresponding to the respective representative importances of each of the group clusters as a corresponding representative weight, as well as generally linking the method to a technological environment. Type definitions and a general association to a technological environment do not further integrate the judicial exception into a practical application. See MPEP 2106.05(h).); … 
Step 2B: This claim further recites:
determining respective weights corresponding to the respective representative importances of each of the grouped clusters as a corresponding representative weight for each of the grouped clusters (As analyzed in Step 2A Prong 2, type definitions and a general association to a technological environment do not further integrate the judicial exception into a practical application. See MPEP 2106.05(h). Hence this claim element does not add significantly more than the judicial exception, alone or in combination with other elements in the claim.); … 
Regarding Claim 8, 
Step 1: The claim recites the method of claim 6, therefore it falls into one of the four statutory categories (i.e., process, machine, article of manufacture, or composition of matter).
Step 2A Prong 1: This claim is a dependent claim of Claim 6, and hence inherits the same abstract ideas mentioned above. This claim further recites the following abstract idea:
adjusting the quantization levels assigned to the data values by adjusting boundaries of each of the clusters in a direction that increases the weighted entropy (Under its broadest reasonable interpretation, this claim element recites a judicial exception, as adjusting quantization levels represents a mathematical relationship, where adjusting boundaries of each cluster to increase the value of the weighted entropy is considered a form of organizing information and manipulating information through mathematical correlations. See MPEP 2106.04(a)(2)(I-A), example iv.).
Step 2A Prong 2: This claim does not recite any additional elements to be further analyzed at this step.
Step 2B: This claim does not recite any additional elements to be further analyzed at this step.
Regarding Claim 9, 
Step 1: The claim recites the method of claim 1, therefore it falls into one of the four statutory categories (i.e., process, machine, article of manufacture, or composition of matter).
Step 2A Prong 1: This claim is a dependent claim of claim 1, and hence inherits the same abstract ideas mentioned above. This claim further recites the following abstract ideas:
determining respective relative frequencies for each of the quantization levels by  total number of activations included in each of the [respective] quantization levels by a total number of activations included in the set of activations (Under its broadest reasonable interpretation, this claim element recites a judicial exception, as determining respective relative frequencies for each of the quantization levels by dividing a total ; …
determining the weighted entropy based on the respective relative frequencies and the respective representative importances (Under its broadest reasonable interpretation, this claim element recites a judicial exception, as determining the weighted entropy represents a mathematical relationship, where expressing the weighted entropy based on the respective relative frequencies and respective representative importances is considered a form of organizing information and manipulating information through mathematical correlations. See MPEP 2106.04(a)(2)(I-A), example iv.).  
Step 2A Prong 2: This claim further recites:
the set of floating point data is a set of activations (This claim element places an additional limitation on the type of floating point data (by identifying it as a set of activations), as well as generally linking the method to a technological environment. Type definitions and a general association to a technological environment do not further integrate the judicial exception into a practical application. See MPEP 2106.05(h).), and 
the quantization levels are assigned using an entropy-based logarithm data representation-based quantization method (This claim element is considered a form of applying mere instructions on a generic computer to implement a judicial exception. See MPEP 2106.05(f). This additional element does not add a meaningful limitation to the claim, and hence does not integrate the judicial exception into a practical application.), …
determining respective data values corresponding to each of the quantization levels as respective representative importances of each of the quantization levels (This claim element identifies the respective data values corresponding to each of the quantization levels as respective representative importances of each of the quantization levels, as well as generally linking the method to a technological environment. Type definitions and a general association to ; …
Step 2B: This claim further recites:
the set of floating point data is a set of activations (As analyzed in Step 2A Prong 2, type definitions and a general association to a technological environment do not further integrate the judicial exception into a practical application. See MPEP 2106.05(h). Hence this claim element does not add significantly more than the judicial exception, alone or in combination with other elements in the claim.), and 
the quantization levels are assigned using an entropy-based logarithm data representation-based quantization method (As analyzed in Step 2A Prong 2, applying mere instructions on a generic computer to implement a judicial exception does not integrate the judicial exception into a practical application. See MPEP 2106.05(f). Hence this claim element does not add significantly more than the judicial exception, alone or in combination with other elements in the claim.), …
determining respective data values corresponding to each of the quantization levels as respective representative importances of each of the quantization levels (As analyzed in Step 2A Prong 2, type definitions and a general association to a technological environment do not further integrate the judicial exception into a practical application. See MPEP 2106.05(h). Hence this claim element does not add significantly more than the judicial exception, alone or in combination with other elements in the claim.); …
Regarding Claim 10, 
Step 1: The claim recites the method of claim 9, therefore it falls into one of the four statutory categories (i.e., process, machine, article of manufacture, or composition of matter).
Step 2A Prong 1: This claim is a dependent claim of Claim 9, and hence inherits the same abstract ideas mentioned above. This claim further recites the following abstract idea:
adjusting the quantization levels assigned to the respective data values by adjusting a value corresponding to a first quantization level among the quantization levels and a size of an interval between the quantization levels in a direction of increasing the weighted entropy (Under its broadest reasonable interpretation, this claim element recites a judicial exception, as adjusting quantization levels represents a mathematical relationship, where adjusting a value corresponding to a first quantization level and a size of an interval between the quantization level to increase the value of the weighted entropy is considered a form of organizing information and manipulating information through mathematical correlations. See MPEP 2106.04(a)(2)(I-A), example iv.).
Step 2A Prong 2: This claim does not recite any additional elements to be further analyzed at this step.
Step 2B: This claim does not recite any additional elements to be further analyzed at this step.
Regarding Claim 11, 
Step 1: The claim recites the method of claim 9, therefore it falls into one of the four statutory categories (i.e., process, machine, article of manufacture, or composition of matter).
Step 2A Prong 1: This claim is a dependent claim of claim 9, and hence inherits the same abstract ideas mentioned above. This claim further recites the following abstract ideas:
adjusting a log base, which is controlling of the quantization levels, in a direction that maximizes the weighted entropy (Under its broadest reasonable interpretation, this claim element recites a judicial exception, as adjusting a log base that controls the quantization levels represents a mathematical relationship, where adjusting a log base to maximize the weighted entropy is considered a form of organizing information and manipulating information through mathematical correlations. See MPEP 2106.04(a)(2)(I-A), example iv.).
Step 2A Prong 2: This claim does not recite any additional elements to be further analyzed at this step.
Step 2B: This claim does not recite any additional elements to be further analyzed at this step.
Regarding Claim 12, 
Step 1: The claim recites the method of claim 1, therefore it falls into one of the four statutory categories (i.e., process, machine, article of manufacture, or composition of matter).
Step 2A Prong 1: This claim is a dependent claim of claim 1, and hence inherits the same abstract ideas mentioned above. This claim further recites the following abstract ideas:
[the] … determining … [are performed with respect to each of a plurality of layers included in the neural network] (Under its broadest reasonable interpretation, this claim element recites a judicial exception, as determining a weighted entropy based on data values for each of the plurality of layers represents a mathematical relationship, as the weighted entropy is a form of organizing information and manipulating information through mathematical correlations. See MPEP 2106.04(a)(2)(I-A), example iv.) …
[the] … adjusting … [are performed with respect to each of a plurality of layers included in the neural network] (Under its broadest reasonable interpretation, this claim element recites a judicial exception, as adjusting quantization levels assigned to the data values for each of the plurality of layers based on the weighted entropy represents a mental step (observations, judgments, evaluations, opinions) that is implementable in the human mind, with aid of pen and paper. See MPEP 2106.04(a)(2)(III).) …
[the] … 012055.0439quantizing are performed with respect to each of a plurality of layers included in the neural network (Under its broadest reasonable interpretation, this claim element recites a judicial exception, as quantizing data values in accordance with the adjusted quantization levels for each of the plurality of layers represents a mathematical relationship, where the act of quantizing is considered a form of organizing information and manipulating information through mathematical correlations. See MPEP 2106.04(a)(2)(I-A), example iv.), 
with respective adjusted quantization levels being optimized and assigned for each of the plurality of layers (Under its broadest reasonable interpretation, this claim element recites a judicial exception, as optimizing and assigning respective adjusted quantization levels for each of the plurality of layers represents a mental step (observations, judgments, evaluations, opinions) that is implementable in the human mind, with aid of pen and paper. See MPEP 2106.04(a)(2)(III).).
Step 2A Prong 2: This claim further recites:
the obtaining … [are performed with respect to each of a plurality of layers included in the neural network] (This claim element is directed to gathering data for each of the plurality of layers, which is an insignificant extra-solution activity for use in a claimed process. See MPEP 2106.05(g). This additional element does not add a meaningful limitation to the claim, and hence does not integrate the judicial exception into a practical application.).
Step 2B: This claim further recites:
the obtaining … [are performed with respect to each of a plurality of layers included in the neural network] (This claim element is directed to storing and retrieving information in memory, which is a well-known, understood, routine, conventional activity, and hence does not add significantly more than the judicial exception, alone or in combination with other elements in the claim. See MPEP 2106.05(d)(II), list 1, example iv.).
Regarding Claim 14, 
Step 1: The claim recites a computer-readable recording medium storing instructions. While the specification does recite examples of non-transitory computer-readable recording medium, it does not explicitly exclude transitory computer-readable recording medium from this category (see specification, paragraph [0198]). Under broadest reasonable interpretation, this claim element may include the category of transitory computer-readable recording medium; therefore this claim does not fall into one of the four statutory categories (i.e., process, machine, article of manufacture, or composition of matter), and hence is rejected as being ineligible subject matter.
Regarding Claim 15, 
Step 1: The claim recites a neural network apparatus, therefore it falls into one of the four statutory categories (i.e., process, machine, article of manufacture, or composition of matter).
Step 2A Prong 1: This claim recites the following mental processes:
determine a weighted entropy based on data values included in the set of floating point data (Under its broadest reasonable interpretation, this claim element recites a judicial exception, as determining a weighted entropy based on data values represents a mathematical relationship, as the weighted entropy is a form of organizing information and manipulating information through mathematical correlations. See MPEP 2106.04(a)(2)(I-A), example iv.); 
adjust quantization levels assigned to the data values based on the weighted entropy (Under its broadest reasonable interpretation, this claim element recites a judicial exception, as adjusting quantization levels assigned to the data values based on the weighted entropy represents a mental step (observations, judgments, evaluations, opinions) that is implementable in the human mind, with aid of pen and paper. See MPEP 2106.04(a)(2)(III).); and 
quantize the data values included in the set of floating point data in accordance with the adjusted quantization levels (Under its broadest reasonable interpretation, this claim element recites a judicial exception, as quantizing data values in accordance with the adjusted quantization levels represents a mathematical relationship, where the act of quantizing is considered a form of organizing information and manipulating information through mathematical correlations. See MPEP 2106.04(a)(2)(I-A), example iv.).  
Step 2A Prong 2: This claim further recites:
a processor configured to (This claim element identifies the processor performing operations specific to the neural network apparatus, as well as generally linking the method to a : 
obtain a set of floating point data processed in a layer included in a neural network (This claim element is directed to gathering data, which is an insignificant extra-solution activity for use in a claimed process. See MPEP 2106.05(g). This additional element does not add a meaningful limitation to the claim, and hence does not integrate the judicial exception into a practical application.); …
Step 2B: This claim further recites:
a processor configured to (As analyzed in Step 2A Prong 2, type definitions and a general association to a technological environment do not further integrate the judicial exception into a practical application. See MPEP 2106.05(h). Hence this claim element does not add significantly more than the judicial exception, alone or in combination with other elements in the claim.): 
obtain a set of floating point data processed in a layer included in a neural network (This claim element is directed to storing and retrieving information in memory, which is a well-known, understood, routine, conventional activity, and hence does not add significantly more than the judicial exception, alone or in combination with other elements in the claim. See MPEP 2106.05(d)(II), list 1, example iv.); …
Regarding Claim 17, 
Step 1: The claim recites the apparatus of claim 15, therefore it falls into one of the four statutory categories (i.e., process, machine, article of manufacture, or composition of matter).
Step 2A Prong 1: This claim is a dependent claim of claim 15, and hence inherits the same abstract ideas mentioned above. This claim further recites the following abstract idea:
wherein the weighted entropy is determined by applying a weighting factor based on determined sizes of the data values to a determined distribution of the data values included in the set of floating point data (Under its broadest reasonable interpretation, this claim element recites a judicial exception, as applying a weighting factor represents a mathematical relationship, where the weighting factor being based on determined data value sizes and determined data distribution is considered a form of organizing information and manipulating information through mathematical correlations. See MPEP 2106.04(a)(2)(I-A), example iv.).  
Step 2A Prong 2: This claim does not recite any additional elements to be further analyzed at this step.
Step 2B: This claim does not recite any additional elements to be further analyzed at this step.
Regarding Claim 18, 
Step 1: The claim recites the apparatus of claim 15, therefore it falls into one of the four statutory categories (i.e., process, machine, article of manufacture, or composition of matter).
Step 2A Prong 1: This claim is a dependent claim of claim 15, and hence inherits the same abstract ideas mentioned above.
Step 2A Prong 2: This claim further recites:
wherein the set of floating point data comprises a set of activations processed in the layer or a set of weights processed in the layer (This claim element places an additional limitation on the type of floating point data (set of activations processed in the layer or set of weights processed in the layer), as well as generally linking the method to a technological environment. Type definitions and a general association to a technological environment do not further integrate the judicial exception into a practical application. See MPEP 2106.05(h).).  
Step 2B: This claim further recites:
wherein the set of floating point data comprises a set of activations processed in the layer or a set of weights processed in the layer (As analyzed in Step 2A Prong 2, type definitions and a general association to a technological environment do not further integrate the judicial exception into a practical application. See MPEP 2106.05(h). Hence this claim element .  
Regarding Claim 19, 
Step 1: The claim recites the apparatus of claim 15, therefore it falls into one of the four statutory categories (i.e., process, machine, article of manufacture, or composition of matter).
Step 2A Prong 1: This claim is a dependent claim of Claim 15, and hence inherits the same abstract ideas mentioned above. This claim further recites the following abstract ideas:
group the set of weights into a plurality of clusters (Under its broadest reasonable interpretation, this claim element recites a judicial exception, as grouping the set of weights into a plurality of clusters represents a mental step (observations, judgments, evaluations, opinions) that is implementable in the human mind, with aid of pen and paper. See MPEP 2106.04(a)(2)(III).);
determine respective relative frequencies for each of the grouped clusters by  total number of weights included in each of the [respective] grouped clusters by a total number of weights included in the set of weights (Under its broadest reasonable interpretation, this claim element recites a judicial exception, as determining respective relative frequencies for each of the grouped clusters by dividing a total number of weights in each of the grouped clusters by a total number of weights represents a mathematical calculation. See MPEP 2106.04(a)(2)(I-C).); 
determine respective representative importances of each of the grouped clusters based on sizes of weights included in each of the grouped clusters (Under its broadest reasonable interpretation, this claim element recites a judicial exception, as determining respective representative importances of each cluster represents a mathematical relationship, where the respective representative importances being based on weight size is considered a form of organizing information and manipulating information through mathematical correlations. See MPEP 2106.04(a)(2)(I-A), example iv.); and 
determine the weighted entropy based on the respective relative frequencies and the respective representative importances (Under its broadest reasonable interpretation, this claim element recites a judicial exception, as determining the weighted entropy represents a mathematical relationship, where expressing the weighted entropy based on the respective relative frequencies and respective representative importances is considered a form of organizing information and manipulating information through mathematical correlations. See MPEP 2106.04(a)(2)(I-A), example iv.).   
Step 2A Prong 2: This claim further recites:
the processor is further configured to (This claim element identifies the processor performing operations specific to the neural network apparatus, as well as generally linking the method to a technological environment. Type definitions and a general association to a technological environment do not further integrate the judicial exception into a practical application. See MPEP 2106.05(h).): …
the set of floating point data is a set of weights (This claim element places an additional limitation on the type of floating point data (set of weights), as well as generally linking the method to a technological environment. Type definitions and a general association to a technological environment do not further integrate the judicial exception into a practical application. See MPEP 2106.05(h).), …
Step 2B: This claim further recites:
the processor is further configured to (As analyzed in Step 2A Prong 2, type definitions and a general association to a technological environment do not further integrate the judicial exception into a practical application. See MPEP 2106.05(h). Hence this claim element does not add significantly more than the judicial exception, alone or in combination with other elements in the claim.): …
the set of floating point data is a set of weights (As analyzed in Step 2A Prong 2, type definitions and a general association to a technological environment do not further integrate the , …  
Regarding Claim 20, 
Step 1: The claim recites the apparatus of claim 19, therefore it falls into one of the four statutory categories (i.e., process, machine, article of manufacture, or composition of matter).
Step 2A Prong 1: This claim is a dependent claim of Claim 19, and hence inherits the same abstract ideas mentioned above. This claim further recites the following abstract idea:
quantize the weights included in each of the grouped clusters respectively into the corresponding representative weight for each of the grouped clusters (Under its broadest reasonable interpretation, this claim element recites a judicial exception, as quantizing the weights into the corresponding representative weight for each cluster represents a mathematical relationship, where the act of quantizing is considered a form of organizing information and manipulating information through mathematical correlations. See MPEP 2106.04(a)(2)(I-A), example iv.)).  
Step 2A Prong 2: This claim further recites:
the processor is further configured to (This claim element identifies the processor performing operations specific to the neural network apparatus, as well as generally linking the method to a technological environment. Type definitions and a general association to a technological environment do not further integrate the judicial exception into a practical application. See MPEP 2106.05(h).): …
determine respective weights corresponding to the respective representative importances of each of the grouped clusters as a corresponding representative weight for each of the grouped clusters (This claim element identifies respective weights corresponding to the respective representative importances of each of the group clusters as a corresponding representative weight, as well as generally linking the method to a technological environment. ; … 
Step 2B: This claim further recites:
the processor is further configured to (As analyzed in Step 2A Prong 2, type definitions and a general association to a technological environment do not further integrate the judicial exception into a practical application. See MPEP 2106.05(h). Hence this claim element does not add significantly more than the judicial exception, alone or in combination with other elements in the claim.): …
determine respective weights corresponding to the respective representative importances of each of the grouped clusters as a corresponding representative weight for each of the grouped clusters (As analyzed in Step 2A Prong 2, type definitions and a general association to a technological environment do not further integrate the judicial exception into a practical application. See MPEP 2106.05(h). Hence this claim element does not add significantly more than the judicial exception, alone or in combination with other elements in the claim.); … 
Regarding Claim 21, 
Step 1: The claim recites the apparatus of claim 19, therefore it falls into one of the four statutory categories (i.e., process, machine, article of manufacture, or composition of matter).
Step 2A Prong 1: This claim is a dependent claim of Claim 19, and hence inherits the same abstract ideas mentioned above. This claim further recites the following abstract idea:
adjust the quantization levels assigned to the data values by adjusting boundaries of each of the clusters in a direction that increases the weighted entropy (Under its broadest reasonable interpretation, this claim element recites a judicial exception, as adjusting quantization levels represents a mathematical relationship, where adjusting boundaries of each cluster to increase the value of the weighted entropy is considered a form of organizing .
Step 2A Prong 2: This claim further recites:
the processor is further configured to (This claim element identifies the processor performing operations specific to the neural network apparatus, as well as generally linking the method to a technological environment. Type definitions and a general association to a technological environment do not further integrate the judicial exception into a practical application. See MPEP 2106.05(h).): …
Step 2B: This claim further recites:
the processor is further configured to (As analyzed in Step 2A Prong 2, type definitions and a general association to a technological environment do not further integrate the judicial exception into a practical application. See MPEP 2106.05(h). Hence this claim element does not add significantly more than the judicial exception, alone or in combination with other elements in the claim.): …
Regarding Claim 22, 
Step 1: The claim recites the apparatus of claim 15, therefore it falls into one of the four statutory categories (i.e., process, machine, article of manufacture, or composition of matter).
Step 2A Prong 1: This claim is a dependent claim of claim 15, and hence inherits the same abstract ideas mentioned above. This claim further recites the following abstract ideas:
determine respective relative frequencies for each of the quantization levels by  total number of activations included in each of the [respective] quantization levels by a total number of activations included in the set of activations (Under its broadest reasonable interpretation, this claim element recites a judicial exception, as determining respective relative frequencies for each of the quantization levels by dividing a total number of activations included in each of the quantization levels by a total number of activations represents a mathematical calculation. See MPEP 2106.04(a)(2)(I-C).); …
determine the weighted entropy based on the respective relative frequencies and the respective representative importances (Under its broadest reasonable interpretation, this claim element recites a judicial exception, as determining the weighted entropy represents a mathematical relationship, where expressing the weighted entropy based on the respective relative frequencies and respective representative importances is considered a form of organizing information and manipulating information through mathematical correlations. See MPEP 2106.04(a)(2)(I-A), example iv.).  
Step 2A Prong 2: This claim further recites:
the set of floating point data is a set of activations (This claim element places an additional limitation on the type of floating point data (by identifying it as a set of activations), as well as generally linking the method to a technological environment. Type definitions and a general association to a technological environment do not further integrate the judicial exception into a practical application. See MPEP 2106.05(h).), and 
the quantization levels are assigned using an entropy-based logarithm data representation-based quantization method (This claim element is considered a form of applying mere instructions on a generic computer to implement a judicial exception. See MPEP 2106.05(f). This additional element does not add a meaningful limitation to the claim, and hence does not integrate the judicial exception into a practical application.), and
the processor is further configured to (This claim element identifies the processor performing operations specific to the neural network apparatus, as well as generally linking the method to a technological environment. Type definitions and a general association to a technological environment do not further integrate the judicial exception into a practical application. See MPEP 2106.05(h).): …
determine respective data values corresponding to each of the quantization levels as respective representative importances of each of the quantization levels (This claim element identifies the respective data values corresponding to each of the quantization levels as ; …
Step 2B: This claim further recites:
the set of floating point data is a set of activations (As analyzed in Step 2A Prong 2, type definitions and a general association to a technological environment do not further integrate the judicial exception into a practical application. See MPEP 2106.05(h). Hence this claim element does not add significantly more than the judicial exception, alone or in combination with other elements in the claim.), and 
the quantization levels are assigned using an entropy-based logarithm data representation-based quantization method (As analyzed in Step 2A Prong 2, applying mere instructions on a generic computer to implement a judicial exception does not integrate the judicial exception into a practical application. See MPEP 2106.05(f). Hence this claim element does not add significantly more than the judicial exception, alone or in combination with other elements in the claim.), and
the processor is further configured to (As analyzed in Step 2A Prong 2, type definitions and a general association to a technological environment do not further integrate the judicial exception into a practical application. See MPEP 2106.05(h). Hence this claim element does not add significantly more than the judicial exception, alone or in combination with other elements in the claim.): …
determine respective data values corresponding to each of the quantization levels as respective representative importances of each of the quantization levels (As analyzed in Step 2A Prong 2, type definitions and a general association to a technological environment do not further integrate the judicial exception into a practical application. See MPEP 2106.05(h). Hence ; …
Regarding Claim 23, 
Step 1: The claim recites the apparatus of claim 22, therefore it falls into one of the four statutory categories (i.e., process, machine, article of manufacture, or composition of matter).
Step 2A Prong 1: This claim is a dependent claim of Claim 22, and hence inherits the same abstract ideas mentioned above. This claim further recites the following abstract idea:
adjust the quantization levels assigned to the respective data values by adjusting a value corresponding to a first quantization level among the quantization levels and a size of an interval between the quantization levels in a direction of increasing the weighted entropy (Under its broadest reasonable interpretation, this claim element recites a judicial exception, as adjusting quantization levels represents a mathematical relationship, where adjusting a value corresponding to a first quantization level and a size of an interval between the quantization level to increase the value of the weighted entropy is considered a form of organizing information and manipulating information through mathematical correlations. See MPEP 2106.04(a)(2)(I-A), example iv.).
Step 2A Prong 2: This claim further recites:
the processor is further configured to (This claim element identifies the processor performing operations specific to the neural network apparatus, as well as generally linking the method to a technological environment. Type definitions and a general association to a technological environment do not further integrate the judicial exception into a practical application. See MPEP 2106.05(h).): …
Step 2B: This claim further recites:
the processor is further configured to (As analyzed in Step 2A Prong 2, type definitions and a general association to a technological environment do not further integrate the judicial exception into a practical application. See MPEP 2106.05(h). Hence this claim element does not : …
Regarding Claim 24, 
Step 1: The claim recites the apparatus of claim 22, therefore it falls into one of the four statutory categories (i.e., process, machine, article of manufacture, or composition of matter).
Step 2A Prong 1: This claim is a dependent claim of claim 22, and hence inherits the same abstract ideas mentioned above. This claim further recites the following abstract ideas:
adjust the quantization levels by adjusting a log base, which is controlling of the quantization levels, in a direction that maximizes the weighted entropy (Under its broadest reasonable interpretation, this claim element recites a judicial exception, as adjusting a log base that controls the quantization levels represents a mathematical relationship, where adjusting a log base to maximize the weighted entropy is considered a form of organizing information and manipulating information through mathematical correlations. See MPEP 2106.04(a)(2)(I-A), example iv.).
Step 2A Prong 2: This claim further recites:
the processor is further configured to (This claim element identifies the processor performing operations specific to the neural network apparatus, as well as generally linking the method to a technological environment. Type definitions and a general association to a technological environment do not further integrate the judicial exception into a practical application. See MPEP 2106.05(h).): …
Step 2B: This claim further recites:
the processor is further configured to (As analyzed in Step 2A Prong 2, type definitions and a general association to a technological environment do not further integrate the judicial exception into a practical application. See MPEP 2106.05(h). Hence this claim element does not add significantly more than the judicial exception, alone or in combination with other elements in the claim.): …
Regarding Claim 25, 
Step 1: The claim recites the apparatus of claim 15, therefore it falls into one of the four statutory categories (i.e., process, machine, article of manufacture, or composition of matter).
Step 2A Prong 1: This claim is a dependent claim of claim 15, and hence inherits the same abstract ideas mentioned above. This claim further recites the following abstract ideas:
[perform the] … determining … [are performed with respect to each of a plurality of layers included in the neural network] (Under its broadest reasonable interpretation, this claim element recites a judicial exception, as determining a weighted entropy based on data values for each of the plurality of layers represents a mathematical relationship, as the weighted entropy is a form of organizing information and manipulating information through mathematical correlations. See MPEP 2106.04(a)(2)(I-A), example iv.) …
[perform the] … adjusting … [are performed with respect to each of a plurality of layers included in the neural network] (Under its broadest reasonable interpretation, this claim element recites a judicial exception, as adjusting quantization levels assigned to the data values for each of the plurality of layers based on the weighted entropy represents a mental step (observations, judgments, evaluations, opinions) that is implementable in the human mind, with aid of pen and paper. See MPEP 2106.04(a)(2)(III).) …
[perform the] … 012055.0439quantizing are performed with respect to each of a plurality of layers included in the neural network (Under its broadest reasonable interpretation, this claim element recites a judicial exception, as quantizing data values in accordance with the adjusted quantization levels for each of the plurality of layers represents a mathematical relationship, where the act of quantizing is considered a form of organizing information and manipulating information through mathematical correlations. See MPEP 2106.04(a)(2)(I-A), example iv.), 
with respective adjusted quantization levels being optimized and assigned for each of the plurality of layers (Under its broadest reasonable interpretation, this claim element recites a judicial exception, as optimizing and assigning respective adjusted quantization levels for each .
Step 2A Prong 2: This claim further recites:
the processor is further configured to (This claim element identifies the processor performing operations specific to the neural network apparatus, as well as generally linking the method to a technological environment. Type definitions and a general association to a technological environment do not further integrate the judicial exception into a practical application. See MPEP 2106.05(h).): …
perform the obtaining … [are performed with respect to each of a plurality of layers included in the neural network] (This claim element is directed to gathering data for each of the plurality of layers, which is an insignificant extra-solution activity for use in a claimed process. See MPEP 2106.05(g). This additional element does not add a meaningful limitation to the claim, and hence does not integrate the judicial exception into a practical application.).
Step 2B: This claim further recites:
the processor is further configured to (As analyzed in Step 2A Prong 2, type definitions and a general association to a technological environment do not further integrate the judicial exception into a practical application. See MPEP 2106.05(h). Hence this claim element does not add significantly more than the judicial exception, alone or in combination with other elements in the claim.): …
perform the obtaining … [are performed with respect to each of a plurality of layers included in the neural network] (This claim element is directed to storing and retrieving information in memory, which is a well-known, understood, routine, conventional activity, and hence does not add significantly more than the judicial exception, alone or in combination with other elements in the claim. See MPEP 2106.05(d)(II), list 1, example iv.).
Regarding Claim 26, 
Step 1: The claim recites the apparatus of claim 15, therefore it falls into one of the four statutory categories (i.e., process, machine, article of manufacture, or composition of matter).
Step 2A Prong 1: This claim is a dependent claim of claim 15, and hence inherits the same abstract ideas mentioned above. This claim further recites the following abstract ideas:
[to implement the] … determining (Under its broadest reasonable interpretation, this claim element recites a judicial exception, as determining a weighted entropy based on data values represents a mathematical relationship, as the weighted entropy is a form of organizing information and manipulating information through mathematical correlations. See MPEP 2106.04(a)(2)(I-A), example iv.), …   
[to implement the] … adjusting (Under its broadest reasonable interpretation, this claim element recites a judicial exception, as adjusting quantization levels assigned to the data values based on the weighted entropy represents a mental step (observations, judgments, evaluations, opinions) that is implementable in the human mind, with aid of pen and paper. See MPEP 2106.04(a)(2)(III).)...  
[to implement the] … quantizing (Under its broadest reasonable interpretation, this claim element recites a judicial exception, as quantizing data values in accordance with the adjusted quantization levels represents a mathematical relationship, where the act of quantizing is considered a form of organizing information and manipulating information through mathematical correlations. See MPEP 2106.04(a)(2)(I-A), example iv.).  
Step 2A Prong 2: This claim further recites:
a non-transitory memory storing instructions, which when executed by the processor, control the processor (This claim element is considered a form of applying mere instructions on a generic computer to implement a judicial exception. See MPEP 2106.05(f). This additional element does not add a meaningful limitation to the claim, and hence does not integrate the judicial exception into a practical application.) …
to implement the obtaining (This claim element is directed to gathering data, which is an insignificant extra-solution activity for use in a claimed process. See MPEP 2106.05(g). This additional element does not add a meaningful limitation to the claim, and hence does not integrate the judicial exception into a practical application.), …
Step 2B: This claim further recites:
a non-transitory memory storing instructions, which when executed by the processor, control the processor (As analyzed in Step 2A Prong 2, applying mere instructions on a generic computer to implement a judicial exception does not integrate the judicial exception into a practical application. See MPEP 2106.05(f). Hence this claim element does not add significantly more than the judicial exception, alone or in combination with other elements in the claim.) …
to implement the obtaining (This claim element is directed to storing and retrieving information in memory, which is a well-known, understood, routine, conventional activity, and hence does not add significantly more than the judicial exception, alone or in combination with other elements in the claim. See MPEP 2106.05(d)(II), list 1, example iv.), …

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.

4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 1-8, 12-21, and 25-26 are rejected under 35 U.S.C. 103 as being unpatentable over Hwang et al., Fixed-point feedforward deep neural network design using weights +1, 0, and -1, 2014 IEEE Workshop on Signal Processing Systems SiPS, 2014, 6 pages [hereafter referred as Hwang] in view of Guiasu, Silviu, Grouping Data by Using the Weighted Entropy, Journal of Statistical Planning and Inference 15 (1986) Elsevier Science Publishers B.V., 1986, pp.63-69 [hereafter referred as Guiasu].
Regarding Claim 1, Hwang teaches
A processor-implemented neural network method, the method comprising: 
obtaining a set of floating point data processed in a layer included in a neural network ([Hwang p.1 col.1 Abstract: floating-point-based feedforward deep neural networks with multiple hidden layers (“a layer included in a neural network”), with floating-point values for weights and activation signals (“a set of floating point data”) being quantized through direct quantization to ternary weights and 3-bit signals, with further refinement through backpropagation based retraining (“Feedforward deep neural networks that employ multiple hidden layers show high performance in many applications, but they demand complex hardware for implementation. The hardware complexity can be much lowered by minimizing the word-length of weights and signals, but direct quantization for fixed-point network design does not yield good results. We optimize the fixed-point design by employing backpropagation based retraining. The designed fixed-point networks with ternary weights (+1, 0, and -1) and 3-bit signal show only negligible performance loss when compared to the floating-point counterparts.”).] [Hwang p.1 col.1-col.2 Section 1. Introduction 2nd paragraph: extracting from each layer k a signal vector yk and weight matrix Wk (“obtaining a set of floating point data processed in a layer included in a neural network”) (“In a general feedforward deep neural network with multiple hidden layers as depicted in Fig. 1, each layer k has a signal vector yk, which is propagated to the next layer by multiplying the weight matrix Wk+1, adding biases bk+1, and applying the activation function φk+1 (·) as follows: yk+1 = φk+1 (Wk+1 yk + bk+1). … In fully-connected feedforward deep neural networks, each weight matrix between two layers demands N1 ×N2 weights, where N1 and N2 are the number of units for the anterior layer and the posterior layer, respectively. Considering a network employing hidden layers with 1,024 units, each hidden layer demands about one mega weights. The number of output signals and that of biases are both N2.”).]); 
determining a … entropy based on data values included in the set of floating point data ([Hwang p.2 col.1 Section II. Direct Quantization with Exhaustive Search: determining an initial grouping for weights and activation signals based on their complexity, range, and quantization sensitivity, which are interpreted as measurements of information content (“entropy”) present in the data (“determining a … entropy based on data values included in the set of floating point data”) (“A deep neural network usually contains millions of weights and thousands of internal signals. Since applying a different data format for each weight or signal is too complex, it is needed to group them according to their range and the quantization sensitivity [13]. In a deep neural network with several layers, it is convenient to separate each layer for the grouping. Among the weights in each layer, we notice that the biases need to have high precision because their range is usually much larger than that of other weights. Assigning a high precision fixed-point format, such as 8 bits, to the biases does not increase the hardware complexity much because the number of them is small. The quantization sensitivity can also be determined from simulations that apply quantized weights for a specific group while using the floating-point data type for other groups [13]. We found that the quantization sensitivity of signals in the hidden layers is mostly the same and very low, but that in the input of the network depends on applications very much.”).]); 
adjusting quantization levels assigned to the data values based on the … entropy ([Hwang p.1 col.2 2nd full paragraph (Section 1. Introduction): quantizing weights and activation signals into (+1, 0, -1) weight categories and 2- or 3-bit fixed point signals, respectively (“quantization levels assigned to the data values”) (“In this paper, we propose a high performance fixed-point optimization method that can greatly reduce the word-length of weights and signals for implementing DNNs. The proposed scheme allows design of DNNs for real-world problems only with ternary (+1, 0, and -1) weights and 2 or 3 bits of fixed-point signals.”).] [Hwang p.2 col.1 1st full paragraph (Section 1. Introduction): performing a two phase approach for quantizing data, which includes performing a direct quantization as a baseline, followed by a retraining of network using the quantized values (“The paper is organized as follows. In Section II, we describe a direct quantization approach as a baseline. Section III contains the proposed scheme that retrains the network after fixed-point quantization. Both the direct and the proposed quantization schemes are evaluated in Section IV.”).] [Hwang p.2 col.2 1st full paragraph (Section II. Direct Quantization with Exhaustive Search): performing an initial direct quantization with an optimal search, performing iterative adjustments to identify the quantization step size boundary (e.g. range of weight values for each quantization level) (“adjusting quantization levels assigned to the data values based on the … entropy”) to minimize the output error of the  (“… the optimum step size is initially determined by using an L2-error minimizing approach that is similar to Lloyd-Max quantization, and then the quantization step size is fine tuned by using exhaustive search. … To reduce the search dimension, the greedy approach is applied as follows: 1) Prepare a fully trained floating-point weights. 2) Quantize all input data and signals of hidden layers. 3) Starts with the weight quantizer between the input layer and the first hidden layer, try several step sizes around the initial step size and measure the output error of the network with the training set. The initial step size is determined using the L2-error minimizing approach. 4) Choose the step size that minimizes the output error and quantize the weights. 5) Perform the third and fourth steps for the next layer until it reaches the last layer.”).]); and 
quantizing the data values included in the set of floating point data in accordance with the adjusted quantization levels ([Hwang p.2 col.2 1st full paragraph (Section II. Direct Quantization with Exhaustive Search): performing an initial direct quantization with an optimal search, performing iterative adjustments to identify the quantization step size boundary (e.g. range of weight values for each quantization level) to minimize the output error of the network, and re-quantize the weights to find the best quantization level (“quantizing the data values included in the set of floating point data in accordance with the adjusted quantization levels”) (“… the optimum step size is initially determined by using an L2-error minimizing approach that is similar to Lloyd-Max quantization, and then the quantization step size is fine tuned by using exhaustive search. … To reduce the search dimension, the greedy approach is applied as follows: 1) Prepare a fully trained floating-point weights. 2) Quantize all input data and signals of hidden layers. 3) Starts with the weight quantizer between the input layer and the first hidden layer, try several step sizes around the initial step size and measure the output error of the network with the training set. The initial step size is determined using the L2-error minimizing approach. 4) Choose the step size that minimizes the output error and quantize the weights. 5) Perform the third and fourth steps for the next layer until it reaches the last layer.”).]).  
However, Hwang does not teach
[determining] a weighted entropy [based on data values] 
…
[adjusting] … based on the weighted entropy …
Guiasu teaches
[determining] a weighted entropy [based on data values] ([Guiasu p.63 Section 1. Introduction: grouping data based on data complexity (“entropy”), and taking into account information content and class homogeneity, where the information content and class homogeneity represents weighting factors on the entropy (“weighted entropy”); in the context of performing quantization of floating point weights and activation signals, the grouping of data into classes, factoring in information content and class homogeneity is interpreted as grouping the weights or activation signals into assigned quantization levels based on weighted entropy (“determining a weighted entropy”) (“Grouping data is a way of coping with complexity. It is well known that when the raw data are grouped in classes a certain amount of information is lost, since no distinction is made between observations falling into the same class. The larger the class interval is, the greater is the amount of information lost. On the other hand, if too many distinct classes are used, the presentation of information is somewhat misleading because conspicuous irregularities merely reflect the accidents of sampling. In the choice of a class interval a reasonable compromise must be reached between information content and class homogeneity. The aim of the paper is to show how the weighted entropy, a generalization of Shannon's entropy from information theory, may be used to balance the amount of information and the degree of homogeneity associated to a partition of data in classes.”).] [Guiasu p.65 Section 2. Information balance for weighted data: referring to equations 2.1, 2.2, 2.3, with                         
                            
                                
                                    X
                                
                                
                                    i
                                
                            
                        
                                             
                            
                                
                                    X
                                
                                
                                    i
                                
                            
                        
                    ) of a data set X and a weighting factor w(                        
                            
                                
                                    X
                                
                                
                                    i
                                
                            
                        
                    ) based on the values of the data set, with p(                        
                            
                                
                                    X
                                
                                
                                    i
                                
                            
                        
                    ) expressed as a ratio of the number of data elements per partition/class and the total number of data elements X, and w(                        
                            
                                
                                    X
                                
                                
                                    i
                                
                            
                        
                    ) expressed as a ratio of the sum of all element values per partition/class and the total number of data elements per partition/class; 

    PNG
    media_image1.png
    433
    610
    media_image1.png
    Greyscale

in the context of performing quantization of floating point weights and activation signals, the data set X represents either the set of floating point weights or the set of floating point activation signals, and the values of the data set represent the weight size or activation signal value (“[determining] a weighted entropy [based on data values]”).]) …
[adjusting] … based on the weighted entropy ([Guiasu p.66 Section 3. The trade-off between information and homogeneity: referring to p.65 equation 2.3 (“weighted entropy”), p.65 equation 2.6 (information content of the partition/class based on the weighted entropy), p.65 equation 2.7 (degree of homogeneity of the partition/class), and p.66 equation 3.1, one way to find an optimal number of partitions (                        
                            
                                
                                    P
                                
                                
                                    n
                                
                            
                        
                    ) is shown in equation adjusting”) of the measurement of information content I(                        
                            
                                
                                    P
                                
                                
                                    n
                                
                            
                        
                    ) and the measurement of the degree of homogeneity H                        
                            (
                            
                                
                                    P
                                
                                
                                    n
                                
                            
                            )
                        
                     within each partition; in the context of performing quantization of floating point weights and activation signals, each partition/class represents a quantization level, and finding the right balance represents adjusting quantization levels assigned to the data values, where the adjusting is based on the information content (which is based on the weighted entropy) and the degree of homogeneity (“[adjusting] … based on the weighted entropy”).]) …
Hwang and Guiasu are analogous art since both teach the partitioning of data elements in a data set.
It would have been obvious to a person having ordinary skill in the art before the effective filing date to take the step of determining entropy based on the floating point data set of Hwang and enhance it with the step of determining weighted entropy based on the floating point data set of Guiasu as a way to perform partitioning of data elements in a data set (i.e., determining quantization levels). The motivation to combine is driven by the fact that using weighted entropy to determine groupings is a known measurement that can be applied to produce an expected outcome with a reasonable chance of success, as it produces a predictable result (i.e., determining an optimal number of groups and group boundaries for any set of data elements) that can be further used in other applications (in the case of Hwang,  determining an optimal number of quantization levels and the step size/range of data elements within each quantization level). Additionally, Guiasu teaches that the weighted entropy represents a class grouping that balances information loss (i.e., accuracy loss resulting from quantization) and class element variation (i.e., class homogeneity) to offset the quantization, which is advantageous in training a neural network as having variation improves the performance and classification result in a trained network ([Guiasu p.63 Section 1. Introduction: “Grouping data is a way of coping with complexity. It is well known that when the raw data are grouped in classes a certain amount of information is lost, since no distinction is made between observations falling into the same class. The larger the class interval is, the greater is the amount of information lost. On the other hand, if too many distinct classes are used, the presentation of information is somewhat misleading because conspicuous irregularities merely reflect the accidents of sampling. In the choice of a class interval a reasonable compromise must be reached between information content and class homogeneity. The aim of the paper is to show how the weighted entropy, a generalization of Shannon's entropy from information theory, may be used to balance the amount of information and the degree of homogeneity associated to a partition of data in classes.”]).
Regarding Claim 2, Hwang in view of Guiasu teaches
The method of claim 1, further comprising 
implementing the neural network using the quantized data values and based on input data provided to the neural network ([Hwang p.3 col.2 last paragraph – p.4 col.1 first paragraph (IV.A. Handwritten Digit Recognition): retraining a neural network with the quantization algorithm using input data from MNIST database (“implementing the neural network using the quantized data values and based on input data provided to the neural network”), and evaluating the retrained network to produce a classification result and analysis of the miss classification rate (“1) MNIST Database: The MNIST database consists of 28 by 28 grey level images of handwritten digits. A training set has 60,000 examples and a test set has 10,000 examples. This database has been widely used for evaluation of various image classifiers … 2) Neural Network Configuration: … The input layer has 784 units, which is followed by two 500-unit and one 2,000-unit hidden layers. The output layer has 10 units, which correspond to 10 target digit labels. All layers contain logistic units. 3) Training: The network is pre-trained with unsupervised greedy RBM learning. Each RBM is trained by 50 epochs of 10-step contrastive-divergence based stochastic gradient descent with the mini-batch size of 100, the fixed learning rate of 0.1, and the momentum of 0.9. Then, we ran 100 epochs of the backpropagation with stochastic gradient descent using the mini-batch size of 100, the fixed learning rate of 0.1, and the momentum of 0.9. Also, the same parameters are used for the proposed retraining algorithm. 4) Experimental Results: The experimental results with various weight and signal quantization are summarized in TABLE I. The original miss classification rate for the test set was 0.97% with floating-point arithmetic. The direct quantization shows a miss rate of 4.28% with 3-point weights, and 1.20% with 7-point weights. On the other hand, the retraining approach shows the result that is quite close to the original one. The miss rate with 3-point weights and 3-bit signal quantization is 1.08%. Applying 8-bit signal quantization does not show significant difference compared to 3-bit signal quantization, which means 3 bits for signal word-length is enough for hidden layers.”).]), and 
indicating a result of the implementation ([Hwang p.3 col.2 last paragraph – p.4 col.1 first paragraph (IV.A. Handwritten Digit Recognition): retraining a neural network with the quantization algorithm using input data from MNIST database, and evaluating the retrained network to produce a classification result and analysis of the miss classification rate (“indicating a result of the implementation”) (“1) MNIST Database: The MNIST database consists of 28 by 28 grey level images of handwritten digits. A training set has 60,000 examples and a test set has 10,000 examples. This database has been widely used for evaluation of various image classifiers … 2) Neural Network Configuration: … The input layer has 784 units, which is followed by two 500-unit and one 2,000-unit hidden layers. The output layer has 10 units, which correspond to 10 target digit labels. All layers contain logistic units. 3) Training: The network is pre-trained with unsupervised greedy RBM learning. Each RBM is trained by 50 epochs of 10-step contrastive-divergence based stochastic gradient descent with the mini-batch size of 100, the fixed learning rate of 0.1, and the momentum of 0.9. Then, we ran 100 epochs of the backpropagation with stochastic gradient descent using the mini-batch size of 100, the fixed learning rate of 0.1, and the momentum of 0.9. Also, the same parameters are used for the proposed retraining algorithm. 4) Experimental Results: The experimental results with various weight and signal quantization are summarized in TABLE I. The original miss classification rate for the test set was 0.97% with floating-point arithmetic. The direct quantization shows a miss rate of 4.28% with 3-point weights, and 1.20% with 7-point weights. On the other hand, the retraining approach shows the result that is quite close to the original one. The miss rate with 3-point weights and 3-bit signal quantization is 1.08%. Applying 8-bit signal quantization does not show significant difference compared to 3-bit signal quantization, which means 3 bits for signal word-length is enough for hidden layers.”).]).  
Regarding Claim 3, Hwang in view of Guiasu teaches
The method of claim 1, wherein the weighted entropy is determined by applying a weighting factor based on determined sizes of the data values to a determined distribution of the data values included in the set of floating point data ([Hwang p.1 col.1-col.2 Section 1. Introduction 2nd paragraph: extracting from each layer k a signal vector yk and weight matrix Wk (“In a general feedforward deep neural network with multiple hidden layers as depicted in Fig. 1, each layer k has a signal vector yk, which is propagated to the next layer by multiplying the weight matrix Wk+1, adding biases bk+1, and applying the activation function φk+1 (·) as follows: yk+1 = φk+1 (Wk+1 yk + bk+1). … In fully-connected feedforward deep neural networks, each weight matrix between two layers demands N1 ×N2 weights, where N1 and N2 are the number of units for the anterior layer and the posterior layer, respectively. Considering a network employing hidden layers with 1,024 units, each hidden layer demands about one mega weights. The number of output signals and that of biases are both N2.”).] [Guiasu p.65 Section 2. Information balance for weighted data: referring to equations 2.1, 2.2, 2.3, and in the context of performing quantization of floating point weights, with                         
                            
                                
                                    X
                                
                                
                                    i
                                
                            
                        
                     representing groups of data elements from the data set X (i.e., a set of floating point weights, and their corresponding values in the data set representing weight sizes), the equation for weighted entropy is based on the relative frequency p(                        
                            
                                
                                    X
                                
                                
                                    i
                                
                            
                        
                    ) of a data set X and a weighting factor w(                        
                            
                                
                                    X
                                
                                
                                    i
                                
                            
                        
                    ) based on the values of the data set, with p(                        
                            
                                
                                    X
                                
                                
                                    i
                                
                            
                        
                    ) expressed as a ratio of the number of data elements per partition/class and the total number of data elements X, and w(                        
                            
                                
                                    X
                                
                                
                                    i
                                
                            
                        
                    ) expressed as a ratio of the sum of all element values per partition/class (“determined sizes of the data values”) and the total number of data elements per partition/class (“determined distribution of the data values included in the set”), with w(                        
                            
                                
                                    X
                                
                                
                                    i
                                
                            
                        
                    ) representing the weighting ratio for the set of floating point weights (“the weighted entropy is determined by applying a weighting factor based on determined sizes of the data values to a determined distribution of the data values include in the set of floating point data”).

    PNG
    media_image1.png
    433
    610
    media_image1.png
    Greyscale

]).  
Regarding Claim 4, Hwang in view of Guiasu teaches
The method of claim 1, wherein the set of floating point data are a set of activations processed in the layer ([Hwang p.1 col.1-col.2 Section 1. Introduction 2nd paragraph: k (“the set of floating point data are a set of activations processed in the layer”) and weight matrix Wk (“In a general feedforward deep neural network with multiple hidden layers as depicted in Fig. 1, each layer k has a signal vector yk, which is propagated to the next layer by multiplying the weight matrix Wk+1, adding biases bk+1, and applying the activation function φk+1 (·) as follows: yk+1 = φk+1 (Wk+1 yk + bk+1). … In fully-connected feedforward deep neural networks, each weight matrix between two layers demands N1 ×N2 weights, where N1 and N2 are the number of units for the anterior layer and the posterior layer, respectively. Considering a network employing hidden layers with 1,024 units, each hidden layer demands about one mega weights. The number of output signals and that of biases are both N2.”).]).  
Regarding Claim 5, Hwang in view of Guiasu teaches
The method of claim 1, wherein the set of floating point data are a set of weights processed in the layer ([Hwang p.1 col.1-col.2 Section 1. Introduction 2nd paragraph: extracting from each layer k a signal vector yk  and weight matrix Wk (“the set of floating point data are a set of weights processed in the layer”) (“In a general feedforward deep neural network with multiple hidden layers as depicted in Fig. 1, each layer k has a signal vector yk, which is propagated to the next layer by multiplying the weight matrix Wk+1, adding biases bk+1, and applying the activation function φk+1 (·) as follows: yk+1 = φk+1 (Wk+1 yk + bk+1). … In fully-connected feedforward deep neural networks, each weight matrix between two layers demands N1 ×N2 weights, where N1 and N2 are the number of units for the anterior layer and the posterior layer, respectively. Considering a network employing hidden layers with 1,024 units, each hidden layer demands about one mega weights. The number of output signals and that of biases are both N2.”).]).  
Regarding Claim 6, Hwang in view of Guiasu teaches
The method of claim 1, wherein, 
the set of floating point data is a set of weights ([Hwang p.1 col.1-col.2 Section 1. Introduction 2nd paragraph: extracting from each layer k a signal vector yk  and weight matrix Wk (“the set of floating point data is a set of weights”) (“In a general feedforward deep neural network with multiple hidden layers as depicted in Fig. 1, each layer k has a signal vector yk, which is propagated to the next layer by multiplying the weight matrix Wk+1, adding biases bk+1, and applying the activation function φk+1 (·) as follows: yk+1 = φk+1 (Wk+1 yk + bk+1). … In fully-connected feedforward deep neural networks, each weight matrix between two layers demands N1 ×N2 weights, where N1 and N2 are the number of units for the anterior layer and the posterior layer, respectively. Considering a network employing hidden layers with 1,024 units, each hidden layer demands about one mega weights. The number of output signals and that of biases are both N2.”).]), and 
the determining of the weighted entropy comprises: 
grouping the set of weights into a plurality of clusters ([Guiasu p.65 Section 2. Information balance for weighted data: referring to equations 2.1, 2.2, 2.3, and in the context of performing quantization of floating point weights, with                         
                            
                                
                                    X
                                
                                
                                    i
                                
                            
                        
                     representing groups of data elements from the data set X (i.e., a set of floating point weights, and their corresponding values in the data set representing weight sizes) (“grouping the set of weights into a plurality of clusters”), the equation for weighted entropy is based on the relative frequency p(                        
                            
                                
                                    X
                                
                                
                                    i
                                
                            
                        
                    ) of a data set X and a weighting factor w(                        
                            
                                
                                    X
                                
                                
                                    i
                                
                            
                        
                    ) that is based on the values of the data set, with p(                        
                            
                                
                                    X
                                
                                
                                    i
                                
                            
                        
                    ) expressed as a ratio of the number of data elements per partition/class and the total number of data elements X, and w(                        
                            
                                
                                    X
                                
                                
                                    i
                                
                            
                        
                    ) expressed as a ratio of the sum of all element values per partition/class and the total number of data elements per partition/class, with w(                        
                            
                                
                                    X
                                
                                
                                    i
                                
                            
                        
                    ) representing the weighting ratio for the set of floating point weights.]); 
determining respective relative frequencies for each of the grouped clusters by  total number of weights included in each of the [respective] grouped clusters by a total number of weights included in the set of weights ([Guiasu p.65 Section 2. Information balance for weighted data: referring to equations 2.1, 2.2, 2.3, and in the context of performing quantization of floating point weights, with                         
                            
                                
                                    X
                                
                                
                                    i
                                
                            
                        
                     representing groups of data elements from the data set X (“grouped clusters”) (i.e., a set of floating point weights, and their corresponding values in the data set representing weight sizes), the equation for weighted entropy is based on the relative frequency p(                        
                            
                                
                                    X
                                
                                
                                    i
                                
                            
                        
                    ) (“respective relative frequencies”) of a data set X and a weighting factor w(                        
                            
                                
                                    X
                                
                                
                                    i
                                
                            
                        
                    ) based on the values of the data set, with p(                        
                            
                                
                                    X
                                
                                
                                    i
                                
                            
                        
                    ) expressed as a ratio of the number of data elements per partition/class and the total number of data elements X (“determining respective relative frequencies for each of the grouped clusters by ”), and w(                        
                            
                                
                                    X
                                
                                
                                    i
                                
                            
                        
                    ) expressed as a ratio of the sum of all element values per partition/class and the total number of data elements per partition/class.
    PNG
    media_image1.png
    433
    610
    media_image1.png
    Greyscale
	]); 
determining respective representative importances of each of the grouped clusters based on sizes of weights included in each of the grouped clusters ([Guiasu p.65 Section 2. Information balance for weighted data: referring to equations 2.1, 2.2, 2.3, and in the context of performing quantization of floating point weights, with                         
                            
                                
                                    X
                                
                                
                                    i
                                
                            
                        
                     representing groups of grouped clusters”) (i.e., a set of floating point weights, and their corresponding values in the data set representing weight sizes), the equation for weighted entropy is based on the relative frequency p(                        
                            
                                
                                    X
                                
                                
                                    i
                                
                            
                        
                    ) of a data set X and a weighting factor w(                        
                            
                                
                                    X
                                
                                
                                    i
                                
                            
                        
                    ) based on the values of the data set (“respective representative importances”), with p(                        
                            
                                
                                    X
                                
                                
                                    i
                                
                            
                        
                    ) expressed as a ratio of the number of data elements per partition/class and the total number of data elements X, and w(                        
                            
                                
                                    X
                                
                                
                                    i
                                
                            
                        
                    ) expressed as a ratio of the sum of all element values per partition/class and the total number of data elements per partition/class (“determining respective representative importances of each of the grouped clusters based on sizes of weights included in each of the grouped clusters”).
    PNG
    media_image1.png
    433
    610
    media_image1.png
    Greyscale
	]); and 
determining the weighted entropy based on the respective relative frequencies and the respective representative importances ([Guiasu p.65 Section 2. Information balance for weighted data: referring to equations 2.1, 2.2, 2.3, and in the context of performing quantization of floating point weights, with                         
                            
                                
                                    X
                                
                                
                                    i
                                
                            
                        
                     representing groups of data elements from the data set X (“grouped clusters”) (i.e., a set of floating point weights, and their corresponding values in the data set representing weight sizes), the equation for weighted entropy is based on the relative frequency p(                        
                            
                                
                                    X
                                
                                
                                    i
                                
                            
                        
                    ) of a data set X (“respective relative frequencies”) and a                         
                            
                                
                                    X
                                
                                
                                    i
                                
                            
                        
                    ) based on the values of the data set (“respective representative importances”), with p(                        
                            
                                
                                    X
                                
                                
                                    i
                                
                            
                        
                    ) expressed as a ratio of the number of data elements per partition/class and the total number of data elements X, and w(                        
                            
                                
                                    X
                                
                                
                                    i
                                
                            
                        
                    ) expressed as a ratio of the sum of all element values per partition/class and the total number of data elements per partition/class; hence, the weighted entropy in equation 2.3 is a function of the respective relative frequencies pk and relative importances wk (“determining the weighted entropy based on the respective relative frequencies and the respective representative importances”).]).   
Regarding Claim 7, Hwang in view of Guiasu teaches
The method of claim 6, wherein the quantizing comprises: 
determining respective weights corresponding to the respective representative importances of each of the grouped clusters as a corresponding representative weight for each of the grouped clusters ([Guiasu p.65 Section 2. Information balance for weighted data: referring to equations 2.1, 2.2, 2.3, and in the context of performing quantization of floating point weights, with                         
                            
                                
                                    X
                                
                                
                                    i
                                
                            
                        
                     representing groups of data elements from the data set X (“grouped clusters”) (i.e., a set of floating point weights, and their corresponding values in the data set representing weight sizes), the equation for weighted entropy is based on the relative frequency p(                        
                            
                                
                                    X
                                
                                
                                    i
                                
                            
                        
                    ) of a data set X and a weighting factor w(                        
                            
                                
                                    X
                                
                                
                                    i
                                
                            
                        
                    ) based on the values of the data set (“respective representative importances of each of the grouped clsuters”), with p(                        
                            
                                
                                    X
                                
                                
                                    i
                                
                            
                        
                    ) expressed as a ratio of the number of data elements per partition/class and the total number of data elements X, and w(                        
                            
                                
                                    X
                                
                                
                                    i
                                
                            
                        
                    ) expressed as a ratio of the sum of all element values per partition/class and the total number of data elements per partition/class; hence, the weighted entropy in equation 2.3 is a function of the respective relative frequencies pk and relative importances wk.] [Guiasu p.68 Section 4. Numerical example: referring to Table 1, listing the set of partitions (“grouped clusters”) for a set of N=95 items (in the context of a set of floating determining respective weights corresponding to the respective representative importances of each of the grouped clusters”) to identify the group cluster (“as a corresponding representative weights for each of the grouped clusters”), and placing each of the 95 items into the appropriate partitions.]); and 
quantizing the weights included in each of the grouped clusters respectively into the corresponding representative weight for each of the grouped clusters ([Guiasu p.65 Section 2. Information balance for weighted data: referring to equations 2.1, 2.2, 2.3, and in the context of performing quantization of floating point weights, with                         
                            
                                
                                    X
                                
                                
                                    i
                                
                            
                        
                     representing groups of data elements from the data set X (“grouped clusters”) (i.e., a set of floating point weights, and their corresponding values in the data set representing weight sizes), the equation for weighted entropy is based on the relative frequency p(                        
                            
                                
                                    X
                                
                                
                                    i
                                
                            
                        
                    ) of a data set X and a weighting factor w(                        
                            
                                
                                    X
                                
                                
                                    i
                                
                            
                        
                    ) based on the values of the data set, with p(                        
                            
                                
                                    X
                                
                                
                                    i
                                
                            
                        
                    ) expressed as a ratio of the number of data elements per partition/class and the total number of data elements X, and w(                        
                            
                                
                                    X
                                
                                
                                    i
                                
                            
                        
                    ) expressed as a ratio of the sum of all element values per partition/class and the total number of data elements per partition/class; hence, the weighted entropy in equation 2.3 is a function of the respective relative frequencies pk and relative importances wk.] [Guiasu p.68 Section 4. Numerical example: referring to Table 1, listing the set of partitions (“grouped clusters”) for a set of N=95 items (in the context of a set of floating point data, these items are interpreted as a set of weight values), selecting the start and end weight values belonging in each partition to identify the group cluster (“corresponding representative weight”), and placing each of the 95 items into the appropriate partitions (“quantizing the weights included in each of the grouped clusters respectively into the corresponding representative weight for each of the grouped clusters”).]).  
Regarding Claim 8, Hwang in view of Guiasu teaches
The method of claim 6, wherein the adjusting comprises 
adjusting the quantization levels assigned to the data values by adjusting boundaries of each of the clusters in a direction that increases the weighted entropy ([Hwang p.4 col.1 Section IV.A. Handwritten Digit Recognition: referring to Table II, comparing quantization adjustments of the step size 𝚫 before and after backpropagation retraining, where the retraining involves adjusting the quantization boundaries (“adjusting the quantization levels assigned to the data values by adjusting boundaries of each of the clusters”) as well as moving weights around the quantization boundaries in both directions; the act of moving weights around the quantization boundaries affects the distribution within a class which can increase the weighted entropy for that class (“in a direction that increases the weighted entropy”).]).  
Regarding Claim 12, Hwang in view of Guiasu teaches
The method of claim 1, wherein, 
the obtaining … [are performed with respect to each of a plurality of layers included in the neural network] ([Hwang p.1 col.1-col.2 Section 1. Introduction 2nd paragraph: extracting from each layer k a signal vector yk and weight matrix Wk (“the obtaining … [are performed with respect to each of a plurality of layers included in the neural network]”) (“In a general feedforward deep neural network with multiple hidden layers as depicted in Fig. 1, each layer k has a signal vector yk, which is propagated to the next layer by multiplying the weight matrix Wk+1, adding biases bk+1, and applying the activation function φk+1 (·) as follows: yk+1 = φk+1 (Wk+1 yk + bk+1). … In fully-connected feedforward deep neural networks, each weight matrix between two layers demands N1 ×N2 weights, where N1 and N2 are the number of units for the anterior layer and the posterior layer, respectively. Considering a network employing hidden layers with 1,024 units, each hidden layer demands about one mega weights. The number of output signals and that of biases are both N2.”).]), 
[the] determining … [are performed with respect to each of a plurality of layers included in the neural network] ([Hwang p.2 col.1 Section II. Direct Quantization with Exhaustive Search: determining an initial grouping for weights and activation signals based on their complexity, range, and quantization sensitivity, which are interpreted as measurements of information content present in the data (“A deep neural network usually contains millions of weights and thousands of internal signals. Since applying a different data format for each weight or signal is too complex, it is needed to group them according to their range and the quantization sensitivity [13]. In a deep neural network with several layers, it is convenient to separate each layer for the grouping. Among the weights in each layer, we notice that the biases need to have high precision because their range is usually much larger than that of other weights. Assigning a high precision fixed-point format, such as 8 bits, to the biases does not increase the hardware complexity much because the number of them is small. The quantization sensitivity can also be determined from simulations that apply quantized weights for a specific group while using the floating-point data type for other groups [13]. We found that the quantization sensitivity of signals in the hidden layers is mostly the same and very low, but that in the input of the network depends on applications very much.”).] [Guiasu p.63 Section 1. Introduction: grouping data based on data complexity (“entropy”), and taking into account information content and class homogeneity, where the information content and class homogeneity represents weighting factors on the entropy (“weighted entropy”); in the context of performing quantization of floating point weights and activation signals, the grouping of data into classes, factoring in information content and class homogeneity is interpreted as grouping the weights or activation signals into assigned quantization levels based on weighted entropy (“determining a weighted entropy”) (“Grouping data is a way of coping with complexity. It is well known that when the raw data are grouped in classes a certain amount of information is lost, since no distinction is made between observations falling into the same class. The larger the class interval is, the greater is the amount of information lost. On the other hand, if too many distinct classes are used, the presentation of information is somewhat misleading because conspicuous irregularities merely reflect the accidents of sampling. In the choice of a class interval a reasonable compromise must be reached between information content and class homogeneity. The aim of the paper is to show how the weighted entropy, a generalization of Shannon's entropy from information theory, may be used to balance the amount of information and the degree of homogeneity associated to a partition of data in classes.”).] [Guiasu p.65 Section 2. Information balance for weighted data: referring to equations 2.1, 2.2, 2.3, with                         
                            
                                
                                    X
                                
                                
                                    i
                                
                            
                        
                     representing groups of data elements from the data set X (i.e., a set of floating point weights or a set of floating point activations, and their corresponding values in the data set representing weight sizes or activation values), the equation for weighted entropy is based on the relative frequency p(                        
                            
                                
                                    X
                                
                                
                                    i
                                
                            
                        
                    ) of a data set X and a weighting factor w(                        
                            
                                
                                    X
                                
                                
                                    i
                                
                            
                        
                    ) based on the values of the data set, with p(                        
                            
                                
                                    X
                                
                                
                                    i
                                
                            
                        
                    ) expressed as a ratio of the number of data elements per partition/class and the total number of data elements X, and w(                        
                            
                                
                                    X
                                
                                
                                    i
                                
                            
                        
                    ) expressed as a ratio of the sum of all element values per partition/class and the total number of data elements per partition/class; 

    PNG
    media_image1.png
    433
    610
    media_image1.png
    Greyscale

in the context of performing quantization of floating point weights and activation signals, the data set X represents either the set of floating point weights or the set of floating [the] determining … [are performed with respect to each of a plurality of layers included in the neural network]”).]), 
[the] adjusting … [are performed with respect to each of a plurality of layers included in the neural network] ([Hwang p.1 col.2 2nd full paragraph (Section 1. Introduction): quantizing weights and activation signals into (+1, 0, -1) weight categories and 2- or 3-bit fixed point signals, respectively (“quantization levels assigned to the data values”) (“In this paper, we propose a high performance fixed-point optimization method that can greatly reduce the word-length of weights and signals for implementing DNNs. The proposed scheme allows design of DNNs for real-world problems only with ternary (+1, 0, and -1) weights and 2 or 3 bits of fixed-point signals.”).] [Hwang p.2 col.1 1st full paragraph (Section 1. Introduction): performing a two phase approach for quantizing data, which includes performing a direct quantization as a baseline, followed by a retraining of network using the quantized values (“The paper is organized as follows. In Section II, we describe a direct quantization approach as a baseline. Section III contains the proposed scheme that retrains the network after fixed-point quantization. Both the direct and the proposed quantization schemes are evaluated in Section IV.”).] [Hwang p.2 col.2 1st full paragraph (Section II. Direct Quantization with Exhaustive Search): performing an initial direct quantization with an optimal search, performing iterative adjustments to identify the quantization step size boundary (e.g. range of weight values for each quantization level) (“adjusting quantization levels”) to minimize the output error of the network, and re-quantize the weights to find the best quantization level (“… the optimum step size is initially determined by using an L2-error minimizing approach that is similar to Lloyd-Max quantization, and then the quantization step size is fine tuned by using exhaustive search. … To reduce the search dimension, the greedy approach is applied as follows: 1) Prepare a fully trained floating-point weights. 2) Quantize all input data and signals of hidden layers. 3) Starts with the weight quantizer between the input layer and the first hidden layer, try several step sizes around the initial step size and measure the output error of the network with the training set. The initial step size is determined using the L2-error minimizing approach. 4) Choose the step size that minimizes the output error and quantize the weights. 5) Perform the third and fourth steps for the next layer until it reaches the last layer.”).] [Guiasu p.66 Section 3. The trade-off between information and homogeneity: referring to p.65 equation 2.3 (“weighted entropy”), p.65 equation 2.6 (information content of the partition/class based on the weighted entropy), p.65 equation 2.7 (degree of homogeneity of the partition/class), and p.66 equation 3.1, one way to find an optimal number of partitions (                        
                            
                                
                                    P
                                
                                
                                    n
                                
                            
                        
                    ) is shown in equation 3.1, which involves finding the right balance (“adjusting”) of the measurement of information content I(                        
                            
                                
                                    P
                                
                                
                                    n
                                
                            
                        
                    ) and the measurement of the degree of homogeneity H                        
                            (
                            
                                
                                    P
                                
                                
                                    n
                                
                            
                            )
                        
                     within each partition; in the context of performing quantization of floating point weights and activation signals, each partition/class represents a quantization level, and finding the right balance represents adjusting quantization levels assigned to the data values, where the adjusting is based on the information content (which is based on the weighted entropy) and the degree of homogeneity (“[the] adjusting … [are performed with respect to each of a plurality of layers included in the neural network]”).]), and 
[the] quantizing are performed with respect to each of a plurality of layers included in the neural network ([Hwang p.2 col.2 1st full paragraph (Section II. Direct Quantization with Exhaustive Search): performing an initial direct quantization with an optimal search, performing iterative adjustments to identify the quantization step size boundary (e.g. range of weight values for each quantization level) to minimize the output error of the network, and re-quantize the weights to find the best quantization level (“[the] quantizing are performed with respect to each of a plurality of layers included in the neural network”) (“… the optimum step size is initially determined by using an L2-error minimizing approach that is similar to Lloyd-Max quantization, and then the quantization step size is fine tuned by using exhaustive search. … To reduce the search dimension, the greedy approach is applied as follows: 1) Prepare a fully trained floating-point weights. 2) Quantize all input data and signals of hidden layers. 3) Starts with the weight quantizer between the input layer and the first hidden layer, try several step sizes around the initial step size and measure the output error of the network with the training set. The initial step size is determined using the L2-error minimizing approach. 4) Choose the step size that minimizes the output error and quantize the weights. 5) Perform the third and fourth steps for the next layer until it reaches the last layer.”).]), 
with respective adjusted quantization levels being optimized and assigned for each of the plurality of layers ([Guiasu p.66 Section 3. The trade-off between information and homogeneity: referring to p.65 equation 2.3 (“weighted entropy”), p.65 equation 2.6 (information content of the partition/class based on the weighted entropy), p.65 equation 2.7 (degree of homogeneity of the partition/class), and p.66 equation 3.1, one way to find an optimal number of partitions (                        
                            
                                
                                    P
                                
                                
                                    n
                                
                            
                        
                    ) is shown in equation 3.1, which involves finding the right balance (“adjusting”) of the measurement of information content I(                        
                            
                                
                                    P
                                
                                
                                    n
                                
                            
                        
                    ) and the measurement of the degree of homogeneity H                        
                            (
                            
                                
                                    P
                                
                                
                                    n
                                
                            
                            )
                        
                     within each partition; in the context of performing quantization of floating point weights and activation signals, each partition/class represents a quantization level, and finding the right balance (“optimized”) represents adjusting quantization levels assigned to the data values (“with respective adjusted quantization levels being optimized and assigned for each of the plurality of layers”), where the adjusting is based on the information content (which is based on the weighted entropy) and the degree of homogeneity.]).  
Regarding Claim 13, Hwang in view of Guiasu teaches
The method of claim 1, further comprising: 
training the neural network based on the quantized data values ([Hwang p.1 col.2 2nd paragraph – p.2 col.1 (Section 1. Introduction): using retraining backpropagation algorithm to train the neural network using the quantized weights and activation signals (“training the neural network”) (“In this paper, we propose a high performance fixed-point optimization method that can greatly reduce the word-length of weights and signals for implementing DNNs. The proposed scheme allows design of DNNs for real-world problems only with ternary (+1, 0, and -1) weights and 2 or 3 bits of fixed-point signals. The developed training algorithm also retrains fixed-point networks using backpropagation but employs several effective and practical techniques, such as elaborate signal grouping through range and sensitivity analysis, quantization of both weights and signals, optimum quantization parameter search, and consideration of deep neural networks.”).] [Hwang p.3 col.1 1st - 2nd full paragraphs (Section III. Retrain with Error Backpropagation on Quantized Domain): referring to Figure 3, using the retrain backpropagation algorithm to train the neural network based on the quantized weight and activation signal values (“training the neural network based on the quantized data values”) (“The mini-batch based backpropagation algorithm [14] updates the weight wij, the synaptic strength from the unit j to the unit i, by 
    PNG
    media_image2.png
    49
    399
    media_image2.png
    Greyscale
 where E is the output error, α is the learning rate, δi is the error signal of the unit i, yj is the output signal of the unit j, and < · > averages the value over a mini-batch. Note that (3) cannot be directly applied to update the low-precision weights because the amount of update, α <δiyj >, is much smaller than the quantization step size, Δ. Thus, we also store high precision weights for adaptation. The high precision weights are used for accumulating errors and generating quantized ones. The low-precision weights are obtained by quantizing the high-precision weights and used in the forward and backward steps of the backpropagation algorithm. We further modify the backpropagation algorithm to quantize the signals or the outputs of the units. … The overall algorithm is summarized in Fig. 3.”).]); and 
implementing the trained neural network based on input data, and indicating a result of the implementation ([Hwang p.3 col.2 last paragraph – p.4 col.1 first paragraph (IV.A. Handwritten Digit Recognition): retraining a neural network with the quantization algorithm using input data from MNIST database (“implementing the trained neural network based on input data”), and evaluating the retrained network to produce a classification result and analysis of the miss classification rate (“indicating a result of the implementation”) (“1) MNIST Database: The MNIST database consists of 28 by 28 grey level images of handwritten digits. A training set has 60,000 examples and a test set has 10,000 examples. This database has been widely used for evaluation of various image classifiers … 2) Neural Network Configuration: … The input layer has 784 units, which is followed by two 500-unit and one 2,000-unit hidden layers. The output layer has 10 units, which correspond to 10 target digit labels. All layers contain logistic units. 3) Training: The network is pre-trained with unsupervised greedy RBM learning. Each RBM is trained by 50 epochs of 10-step contrastive-divergence based stochastic gradient descent with the mini-batch size of 100, the fixed learning rate of 0.1, and the momentum of 0.9. Then, we ran 100 epochs of the backpropagation with stochastic gradient descent using the mini-batch size of 100, the fixed learning rate of 0.1, and the momentum of 0.9. Also, the same parameters are used for the proposed retraining algorithm. 4) Experimental Results: The experimental results with various weight and signal quantization are summarized in TABLE I. The original miss classification rate for the test set was 0.97% with floating-point arithmetic. The direct quantization shows a miss rate of 4.28% with 3-point weights, and 1.20% with 7-point weights. On the other hand, the retraining approach shows the result that is quite close to the original one. The miss rate with 3-point weights and 3-bit signal quantization is 1.08%. Applying 8-bit signal quantization does not show significant difference compared to 3-bit signal quantization, which means 3 bits for signal word-length is enough for hidden layers.”).]).  
Regarding Claim 14, Hwang teaches
A computer-readable recording medium storing instructions, which when executed by a processor, cause the processor ([Hwang p.1 col.1 Section I. Introduction, 1st paragraph: implementing deep neural networks using hardware (VLSI) or software running on embedded computing systems (“Implementation of deep neural networks using VLSI or embedded computing systems is needed for real-time and low-power applications. … It is, therefore, very important for efficient implementations to reduce the word-length of weights and internal signals.”).] [Hwang p.5 col.2 Section V. Concluding Remarks: applying the quantization methods to hardware and software development; application of these methods on software systems requires the use of a generic computer which contains a processor and non-transitory computer-readable medium storing instructions (“which when executed by a processor, cause the processor … ”) (“We have developed a training procedure to reduce the word-length of weights and that of signals in deep neural networks. The proposed procedure yields superior results compared to the direct quantization method, especially when only 3-point (+1, 0, and -1) weights are used. The signal word-length that affects the complexity of interconnection and arithmetic units can also be reduced to 3 bits without sacrificing the performance much. We find that the performance gap between the floating-point and fixed-point networks shrinks as the number of units in each layer increases. Also it is shown that dropout can be employed together to generalize the network and further increase the performance. This research is useful for not only hardware based implementations but also real-time software development.”).]), 
to implement the method of claim 1 (This claim element is similar in scope as Claim 1, and hence is rejected under similar rationale.).  
Regarding Claim 15, Hwang teaches
A neural network apparatus, the apparatus comprising: 
a processor ([Hwang p.1 col.1 Section I. Introduction, 1st paragraph: implementing deep neural networks using hardware (VLSI) or software running on embedded computing systems (“Implementation of deep neural networks using VLSI or embedded computing systems is needed for real-time and low-power applications. … It is, therefore, very important for efficient implementations to reduce the word-length of weights and internal signals.”).] [Hwang p.5 col.2 Section V. Concluding Remarks: applying the quantization methods to hardware and software development; application of these methods on software systems requires the use of a generic computer which contains a processor and non-transitory computer-readable medium storing instructions (“a processor”) (“We have developed a training procedure to reduce the word-length of weights and that of signals in deep neural networks. The proposed procedure yields superior results compared to the direct quantization method, especially when only 3-point (+1, 0, and -1) weights are used. The signal word-length that affects the complexity of interconnection and arithmetic units can also be reduced to 3 bits without sacrificing the performance much. We find that the performance gap between the floating-point and fixed-point networks shrinks as the number of units in each layer increases. Also it is shown that dropout can be employed together to generalize the network and further increase the performance. This research is useful for not only hardware based implementations but also real-time software development.”).])
configured to: 
obtain a set of floating point data processed in a layer included in a neural network ([Hwang p.1 col.1 Abstract: floating-point-based feedforward deep neural networks with multiple hidden layers (“a layer included in a neural network”), with floating-point values for weights and activation signals (“a set of floating point data”) being quantized through direct  (“Feedforward deep neural networks that employ multiple hidden layers show high performance in many applications, but they demand complex hardware for implementation. The hardware complexity can be much lowered by minimizing the word-length of weights and signals, but direct quantization for fixed-point network design does not yield good results. We optimize the fixed-point design by employing backpropagation based retraining. The designed fixed-point networks with ternary weights (+1, 0, and -1) and 3-bit signal show only negligible performance loss when compared to the floating-point counterparts.”).] [Hwang p.1 col.1-col.2 Section 1. Introduction 2nd paragraph: extracting from each layer k a signal vector yk and weight matrix Wk (“obtain a set of floating point data processed in a layer included in a neural network”) (“In a general feedforward deep neural network with multiple hidden layers as depicted in Fig. 1, each layer k has a signal vector yk, which is propagated to the next layer by multiplying the weight matrix Wk+1, adding biases bk+1, and applying the activation function φk+1 (·) as follows: yk+1 = φk+1 (Wk+1 yk + bk+1). … In fully-connected feedforward deep neural networks, each weight matrix between two layers demands N1 ×N2 weights, where N1 and N2 are the number of units for the anterior layer and the posterior layer, respectively. Considering a network employing hidden layers with 1,024 units, each hidden layer demands about one mega weights. The number of output signals and that of biases are both N2.”).]); 
determine a … entropy based on data values included in the set of floating point data ([Hwang p.2 col.1 Section II. Direct Quantization with Exhaustive Search: determining an initial grouping for weights and activation signals based on their complexity, range, and quantization sensitivity, which are interpreted as measurements of information content (“entropy”) present in the data (“determine a … entropy based on data values included in the set of floating point data”) (“A deep neural network usually contains millions of weights and thousands of internal signals. Since applying a different data format for each weight or signal is too complex, it is needed to group them according to their range and the quantization sensitivity [13]. In a deep neural network with several layers, it is convenient to separate each layer for the grouping. Among the weights in each layer, we notice that the biases need to have high precision because their range is usually much larger than that of other weights. Assigning a high precision fixed-point format, such as 8 bits, to the biases does not increase the hardware complexity much because the number of them is small. The quantization sensitivity can also be determined from simulations that apply quantized weights for a specific group while using the floating-point data type for other groups [13]. We found that the quantization sensitivity of signals in the hidden layers is mostly the same and very low, but that in the input of the network depends on applications very much.”).]); 
adjust quantization levels assigned to the data values based on the … entropy ([Hwang p.1 col.2 2nd full paragraph (Section 1. Introduction): quantizing weights and activation signals into (+1, 0, -1) weight categories and 2- or 3-bit fixed point signals, respectively (“quantization levels assigned to the data values”) (“In this paper, we propose a high performance fixed-point optimization method that can greatly reduce the word-length of weights and signals for implementing DNNs. The proposed scheme allows design of DNNs for real-world problems only with ternary (+1, 0, and -1) weights and 2 or 3 bits of fixed-point signals.”).] [Hwang p.2 col.1 1st full paragraph (Section 1. Introduction): performing a two phase approach for quantizing data, which includes performing a direct quantization as a baseline, followed by a retraining of network using the quantized values (“The paper is organized as follows. In Section II, we describe a direct quantization approach as a baseline. Section III contains the proposed scheme that retrains the network after fixed-point quantization. Both the direct and the proposed quantization schemes are evaluated in Section IV.”).] [Hwang p.2 col.2 1st full paragraph (Section II. Direct Quantization with Exhaustive Search): performing an initial direct quantization with an optimal search, performing iterative adjustments to identify the quantization step size boundary (e.g. range of weight values for each quantization level) (“adjust quantization levels assigned to the data values based on the … entropy”) to minimize the output error of the network, and re-quantize the weights to find the best quantization level (“… the optimum step size is initially determined by using an L2-error minimizing approach that is similar to Lloyd-Max quantization, and then the quantization step size is fine tuned by using exhaustive search. … To reduce the search dimension, the greedy approach is applied as follows: 1) Prepare a fully trained floating-point weights. 2) Quantize all input data and signals of hidden layers. 3) Starts with the weight quantizer between the input layer and the first hidden layer, try several step sizes around the initial step size and measure the output error of the network with the training set. The initial step size is determined using the L2-error minimizing approach. 4) Choose the step size that minimizes the output error and quantize the weights. 5) Perform the third and fourth steps for the next layer until it reaches the last layer.”).]); and 
quantize the data values included in the set of floating point data in accordance with the adjusted quantization levels ([Hwang p.2 col.2 1st full paragraph (Section II. Direct Quantization with Exhaustive Search): performing an initial direct quantization with an optimal search, performing iterative adjustments to identify the quantization step size boundary (e.g. range of weight values for each quantization level) to minimize the output error of the network, and re-quantize the weights to find the best quantization level (“quantize the data values included in the set of floating point data in accordance with the adjusted quantization levels”) (“… the optimum step size is initially determined by using an L2-error minimizing approach that is similar to Lloyd-Max quantization, and then the quantization step size is fine tuned by using exhaustive search. … To reduce the search dimension, the greedy approach is applied as follows: 1) Prepare a fully trained floating-point weights. 2) Quantize all input data and signals of hidden layers. 3) Starts with the weight quantizer between the input layer and the first hidden layer, try several step sizes around the initial step size and measure the output error of the network with the training set. The initial step size is determined using the L2-error minimizing approach. 4) Choose the step size that minimizes the output error and quantize the weights. 5) Perform the third and fourth steps for the next layer until it reaches the last layer.”).]).  
However, Hwang does not teach
[determine] a weighted entropy [based on data values] 
…
[adjust] … based on the weighted entropy …
Guiasu teaches
[determine] a weighted entropy [based on data values] ([Guiasu p.63 Section 1. Introduction: grouping data based on data complexity (“entropy”), and taking into account information content and class homogeneity, where the information content and class homogeneity represents weighting factors on the entropy (“weighted entropy”); in the context of performing quantization of floating point weights and activation signals, the grouping of data into classes, factoring in information content and class homogeneity is interpreted as grouping the weights or activation signals into assigned quantization levels based on weighted entropy (“determine a weighted entropy”) (“Grouping data is a way of coping with complexity. It is well known that when the raw data are grouped in classes a certain amount of information is lost, since no distinction is made between observations falling into the same class. The larger the class interval is, the greater is the amount of information lost. On the other hand, if too many distinct classes are used, the presentation of information is somewhat misleading because conspicuous irregularities merely reflect the accidents of sampling. In the choice of a class interval a reasonable compromise must be reached between information content and class homogeneity. The aim of the paper is to show how the weighted entropy, a generalization of Shannon's entropy from information theory, may be used to balance the amount of information and the degree of homogeneity associated to a partition of data in classes.”).] [Guiasu p.65 Section 2. Information balance for weighted data: referring to equations 2.1, 2.2, 2.3, with                         
                            
                                
                                    X
                                
                                
                                    i
                                
                            
                        
                     representing groups of data elements from the data set X (i.e., a set of floating point weights or a set of floating point activations, and their corresponding values in the data set representing weight sizes or activation values), the equation for weighted entropy is based on the relative frequency p(                        
                            
                                
                                    X
                                
                                
                                    i
                                
                            
                        
                    ) of a data set X and a weighting factor w(                        
                            
                                
                                    X
                                
                                
                                    i
                                
                            
                        
                    ) based on the values of the data set, with p(                        
                            
                                
                                    X
                                
                                
                                    i
                                
                            
                        
                    ) expressed as a ratio of the number of data elements per partition/class and the total number of data elements X, and w(                        
                            
                                
                                    X
                                
                                
                                    i
                                
                            
                        
                    ) expressed as a ratio of the sum of all element values per partition/class and the total number of data elements per partition/class; 

    PNG
    media_image1.png
    433
    610
    media_image1.png
    Greyscale

in the context of performing quantization of floating point weights and activation signals, the data set X represents either the set of floating point weights or the set of floating point activation signals, and the values of the data set represent the weight size or activation signal value (“[determine] a weighted entropy [based on data values]”).]) …
[adjust] … based on the weighted entropy ([Guiasu p.66 Section 3. The trade-off between information and homogeneity: referring to p.65 equation 2.3 (“weighted entropy”), p.65 equation 2.6 (information content of the partition/class based on the weighted entropy), p.65 equation 2.7 (degree of homogeneity of the partition/class), and p.66 equation 3.1, one way to find an optimal number of partitions (                        
                            
                                
                                    P
                                
                                
                                    n
                                
                            
                        
                    ) is shown in equation 3.1, which involves finding the right balance (“adjust”) of the measurement of information content I(                        
                            
                                
                                    P
                                
                                
                                    n
                                
                            
                        
                    ) and the measurement of the degree of homogeneity H                        
                            (
                            
                                
                                    P
                                
                                
                                    n
                                
                            
                            )
                        
                     within each partition; in the context of performing quantization of floating point weights and activation signals, each partition/class represents a quantization level, and finding the right balance represents adjusting quantization levels assigned to the data values, where the adjusting is based on the information content (which is based on the weighted entropy) and the degree of homogeneity (“[adjust] … based on the weighted entropy”).]) …
Both Hwang and Guiasu are analogous art since both teach the partitioning of data elements in a data set.
It would have been obvious to a person having ordinary skill in the art before the effective filing date to take the step of determining entropy based on the floating point data set of Hwang and enhance it with the step of determining weighted entropy based on the floating point data set of Guiasu as a way to perform partitioning of data elements in a data set (i.e., determining quantization levels). The motivation to combine is driven by the fact that using weighted entropy to determine groupings is a known measurement that can be applied to produce an expected outcome with a reasonable chance of success, as it produces a predictable result (i.e., determining an optimal number of groups and group boundaries for any set of data elements) that can be further used in other applications (in the case of Hwang,  determining an optimal number of quantization levels and the step size/range of data elements within each quantization level). Additionally, the motivation to combine is also taught in Guiasu, ([Guiasu p.63 Section 1. Introduction: “Grouping data is a way of coping with complexity. It is well known that when the raw data are grouped in classes a certain amount of information is lost, since no distinction is made between observations falling into the same class. The larger the class interval is, the greater is the amount of information lost. On the other hand, if too many distinct classes are used, the presentation of information is somewhat misleading because conspicuous irregularities merely reflect the accidents of sampling. In the choice of a class interval a reasonable compromise must be reached between information content and class homogeneity. The aim of the paper is to show how the weighted entropy, a generalization of Shannon's entropy from information theory, may be used to balance the amount of information and the degree of homogeneity associated to a partition of data in classes.”]).
Regarding Claim 16, Hwang in view of Guiasu teaches
The apparatus of claim 15, wherein the processor is further configured to 
implement the neural network using the quantized data values and based on input data provided to the neural network (This claim limitation is similar in scope to a corresponding claim element in Claim 2, and hence is rejected under similar rationale.), and 
indicate a result of the implementation (This claim limitation is similar in scope to a corresponding claim element in Claim 2, and hence is rejected under similar rationale.).  
Regarding Claim 17, Hwang in view of Guiasu teaches
The apparatus of claim 15, wherein the weighted entropy is determined by applying a weighting factor based on determined sizes of the data values to a determined distribution of the data values included in the set of floating point data (This claim limitation is similar in scope to Claim 3, and hence is rejected under similar rationale.).  
Regarding Claim 18, Hwang in view of Guiasu teaches
The apparatus of claim 15, wherein the set of floating point data comprises a set of activations processed in the layer or a set of weights processed in the layer (This claim limitation is similar in scope to Claims 4 and 5, and hence is rejected under similar rationale.).  
Regarding Claim 19, Hwang in view of Guiasu teaches
The apparatus of claim 15, wherein, 
the set of floating point data is a set of weights (This claim limitation is similar in scope to a corresponding claim element in Claim 6, and hence is rejected under similar rationale.), and
the processor is further configured to:
group the set of weights into a plurality of clusters (This claim limitation is similar in scope to a corresponding claim element in Claim 6, and hence is rejected under similar rationale.); 
determine respective relative frequencies for each of the grouped clusters by [respective] grouped clusters by a total number of weights included in the set of weights (This claim limitation is similar in scope to a corresponding claim element in Claim 6, and hence is rejected under similar rationale.); 
determine respective representative importances of each of the grouped clusters based on sizes of weights included in each of the grouped clusters (This claim limitation is similar in scope to a corresponding claim element in Claim 6, and hence is rejected under similar rationale.); and 
determine the weighted entropy based on the respective relative frequencies and the respective representative importances (This claim limitation is similar in scope to a corresponding claim element in Claim 6, and hence is rejected under similar rationale.).  
Regarding Claim 20, Hwang in view of Guiasu teaches
The apparatus of claim 19, wherein the processor is further configured to: 
determine respective weights corresponding to the respective representative importances of each of the grouped clusters as a corresponding representative weight for each of the grouped clusters (This claim limitation is similar in scope to a corresponding claim element in Claim 7, and hence is rejected under similar rationale.); and 
quantize the weights included in each of the grouped clusters respectively into the corresponding representative weight for each of the grouped clusters (This claim limitation is similar in scope to a corresponding claim element in Claim 7, and hence is rejected under similar rationale.).  
Regarding Claim 21, Hwang in view of Guiasu teaches
The apparatus of claim 19, wherein the processor is further configured to 
adjust the quantization levels assigned to the data values by adjusting boundaries of each of the clusters in a direction that increases the weighted entropy (This claim limitation is similar in scope to a corresponding claim element in Claim 8, and hence is rejected under similar rationale.).  
Regarding Claim 25, Hwang in view of Guiasu teaches
The apparatus of claim 15, wherein the processor is further configured to 
perform the obtaining [with respect to each of a plurality of layers included in the neural network] (This claim limitation is similar in scope to a corresponding claim element in Claim 12, and hence is rejected under similar rationale.), 
[perform the] … determining [with respect to each of a plurality of layers included in the neural network] (This claim limitation is similar in scope to a corresponding claim element in Claim 12, and hence is rejected under similar rationale.), 
[perform the] … adjusting [with respect to each of a plurality of layers included in the neural network] (This claim limitation is similar in scope to a corresponding claim element in Claim 12, and hence is rejected under similar rationale.), and 
[perform the] … quantizing with respect to each of a plurality of layers included in the neural network (This claim limitation is similar in scope to a corresponding claim element in Claim 12, and hence is rejected under similar rationale.), 
with respective adjusted quantization levels being optimized and assigned for each of the plurality of layers (This claim limitation is similar in scope to a corresponding claim element in Claim 12, and hence is rejected under similar rationale.).  
Regarding Claim 26, Hwang in view of Guiasu teaches
The apparatus of claim 15, further comprising 
a non-transitory memory storing instructions, which when executed by the processor, control the processor ([Hwang p.1 col.1 Section I. Introduction, 1st paragraph: implementing deep neural networks using hardware (VLSI) or software running on embedded computing systems (“Implementation of deep neural networks using VLSI or embedded computing systems is needed for real-time and low-power applications. … It is, therefore, very important for efficient implementations to reduce the word-length of weights and internal signals.”).] [Hwang p.5 col.2 Section V. Concluding Remarks: applying the quantization methods to hardware and software development; application of these methods on software systems requires the use of a generic computer which contains a processor and non-transitory computer-readable medium storing instructions (“a non-transitory memory storing instructions, which when executed by a processor, control the processor … ”) (“We have developed a training procedure to reduce the word-length of weights and that of signals in deep neural networks. The proposed procedure yields superior results compared to the direct quantization method, especially when only 3-point (+1, 0, and -1) weights are used. The signal word-length that affects the complexity of interconnection and arithmetic units can also be reduced to 3 bits without sacrificing the performance much. We find that the performance gap between the floating-point and fixed-point networks shrinks as the number of units in each layer increases. Also it is shown that dropout can be employed together to generalize the network and further increase the performance. This research is useful for not only hardware based implementations but also real-time software development.”).])
to implement the obtaining ([Hwang p.1 col.1-col.2 Section 1. Introduction 2nd paragraph: extracting from each layer k a signal vector yk and weight matrix Wk (“to implement the obtaining”) (“In a general feedforward deep neural network with multiple hidden layers as depicted in Fig. 1, each layer k has a signal vector yk, which is propagated to the next layer by multiplying the weight matrix Wk+1, adding biases bk+1, and applying the activation function φk+1 (·) as follows: yk+1 = φk+1 (Wk+1 yk + bk+1). … In fully-connected feedforward deep neural networks, each weight matrix between two layers demands N1 ×N2 weights, where N1 and N2 are the number of units for the anterior layer and the posterior layer, respectively. Considering a network employing hidden layers with 1,024 units, each hidden layer demands about one mega weights. The number of output signals and that of biases are both N2.”).]), 
[to implement the] … determining ([Hwang p.2 col.1 Section II. Direct Quantization with Exhaustive Search: determining an initial grouping for weights and activation signals based on their complexity, range, and quantization sensitivity, which are interpreted as measurements of information content present in the data (“A deep neural network usually contains millions of weights and thousands of internal signals. Since applying a different data format for each weight or signal is too complex, it is needed to group them according to their range and the quantization sensitivity [13]. In a deep neural network with several layers, it is convenient to separate each layer for the grouping. Among the weights in each layer, we notice that the biases need to have high precision because their range is usually much larger than that of other weights. Assigning a high precision fixed-point format, such as 8 bits, to the biases does not increase the hardware complexity much because the number of them is small. The quantization sensitivity can also be determined from simulations that apply quantized weights for a specific group while using the floating-point data type for other groups [13]. We found that the quantization sensitivity of signals in the hidden layers is mostly the same and very low, but that in the input of the network depends on applications very much.”).] [Guiasu p.63 Section 1. Introduction: grouping data based on data complexity, and taking into account information content and class homogeneity, where the information content and class homogeneity represents weighting factors on the entropy; in the context of performing quantization of floating point weights and activation signals, the grouping of data into classes, factoring in information content and class homogeneity is interpreted as grouping the weights or activation signals into assigned quantization levels based on weighted entropy (“Grouping data is a way of coping with complexity. It is well known that when the raw data are grouped in classes a certain amount of information is lost, since no distinction is made between observations falling into the same class. The larger the class interval is, the greater is the amount of information lost. On the other hand, if too many distinct classes are used, the presentation of information is somewhat misleading because conspicuous irregularities merely reflect the accidents of sampling. In the choice of a class interval a reasonable compromise must be reached between information content and class homogeneity. The aim of the paper is to show how the weighted entropy, a generalization of Shannon's entropy from information theory, may be used to balance the amount of information and the degree of homogeneity associated to a partition of data in classes.”).] [Guiasu p.65 Section 2. Information balance for weighted data: referring to equations 2.1, 2.2, 2.3, with                         
                            
                                
                                    X
                                
                                
                                    i
                                
                            
                        
                     representing groups of data elements from the data set X (i.e., a set of floating point weights or a set of floating point activations, and their corresponding values in the data set representing                         
                            
                                
                                    X
                                
                                
                                    i
                                
                            
                        
                    ) of a data set X and a weighting factor w(                        
                            
                                
                                    X
                                
                                
                                    i
                                
                            
                        
                    ) based on the values of the data set, with p(                        
                            
                                
                                    X
                                
                                
                                    i
                                
                            
                        
                    ) expressed as a ratio of the number of data elements per partition/class and the total number of data elements X, and w(                        
                            
                                
                                    X
                                
                                
                                    i
                                
                            
                        
                    ) expressed as a ratio of the sum of all element values per partition/class and the total number of data elements per partition/class; 

    PNG
    media_image1.png
    433
    610
    media_image1.png
    Greyscale

in the context of performing quantization of floating point weights and activation signals, the data set X represents either the set of floating point weights or the set of floating point activation signals, and the values of the data set represent the weight size or activation signal value (“[to implement the] … determining”).]), 
[to implement the] … adjusting ([Hwang p.1 col.2 2nd full paragraph (Section 1. Introduction): quantizing weights and activation signals into (+1, 0, -1) weight categories and 2- or 3-bit fixed point signals, respectively (“In this paper, we propose a high performance fixed-point optimization method that can greatly reduce the word-length of weights and signals for implementing DNNs. The proposed scheme allows design of DNNs for real-world problems only with ternary (+1, 0, and -1) weights and 2 or 3 bits of fixed-point signals.”).] [Hwang p.2 col.1 1st full paragraph (Section 1. Introduction): performing a two  (“The paper is organized as follows. In Section II, we describe a direct quantization approach as a baseline. Section III contains the proposed scheme that retrains the network after fixed-point quantization. Both the direct and the proposed quantization schemes are evaluated in Section IV.”).] [Hwang p.2 col.2 1st full paragraph (Section II. Direct Quantization with Exhaustive Search): performing an initial direct quantization with an optimal search, performing iterative adjustments to identify the quantization step size boundary (e.g. range of weight values for each quantization level) to minimize the output error of the network, and re-quantize the weights to find the best quantization level (“… the optimum step size is initially determined by using an L2-error minimizing approach that is similar to Lloyd-Max quantization, and then the quantization step size is fine tuned by using exhaustive search. … To reduce the search dimension, the greedy approach is applied as follows: 1) Prepare a fully trained floating-point weights. 2) Quantize all input data and signals of hidden layers. 3) Starts with the weight quantizer between the input layer and the first hidden layer, try several step sizes around the initial step size and measure the output error of the network with the training set. The initial step size is determined using the L2-error minimizing approach. 4) Choose the step size that minimizes the output error and quantize the weights. 5) Perform the third and fourth steps for the next layer until it reaches the last layer.”).] [Guiasu p.66 Section 3. The trade-off between information and homogeneity: referring to p.65 equation 2.3 (“weighted entropy”), p.65 equation 2.6 (information content of the partition/class based on the weighted entropy), p.65 equation 2.7 (degree of homogeneity of the partition/class), and p.66 equation 3.1, one way to find an optimal number of partitions (                        
                            
                                
                                    P
                                
                                
                                    n
                                
                            
                        
                    ) is shown in equation 3.1, which involves finding the right balance of the measurement of information content I(                        
                            
                                
                                    P
                                
                                
                                    n
                                
                            
                        
                    ) and the measurement of the degree of H                        
                            (
                            
                                
                                    P
                                
                                
                                    n
                                
                            
                            )
                        
                     within each partition; in the context of performing quantization of floating point weights and activation signals, each partition/class represents a quantization level, and finding the right balance represents adjusting quantization levels assigned to the data values, where the adjusting is based on the information content (which is based on the weighted entropy) and the degree of homogeneity (“[to implement the] … adjusting”).]), and 
[to implement the] … quantizing ([Hwang p.2 col.2 1st full paragraph (Section II. Direct Quantization with Exhaustive Search): performing an initial direct quantization with an optimal search, performing iterative adjustments to identify the quantization step size boundary (e.g. range of weight values for each quantization level) to minimize the output error of the network, and re-quantize the weights to find the best quantization level (“[to implement the] … quantizing”) (“… the optimum step size is initially determined by using an L2-error minimizing approach that is similar to Lloyd-Max quantization, and then the quantization step size is fine tuned by using exhaustive search. … To reduce the search dimension, the greedy approach is applied as follows: 1) Prepare a fully trained floating-point weights. 2) Quantize all input data and signals of hidden layers. 3) Starts with the weight quantizer between the input layer and the first hidden layer, try several step sizes around the initial step size and measure the output error of the network with the training set. The initial step size is determined using the L2-error minimizing approach. 4) Choose the step size that minimizes the output error and quantize the weights. 5) Perform the third and fourth steps for the next layer until it reaches the last layer.”).]).  
Claims 9-11 and 22-24 are rejected under 35 U.S.C. 103 as being unpatentable over Hwang et al., Fixed-point feedforward deep neural network design using weights +1, 0, and -1, 2014 IEEE Workshop on Signal Processing Systems SiPS, 2014, 6 pages [hereafter referred as Hwang] in view of Guiasu, Silviu, Grouping Data by Using the Weighted Entropy, Journal of Statistical Planning and Inference 15 (1986) Elsevier Science Publishers B.V., 1986, pp.63-69 .
Regarding Claim 9, Hwang in view of Guiasu as applied to Claim 1 teaches
The method of claim 1, wherein, 
the set of floating point data is a set of activations ([Hwang p.1 col.1-col.2 Section 1. Introduction 2nd paragraph: extracting from each layer k a signal vector yk (“the set of floating point data is a set of activations”) and weight matrix Wk (“In a general feedforward deep neural network with multiple hidden layers as depicted in Fig. 1, each layer k has a signal vector yk, which is propagated to the next layer by multiplying the weight matrix Wk+1, adding biases bk+1, and applying the activation function φk+1 (·) as follows: yk+1 = φk+1 (Wk+1 yk + bk+1). … In fully-connected feedforward deep neural networks, each weight matrix between two layers demands N1 ×N2 weights, where N1 and N2 are the number of units for the anterior layer and the posterior layer, respectively. Considering a network employing hidden layers with 1,024 units, each hidden layer demands about one mega weights. The number of output signals and that of biases are both N2.”).], and 
the quantization levels are assigned using an entropy-based … data representation-based quantization method ([Hwang p.2 col.1 Section II. Direct Quantization with Exhaustive Search: determining an initial grouping (“quantization levels”) for weights and activation signals based on their complexity, range, and quantization sensitivity, which are interpreted as measurements of information content (“entropy”) present in the data (“the quantization levels are assigned using an entropy-based … data representation-based quantization method”) (“A deep neural network usually contains millions of weights and thousands of internal signals. Since applying a different data format for each weight or signal is too complex, it is needed to group them according to their range and the quantization sensitivity [13]. In a deep neural network with several layers, it is convenient to separate each layer for the grouping. Among the weights in each layer, we notice that the biases need to have high precision because their range is usually much larger than that of other weights. Assigning a high precision fixed-point format, such as 8 bits, to the biases does not increase the hardware complexity much because the number of them is small. The quantization sensitivity can also be determined from simulations that apply quantized weights for a specific group while using the floating-point data type for other groups [13]. We found that the quantization sensitivity of signals in the hidden layers is mostly the same and very low, but that in the input of the network depends on applications very much.”).]), 
wherein the determining of the weighted entropy comprises: 
determining respective relative frequencies for each of the quantization levels by  total number of activations included in each of the [respective] quantization levels by a total number of activations included in the set of activations ([Guiasu p.65 Section 2. Information balance for weighted data: referring to equations 2.1, 2.2, 2.3, and in the context of performing quantization of floating point activations, with                 
                    
                        
                            X
                        
                        
                            i
                        
                    
                
             representing groups of data elements from the data set X (“quantization levels”) (i.e., a set of floating point activations, and their corresponding values in the data set representing activation values), the equation for weighted entropy is based on the relative frequency p(                
                    
                        
                            X
                        
                        
                            i
                        
                    
                
            ) (“respective relative frequencies”) of a data set X and a weighting factor w(                
                    
                        
                            X
                        
                        
                            i
                        
                    
                
            ) based on the values of the data set, with p(                
                    
                        
                            X
                        
                        
                            i
                        
                    
                
            ) expressed as a ratio of the number of data elements per partition/class and the total number of data elements X (“determining respective relative frequencies for each of the quantization levels by ”), and w(                
                    
                        
                            X
                        
                        
                            i
                        
                    
                
            ) expressed as a ratio of the sum of 
    PNG
    media_image1.png
    433
    610
    media_image1.png
    Greyscale
	]); 
determining respective data values corresponding to each of the quantization levels as respective representative importances of each of the quantization levels ([Guiasu p.65 Section 2. Information balance for weighted data: referring to equations 2.1, 2.2, 2.3, and in the context of performing quantization of floating point weights, with                 
                    
                        
                            X
                        
                        
                            i
                        
                    
                
             representing groups of data elements from the data set X (“quantization levels”) (i.e., a set of floating point activations, and their corresponding values in the data set representing activation values), the equation for weighted entropy is based on the relative frequency p(                
                    
                        
                            X
                        
                        
                            i
                        
                    
                
            ) of a data set X and a weighting factor w(                
                    
                        
                            X
                        
                        
                            i
                        
                    
                
            ) based on the values of the data set (“respective representative importances”), with p(                
                    
                        
                            X
                        
                        
                            i
                        
                    
                
            ) expressed as a ratio of the number of data elements per partition/class and the total number of data elements X, and w(                
                    
                        
                            X
                        
                        
                            i
                        
                    
                
            ) expressed as a ratio of the sum of all element values per partition/class and the total number of data elements per partition/class (“determining respective data values corresponding to each of the quantization levels as respective representative importances of each of the quantization levels”).
    PNG
    media_image1.png
    433
    610
    media_image1.png
    Greyscale
	]); and 
determining the weighted entropy based on the respective relative frequencies and the respective representative importances ([Guiasu p.65 Section 2. Information balance for weighted data: referring to equations 2.1, 2.2, 2.3, and in the context of performing quantization of floating point activations, with                 
                    
                        
                            X
                        
                        
                            i
                        
                    
                
             representing groups of data elements from the data set X (“quantization levels”) (i.e., a set of floating point activations, and their corresponding values in the data set representing activation values), the equation for weighted entropy is based on the relative frequency p(                
                    
                        
                            X
                        
                        
                            i
                        
                    
                
            ) of a data set X (“respective relative frequencies”) and a weighting factor w(                
                    
                        
                            X
                        
                        
                            i
                        
                    
                
            ) based on the values of the data set (“respective representative importances”), with p(                
                    
                        
                            X
                        
                        
                            i
                        
                    
                
            ) expressed as a ratio of the number of data elements per partition/class and the total number of data elements X, and w(                
                    
                        
                            X
                        
                        
                            i
                        
                    
                
            ) expressed as a ratio of the sum of all element values per partition/class and the total number of data elements per partition/class; hence, the weighted entropy in equation 2.3 is a function of the respective relative frequencies pk and relative importances wk (“determining the weighted entropy based on the respective relative frequencies and the respective representative importances”).]).
Hwang in view of Guiasu does not teach
… using an entropy-based logarithm data representation-based quantization method, 
Miyashita teaches
… using an entropy-based logarithm data representation-based quantization method ([Miyashita p.1 col.1 Abstract: representing weights and activations in a neural network by using non-uniform logarithmic representation (“… it is now well-known that the arithmetic operations of deep networks can be encoded down to 8-bit fixed-point without significant deterioration in performance. However, further reduction in precision down to as low as 3-bit fixed-point results in significant losses in performance. In this paper we propose a new data representation that enables state-of-the-art networks to be encoded to 3 bits with negligible loss in classification performance. To perform this, we take advantage of the fact that the weights and activations in a trained network naturally have non-uniform distributions. Using non-uniform, base-2 logarithmic representation to encode weights, communicate activations, and perform dot-products enables networks to 1) achieve higher classification accuracies than fixed-point at the same resolution and 2) eliminate bulky digital multipliers.”).] [Miyashita p.4 col.1-col.2 Section 4.1. Logarithmic Representation of Activations: referring to equations 5, 6, and 7, quantizing the activations using base-2 log to define the quantization levels                 
                    
                        
                            x
                        
                        ^
                    
                
            , according to the method described in Section 3.1 (“… using an entropy-based logarithm data representation-based quantization method”) (“In similar spirit to that of (Gupta et al., 2015), we describe the logarithmic quantization layer LogQuant that performs the element-wise operation as follows: 
    PNG
    media_image3.png
    250
    539
    media_image3.png
    Greyscale
 
These layers perform the logarithmic quantization and computation as detailed in Section 3.1.”).]), 
Hwang in view of Guiasu and Miyashita are analogous art since both teach methods of quantizing activations in neural networks.
It would have been obvious to a person having ordinary skill in the art before the effective filing date to take the entropy-based data representation-based quantization method of Hwang in view of Guiasu and enhance it with the entropy-based logarithm data representation-based quantization method of Miyashita as a way to quantize activations in neural networks. The motivation to combine is taught in Miyashita, as quantization introduces a form of compression of the data set without significant deterioration in performance of the neural network, thus allowing for these computations to be performed on lower-precision platforms such as mobile or embedded platforms, expanding the range of applications that use neural networks ([Miyashita p.1 col.1 Abstract: “Recent advances in convolutional neural networks have considered model complexity and hardware efficiency to enable deployment onto embedded systems and mobile devices. For example, it is now well-known that the arithmetic operations of deep networks can be encoded down to 8-bit fixed-point without significant deterioration in performance. However, further reduction in precision down to as low as 3-bit fixed-point results in significant losses in performance. In this paper we propose a new data representation that enables state-of-the-art networks to be encoded to 3 bits with negligible loss in classification performance.”] [Miyashita p.1 col.2 Section 1. Introduction: “In order for these large networks to run in real-time applications such as for mobile or embedded platforms, it is often necessary to use low-precision arithmetic and apply compression techniques.”] [Miyashita p.5 col.1 Section 4.1 Logarithmic Representation of Activations: “Using only 3 bits to represent the activations for both logarithmic and linear quantizations, the top-5 accuracy is still very close to that of the original, unquantized model encoded at floating-point 32b. However, logarithmic representations tolerate a large dynamic range of FSRs. For example, using 4b log, we can obtain 3 order of magnitude variations in the full scale without a significant loss of top-5 accuracy.”]).
Regarding Claim 10, Hwang in view of Guiasu, in further view of Miyashita teaches
The method of claim 9, wherein the adjusting comprises 
adjusting the quantization levels assigned to the respective data values by adjusting a value corresponding to a first quantization level among the quantization levels and a size of an interval between the quantization levels in a direction of increasing the weighted entropy ([Miyashita p.4 col.1-col.2 Section 4.1 Logarithmic Representation of Activations: referring to equations 5, 6, 7, and Table 2, using FSR parameter (full scale range) to handle the variation of activation ranges from layer to layer (“adjusting the quantization levels assigned to the respective data values by adjusting a value corresponding to a first quantization level among the quantization levels”) (“Tables 1 and 2 illustrate the addition of these layers to the models. The quantizer has a specified full scale range, and this range in linear scale is                 
                    
                        
                            2
                        
                        
                            F
                            S
                            R
                        
                    
                
            , where we express this as simply FSR … The FSR values for each layer are shown in Tables 1 and 2; they show fsr added by an offset parameter. This offset parameter is chosen to properly handle the variations of activation ranges from layer to layer using 100 images from the training set. The fsr is a parameter which is global to the network and is tuned to perform the experiments to measure the effect of FSR on classification accuracy.”).] [Miyashita p.6 col.2-p.7 col.1 Section 4.3 Logarithmic Representation of Weights of Convolutional Layers: referring to Figure 6, changing the log base from base-2 to base-√2 moves towards finer quantization granularity (“adjusting the quantization levels assigned to the respective data values by adjusting … a size of an interval between the quantization levels”), with a finer quantization granularity resulting in narrowing the distributions of elements within each quantization level, thus increasing the weighted entropy (“in a direction of increasing the weighted entropy”) (“We now represent the convolutional layers using the same procedure. We keep the representation of activations at 4b log and the representation of weights of FC layers at 4b log, and compare our log method with the linear reference and ideal floating point. We also perform the dot products using two different bases: 2; √2. Note that there is no additional overhead for log base-√2 as it is computed with the same equation shown in Equation 4. … Table 5 shows the classification results. The results illustrate an approximate 6% drop in performance from floating point down to 5b base-2 but a relatively minor 1:7% drop for 5b base-√2. … The distributions of quantization errors for both 5b base-2 and 5b base-√2 are shown in Figure 6. The total quantization error on the weights,                 
                    
                        
                            1/N ||Quantize(x)-x||
                        
                        
                            1
                        
                    
                
            , where, x is the vectorized weights of size N, is 2x smaller for base- √2 than for base-2.”).]).  
Regarding Claim 11, Hwang in view of Guiasu, in further view of Miyashita teaches
The method of claim 9, wherein the adjusting comprises 
adjusting a log base, which is controlling of the quantization levels, in a direction that maximizes the weighted entropy ([Miyashita p.6 col.2-p.7 col.1 Section 4.3 Logarithmic Representation of Weights of Convolutional Layers: referring to Figure 6, changing the log base from base-2 to base-√2 moves towards finer quantization granularity (“adjusting a log base, which is controlling of the quantization levels”), with a finer quantization granularity resulting in narrowing the distributions of elements within each quantization level, thus increasing the weighted entropy (“in a direction that maximizes the weighted entropy”) (“We now represent the convolutional layers using the same procedure. We keep the representation of activations at 4b log and the representation of weights of FC layers at 4b log, and compare our log method with the linear reference and ideal floating point. We also perform the dot products using two different bases: 2; √2. Note that there is no additional overhead for log base-√2 as it is computed with the same equation shown in Equation 4. … Table 5 shows the classification results. The results illustrate an approximate 6% drop in performance from floating point down to 5b base-2 but a relatively minor 1:7% drop for 5b base-√2. … The distributions of quantization errors for both 5b base-2 and 5b base-√2 are shown in Figure 6. The total quantization error on the weights,                 
                    
                        
                            1/N ||Quantize(x)-x||
                        
                        
                            1
                        
                    
                
            , where, x is the vectorized weights of size N, is 2x smaller for base- √2 than for base-2.”).]).  
Regarding Claim 22, Hwang in view of Guiasu as applied to Claim 15 teaches
The apparatus of claim 15, wherein, 
the set of floating point data is a set of activations ([Hwang p.1 col.1-col.2 Section 1. Introduction 2nd paragraph: extracting from each layer k a signal vector yk (“the set of floating point data is a set of activations”) and weight matrix Wk (“In a general feedforward deep neural network with multiple hidden layers as depicted in Fig. 1, each layer k has a signal vector yk, which is propagated to the next layer by multiplying the weight matrix Wk+1, adding biases bk+1, and applying the activation function φk+1 (·) as follows: yk+1 = φk+1 (Wk+1 yk + bk+1). … In fully-connected feedforward deep neural networks, each weight matrix between two layers demands N1 ×N2 weights, where N1 and N2 are the number of units for the anterior layer and the posterior layer, respectively. Considering a network employing hidden layers with 1,024 units, each hidden layer demands about one mega weights. The number of output signals and that of biases are both N2.”).], and 
the quantization levels are assigned using an entropy-based … data representation-based quantization method ([Hwang p.2 col.1 Section II. Direct Quantization with Exhaustive Search: determining an initial grouping (“quantization levels”) for weights and activation signals based on their complexity, range, and quantization sensitivity, which are interpreted as measurements of information content (“entropy”) present in the data (“the quantization levels are assigned using an entropy-based … data representation-based quantization method”) (“A deep neural network usually contains millions of weights and thousands of internal signals. Since applying a different data format for each weight or signal is too complex, it is needed to group them according to their range and the quantization sensitivity [13]. In a deep neural network with several layers, it is convenient to separate each layer for the grouping. Among the weights in each layer, we notice that the biases need to have high precision because their range is usually much larger than that of other weights. Assigning a high precision fixed-point format, such as 8 bits, to the biases does not increase the hardware complexity much because the number of them is small. The quantization sensitivity can also be determined from simulations that apply quantized weights for a specific group while using the floating-point data type for other groups [13]. We found that the quantization sensitivity of signals in the hidden layers is mostly the same and very low, but that in the input of the network depends on applications very much.”).]), and 
the processor is further configured to: 
determine respective relative frequencies for each of the quantization levels by  total number of activations included in each of the [respective] quantization levels by a total number of activations included in the set of activations ([Guiasu p.65 Section 2. Information balance for weighted data: referring to equations 2.1, 2.2, 2.3, and in the context of performing quantization of floating point activations, with                 
                    
                        
                            X
                        
                        
                            i
                        
                    
                
             representing groups of data elements from the data set X (“quantization levels”) (i.e., a set of floating point activations, and their corresponding values in the data set representing activation values), the equation for weighted entropy is based on the relative frequency p(                
                    
                        
                            X
                        
                        
                            i
                        
                    
                
            ) (“respective relative frequencies”) of a data set X and a weighting factor w(                
                    
                        
                            X
                        
                        
                            i
                        
                    
                
            ) based on the values of the data set, with p(                
                    
                        
                            X
                        
                        
                            i
                        
                    
                
            ) expressed as a ratio of the number of data elements per partition/class and the total number of data elements X (“determine respective relative frequencies for each of the quantization levels by ”), and w(                
                    
                        
                            X
                        
                        
                            i
                        
                    
                
            ) expressed as a ratio of the sum of 
    PNG
    media_image1.png
    433
    610
    media_image1.png
    Greyscale
	]); 
determine respective data values corresponding to each of the quantization levels as respective representative importances of each of the quantization levels ([Guiasu p.65 Section 2. Information balance for weighted data: referring to equations 2.1, 2.2, 2.3, and in the context of performing quantization of floating point weights, with                 
                    
                        
                            X
                        
                        
                            i
                        
                    
                
             representing groups of data elements from the data set X (“quantization levels”) (i.e., a set of floating point activations, and their corresponding values in the data set representing activation values), the equation for weighted entropy is based on the relative frequency p(                
                    
                        
                            X
                        
                        
                            i
                        
                    
                
            ) of a data set X and a weighting factor w(                
                    
                        
                            X
                        
                        
                            i
                        
                    
                
            ) based on the values of the data set (“respective representative importances”), with p(                
                    
                        
                            X
                        
                        
                            i
                        
                    
                
            ) expressed as a ratio of the number of data elements per partition/class and the total number of data elements X, and w(                
                    
                        
                            X
                        
                        
                            i
                        
                    
                
            ) expressed as a ratio of the sum of all element values per partition/class and the total number of data elements per partition/class (“determine respective data values corresponding to each of the quantization levels as respective representative importances of each of the quantization levels”).
    PNG
    media_image1.png
    433
    610
    media_image1.png
    Greyscale
	]); and 
determine the weighted entropy based on the respective relative frequencies and the respective representative importances ([Guiasu p.65 Section 2. Information balance for weighted data: referring to equations 2.1, 2.2, 2.3, and in the context of performing quantization of floating point activations, with                 
                    
                        
                            X
                        
                        
                            i
                        
                    
                
             representing groups of data elements from the data set X (“quantization levels”) (i.e., a set of floating point activations, and their corresponding values in the data set representing activation values), the equation for weighted entropy is based on the relative frequency p(                
                    
                        
                            X
                        
                        
                            i
                        
                    
                
            ) of a data set X (“respective relative frequencies”) and a weighting factor w(                
                    
                        
                            X
                        
                        
                            i
                        
                    
                
            ) based on the values of the data set (“respective representative importances”), with p(                
                    
                        
                            X
                        
                        
                            i
                        
                    
                
            ) expressed as a ratio of the number of data elements per partition/class and the total number of data elements X, and w(                
                    
                        
                            X
                        
                        
                            i
                        
                    
                
            ) expressed as a ratio of the sum of all element values per partition/class and the total number of data elements per partition/class; hence, the weighted entropy in equation 2.3 is a function of the respective relative frequencies pk and relative importances wk (“determine the weighted entropy based on the respective relative frequencies and the respective representative importances”).]).
Hwang in view of Guiasu does not teach
… using an entropy-based logarithm data representation-based quantization method, 
Miyashita teaches
… using an entropy-based logarithm data representation-based quantization method ([Miyashita p.1 col.1 Abstract: representing weights and activations in a neural network by using non-uniform logarithmic representation (“… it is now well-known that the arithmetic operations of deep networks can be encoded down to 8-bit fixed-point without significant deterioration in performance. However, further reduction in precision down to as low as 3-bit fixed-point results in significant losses in performance. In this paper we propose a new data representation that enables state-of-the-art networks to be encoded to 3 bits with negligible loss in classification performance. To perform this, we take advantage of the fact that the weights and activations in a trained network naturally have non-uniform distributions. Using non-uniform, base-2 logarithmic representation to encode weights, communicate activations, and perform dot-products enables networks to 1) achieve higher classification accuracies than fixed-point at the same resolution and 2) eliminate bulky digital multipliers.”).] [Miyashita p.4 col.1-col.2 Section 4.1. Logarithmic Representation of Activations: referring to equations 5, 6, and 7, quantizing the activations using base-2 log to define the quantization levels                 
                    
                        
                            x
                        
                        ^
                    
                
            , according to the method described in Section 3.1 (“… using an entropy-based logarithm data representation-based quantization method”) (“In similar spirit to that of (Gupta et al., 2015), we describe the logarithmic quantization layer LogQuant that performs the element-wise operation as follows: 
    PNG
    media_image3.png
    250
    539
    media_image3.png
    Greyscale
 
These layers perform the logarithmic quantization and computation as detailed in Section 3.1.”).]), 
Hwang in view of Guiasu and Miyashita are analogous art since both teach methods of quantizing activations in neural networks.
It would have been obvious to a person having ordinary skill in the art before the effective filing date to take the entropy-based data representation-based quantization method of Hwang in view of Guiasu and enhance it with the entropy-based logarithm data representation-based quantization method of Miyashita as a way to quantize activations in neural networks. The motivation to combine is taught in Miyashita, as quantization introduces a form of compression of the data set without significant deterioration in performance of the neural network, thus allowing for these computations to be performed on lower-precision platforms such as mobile or embedded platforms, expanding the range of applications that use neural networks ([Miyashita p.1 col.1 Abstract: “Recent advances in convolutional neural networks have considered model complexity and hardware efficiency to enable deployment onto embedded systems and mobile devices. For example, it is now well-known that the arithmetic operations of deep networks can be encoded down to 8-bit fixed-point without significant deterioration in performance. However, further reduction in precision down to as low as 3-bit fixed-point results in significant losses in performance. In this paper we propose a new data representation that enables state-of-the-art networks to be encoded to 3 bits with negligible loss in classification performance.”] [Miyashita p.1 col.2 Section 1. Introduction: “In order for these large networks to run in real-time applications such as for mobile or embedded platforms, it is often necessary to use low-precision arithmetic and apply compression techniques.”] [Miyashita p.5 col.1 Section 4.1 Logarithmic Representation of Activations: “Using only 3 bits to represent the activations for both logarithmic and linear quantizations, the top-5 accuracy is still very close to that of the original, unquantized model encoded at floating-point 32b. However, logarithmic representations tolerate a large dynamic range of FSRs. For example, using 4b log, we can obtain 3 order of magnitude variations in the full scale without a significant loss of top-5 accuracy.”]).
Regarding Claim 23, Hwang in view of Guiasu, in further view of Miyashita teaches
The apparatus of claim 22, wherein the processor is further configured to 
adjust the quantization levels assigned to the respective data values by adjusting a value corresponding to a first quantization level among the quantization levels and a size of an interval between the quantization levels in a direction of increasing the weighted entropy (This claim limitation is similar in scope to a corresponding claim element in Claim 10, and hence is rejected under similar rationale.).  
Regarding Claim 24, Hwang in view of Guiasu, in further view of Miyashita teaches
The apparatus of claim 22, wherein the processor is further configured to 
adjust the quantization levels by adjusting a log base, which is controlling of the quantization levels, in a direction that maximizes the weighted entropy (This claim limitation is similar in scope to a corresponding claim element in Claim 11, and hence is rejected under similar rationale.).  

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to WILLIAM WAI YIN KWAN whose telephone number is 303-297-4332.  The examiner can normally be reached on Monday-Friday 8:00am - 4:30pm PT.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.

Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/WILLIAM WAI YIN KWAN/
Examiner, Art Unit 2121



/Li B. Zhen/Supervisory Patent Examiner, Art Unit 2121