Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation(s) is/are:
“hardware logic configured to” in claims 19 and 20.
“DNN configured to” in claims 1 and 11.
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.

Claims 1-20 rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.

Regarding claim 1, “Forming a higher number of disjoint subsets” is indefinite.  One of ordinary skill in the art would not be able to determine whether the higher number of disjoint subsets is relative to the previous number of disjoint subsets, or whether the higher number is relative to the threshold number of layers, in which case a higher number of disjoint subsets could actually be less than the previous number of disjoint subsets, which would contradict the previous interpretation.  Forming a higher number is a relative metric against which there is no comparative baseline.  In the interest of further examination, a higher number of disjoint subsets is interpreted as being relative to the number of layers.

Regarding claim 4, “wherein the plurality of layers do not include a first layer of the DNN and/or a last layer of the DNN” is indefinite.  With respect to claim 1, one of ordinary skill in the art would expect that in order to create a hardware implementation of the DNN, both the input and output layers would be necessary.  It is unclear how the hardware DNN would operate without the expected input and output layers.  In the interest of further examination this is interpreted as not using the first and/or last layer of the DNN for processing.  Examiner suggests as a possible correction “wherein the processing of the plurality of layers does not include a first layer of the DNN and/or a last layer of the DNN”.

The remaining claims are rejected with respect to their dependence on the rejected claims.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-2, 5, 7, 8, 11, 13, and 15-20 are rejected under 35 U.S.C. 103 as being unpatentable over Courbariaux (“TRAINING DEEP NEURAL NETWORKS WITH LOW PRECISION MULTIPLICATIONS”, 2015) and in view of Alstrom (US5857177A), and in further view of Montafur (“On the Number of Linear Regions of Deep Neural Networks”, 2014). 

Regarding claim 1, Courbariaux teaches A computer-implemented method of selecting a fixed point number format for representing values input to, and/or output from, ([p. 3 Sec. 4] "Fixed point formats consist in a signed mantissa and a global scaling factor shared between all fixed point variables. The scaling factor can be seen as the position of the radix point" [p. 4 Sec. 5] "The dynamic fixed point format (Williamson, 1991) is a variant of the fixed point format in which there are several scaling factors instead of a single global one").
a plurality of layers of a Deep Neural Network “DNN” for use in configuring a hardware implementation of the DNN, the method comprising: ([p. 1 Sec. 1] "The training of deep neural networks is very often limited by hardware." [p. 4 Sec. 7] "A Maxout network is a multi-layer neural network" [p. 5 Sec. 8] "We train Maxout networks" [p. 8 Sec. 11] "We have shown that: Very low precision multipliers are sufficient for training deep neural networks.").
receiving an instantiation of the DNN ([p. 5 Sec. 8] "we use the same hyperparameters as in this section to train Maxout networks with low precision multiplications").
configured to represent the values of each of the plurality of layers using one or more initial fixed point number formats for that layer, ([p. 4 Sec. 5] "In practice, we associate each layer’s weights, bias, weighted sum, outputs (post-nonlinearity) and the respective gradients vectors and matrices with a different scaling factor. Those scaling factors are initialized with a global value").
each initial fixed point number format comprising an exponent ([p. 3 Sec. 4] "fixed point format can also be seen as a floating point format with a unique shared fixed exponent").
and a mantissa bit length; ([p. 3 Sec. 4] "Fixed point formats consist in a signed mantissa and a global scaling factor shared between all fixed point variables").
forming a plurality of disjoint subsets from the plurality of layers; ([p. 4 Sec. 5] "With dynamic fixed point, a few grouped variables share a scaling factor which is updated from time to time to reflect the statistics of values in the group." Grouped variable sharing a scaling factor is interpreted as synonymous with a disjoint subset.  Sec. 5 explicitly teaches that the grouped variables are representative of layers parameters.).
for each subset of the plurality of subsets, (See Algorithm 2 in Sec. 5. Scaling factors are handled individually.).
iteratively adjusting the fixed point number formats for the layers ([p. 8 Sec. 9.3] "We update the scaling factors once every 10000 examples").
in the subset to fixed point number formats with a next lowest mantissa bit length ([p. 3 Sec. 4] "The scaling factor is typically a power of two for computational efficiency (the scaling multiplications are replaced with shifts)." See also Algorithm 2 where the scaling factor is reduced by half in the case of overflow condition being satisfied.  Dividing by 2 interpreted as synonymous with multiplying with .5 such that a shift to next lowest bit length is expected.).
until the output error of the instantiation of the DNN exceeds an error threshold; (See Algorithm 2 in Sec. 5 determining whether overflow rate of M>rmax.   Overflow rate interpreted as synonymous with error, rmax interpreted as synonymous with error threshold.).
outputting the fixed point number formats for the plurality of layers. (See Algorithm 2: "ensure: an updated scaling factor"). 
However, Courbariaux does not explicitly teach in response to determining that the subsets comprise greater than a lower threshold number of layers, forming a higher number of disjoint subsets from the plurality of layers and repeating the iterative adjusting; and 
in response to determining that the subsets comprise less than or equal to the lower threshold number of layers,  

Alstrom, in the same field of endeavor, teaches in response to determining that the subsets comprise greater than a lower threshold number of layers, forming a higher number of disjoint subsets from the plurality of layers and repeating the iterative adjusting; and ([Col. 2 l. 13-21] "The object of the invention is achieved in that the number of firings from a network region, preferably the output region, determines the size of the threshold values so that, if the number of firings exceeds a certain value, the threshold value signal is increased" [Col. 3 l. 11-25] "The threshold value can hereby be controlled so that the number of firing neurons approaches a number of the same order as the number of layers. This means that the number of firing neurons will be small with respect to the total number of neurons of the network, but will be sligthly larger than the number of neuron layers in the network" Firings from a network region interpreted as synonymous with disjoint subsets. Alstrom explicitly teaches a threshold value based on the number of subsets to layers, and teaches the subsets forming a higher number than the layers.).
in response to determining that the subsets comprise less than or equal to the lower threshold number of layers, ([Col. 3 l. 5-10] "The total signal from a region which can e.g. be identified with the output region, is detected. If this exceeds the unit signal, i.e. if more than one output unit fires, then the threshold value increases and if the total signal is below the unit signal (no signal), then the threshold value decreases"  [Col. 3 l. 11-25] "The threshold value can hereby be controlled so that the number of firing neurons approaches a number of the same order as the number of layers. This means that the number of firing neurons will be small with respect to the total number of neurons of the network, but will be sligthly larger than the number of neuron layers in the network" Alstrom teaches that the threshold value may be relative to the subsets and number of layers.  Alstrom further teaches outputting a signal if the threshold is not exceeded.). 

Alstrom and Courbariaux are both directed to hardware neural network implementations. It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the hardware neural network implementation of Courbariaux with that of Alstrom. Alstrom teaches the advantages of having a threshold number of layers to control a subset of the neural network.  Alstrom teaches that having the threshold match the number of layers such that there is close to a 1:1 ratio of firing neurons to layers increases the sensitivity to changes in the dynamic conditions of the external system.  In order to reinforce the combination of Alstrom and Courbariaux, Montafur, in the same field of endeavor, teaches a method of creating disjoint subsets in maxout networks with the explicit intent to create a 1:1 ratio of subsets to layers ([p. 4 Sec. 2.3] “Here, the idea to construct a function with many linear regions is to use the first L-1 hidden layers to identify many input-space neighborhoods, mapping all of them to the activation neighborhoods PL of the (L-1)-th hidden layer, each of which belongs to a distinct linear region of the last hidden layer.").

Regarding claim 2, the combination of Courbariaux, Alstrom, and Montafur teaches The method of claim 1, wherein iteratively adjusting the fixed point number formats for the layers in the subset to fixed point number formats with the next lowest mantissa bit length comprises: determining a fixed point number format with the next lowest mantissa bit length for the fixed point number formats for each layer of the subset; (Courbariaux [p. 3 Sec. 4] "The scaling factor is typically a power of two for computational efficiency (the scaling multiplications are replaced with shifts)." See also Algorithm 2 where the scaling factor is reduced by half in the case of overflow condition being satisfied.  Dividing by 2 interpreted as synonymous with multiplying with .5 such that a shift to next lowest bit length is expected.).
adjusting the fixed point number formats used by the instantiation of the DNN for each layer in the subset to the determined fixed point number formats with the next lowest mantissa bit length; (Courbariaux See Algorithm 1 and 2.  Algorithm 1 shows quantizing all parameters of each layer, and algorithm 2 shows that the quantization involves bit shifting.).
determining an output of the adjusted instantiation of the DNN in response to test input data; (Courbariaux [p. 5 Sec. 8 Table 4] "Test set error rates of single and half floating point formats, fixed and dynamic fixed point formats on the permutation invariant (PI) MNIST, MNIST (with convolutions, no distortions), CIFAR-10 and SVHN datasets.").
determining an output error of the adjusted instantiation of the DNN; (Courbariaux [p. 5 Sec. 8 Table 4] "Test set error rates of single and half floating point formats, fixed and dynamic fixed point formats on the permutation invariant (PI) MNIST, MNIST (with convolutions, no distortions), CIFAR-10 and SVHN datasets.").
in response to determining that the output error exceeds the error threshold, reversing the adjustment of the instantiation of the DNN; and (Courbariaux See Algorithm 2.  Checking if the overflor rate is greater than rmax and doubling the scaling factor is interpreted as synonymous with reversing the adjustment (halving the scaling factor).).
in response to determining that the output error does not exceed the error threshold, repeating the determining the fixed point number formats, adjusting the fixed point number formats, (Courbariaux [p. 4 Sec. 5] "During the training, we update those scaling factors at a given frequency, following the policy described in Algorithm 2." Algorithm 2 shows that the determination and adjustment of the fixed point number format occurs after comparing error to threshold. Determination / adjustment during update interpreted as synonymous with repeating the determination.  ).
determining the output, and determining the output error. (Courbariaux [p. 5 Sec. 8 Table 4] "Test set error rates of single and half floating point formats, fixed and dynamic fixed point formats on the permutation invariant (PI) MNIST, MNIST (with convolutions, no distortions), CIFAR-10 and SVHN datasets." Courbariaux explicitly teaches determining final test error which is temporally subsequent to updating and therefore interpreted as synonymous with in response to.  Algorithm 2 also teaches using update error to determine bit width.). 

Regarding claim 5, the combination of Courbariaux, Alstrom, and Montafur teaches 
The method of claim 1, wherein a first adjustment of the fixed point number formats is made for all of the subsets before a second adjustment of the fixed point number formats is made for any of the subsets. (Courbariaux [p. 4 Sec. 5] "With dynamic fixed point, a few grouped variables share a scaling factor which is updated from time to time to reflect the statistics of values in the group...During the training, we update those scaling factors at a given frequency, following the policy described in Algorithm 2." Courbariaux explicitly teaches updating all of the scaling factors at a given frequency.). 

Regarding claim 7, the combination of Courbariaux, Alstrom, and Montafur teaches 
The method of claim 1, wherein there is an initial fixed point number format for input data values of at least one layer of the plurality of layers and there is an initial fixed point number format for weights of at least one layer of the plurality of layers, and iteratively adjusting the fixed point number formats for the layers in the subset to fixed point number formats with the next lowest mantissa bit length until the output error of the instantiation of the DNN exceeds the error threshold comprises: (Courbariaux [p. 4 Sec. 5] "In practice, we associate each layer’s weights, bias, weighted sum, outputs (post-nonlinearity) and the respective gradients vectors and matrices with a different scaling factor. Those scaling factors are initialized with a global value" See also Algorithm 1 where the inputs and parameters are reduced from an initial fixed point number format. See Algorithm 2 for the bit length being reduced in response to the error threshold.).
iteratively adjusting the fixed point number formats for the input data values for the layers in the subset to fixed point number formats with the next lowest mantissa bit length until the output error of the instantiation of the DNN exceeds the error threshold; and (Courbariaux See Algorithm 1 where the inputs and parameters are reduced from an initial fixed point number format. See Algorithm 2 for the bit length being reduced in response to the error threshold.).
subsequent to iteratively adjusting the fixed point number formats for the input data values, iteratively adjusting the fixed point number formats for the weights for the layers in the subset to fixed point number formats with the next lowest mantissa bit length until the output error of the instantiation of the DNN exceeds the error threshold. (Courbariaux See Algorithm 1 where the precision of the weighted sums is reduced subsequent to the inputs. See Algorithm 2 for the bit length being reduced in response to the error threshold.). 

Regarding claim 8, the combination of Courbariaux, Alstrom, and Montafur teaches 
The method of claim 7, wherein there is an initial fixed point number format for output data values of at least one layer of the plurality of layers, and iteratively adjusting the fixed point number formats for the layers in the subset to a fixed point number format with the next lowest mantissa bit length until the output error of the instantiation of the DNN exceeds the error threshold further comprises: (Courbariaux [p. 4 Sec. 5] "In practice, we associate each layer’s weights, bias, weighted sum, outputs (post-nonlinearity) and the respective gradients vectors and matrices with a different scaling factor. Those scaling factors are initialized with a global value" See Algorithm 2 for the bit length being reduced in response to the error threshold.).
subsequent to iteratively adjusting the fixed point number formats for the input data values, iteratively adjusting the fixed point number formats for the output data values for the layers in the subset to fixed point number formats with the next lowest mantissa bit length until the output error of the instantiation of the DNN exceeds the error threshold. (Courbariaux See Algorithm 1, precision of outputs is reduced at the end of the algorithm subsequent to inputs, parameters, and weights. ). 

Regarding claim 13, the combination of Courbariaux, Alstrom, and Montafur teaches 
The method of claim 1, wherein the lower threshold number of layers is greater than one. (Alstrom "FIG. 1, the neural network is layered, the network units 1 in one layer being connected to a plurality of network units 1 in the adjacent layers" Alstrom explicitly teaches that the threshold reaches the total number of layers and further teaches that the preferred embodiment of the neural network contains multiple layers as shown in FIG. 1.  This is further interpreted as synonymous with a lower threshold number of layers being greater than one.). 

Regarding claim 15, the combination of Courbariaux, Alstrom, and Montafur teaches 
The method of claim 1, wherein the values input to and/or output from the plurality of layers comprise one or more of input data values, output data values, weights and biases. (Courbariaux [p. 4 Sec. 5] "In practice, we associate each layer’s weights, bias, weighted sum, outputs (post-nonlinearity) and the respective gradients vectors and matrices with a different scaling factor"). 

Regarding claim 16, the combination of Courbariaux, Alstrom, and Montafur teaches 
The method of claim 1, wherein the DNN is a convolutional neural network. (Courbariaux [p. 4 Sec. 7] "which corresponds to a maxout unit when k = 2 and one of the filters is forced at 0 (Goodfellow et al., 2013a). Combined with dropout, a very effective regularization method (Hinton et al., 2012), maxout networks achieved state-of-the-art results on a number of benchmarks (Goodfellow et al., 2013a), both as part of fully connected feedforward deep nets and as part of deep convolutional nets" [p. 5 Sec. 8.1] "The second model consists in three convolutional maxout hidden layers...This is the same procedure as in Goodfellow et al. (2013a), except that we do not train our model on the validation examples"). 

Regarding claim 17, the combination of Courbariaux, Alstrom, and Montafur teaches 
The method of claim 1, further comprising configuring a hardware implementation of the DNN to represent values of at least one of the plurality of layers using a fixed point number format output for the at least one layer. (Courbariaux [p. 5 Sec. 8] "Table 4: Test set error rates of single and half floating point formats, fixed and dynamic fixed point formats on the permutation invariant (PI) MNIST, MNIST (with convolutions, no distortions), CIFAR-10 and SVHN datasets" Table 4 shows that at least one of the DNN implementations used full fixed point number format for the model.). 

Regarding claim 18, the combination of Courbariaux, Alstrom, and Montafur teaches 
A non-transitory computer readable storage medium having stored thereon computer readable instructions that, when executed at a computer system, cause the computer system to perform the method as set forth in claim 1. (Courbariaux [p. 9 Sec. 12] "We thank the developers of Theano (Bergstra et al., 2010; Bastien et al., 2012), a Python library which allowed us to easily develop a fast and optimized code for GPU.We also thank the developers of Pylearn2 (Goodfellow et al., 2013b), a Python library built on the top of Theano which allowed us to easily interface the datasets with our Theano code" Courbariaux explicitly teaches that the instructions are run on a GPU.). 

Regarding claim 19, the combination of Courbariaux, Alstrom, and Montafur teaches 
A hardware implementation of a Deep Neural Network “DNN” comprising: hardware logic configured to: (Courbariaux [p. 1 Sec. 1] "The training of deep neural networks is very often limited by hardware." [p. 4 Sec. 7] "A Maxout network is a multi-layer neural network" [p. 5 Sec. 8] "We train Maxout networks" [p. 8 Sec. 11] "We have shown that: Very low precision multipliers are sufficient for training deep neural networks.").
receive input data values, a set of weight or a set of biases for a layer of the DNN; (Courbariaux [p. 4 Sec. 7] "A Maxout network is a multi-layer neural network that uses maxout units in its hidden layers. A maxout unit outputs the maximum of a set of k dot products between k weight vectors and the input vector of the unit").
receive information indicating a fixed point number format for the input data values, the set of weights, or the set of biases of the layer, the fixed point number format for the input data values, the set of weights, or the set of (Courbariaux [p. 4 Sec. 5] "With dynamic fixed point, a few grouped variables share a scaling factor which is updated from time to time to reflect the statistics of values in the group. In practice, we associate each layer’s weights, bias, weighted sum, outputs (post-nonlinearity) and the respective gradients vectors and matrices with a different scaling factor. Those scaling factors are initialized with a global value").
biases of the layer having been selected in accordance with the method as set forth in claim 1; (Courbariaux [p. 4 Sec. 7] "where hl is the vector of activations at layer l and weight vectors wl i;j and biases bl i;j are the parameters of the j-th filter of unit i on layer l").
interpret the input data values, the set of weights or the set of biases based on the fixed point number format for the input data values, the set of weights or the set of biases of the layer; and (Courbariaux [p. 4 Sec. 5] "With dynamic fixed point, a few grouped variables share a scaling factor which is updated from time to time to reflect the statistics of values in the group. In practice, we associate each layer’s weights, bias, weighted sum, outputs (post-nonlinearity) and the respective gradients vectors and matrices with a different scaling factor. Those scaling factors are initialized with a global value").
process the interpreted input data values, the set of weights or the set of biases in accordance with the layer to generate output data values for the layer. (Courbariaux [p. 4 Sec. 7] "A Maxout network is a multi-layer neural network that uses maxout units in its hidden layers. A maxout unit outputs the maximum of a set of k dot products between k weight vectors and the input vector of the unit (e.g., the output of the previous layer):"). 

Regarding claim 20, the combination of Courbariaux, Alstrom, and Montafur teaches 
The hardware implementation of a DNN of claim 19, wherein the hardware logic is further configured to: receive information indicating a fixed point number format for the output data values of the layer, the fixed point number format for the output data values of the layer having been selected in accordance with the method as set forth in claim 1; and (Courbariaux [p. 4 Sec. 5] "With dynamic fixed point, a few grouped variables share a scaling factor which is updated from time to time to reflect the statistics of values in the group. In practice, we associate each layer’s weights, bias, weighted sum, outputs (post-nonlinearity) and the respective gradients vectors and matrices with a different scaling factor. Those scaling factors are initialized with a global value").
convert the output data values for the layer into the fixed point number format for the output data values of the layer. (Courbariaux See Algorithm 2 reducing precision of output data.). 

Claim 3 is rejected under 35 U.S.C. 103 as being unpatentable over the combination of Courbariaux, Alstrom, and Montafur and in further view of Young (US 2018/0165574 A1).

Regarding claim 3, the combination of Courbariaux, Alstrom, and Montafur teaches The method of claim 1, further comprising identifying a sequence of the plurality of layers wherein each layer is preceded in the sequence by any layer of the plurality of layers on which it depends, (Courbariaux [p. 4 Sec. 7] "A Maxout network is a multi-layer neural network that uses maxout units in its hidden layers.  A maxout unit outputs the maximum of a set of k dot products between k weight vectors and the input vector of the unit (e.g., the output of the previous layer):"). However, the combination of Courbariaux, Alstrom, and Montafur does not explicitly teach and wherein each of the subsets comprises a contiguous set of layers in the sequence.  

Young who teaches a related art of a hardware implemented neural network teaches and wherein each of the subsets comprises a contiguous set of layers in the sequence. ([¶0024] "Some neural networks pool outputs from one or more neural network layers to generate pooled values that are used as inputs to subsequent neural network layers" pooled layers are interpreted as synonymous with a type of subset.). 

Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the hardware neural network implementations of Courbariaux, and Alstrom with that of Young. The combination would have been obvious because a person of ordinary skill in the art would be able to determine that grouping layers in sequential order would seem obvious.  Young explains the intrinsic value of the pooling layers disclosed ([¶0009] “Particular embodiments of the subject matter described in this specification can be implemented so as to realize one or more of the following advantages. An output tensor corresponding to an average pooling neural network layer can be generated in hardware by a special-purpose hardware circuit, even where the hardware circuit cannot directly process an input tensor to perform average pooling”).   

Claim 4 is rejected under 35 U.S.C. 103 as being unpatentable over the combination of Courbariaux, Alstrom, and Montafur and in further view of Botros (“Hardware Implementation of an Artificial Neural Network”, 1993). 

Regarding claim 4, the combination of Courbariaux, Alstrom, and Montafur teaches The method of claim 1.  However, the combination of Courbariaux, Alstrom, and Montafur does not explicitly teach, wherein the plurality of layers do not include a first layer of the DNN and/or a last layer of the DNN.  

Botros, in a related field of endeavor, teaches wherein the plurality of layers do not include a first layer of the DNN and/or a last layer of the DNN. ([p. 1253] "The input layer does no processing but simply buffers the data" Botros teaches a hardware implementation of an artificial neural network and explains that the first layer is not necessary for processing.  Recreating a DNN without explicitly copying a first layer would therefore lead to an expected and obvious outcome.). 

Botros and Courbariaux are both directed to a hardware implementation of a neural network.  It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to not use the first layer in the neural network disclosed in Courbariaux and Alstrom for processing. Botros teaches that the first layer is not necessary for processing.  One of ordinary skill in the art would recognize from FIG. 3 of Botros that the input layer can be streamlined in a hardware implementation to simply an input buffer and would not necessarily need to be included in the processing aspect of a hardware neural network implementation. 

Claim 6 is rejected under 35 U.S.C. 103 as being unpatentable over the combination of Courbariaux, Alstrom, and Montafur and in further view of Chung (US10167800B1).

Regarding claim 6, the combination of Courbariaux, Alstrom, and Montafur teaches The method of claim 1, wherein all iterative adjustments of the fixed point number formats for the layers in a first subset (See Algorithm 2.  Group scaling factor update is interpreted as synonymous with iterative adjustment of the fixed point number format for the layers in a first subset.). However, the combination of Courbariaux, Alstrom, and Montafur does not explicitly teach that they are completed before a first adjustment of the fixed point number formats for the layers in a second subset.  

Chung, in a related field of endeavor, teaches that they are completed before a first adjustment of the fixed point number formats for the layers in a second subset. ([Col. 15 l. 60] "The method may further include a step (e.g., step 920) including first processing a first subset of the training vector data to determine a first shared exponent for representing values in the first subset of the training vector data in a block-floating point format and second processing a second subset of the training vector data to determine a second shared exponent for representing values in the second subset" While Courbariaux implicitly teaches that the groups are updated in a sequence, Chung explicitly teaches updating the second subset subsequent to the first.). 

Chung and Courbariaux are both directed to quantizing neural networks.  It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the exponent in Courbariaux with that of Chung. The combination would have been obvious because a person of ordinary skill in the art would be able to determine from Chung ([Col. 5 l. 39] “The matrix-vector multiplier may use integer arithmetic, however, in the form of block floating point techniques for expanded dynamic range. This may advantageously result in a processor that communicates with the outside world in floating point and transparently implements internal integer arithmetic when necessary.” [Col. 5 l. 63] “Advantageously, individual members of the block can be operated on with integer arithmetic. Moreover, the shared exponent for each block is determined independently, which may advantageously allow for a higher dynamic range.”).

Claim 9 is rejected under 35 U.S.C. 103 as being unpatentable over the combination of Courbariaux, Alstrom, and Montafur and in further view of El-Yaniv (US 2017/0286830 A1). 

Regarding claim 9, the combination of Courbariaux, Alstrom, and Montafur teaches The method of claim 1.  However, the combination of Courbariaux, Alstrom, and Montafur does not explicitly teach wherein the DNN is a classification network and the output error is a top-1 classification accuracy or a top-5 classification accuracy of an output of the instantiation of the DNN in response to test input data.  

El-Yaniv who teaches a related hardware neural network implementation teaches wherein the DNN is a classification network and the output error is a top-1 classification accuracy or a top-5 classification accuracy of an output of the instantiation of the DNN in response to test input data. ([¶0105] "The above described training method was applied to tackle the task of binarizing both weights and activations by employing the AlexNet and GoogleNet architectures. This implementation achieved 36:1% top-1 and 60:1% top-5 accuracies using AlexNet and 47:1% top-1 and 69:1% top-5 accuracies using GoogleNet."). 

El-Yaniv and Courbariaux are both directed towards quantizing a neural network.  It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to determine a top-1 and/or top-5 accuracy in the quantized neural network suggested by Courbariaux. El-Yaniv teaches that top-1 and top-5 accuracies are a well-known metric in the art for determining neural network prediction performance.  Furthermore, El-Yaniv shows that said accuracies can be used in a quantized neural network for determining prediction performance and that using top-1 and top-5 accuracies leads to an obvious and expected outcome.

Claim 10 is rejected under 35 U.S.C. 103 as being unpatentable over the combination of Courbariaux, Alstrom, and Montafur and in further view of He 

Regarding claim 10, the combination of Courbariaux, Alstrom, and Montafur teaches The method of claim 1.  While Courbariaux implicitly teaches wherein the DNN is a classification network and the output error is a sum of L1 differences between SoftMax normalised logits of an output of the instantiation of the DNN in response to test input data and SoftMax normalised logits of a baseline output ([p. 5 Sec. 8.1] “It consists in two fully connected maxout layers followed by a softmax layer. The second model consists in three convolutional maxout hidden layers (with spatial max pooling on top of the maxout layers) followed by a densely connected softmax layer”), the combination of Courbariaux, Alstrom, and Montafur does not explicitly teach, wherein the DNN is a classification network and the output error is a sum of L1 differences between SoftMax normalised logits of an output of the instantiation of the DNN in response to test input data and SoftMax normalised logits of a baseline output.  

He who teaches a related art of training a deep neural network teaches wherein the DNN is a classification network and the output error is a sum of L1 differences between SoftMax normalised logits of an output of the instantiation of the DNN in response to test input data and SoftMax normalised logits of a baseline output. ([¶0038] "a Softmax classifier may be used. For predicting real-valued quantities, the loss function may use regression-based methods. For example, in one embodiment, the loss function measures the loss between the predicted quantity and the ground truth before measuring the L2 squared norm, or L1 norm of the difference"). 

He and Courbariaux are both directed towards training a deep neural network.  It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the softmax layer taught in Courbariaux with the detailed method taught in He. The combination would have been obvious because a person of ordinary skill in the art would be able to determine that a softmax layer is typically used to predict a probability distribution, of which an L1 norm is typically used to calculate.  He shows that the usage of L1 norm differences from the softmax layer would lead to obvious and expected results in a deep neural network.  

Regarding claim 11, the combination of Courbariaux, Alstrom, and He teaches 
The method of claim 10, further comprising generating the baseline output by applying the test input data to an instantiation of the DNN configured to represent values input to and output from each layer of the DNN using a floating point number format. (Courbariaux [p. 5 Sec. 8 Table 4] "Test set error rates of single and half floating point formats, fixed and dynamic fixed point formats on the permutation invariant (PI) MNIST, MNIST (with convolutions, no distortions), CIFAR-10 and SVHN datasets...It serves as a baseline to evaluate the degradation brought by lower precision"). 

Claim 12 is rejected under 35 U.S.C. 103 as being unpatentable over the combination of Courbariaux, Alstrom, and Montafur and in further view of Shynk ("performance surfaces of a single-layer perceptron", 1990).

Regarding claim 12, the combination of Courbariaux, Alstrom, and Montafur teaches The method of claim 1.  However, the combination of Courbariaux, Alstrom, and Montafur does not explicitly teach wherein the lower threshold number of layers is one.  

Shynk who teaches a related art of a quantized neural network teaches The method of claim 1, wherein the lower threshold number of layers is one. ([p. 1 Sec. 1] "The perceptron is a linear combiner that quantizes its output to one of two discrete values…For a single layer perceptron..." Shynk explicitly teaches a single layer perceptron.  Alstrom teaches the threshold reaching the number of layers, therefore the substitution of the multilayer perceptron in Alstrom with the single layered perceptron in Shynk it would be obvious that the threshold number of layers would be one.). 

Shynk and Courbariaux are both directed towards quantizing neural networks.  It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to substitute the multi-layer perceptron in Courbariaux with a single-layer perceptron described in Shynk. Shynk teaches as motivation, that a single-layer perceptron is well-known in the art and would lead to obvious and expected outcomes.  

Claim 14 is rejected under 35 U.S.C. 103 as being unpatentable over the combination of Courbariaux, Alstrom, and Montafur and in further view of Montufar (“On the Number of Linear Regions of Deep Neural Networks”, 2014).

Regarding claim 14, the combination of Courbariaux, Alstrom, and Montafur teaches The method of claim 1.  However, the combination of Courbariaux, Alstrom, and Montafur does not explicitly teach wherein forming a higher number of disjoint subsets from the plurality of layers comprises: dividing the layers in each subset into a plurality of disjoint subsets and/or forming twice as many disjoint subsets from the plurality of layers.  

Montufar who teaches a related art of determining subsets in a deep neural network teaches wherein forming a higher number of disjoint subsets from the plurality of layers comprises: dividing the layers in each subset into a plurality of disjoint subsets and/or forming twice as many disjoint subsets from the plurality of layers. ("The proof is done by counting the number of regions for a suitable choice of network parameters. The idea of the construction is to divide the first L - 1 layers of the network into n0 independent parts"). 

Montufar and Courbariaux are both directed towards determining subsets in a deep neural network.  It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to substitute the subset determination of Montufar with the fixed point grouping in Courbariaux. Montufar provides as motivation for substitution ([Abstract] “This paper investigates the complexity of such compositional maps and contributes new theoretical results regarding the advantage of depth for neural networks with piecewise linear activation functions. In particular, our analysis is not specific to a single family of models, and as an example, we employ it for rectifier and maxout networks”).  The substitution is considered relevant as Courbariaux explicitly teaches using maxout networks.  

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SIDNEY VINCENT BOSTWICK whose telephone number is (571)272-4720.  The examiner can normally be reached on M-F 7:30am-5:00pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Miranda Huang can be reached on (571)270-7092.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/SB/Examiner, Art Unit 2124                                                                                                                                                                                                        


/MIRANDA M HUANG/Supervisory Patent Examiner, Art Unit 2124