Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
Applicant’s arguments with respect to the independent claims have been considered but are moot because the arguments are directed to amended limitation(s) that has/have not been previously examined.

Examiner’s Note
Providing supporting paragraph(s) with a clear explanation for each limitation of amended/new claim(s) in Remarks is strongly requested for clear and definite claim interpretations by Examiner.

Priority
Acknowledgment is made of applicant's claim for the present application filed on 02/09/2018.

Claim Objections
Claim(s) 8 is/are objected to because of the following informalities: it appears that “the particular set of weights” in line 6 should read “the set of weights”. Appropriate correction is required. In addition, claim 22 is objected to for the same reason. 

Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.

The following is a quotation of the first paragraph of pre-AIA  35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.


Claim(s) 9, 23 is/are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA  35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention.
Claim(s) 9 recite(s) “identifying a pair of quantiles corresponding to the anchor point, wherein the pair of quantiles are included in the set of quantiles of the current values of the set of weights of the neural network at the training iteration; identifying a subset of the set of weights of the neural network, wherein the current value of each weight included in the subset is between the pair of quantiles corresponding to the anchor point;”. However, it appears that the specification is silent in regards to “identifying a pair of quantiles corresponding to the anchor point” and “the pair of quantiles corresponding to the anchor point”. Instead, par 60 of the specification says “In some other implementations, the system determines a set of quantiles of the weight values of the neural network and determines the anchor points based on the quantiles.” That is, the inventive system just determines anchor points based on the quantiles. The limitations are changing the scope of the claimed invention without support from the specification, therefore it is rejected under 112(a) lack of written description. In addition, claim(s) 23 is/are rejected for the same reason.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


(Note: Hereinafter, if a limitation has brackets (i.e. [·]) around claim languages, the bracketed claim languages indicate that they have not been taught yet by the current prior art reference but they will be taught by another prior art reference afterwards.)

Claim(s) 1, 4-5, 8, 10-11, 14-16, 19-20, 22, 24 is/are rejected under 35 U.S.C. 103 as being unpatentable over Ullrich et al. (SOFT WEIGHT-SHARING FOR NEURAL NETWORK COMPRESSION) in view of Chen et al. (Compressing Convolutional Neural Networks) further in view of Somers et al. (Quantile regression for modelling distributions of profit and loss)

Regarding claim 1
Ullrich teaches 
A computer-implemented method for neural network compression, the method comprising:
receiving a neural network; 
(Ullrich, [fig(s) 1-3] [algorithm 1] [sec(s) 4] “We retrain pre-trained neural networks with soft weight-sharing and factorized Dirac posteriors. Hence we optimize 

    PNG
    media_image1.png
    101
    1151
    media_image1.png
    Greyscale

via gradient descent, specifically using Adam (Kingma & Ba, 2014). The KL divergence reduces to the prior because the entropy term does not depend on any trainable parameters. Note that, similar to (Nowlan & Hinton, 1992) we weigh the log-prior contribution to the gradient by a factor of τ = 0.005. In the process of retraining the weights, the variances, means, and mixing proportions of all but one component are learned.”;)

identifying a set of multiple weights of the neural network; 
(Ullrich, [fig(s) 1-3] [algorithm 1] “w ← initialize network weights with pre-trained network weights” [sec(s) 4] “We retrain pre-trained neural networks with soft weight-sharing and factorized Dirac posteriors. Hence we optimize 

    PNG
    media_image1.png
    101
    1151
    media_image1.png
    Greyscale

via gradient descent, specifically using Adam (Kingma & Ba, 2014). The KL divergence reduces to the prior because the entropy term does not depend on any trainable parameters. Note that, similar to (Nowlan & Hinton, 1992) we weigh the log-prior contribution to the gradient by a factor of τ = 0.005. In the process of retraining the weights, the variances, means, and mixing proportions of all but one component are learned.”;)

determining initial values of a set of multiple anchor points based on initial values of the set of weights of the neural network; 
(Ullrich, [fig(s) 1-3] [algorithm 1] “θ = {µj, σj, πj}j=1J ← initialize mixture parameters (see Sec. 4.2)” [sec(s) 4] “In principle, we follow the method proposed by Nowlan & Hinton (1992). We distribute the means of the 16 non-fixed components evenly over the range of the pre-trained weights. The variances will be initialized such that each Gaussian has significant probability mass in its region. A good orientation for setting a good initial variance is weight decay rate the original network has been trained on. The trainable mixing proportions are initialized evenly πj = (1 − πj=0)/J. We also experimented with other approaches such as distributing the means such that each component assumes an equal amount of probability. We did not observe any significant improvement over the simpler initialization procedure.”; e.g., “means” may read on “anchor points”.)

training the neural network by, at each of multiple training iterations, performing operations comprising:
adjusting current values of the set of weights of the neural network at the training iteration by [backpropagating] gradients of a loss function, wherein the loss function comprises:
(Ullrich, [fig(s) 1-3] [algorithm 1] 
“
    PNG
    media_image2.png
    150
    1711
    media_image2.png
    Greyscale
”
[sec(s) 4] “We retrain pre-trained neural networks with soft weight-sharing and factorized Dirac posteriors. Hence we optimize 

    PNG
    media_image1.png
    101
    1151
    media_image1.png
    Greyscale

via gradient descent, specifically using Adam (Kingma & Ba, 2014). The KL divergence reduces to the prior because the entropy term does not depend on any trainable parameters. Note that, similar to (Nowlan & Hinton, 1992) we weigh the log-prior contribution to the gradient by a factor of τ = 0.005. In the process of retraining the weights, the variances, means, and mixing proportions of all but one component are learned.”;)

a first loss function term based on a prediction accuracy of the neural network; and 
a second loss function term based on a similarity of the current values of the set of weights of the neural network at the training iteration to current values of the set of anchor points at the training iteration; and
(Ullrich, [fig(s) 1-3] [algorithm 1] 
“w ← initialize network weights with pre-trained network weights
    PNG
    media_image2.png
    150
    1711
    media_image2.png
    Greyscale
”
[sec(s) 4] “We retrain pre-trained neural networks with soft weight-sharing and factorized Dirac posteriors. Hence we optimize 

    PNG
    media_image1.png
    101
    1151
    media_image1.png
    Greyscale

via gradient descent, specifically using Adam (Kingma & Ba, 2014). The KL divergence reduces to the prior because the entropy term does not depend on any trainable parameters. Note that, similar to (Nowlan & Hinton, 1992) we weigh the log-prior contribution to the gradient by a factor of τ = 0.005. In the process of retraining the weights, the variances, means, and mixing proportions of all but one component are learned.” [sec 2] “Model compression was first discussed in the context of information theory. The minimum description length (MDL) principle identifies the best hypothesis to be the one that best compresses the data. More specifically, it minimizes the cost to describe the model (complexity cost LC) and the misfit between model and data (error cost LE) (Rissanen, 1978; 1986). It has been shown that variational learning can be reinterpreted as an MDL problem (Wallace, 1990; Hinton & Van Camp, 1993; Honkela & Valpola, 2004; Graves, 2011). In particular, given data D =  {X = {xn}Nn=1, T = {tn}Nn=1, a set of parameters w = {wi}Ii=1 that describes the model and an approximation q(w) of the posterior p(w|D), the variational lower bound, also known as negative variational free energy, L(q(w), w) can be decomposed in terms of error and complexity losses … Following Nowlan & Hinton (1992) we will model the prior p(w) as a mixture of Gaussians, 
    PNG
    media_image3.png
    88
    356
    media_image3.png
    Greyscale
 (5) We learn the mixture parameters µj, σj, πj via maximum likelihood simultaneously with the network weights.”; e.g., “
    PNG
    media_image4.png
    92
    363
    media_image4.png
    Greyscale
” and “
    PNG
    media_image5.png
    90
    661
    media_image5.png
    Greyscale
” may read on “similarity of the current values of the set of weights of the neural network at the training iteration to current values of the set of anchor points at the training iteration” since the Gaussian normal distributions take a difference between a weight and a mean. For more details, please refer to “Normal distribution” (https://en.wikipedia.org/w/index.php?title=Normal_distribution&oldid=824560527) and “Gaussian function” (https://en.wikipedia.org/w/index.php?title=Gaussian_function&oldid=824354212) on Wikipedia.)

adjusting the current values of the set of anchor points at the training iteration based on the current values of the set of weights of the neural network at the training iteration, the adjusting comprising: 
(Ullrich, [fig(s) 1-3] [algorithm 1] 
“
    PNG
    media_image2.png
    150
    1711
    media_image2.png
    Greyscale
”
[sec(s) 4] “We retrain pre-trained neural networks with soft weight-sharing and factorized Dirac posteriors. Hence we optimize 

    PNG
    media_image1.png
    101
    1151
    media_image1.png
    Greyscale

via gradient descent, specifically using Adam (Kingma & Ba, 2014). The KL divergence reduces to the prior because the entropy term does not depend on any trainable parameters. Note that, similar to (Nowlan & Hinton, 1992) we weigh the log-prior contribution to the gradient by a factor of τ = 0.005. In the process of retraining the weights, the variances, means, and mixing proportions of all but one component are learned.” [sec 2] “The approach naturally encourages quantization because in order to optimize the cross-entropy the weights will cluster tightly around the cluster means, while the cluster means themselves move to some optimal location driven by LE.”;)

determining a set of [quantiles] of the current values of the set of weights of the neural network at the training iteration; and 
adjusting the current values of the set of anchor points at the training iteration based on the set of [quantiles] of the current values of the set of weights of the neural network at the training iteration; and
(Ullrich, [fig(s) 1-3] [algorithm 1] 
“θ = {µj, σj, πj}j=1J ← initialize mixture parameters (see Sec. 4.2) 

    PNG
    media_image2.png
    150
    1711
    media_image2.png
    Greyscale
” [sec(s) 2] “The approach naturally encourages quantization because in order to optimize the cross-entropy the weights will cluster tightly around the cluster means, while the cluster means themselves move to some optimal location driven by LE. The effect might even be so strong that it is beneficial to have a Gamma hyper-prior on the variances of the mixture components to prevent the components from collapsing. Furthermore, note that, mixture components merge when there is not enough pressure from the error loss to keep them separated because weights are attracted by means and means are attracted by weights hence means also attract each other. In that way the network learns how many quantization intervals are necessary. We demonstrate that behaviour in Figure 3.”
[sec(s) 4] “We retrain pre-trained neural networks with soft weight-sharing and factorized Dirac posteriors. Hence we optimize 

    PNG
    media_image1.png
    101
    1151
    media_image1.png
    Greyscale

via gradient descent, specifically using Adam (Kingma & Ba, 2014). The KL divergence reduces to the prior because the entropy term does not depend on any trainable parameters. Note that, similar to (Nowlan & Hinton, 1992) we weigh the log-prior contribution to the gradient by a factor of τ = 0.005. In the process of retraining the weights, the variances, means, and mixing proportions of all but one component are learned.”; e.g., “means” may read on “anchor points”.)

quantizing the current values of the set of weights of the neural network, comprising, for each weight of the set of weights of the neural network: 
determining an anchor point in the set of anchor points corresponding to the weight; and 
setting the current value of the weight to the current value of the corresponding anchor point.
(Ullrich, [fig(s) 1-3] [algorithm 1] 
“
    PNG
    media_image6.png
    124
    1361
    media_image6.png
    Greyscale
”
[sec(s) 4] “After re-training we set each weight to the mean of the component that takes most responsibility for it i.e. we quantize the weights. Before quantizing, however, there might be redundant components as explained in section 2. To eliminate those we follow Adhikari & Hollmen (2012) by computing the KL divergence between all components.”; e.g., “mean” may read on “anchor point”. In addition, e.g., “set each weight to the mean of the component that takes most responsibility for it” may read on “setting the current value of the weight to the current value of the corresponding anchor point”.)

	However, Ullrich does not appear to distinctly disclose:
adjusting current values of the set of weights of the neural network at the training iteration by [backpropagating] gradients of a loss function, wherein the loss function comprises:
determining a set of [quantiles] of the current values of the set of weights of the neural network at the training iteration; and 
adjusting the current values of the set of anchor points at the training iteration based on the set of [quantiles] of the current values of the set of weights of the neural network at the training iteration; and

(Note: Hereinafter, if a limitation has one or more underlines, the one or more underlined claim languages indicate that they are taught by the current prior art reference, while the one or more non-underlined claim languages indicate that they have been taught already by one or more previous art references.)

Chen teaches
adjusting current values of the set of weights of the neural network at the training iteration by backpropagating gradients of a loss function, wherein the loss function comprises: 
(Chen, [fig(s) ] [sec(s) 1] “During training, the hashed weights can be learned with simple back-propagation [2]—the gradient of a hash bucket value is the sum of gradients of all hashed frequency components in that bucket.” [sec(s) 3] “Let L be the loss function adopted for training. Using standard back-propagation, we can derive the gradient w.r.t. filter parameters in the spatial domain”; Note that Ullrich teaches “adjusting current values of the set of weights of the neural network at the training iteration by [backpropagating] gradients of a loss function, wherein the loss function comprises”.)

Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the neural network compression system of Ullrich with the backpropagation of Chen.
Doing so would lead to yielding the lowest generalization error rates on almost all classification tasks based on the most effective compression method among several compression schemes.
(Chen, [sec(s) 1] “We evaluate our compression scheme on eight deep learning image benchmark data sets and compare against four competitive baselines. Although all compression schemes lead to lower test accuracy as the compression increases, our FreshNets method is by far the most effective compression method and yields the lowest generalization error rates on almost all classification tasks.”)

However, the combination of Ullrich, Chen does not appear to distinctly disclose:
determining a set of [quantiles] of the current values of the set of weights of the neural network at the training iteration; and 
adjusting the current values of the set of anchor points at the training iteration based on the set of [quantiles] of the current values of the set of weights of the neural network at the training iteration; and

Somers teaches
determining a set of quantiles of the current values of the set of weights of the neural network at the training iteration; and 
adjusting the current values of the set of anchor points at the training iteration based on the set of quantiles of the current values of the set of weights of the neural network at the training iteration; and
(Somers, [fig(s) 1-3] [sec(s) 2] “Quantile regression is based on the quantile loss function, q, defined for a given percentile s 2 (0, 1) by 
    PNG
    media_image7.png
    46
    384
    media_image7.png
    Greyscale
, (1) where I is an indicator function. It is sometimes called the check or tick function because of its shape. The quantile loss function ρτ(y - 6.2) is plotted in Fig. 1. This loss function has the key property that minimising the expectation Eρτ(Y - a) with respect to a gives the τth quantile, F-1(τ), of Y. … Kernel quantile regression finds the estimate of the quantile a = a(x0) for a grid of values of x0, that minimizes 
    PNG
    media_image8.png
    77
    178
    media_image8.png
    Greyscale
, for given τ and kernel weights wj that depend on the grid point.” [sec(s) 3] “The integral may be approximated by numerical integration at the quantiles of H using a trapezium-like rule. The quantiles s0, s1,...,sk correspond to a grid of k + 1 equi-percentile intervals with constant probability π apart. … 
    PNG
    media_image9.png
    40
    256
    media_image9.png
    Greyscale
 (7) denote the midpoints and j indexes the relevant quantile. Fig. 3 illustrates this computation.”; 
Note that Ullrich teaches “determining a set of [quantiles] of the current values of the set of weights of the neural network at the training iteration; and adjusting the current values of the set of anchor points at the training iteration based on the set of [quantiles] of the current values of the set of weights of the neural network at the training iteration”.)

Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the neural network compression system of Ullrich, Chen with the anchor points between the quantiles of Somers. 
Doing so would lead to realistic and robust estimates suitable for use in financial applications based on distribution estimation.
(Somers, [fig(s) 3] [sec(s) 1] “A quantile regression model explicitly estimates the distribution of house prices realised and thus leads to realistic and robust estimates suitable for use in Basel II applications.”)

Regarding claim 4
The combination of Ullrich, Chen, Somers teaches claim 1. 

determining the initial values of the set of anchor points based on the initial values of the set of weights comprises: (see the rejections of claim 1)

Ullrich further teaches
fitting a mixture model to a distribution of the initial values of the set of weights; and 
determining the initial value of each anchor point in the set of anchor points based on parameters of components of the mixture model.
(Ullrich, [fig(s) 1-3] [algorithm 1] “θ = {µj, σj, πj}j=1J ← initialize mixture parameters (see Sec. 4.2)” [sec 2] “Following Nowlan & Hinton (1992) we will model the prior p(w) as a mixture of Gaussians, 
    PNG
    media_image3.png
    88
    356
    media_image3.png
    Greyscale
 (5) We learn the mixture parameters µj, σj, πj via maximum likelihood simultaneously with the network weights.” [sec(s) 4] “In principle, we follow the method proposed by Nowlan & Hinton (1992). We distribute the means of the 16 non-fixed components evenly over the range of the pre-trained weights. The variances will be initialized such that each Gaussian has significant probability mass in its region. A good orientation for setting a good initial variance is weight decay rate the original network has been trained on. The trainable mixing proportions are initialized evenly πj = (1 − πj=0)/J. We also experimented with other approaches such as distributing the means such that each component assumes an equal amount of probability. We did not observe any significant improvement over the simpler initialization procedure.” [sec 1] “By fitting the mixture components alongside the weights, the weights tend to concentrate very tightly around a number of cluster components, while the cluster centers optimize themselves to give the network high predictive accuracy. Compression is achieved because we only need to encode K cluster means (in full precision) in addition to the assignment of each weight to one of these J values (using log(J) bits per weight).”; e.g., “means” may read on “anchor points”.) 

Regarding claim 5
The combination of Ullrich, Chen, Somers teaches claim 4.

Ullrich further teaches 
the mixture model is a Gaussian mixture model, and 
the initial value of each anchor point is determined based on mean parameters of one or more components of the Gaussian mixture model.
(Ullrich, [fig(s) 1-3] [algorithm 1] “θ = {µj, σj, πj}j=1J ← initialize mixture parameters (see Sec. 4.2)” [sec 2] “Following Nowlan & Hinton (1992) we will model the prior p(w) as a mixture of Gaussians, 
    PNG
    media_image3.png
    88
    356
    media_image3.png
    Greyscale
 (5) We learn the mixture parameters µj, σj, πj via maximum likelihood simultaneously with the network weights.” [sec(s) 4] “In principle, we follow the method proposed by Nowlan & Hinton (1992). We distribute the means of the 16 non-fixed components evenly over the range of the pre-trained weights. The variances will be initialized such that each Gaussian has significant probability mass in its region. … After re-training we set each weight to the mean of the component that takes most responsibility for it i.e. we quantize the weights.” [sec 1] “By fitting the mixture components alongside the weights, the weights tend to concentrate very tightly around a number of cluster components, while the cluster centers optimize themselves to give the network high predictive accuracy. Compression is achieved because we only need to encode K cluster means (in full precision) in addition to the assignment of each weight to one of these J values (using log(J) bits per weight).”; e.g., “means” may read on “anchor points”.) 

Regarding claim 8
The combination of Ullrich, Chen, Somers teaches claim 1.

at each of the multiple training iterations, adjusting the current values of the set of anchor points at the training iteration based on the set of quantiles of the current values of the set of weights of the neural network at the training iteration comprises, (see the rejections of claim 1) 

Ullrich further teaches 
for each of one or more of the anchor points: 
determining a new value for the anchor point based on a [mid-]point between a respective pair of [quantiles] from the set of [quantiles] of the current values of the particular set of weights of the neural network at the training iteration; and 
updating the current value of the anchor point to be the new value for the anchor point.
(Ullrich, [fig(s) 1-3] [algorithm 1] 
“θ = {µj, σj, πj}j=1J ← initialize mixture parameters (see Sec. 4.2) 

    PNG
    media_image2.png
    150
    1711
    media_image2.png
    Greyscale
” [sec(s) 2] “The approach naturally encourages quantization because in order to optimize the cross-entropy the weights will cluster tightly around the cluster means, while the cluster means themselves move to some optimal location driven by LE. The effect might even be so strong that it is beneficial to have a Gamma hyper-prior on the variances of the mixture components to prevent the components from collapsing. Furthermore, note that, mixture components merge when there is not enough pressure from the error loss to keep them separated because weights are attracted by means and means are attracted by weights hence means also attract each other. In that way the network learns how many quantization intervals are necessary. We demonstrate that behaviour in Figure 3.”
[sec(s) 4] “We retrain pre-trained neural networks with soft weight-sharing and factorized Dirac posteriors. Hence we optimize 

    PNG
    media_image1.png
    101
    1151
    media_image1.png
    Greyscale

via gradient descent, specifically using Adam (Kingma & Ba, 2014). The KL divergence reduces to the prior because the entropy term does not depend on any trainable parameters. Note that, similar to (Nowlan & Hinton, 1992) we weigh the log-prior contribution to the gradient by a factor of τ = 0.005. In the process of retraining the weights, the variances, means, and mixing proportions of all but one component are learned.”; e.g., “means” may read on “anchor points”.)

Somers further teaches
for each of one or more of the anchor points: 
determining a new value for the anchor point based on a mid-point between a respective pair of quantiles from the set of quantiles of the current values of the particular set of weights of the neural network at the training iteration; and 
updating the current value of the anchor point to be the new value for the anchor point.
(Somers, [fig(s) 1-3] [sec(s) 3] “The integral may be approximated by numerical integration at the quantiles of H using a trapezium-like rule. The quantiles s0, s1,...,sk correspond to a grid of k + 1 equi-percentile intervals with constant probability π apart. … 
    PNG
    media_image9.png
    40
    256
    media_image9.png
    Greyscale
 (7) denote the midpoints and j indexes the relevant quantile. Fig. 3 illustrates this computation.”; Note that Ullrich teaches “determining a new value for the anchor point based on a [mid-]point between a respective pair of [quantiles] from the set of [quantiles] of the current values of the particular set of weights of the neural network at the training iteration”.)

The combination of Ullrich, Chen, Somers is combinable with Somers for the same rationale as set forth above with respect to claim 1.

Regarding claim 10
The combination of Ullrich, Chen, Somers teaches claim 1.

Ullrich further teaches 
the neural network is pre-trained to perform a prediction task.
(Ullrich, [fig(s) 1-3] [algorithm 1] [sec(s) 4] “We retrain pre-trained neural networks with soft weight-sharing and factorized Dirac posteriors. Hence we optimize 

    PNG
    media_image1.png
    101
    1151
    media_image1.png
    Greyscale

via gradient descent, specifically using Adam (Kingma & Ba, 2014). The KL divergence reduces to the prior because the entropy term does not depend on any trainable parameters. Note that, similar to (Nowlan & Hinton, 1992) we weigh the log-prior contribution to the gradient by a factor of τ = 0.005. In the process of retraining the weights, the variances, means, and mixing proportions of all but one component are learned.”;)

Regarding claim 11
The claim is a system claim corresponding to the method claim 1, and is directed to largely the same subject matter. Thus, it is rejected for the same reasons as given in the rejection of the method claim. 
Note that Ullrich teaches one or more computers and one or more storage devices (Ullrich, [sec ACKNOWLEDGEMENTS] “code”; e.g., “code” may read on “one or more computers” and “one or more storage devices” since code is run on a computer.).

Regarding claim 14
The claim is a system claim corresponding to the method claim 4, and is directed to largely the same subject matter. Thus, it is rejected for the same reasons as given in the rejection of the method claim. 

Regarding claim 15
The claim is a system claim corresponding to the method claim 5, and is directed to largely the same subject matter. Thus, it is rejected for the same reasons as given in the rejection of the method claim. 

Regarding claim 16
The claim is a computer storage media claim corresponding to the method claim 1, and is directed to largely the same subject matter. Thus, it is rejected for the same reasons as given in the rejection of the method claim. 
Note that Ullrich teaches one or more computers and one or more computer storage media (Ullrich, [sec ACKNOWLEDGEMENTS] “code”; e.g., “code” may read on “one or more computers” and “one or more storage devices” since code is run on a computer.).

Regarding claim 19
The claim is a computer storage media claim corresponding to the method claim 4, and is directed to largely the same subject matter. Thus, it is rejected for the same reasons as given in the rejection of the method claim. 

Regarding claim 20
The claim is a computer storage media claim corresponding to the method claim 5, and is directed to largely the same subject matter. Thus, it is rejected for the same reasons as given in the rejection of the method claim. 

Regarding claim 22
The claim is a computer storage media claim corresponding to the method claim 8, and is directed to largely the same subject matter. Thus, it is rejected for the same reasons as given in the rejection of the method claim. 

Regarding claim 24
The claim is a computer storage media claim corresponding to the method claim 10, and is directed to largely the same subject matter. Thus, it is rejected for the same reasons as given in the rejection of the method claim. 

Claim(s) 3, 13, 18 is/are rejected under 35 U.S.C. 103 as being unpatentable over Ullrich et al. (SOFT WEIGHT-SHARING FOR NEURAL NETWORK COMPRESSION) in view of Chen et al. (Compressing Convolutional Neural Networks) further in view of Somers et al. (Quantile regression for modelling distributions of profit and loss) further in view of Grønlund et al. (Fast Exact k-Means, k-Medians and Bregman Divergence Clustering in 1D)

Regarding claim 3
The combination of Ullrich, Chen, Somers teaches claim 1. 

the second loss function term comprises 
a sum, over the set of weights, [of a minimum] distance between the current value of the weight and the current value of a corresponding anchor point.
(Ullrich, [fig(s) 1-3] [algorithm 1] 
“
    PNG
    media_image2.png
    150
    1711
    media_image2.png
    Greyscale
”
[sec(s) 4] “We retrain pre-trained neural networks with soft weight-sharing and factorized Dirac posteriors. Hence we optimize 

    PNG
    media_image1.png
    101
    1151
    media_image1.png
    Greyscale

via gradient descent, specifically using Adam (Kingma & Ba, 2014). The KL divergence reduces to the prior because the entropy term does not depend on any trainable parameters. Note that, similar to (Nowlan & Hinton, 1992) we weigh the log-prior contribution to the gradient by a factor of τ = 0.005. In the process of retraining the weights, the variances, means, and mixing proportions of all but one component are learned.” [sec 2] “Following Nowlan & Hinton (1992) we will model the prior p(w) as a mixture of Gaussians, 
    PNG
    media_image3.png
    88
    356
    media_image3.png
    Greyscale
 (5) We learn the mixture parameters µj, σj, πj via maximum likelihood simultaneously with the network weights.”;)

However, the combination of Ullrich, Chen, Somers does not appear to distinctly disclose:
the second loss function term comprises 
a sum, over the set of weights, [of a minimum] distance between the current value of the weight and the current value of a corresponding anchor point.

Grønlund teaches
the second loss function term comprises 
a sum, over the set of weights, of a minimum distance between the current value of the weight and the current value of a corresponding anchor point.
(Grønlund, [fig(s) 1-2] [sec(s) 1] “Formally, given n points x1, . . . , xn with weights w1, . . . , wn, find centroids M = {µ1, ..., µk} ⊂ R minimizing the cost 
    PNG
    media_image10.png
    102
    293
    media_image10.png
    Greyscale
 … Given X = {x1, ..., xn} ⊂ R and a non-negative real number λ ∈ R+, compute the optimal regularized clustering: 
    PNG
    media_image11.png
    81
    536
    media_image11.png
    Greyscale
 Somewhat surprisingly, it takes only O(n) time to find the solution to the regularized k-Means if the input is sorted” [sec(s) 2] “Formally, the problem is as follows: Given X = {x1, ..., xn} ⊂ R and λ, compute the optimal regularized clustering: 
    PNG
    media_image12.png
    100
    667
    media_image12.png
    Greyscale
 If we set λ = 0 the optimal clustering has cost zero and use a cluster for each input point. If we let λ increase towards infinity, the optimal number of clusters used in the optimal solution monotonically decreases towards one (zero clusters is not well defined).”; Note that Ullrich teaches “the second loss function term comprises a sum, over the set of weights, [of a minimum] distance between the current value of the weight and the current value of a corresponding anchor point”.)

Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the neural network compression system of Ullrich, Chen, Somers with the sum of minimum distances of Grønlund. 
Doing so would lead to reducing the computational complexity while maintaining the classification accuracy when the neural network is compressed.
(Grønlund, [fig(s) 1-2] [sec(s) 1] “Somewhat surprisingly, it takes only O(n) time to find the solution to the regularized k-Means if the input is sorted.”)

Regarding claim 13
The claim is a system claim corresponding to the method claim 3, and is directed to largely the same subject matter. Thus, it is rejected for the same reasons as given in the rejection of the method claim. 

Regarding claim 18
The claim is a computer storage media claim corresponding to the method claim 3, and is directed to largely the same subject matter. Thus, it is rejected for the same reasons as given in the rejection of the method claim. 

Claim(s) 6 is/are rejected under 35 U.S.C. 103 as being unpatentable over Ullrich et al. (SOFT WEIGHT-SHARING FOR NEURAL NETWORK COMPRESSION) in view of Chen et al. (Compressing Convolutional Neural Networks) further in view of Somers et al. (Quantile regression for modelling distributions of profit and loss) further in view of Wang et al. (Diverse and Accurate Image Description Using a Variational Auto-Encoder with an Additive Gaussian Encoding Space)

Regarding claim 6
The combination of Ullrich and teaches claim 5.

Ullrich further teaches 
the components of the Gaussian mixture model are [restricted] to have a fixed standard deviation.
(Ullrich, [fig(s) 1-3, 5] [algorithm 1] 
“
    PNG
    media_image13.png
    142
    1156
    media_image13.png
    Greyscale
” [sec 2] “The effect might even be so strong that it is beneficial to have a Gamma hyper-prior on the variances of the mixture components to prevent the components from collapsing. Furthermore, note that, mixture components merge when there is not enough pressure from the error loss to keep them separated because weights are attracted by means and means are attracted by weights hence means also attract each other. In that way the network learns how many quantization intervals are necessary. We demonstrate that behaviour in Figure 3.” [sec B] “In our experiments we set the desired variance of the mixture components to 0.05. This corresponds to λ ∗ = 1/(0.05)2 = 400. We show the effect of different choices for the variance of the Gamma distribution in Figure 5.”;) 

However, the combination of Ullrich, Chen, Somers does not appear to distinctly disclose:
the components of the Gaussian mixture model are [restricted] to have a fixed standard deviation.

Wang teaches
the components of the Gaussian mixture model are restricted to have a fixed standard deviation.
(Wang, [sec(s) 2-3] “We can model p(z|c) as a Gaussian mixture with weights ck and components with means µk and standard deviations σk: 
    PNG
    media_image14.png
    105
    436
    media_image14.png
    Greyscale
 (3) where ck is defined as the weights above and µk represents the mean vector of the k-th component. In practice, for all components, we use the same standard deviation σ. … It is not directly tractable to optimize Eq. (2) with the above GMM prior. We therefore approximate the KL divergence stochastically [12]. In each step during training, we first draw a discrete component k according to the cluster probability c(I), and then sample z from the resulting Gaussian component. Then we have 

    PNG
    media_image15.png
    186
    1134
    media_image15.png
    Greyscale
(4) We plug the above KL term into Eq. (2) to obtain an objective function, which we optimize w.r.t. the encoder and decoder parameters φ and θ using stochastic gradient descent (SGD). In principle, the prior parameters µk and σk can also be trained, but we obtained good results by keeping them fixed (the means are drawn randomly and all standard deviations are set to the same constant, as will be further explained in Section 4).”; Note that Ullrich teaches “the components of the Gaussian mixture model are [restricted] to have a fixed standard deviation”.)

Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the neural network compression system of Ullrich, Chen, Somers with the fixed standard deviation of Wang. 
Doing so would lead to achieving good results for clustering by keeping the standard deviations during the training. 
(Wang, [sec(s) 2-3] “Clearly, the prior has to change based on the content of the image. However, because of the need to efficiently compute the KL-divergence in closed form, it still needs to have a simple structure, ideally a Gaussian or a mixture of Gaussians. … In principle, the prior parameters µk and σk can also be trained, but we obtained good results by keeping them fixed (the means are drawn randomly and all standard deviations are set to the same constant, as will be further explained in Section 4).”)

Regarding claim 21
The claim is a computer storage media claim corresponding to the method claim 10, and is directed to largely the same subject matter. Thus, it is rejected for the same reasons as given in the rejection of the method claim. 

Claim(s) 9, 23 is/are rejected under 35 U.S.C. 103 as being unpatentable over Ullrich et al. (SOFT WEIGHT-SHARING FOR NEURAL NETWORK COMPRESSION) in view of Chen et al. (Compressing Convolutional Neural Networks) further in view of Somers et al. (Quantile regression for modelling distributions of profit and loss) further in view of Caragea et al. (US 10,776,368 B1) further in view of Mayou et al. (Compute the average for quantiles)

Regarding claim 9
The combination of Ullrich, Chen, Somers teaches claim 1.

at each of the multiple training iterations, adjusting the current values of the set of anchor points at the training iteration based on the set of quantiles of the current values of the set of weights of the neural network at the training iteration comprises, (see the rejections of claim 1)

Ullrich further teaches 
for each of one or more of the anchor points: 
identifying [a pair of quantiles] corresponding to the anchor point, wherein [the pair of quantiles are included in the set of quantiles] of the current values of the set of weights of the neural network at the training iteration; 
(Ullrich, [fig(s) 1-3] [algorithm 1] 
“
    PNG
    media_image2.png
    150
    1711
    media_image2.png
    Greyscale
”
[sec(s) 4] “We retrain pre-trained neural networks with soft weight-sharing and factorized Dirac posteriors. Hence we optimize 

    PNG
    media_image1.png
    101
    1151
    media_image1.png
    Greyscale

via gradient descent, specifically using Adam (Kingma & Ba, 2014). The KL divergence reduces to the prior because the entropy term does not depend on any trainable parameters. Note that, similar to (Nowlan & Hinton, 1992) we weigh the log-prior contribution to the gradient by a factor of τ = 0.005. In the process of retraining the weights, the variances, means, and mixing proportions of all but one component are learned.” [sec 2] “The approach naturally encourages quantization because in order to optimize the cross-entropy the weights will cluster tightly around the cluster means, while the cluster means themselves move to some optimal location driven by LE.”;)

identifying a subset of the set of weights of the neural network, wherein the current value of each weight included in the subset is between [the pair of quantiles] corresponding to the anchor point; 
(Ullrich, [fig(s) 1-3] [algorithm 1] 
“
    PNG
    media_image2.png
    150
    1711
    media_image2.png
    Greyscale
”
[sec(s) 2] “The approach naturally encourages quantization because in order to optimize the cross-entropy the weights will cluster tightly around the cluster means, while the cluster means themselves move to some optimal location driven by LE. The effect might even be so strong that it is beneficial to have a Gamma hyper-prior on the variances of the mixture components to prevent the components from collapsing. Furthermore, note that, mixture components merge when there is not enough pressure from the error loss to keep them separated because weights are attracted by means and means are attracted by weights hence means also attract each other. In that way the network learns how many quantization intervals are necessary. We demonstrate that behaviour in Figure 3.”
[sec(s) 4] “We retrain pre-trained neural networks with soft weight-sharing and factorized Dirac posteriors. Hence we optimize 

    PNG
    media_image1.png
    101
    1151
    media_image1.png
    Greyscale

via gradient descent, specifically using Adam (Kingma & Ba, 2014). The KL divergence reduces to the prior because the entropy term does not depend on any trainable parameters. Note that, similar to (Nowlan & Hinton, 1992) we weigh the log-prior contribution to the gradient by a factor of τ = 0.005. In the process of retraining the weights, the variances, means, and mixing proportions of all but one component are learned.”)

determining a new value for the anchor point based on an [average] of the current values of the weights included in the subset of the set of weights of the neural network; and 
updating the current value of the anchor point to be the new value for the anchor point.
(Ullrich, [fig(s) 1-3] [algorithm 1] 
“w ← initialize network weights with pre-trained network weights
    PNG
    media_image2.png
    150
    1711
    media_image2.png
    Greyscale
”
[sec(s) 4] “We retrain pre-trained neural networks with soft weight-sharing and factorized Dirac posteriors. Hence we optimize 

    PNG
    media_image1.png
    101
    1151
    media_image1.png
    Greyscale

via gradient descent, specifically using Adam (Kingma & Ba, 2014). The KL divergence reduces to the prior because the entropy term does not depend on any trainable parameters. Note that, similar to (Nowlan & Hinton, 1992) we weigh the log-prior contribution to the gradient by a factor of τ = 0.005. In the process of retraining the weights, the variances, means, and mixing proportions of all but one component are learned.” [sec 2] “Following Nowlan & Hinton (1992) we will model the prior p(w) as a mixture of Gaussians, 
    PNG
    media_image3.png
    88
    356
    media_image3.png
    Greyscale
 (5) We learn the mixture parameters µj, σj, πj via maximum likelihood simultaneously with the network weights. … The approach naturally encourages quantization because in order to optimize the cross-entropy the weights will cluster tightly around the cluster means, while the cluster means themselves move to some optimal location driven by LE. The effect might even be so strong that it is beneficial to have a Gamma hyper-prior on the variances of the mixture components to prevent the components from collapsing. Furthermore, note that, mixture components merge when there is not enough pressure from the error loss to keep them separated because weights are attracted by means and means are attracted by weights hence means also attract each other. In that way the network learns how many quantization intervals are necessary. We demonstrate that behaviour in Figure 3.”; e.g., “means” may read on “anchor points”.)

Somers further teaches 
identifying [a pair of] quantiles corresponding to the anchor point, wherein [the pair of] quantiles are included in the set of quantiles of the current values of the set of weights of the neural network at the training iteration; 
identifying a subset of the set of weights of the neural network, wherein the current value of each weight included in the subset is between [the pair of] quantiles corresponding to the anchor point; 
 (Somers, [fig(s) 1-3] [sec(s) 2] “Quantile regression is based on the quantile loss function, q, defined for a given percentile s 2 (0, 1) by 
    PNG
    media_image7.png
    46
    384
    media_image7.png
    Greyscale
, (1) where I is an indicator function. It is sometimes called the check or tick function because of its shape. The quantile loss function ρτ(y - 6.2) is plotted in Fig. 1. This loss function has the key property that minimising the expectation Eρτ(Y - a) with respect to a gives the τth quantile, F-1(τ), of Y. … Kernel quantile regression finds the estimate of the quantile a = a(x0) for a grid of values of x0, that minimizes 
    PNG
    media_image8.png
    77
    178
    media_image8.png
    Greyscale
, for given τ and kernel weights wj that depend on the grid point.” [sec(s) 3] “The integral may be approximated by numerical integration at the quantiles of H using a trapezium-like rule. The quantiles s0, s1,...,sk correspond to a grid of k + 1 equi-percentile intervals with constant probability π apart. … 
    PNG
    media_image9.png
    40
    256
    media_image9.png
    Greyscale
 (7) denote the midpoints and j indexes the relevant quantile. Fig. 3 illustrates this computation.”; Note that Ullrich teaches “identifying [a pair of quantiles] corresponding to the anchor point, wherein [the pair of quantiles are included in the set of quantiles] of the current values of the set of weights of the neural network at the training iteration; identifying a subset of the set of weights of the neural network, wherein the current value of each weight included in the subset is between [the pair of quantiles] corresponding to the anchor point”.)

The combination of Ullrich, Chen, Somers is combinable with Somers for the same rationale as set forth above with respect to claim 1.

However, the combination of Ullrich, Chen, Somers does not appear to explicitly teach:
identifying [a pair of] quantiles corresponding to the anchor point, wherein [the pair of] quantiles are included in the set of quantiles of the current values of the set of weights of the neural network at the training iteration; 
identifying a subset of the set of weights of the neural network, wherein the current value of each weight included in the subset is between [the pair of] quantiles corresponding to the anchor point; 
determining a new value for the anchor point based on an [average] of the current values of the weights included in the subset of the set of weights of the neural network; and 

Caragea
identifying a pair of quantiles corresponding to the anchor point, wherein the pair of quantiles are included in the set of quantiles of the current values of the set of weights of the neural network at the training iteration; 
identifying a subset of the set of weights of the neural network, wherein the current value of each weight included in the subset is between [the pair of] quantiles corresponding to the anchor point; 
(Caragea [figs 1] “quantile boundary values 132” and “quantile boundary values 134” [fig(s) 7] “Deriving a cardinality value for the predicate according to boundary values of quantile(s) that include the predicate and were determined according to an approximate quantile summary generated for the column of the database table 720” [fig(s) 8] “Evaluate the quantiles to identify boundary values of one or more of the quantiles that include the predicate 830” [col 2, ln 56– col 4, ln 46] “Query engine 100 may implement cardinality estimation 110 to derive cardinality values for predicate(s) of query 102 from quantile boundary values 112 that include the predicate of the query directed to the column for which AQS 130 is generated. For example, point predicate 152 may be a predicate directed to the column that specifies a single column value. The column value specified by the predicate or that otherwise satisfies the predicate criteria may be compared with quantile boundary values to determine that point predicate 152 is included in the first quantile. Predicates may be ranges of column values in some embodiments. For example range predicate 154 may be a range of column values that satisfy the predicate in the column, in some embodiments. Range predicates be within a single quantile or span multiple quantiles as illustrated in FIG. 1.” See also [col 15, ln 1– col 18, ln 29]; e.g., “determine that point predicate 152 is included in the first quantile” along with “Range predicates be within a single quantile or span multiple quantiles” read(s) on “identifying a pair of quantiles corresponding to the anchor point” and “the pair of quantiles corresponding to the anchor point”. In addition, Caragea does not appear to explicitly teach but suggests “the current value of each weight included in the subset is between [the pair of] quantiles” based on “determine that point predicate 152 is included in the first quantile” along with “Range predicates be within a single quantile or span multiple quantiles” and fig 1 with multiple predicates.
Note that the combination of Ullrich, Somers teaches “identifying [a pair of] quantiles corresponding to the anchor point, wherein [the pair of] quantiles are included in the set of quantiles of the current values of the set of weights of the neural network at the training iteration; identifying a subset of the set of weights of the neural network, wherein the current value of each weight included in the subset is between [the pair of] quantiles corresponding to the anchor point;”.)

Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the neural network compression system of Ullrich, Chen, Somers with the pair of quantiles corresponding to a point of Caragea. 
Doing so would lead to improving the cost estimation of different operations, resulting in improving the accuracy of query planning.
(Caragea [col 1, ln 6– col 2, ln 55] “Planning the performance of a query is often implemented in order to select the most cost efficient way to perform the query. The cost to perform different operations for the query may be estimated so that different operations or configurations of operations may be selected to provide the optimal query plan. Techniques that improve the cost estimation of different operations can improve the accuracy of query planning and thus are desirable.”;)

However, the combination of Ullrich, Chen, Somers, Caragea does not appear to explicitly teach:
identifying a subset of the set of weights of the neural network, wherein the current value of each weight included in the subset is between [the pair of] quantiles corresponding to the anchor point; 
determining a new value for the anchor point based on an [average] of the current values of the weights included in the subset of the set of weights of the neural network; and 

Mayou teaches
identifying a subset of the set of weights of the neural network, wherein the current value of each weight included in the subset is between the pair of quantiles corresponding to the anchor point; 
determining a new value for the anchor point based on an average of the current values of the weights included in the subset of the set of weights of the neural network; and 
(Mayou, [pp. 1-2] 
“
    PNG
    media_image16.png
    161
    611
    media_image16.png
    Greyscale
 … 

    PNG
    media_image17.png
    562
    567
    media_image17.png
    Greyscale

So if you simply need average of values between quantile(x, 0.25) and quantile(x, 0.5):

    PNG
    media_image18.png
    121
    360
    media_image18.png
    Greyscale
”; e.g., “averageQuantile” along with “getChunkOfVector” read(s) on “identifying a subset of the set of weights” and “wherein the current value of each weight included in the subset is between the pair of quantiles” since a mean is calculated based on the values which are within a range. 
Note that Ullrich teaches “identifying a subset of the set of weights of the neural network, wherein the current value of each weight included in the subset is between [the pair of] quantiles corresponding to the anchor point; determining a new value for the anchor point based on an [average] of the current values of the weights included in the subset of the set of weights of the neural network”.)

Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the neural network compression system of Ullrich, Chen, Somers, Caragea with the averages between quantiles of Mayou. 
Doing so would lead to realistic and robust estimates suitable for use in financial applications based on distribution estimation.
(Somers, [fig(s) 3] [sec(s) 1] “A quantile regression model explicitly estimates the distribution of house prices realised and thus leads to realistic and robust estimates suitable for use in Basel II applications.”)

Regarding claim 23
The claim is a computer storage media claim corresponding to the method claim 10, and is directed to largely the same subject matter. Thus, it is rejected for the same reasons as given in the rejection of the method claim. 

Prior Art
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Piketty et al. (How Progressive is the U.S. Federal Tax System? A Historical and International Perspective) teaches average tax rates for different quantile groups.
Gan et al. (Enhancing short-term probabilistic residential load forecasting with quantile long–short-term memory) teaches average quantile scores.
Swamy et al. (US 2018/0137564 A1) teaches average annual incomes based on quartiles.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SEHWAN KIM whose telephone number is (571)270-7409. The examiner can normally be reached Mon - Thu 7:00 AM - 5:00 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Michael J Huntley can be reached on (303) 297-4307. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/S.K./Examiner, Art Unit 2129                                                                                                                                                                                                        10/8/2022
/MICHAEL J HUNTLEY/Supervisory Patent Examiner, Art Unit 2129