DETAILED ACTION
This non-final rejection is responsive to the amendments and remarks in the request for continued examination filed 28 June 2021.
Claims 1 and 16 are amended. No claims have been added, cancelled, or withdrawn. Therefore, claims 1-20 are presently pending.

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 28 June 2021 has been entered.

Response to Arguments
Applicant’s arguments with respect to claim(s) 1-20 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.

Claims 1-20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Regarding claim 1, the amendments to claim 1 recite the limitation “wherein the sign of the weight being updated is opposite of the weight.” However, the only relevant support the Examiner could find in the Specification regarding this limitation states, “The sign of the weight updates is always the opposite of the weights” (Specification, P[0038]). The Examiner could not find support for the amended limitation; so, for purposes of examination, this limitation is being interpreted as “wherein the sign of the change of the weight is opposite of the weight.”
Claims 2-15 are rejected for their dependency on an indefinite claim.
Claim 16 recites limitations similar to those recited in claim 1 and is rejected for the same reasons.
Claims 17-20 are rejected for their dependency on an indefinite claim.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective 

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claims 1-6, 8, and 11-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Blundell et al. (“Weight Uncertainty in Neural Networks,” 2015, Proceedings of the 32nd International Conference on Machine Learning, JMLR: W&CP volume 37, 10 pages) (“Blundell”) in view of David et al. (US 2019/0108436) (“David”).
Regarding claim 1, Blundell teaches a method comprising: 
providing a deep neural networks (DNN) model comprising a plurality of nodes (Blundell, pp. 1-2, Section 1 and Figure 1, “All weights in our neural networks are represented by probability distributions over possible values, rather than having a single fixed value as is the norm (see Figure 1). … the proposed method trains an ensemble of networks, where each network has its weights drawn from a shared, learnt probability distribution.” Figure 1 shows a DNN model comprising a plurality of nodes.); 
sampling a change of a weight of a plurality of weights that corresponds to the plurality of nodes based on a distribution function (Blundell, pp. 1-2, Section 1 and Figure 1, “the proposed method trains an ensemble of networks, where each network has its weights drawn from a shared, learnt probability distribution.” Blundell, p. 2, Section 2, “If w are given a Gaussian prior, this yields L2 regularisation (or weight decay). If w are given a Laplace prior, then L1 regularisation is recovered.” Blundell, p. 4, Section 3.2, “Suppose that the variational posterior is a diagonal Gaussian distribution, then a sample of the weights w can be obtained by sampling a unit Gaussian, shifting it by a mean                         
                            μ
                        
                     and scaling by a standard deviation                         
                            σ
                        
                    .” Steps 1-7 of the optimization discloses sampling a change of a weight of a plurality of weights that corresponds to the plurality of nodes based on a distribution function.); 
updating the weight with the change of the weight multiplied by a sign of the weight (Blundell, p. 2, Section 2, “If w are given a Gaussian prior, this yields L2 regularisation (or weight decay). If w are given a Laplace prior, then L1 regularisation is recovered.” L1 regularisation discloses updating the weight with the change of the weight multiplied by a sign of the weight.); and 
training the DNN model by iterating the steps of sampling the change and updating the weight (Blundell, p. 4, Section 3.2, Steps 1-7 of the optimization disclose training the DNN model by iterating the steps of sampling the change and updating the weight. Blundell, p. 4, Section 3.3, “the prior [is] more amenable to use during optimization by stochastic gradient descent and avoids the need for prior parameter optimization based upon training data.”)
….
Blundell is inexplicit in disclosing the method,
…
wherein the sign of the weight being updated [i.e., the change of the weight] is opposite of the weight, and a positive weight is decreasingly updated and a negative weight is increasingly updated, and 
wherein the plurality of weights has a high rate of sparsity after the training.  
However, David teaches the method,
wherein the sign of the weight being updated [i.e., the change of the weight] is opposite of the weight, and a positive weight is decreasingly updated and a negative weight is increasingly updated (David, P[0078], “Some embodiments of the invention may prune neuron connections using L1 regularization during neural network training in each of one or more iterations (e.g., in addition to weight correcting updates such as backpropagation). The weights wij of the neural network may be updated to weights wij′ in each training iteration, for example, as follows: w′ij =wij−sgn(wij)*d where d is a ‘weight decay’ parameter (typically a very small number) and sgn is the sign function.”), and 
wherein the plurality of weights has a high rate of sparsity after the training (David, P[0076], “Several embodiments are provided for inducing sparsity during training,” including L1 regularization. David, P[0019], “Embodiments of the invention provide a novel system and method to generate a sparse neural network by pruning weak synapse connections during the training phase.”).  
While Blundell discloses using L1-regularization of weights during neural network training, Blundell does not explicitly disclose information about the sign of the weight change being opposite of the weight. However, David is also directed to using L1-regularization of weights during neural network training and explicitly discloses that L1-regularization involves the sign of the weight change being opposite that of the weight, which also induces sparsity during training (David, P[0076]). It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Blundell to incorporate the teachings of L1-regularization, as taught by David, to yield predictable results of inducing sparsity during training when using L1-regularization for weights. 

Regarding claim 2, Blundell in view of David teaches the method of claim 1.
David further teaches the method, wherein the deep neural networks are convolutional neural networks (CNN) (David, P[0024], “The method is agnostic to the type of neural network and can be applied to any neural network architecture, for example, including but not limited to, fully connected, partially connected, convolutional, recurrent, etc., and results in significant sparsity without adversely affecting the network accuracy.”).  
Blundell is directed to an algorithm for learning the weights of a neural network but does not disclose applying the method to convolutional neural networks explicitly. However, David is also directed to learning weights of a neural network and discloses wherein the method can be applied to any neural network architecture including convolutional neural networks. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the neural networks model in Blundell to be a convolutional neural networks model, as disclosed in David, to yield the predictable result of learning weights of a convolutional neural networks model.

Regarding claim 3, Blundell in view of David teaches the method of claim 1.
David further teaches the method, wherein the deep neural networks are Recurrent Neural Networks (RNN) (David, P[0024], “The method is agnostic to the type of neural network and can be applied to any neural network architecture, for example, including but not limited to, fully connected, partially connected, convolutional, recurrent, etc., and results in significant sparsity without adversely affecting the network accuracy.”).  
Blundell is directed to an algorithm for learning the weights of a neural network but does not disclose applying the method to recurrent neural networks explicitly. However, David is also directed to learning weights of a neural network and discloses wherein the method can be applied to any neural network architecture including recurrent neural networks. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the neural networks model in Blundell to be a recurrent neural networks model, as disclosed in David, to yield the predictable result of learning weights of a recurrent neural networks model.

Regarding claim 4, Blundell in view of David teaches the method of claim 1.
Blundell further teaches the method, wherein the distribution function is an exponential decay function (Blundell, p. 2, Section 2, “If w are given a Gaussian prior, this yields L2 regularisation (or weight decay). If w are given a Laplace prior, then L1 regularisation is recovered.” The disclosed priors teach wherein the distribution function is an exponential decay function.).

Regarding claim 5, Blundell in view of David teaches the method of claim 1.
Blundell further teaches the method, wherein the weights are updated during training using a stochastic gradient descent (SGD) algorithm (Blundell, p. 4, Section 3.3, “the prior [is] more amenable to use during optimisation by stochastic gradient descent and avoids the need for prior parameter optimisation based upon training data.”).

Regarding claim 6, Blundell in view of David teaches the method of claim 5.
Blundell further teaches the method, wherein an amount of change in a step of updating the weight is small enough not to invalidate the convergence of the SGD algorithm (Blundell, p. 6, Section 5.1, “Figure 2 shows the learning curves on the test set for Bayes by Backprop, dropout and SGD on a network with two layers of 1200 rectified linear units. As can be seen, SGD converges the quickest, initially obtaining a low test error and then overfitting. Bayes by Backprop and dropout converge at similar rates…. Eventually Byes by Backprop converges on a better test error than dropout after 600 epochs.” Because SGD converges, it is implicit that an amount of change in a step of updating the weight is small enough not to invalidate the convergence of the SGD algorithm.).

Regarding claim 8, Blundell in view of David teaches the method of claim 1.
Blundell further teaches the method, further comprising: combining a mixture of multiple distributions of weights for weight clustering (Blundell, p. 4, Section 3.2, “We propose using a scale mixture of two Gaussian densities as the prior. Each density is zero mean, but differing variances.”).

Regarding claim 11, Blundell in view of David teaches the method of claim 1.
David further teaches the method, wherein the DNN model is implemented in a software framework of a computer system (David, P[0096], “Memory 515 may store data 517 including a training dataset and data representing a plurality of weights of the neural network. Data 517 may also include code (e.g., software code) or logic, e.g., to enable storage and retrieval of data 517 according to embodiments of the invention.”).  
While Blundell discloses training and implementing neural network models, Blundell does not explicitly disclose implementing the neural network models in a software framework. However, David is also directed to training and implementing neural network models and explicitly discloses implementing the neural network models in a software framework. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Blundell to incorporate the software framework, as taught by David, to yield predictable results of implementing a neural network model in a software framework.

Regarding claim 12, Blundell in view of David teaches the method of claim 1.
David further teaches the method, wherein the computer system includes a random number generating hardware that generates a weight perturbation and apply the weight perturbation to the weight (David, P[0086], “Weights may be set to zero with either a fixed small probability (fully-random zeroing), or with a probability proportional to their current value (partially-random zeroing). In the latter case of partially-random zeroing the smaller the weight, the larger the probability of it becoming zero.” David, P[0118], “Embodiments of the invention may include an article … which, when executed by a processor or controller (e.g., processor 556 of FIG. 5, carry out methods disclosed herein.”).  
Both Blundell and David are directed to training and implementing neural network models. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Blundell to include a random number generating hardware that generates a weight perturbation and applies the weight perturbation to the weight, as disclosed in David. Doing so allows for random zeroing, which induces sparsity during training (David, PP[0076, 0086]); this is advantageous, because sparse neural networks result in speed-up on any hardware compared to non-sparse networks (David, P[0023]).

Regarding claim 13, Blundell in view of David teaches the method of claim 11.
David further teaches the method, wherein the computer system includes an image sensor or a camera for receiving an image input (David, P[0098], “In the application of facial recognition, a device may use the sparse neural network to efficiently perform facial recognition to trigger the device to unlock itself or a physical door when a match is detected. In the application of security, a security camera system may use the sparse neural network to efficiently detect a security breach and sound an alarm or other security measure.”).  
Both Blundell and David are directed to training and implementing neural network models. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Blundell to include an image sensor or camera for receiving an image input, as disclosed in David. Doing so allows the neural network model to be used in the application of facial recognition or security (David, P[0098]).

Regarding claim 14, Blundell in view of David teaches the method of claim 12.
David further teaches the method, wherein the DNN model is applied to a computer-vision application including image classification, image segmentation, and object detection (David, P[0098], “In the application of facial recognition, a device may use the sparse neural network to efficiently perform facial recognition [image classification] to trigger the device to unlock itself or a physical door when a match is detected. In the application of security, a security camera system may use the sparse neural network to efficiently detect a security breach and sound an alarm or other security measure. In the application of autonomous driving, a vehicle computer may use the sparse neural network to control driving operations, e.g., to steer away to avoid a detected object [image segmentation and object detection].”).  
Both Blundell and David are directed to training and implementing neural network models. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Blundell to include the applications of a computer-vision application, as disclosed in David, to yield predictable results of applying a neural network model to real life applications of deep learning. This is advantageous in applications such as facial recognition, security, and autonomous driving (David, P[0098]).

Regarding claim 15, Blundell in view of David teaches the method of claim 12.
David further teaches the method, wherein the DNN model is applied to autonomous driving, augmented reality (AR), or virtual reality (VR) (David, PP[0097-0098], “Local endpoint device(s) 550 may each include one or more memories 558 for storing a sparse neural network according to a data representation (e.g., 206 of FIG. 2, 306 of FIG. 3, or 406 of FIG. 4) provided in some embodiments of the invention. … In various applications, local endpoint device(s) 550 is part of a system for image recognition, computer vision, virtual or augmented reality, speech recognition, text understanding, or other applications of deep learning. … In the application of autonomous driving, a vehicle computer may use the sparse neural network to control driving operations, e.g., to steer away to avoid a detected object.”).  
Both Blundell and David are directed to training and implementing neural network models. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Blundell to include the applications of a computer-vision application, as disclosed in David, to yield predictable results of applying a neural network model to real life applications of deep learning. This is advantageous in applications such as facial recognition, security, and autonomous driving (David, P[0098]).

Regarding claims 16-20, claims 16-20 are directed to a computer system comprising an image repository including a plurality of images and a processor configured to run a deep neural networks (DNN) model comprising a plurality of nodes, wherein the processor performs steps similar to those recited in claims 1 and 12-15. Therefore the rejection made to claims 1 and 12-15 are applied to claims 16-20.
In addition, David further teaches a computer system comprising an image repository including a plurality of images and a processor configured to run a deep neural networks (DNN) mode and to update the image repository (David, P[0042], “CNN 400 may have an input layer that represents a color image and has three color-channels (e.g., red, green and blue channels).” David, P[0096], “Remote server 510 may have a memory 515 for storing a neural network and a processor 516 for training and/or predicting based on the neural network. Remote server 510 may prune a dense neural network (e.g., 100 of FIG. 1) to generate a sparse neural network (e.g., 200 of FIG. 1), or may initially generate or receive a sparse neural network. In some embodiments, remote server 510 may have specialized hardware including a large memory 515 for storing a neural network and a specialized processor 516 (e.g., a GPU), for example, when a dense neural network is used. Memory 515 may store data 517 including a training dataset and data representing a plurality of weights of the neural network.” David, P[0097], “Local endpoint device(s) 550 may each include one or more processor(s) 556 for training, and/or executing prediction based on, the weights of the sparse neural network stored in memory 558. During prediction, the neural network is run forward once. During training, a neural network is run twice, once forward to generate an output and once backwards for error correction (e.g., backpropagation).”).

Claims 7 and 9-10 is/are rejected under 35 U.S.C. 103 as being unpatentable over Blundell in view of David, further in view of Kim et al. (“Deep neural network with weight sparsity control and pre-training extracts hierarchical features and enhances classification performance: Evidence from whole-brain resting state functional connectivity patterns of schizophrenia,” 2016, Neuroimage, pp. 1-54) (“Kim”).
Regarding claim 7, Blundell in view of David teaches the method of claim 1.
Neither Blundell nor David teach the method, further comprising: determining whether pre-trained weights are available, and using the pre-trained weights.
However, Kim teaches the method, further comprising: determining whether pre-trained weights are available, and using the pre-trained weights (Kim, p. 8, Pre-training of DNN weights for initialization, “Pre-training of DNN weights as opposed to random initialization has proven its utility to circumvent a local minimum and thus enhances the classification performance. … the trained weights and bias terms from these AEs were stacked and used as initial weights of the DNN in the subsequent fine-tuning phase using the target output and actual output of the input sample.” It is implicit that it is determined that pre-trained weights are available and then used.). 
Both the combination of Blundell and David and disclosure of Kim are directed to training sparse neural networks. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the combination to determine the availability of and use of pre-trained weights, as disclosed in Kim. Doing so is advantageous, because “[p]re-training of DNN weights as opposed to random initialization has proven its utility to circumvent a local minimum and thus enhances the classification performance” (Kim, p. 8, Pre-training of DNN weights for initialization).

Regarding claim 9, Blundell in view of David teaches the method of claim 1.
Neither Blundell nor David teach the method, wherein a predetermined number of iterations is performed.
However, Kim teaches the method, wherein a predetermined number of iterations is performed (Kim, pp. 8-9, Pre-training of DNN weights for initialization, “The total number of epochs was set to 1,000 to allow for convergence of the weights.”).
Both the combination of Blundell and David and disclosure of Kim are directed to training sparse neural networks. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the combination to perform a predetermined number of iterations, as disclosed in Kim. Doing so is advantageous, because it allows for the full convergence of the weights during training (Kim, pp. 8-9, Pre-training of DNN weights for initialization).

Regarding claim 10, Blundell in view of David teaches the method of claim 1.
Neither Blundell nor David teach the method, wherein the iteration continues until a predetermined rate of sparsity is achieved.
However, Kim teaches the method, wherein the iteration continues until a predetermined rate of sparsity is achieved (Kim, p. 7, DNN training with sparsity control of weights, “Note that [the L1-norm regularization parameter] was adaptively controlled to reach a target sparsity level of weights between [two subsequent] layers.”).  
Both the combination of Blundell and David and disclosure of Kim are directed to training sparse neural networks. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the combination to achieve a predetermined rate of sparsity, as disclosed in Kim. Doing so is advantageous, because it “permits the systematic evaluation of associations between (1) the degrees of weight sparsity in each of the hidden layers during the training phase, and (2) the consequent classification accuracies during the test phase” (Kim, p. 7, Proposed scheme for sparsity control of DNN weights).







Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Li et al. (US 2019/0050734) is directed to a dynamic compression method of deep neural networks by a pruning operation for reaching a target compression ratio.
Srinivas et al. (“Training Sparse Neural Networks,” 21 November 2016, arXiv:1611.06694v1 [cs.CV], 7 pages) is directed to training sparse neural networks.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to CATHERINE F LEE whose telephone number is (571)270-7487. The examiner can normally be reached Monday thru Friday, 10:00AM-6:00PM PST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Miranda Huang can be reached on (571)270-7092. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/C.F.L./Examiner, Art Unit 2124                                                                                                                                                                                                        
/MIRANDA M HUANG/Supervisory Patent Examiner, Art Unit 2124