DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This non-final office action is responsive to the application filed on 01/08/2018.
Claims 1-20 are pending and have been examined.
Claims 1-20 are rejected.

Information Disclosure Statement
Applicant’s Information Disclosure Statements, filed 01/08/2018, have been received, entered into the record, and considered. See attached form PTO-1449

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.

4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claims 1, 4, 7-11, 13, 16, and 18-20 are rejected under 35 U.S.C. 103 as being unpatentable over Achterhold et al. (US 2020/0342315 A1), hereafter referred to as Achterhold, in view of Yu et al. (US 8,700,552 B2), hereafter referred to as Yu.

Regarding claim 1, Achterhold teaches a method comprising: 
providing a deep neural networks (DNN) model comprising a plurality of layers, each layer of the plurality of layers includes a plurality of nodes (Achterhold: paragraph [0033], Fig. 1, “Deep neural network 10 is made up of a plurality of layers 12, each of which includes a plurality of neurons 11”, neurons read nodes);
sampling a change of a weight for each of a plurality of weights based on a distribution function, each weight of the plurality of weights corresponds to each node of the plurality of nodes (Achterhold: paragraph [0041], “distribution functions 20 may be used for one selected subset each of the weights of deep neural network 10”. Paragraph [0050], “distribution function may be used, since methods such as, for example, a Monte Carlo sampling”. Paragraph [0053], “A change variable of the weights…the values of the weights are adapted as a function of adapted posterior distribution function 24”. Paragraph [0004], “The deep neural network includes a plurality of layers and connections including weights”, here, a Monte Carlo sampling and a change variable of the weights are representing as sampling a change of a weight, and layers include nodes which include weights);
updating the weight with the change of the weight multiplied by a sign of the weight (Achterhold: paragraph [0053], “A change variable of the weights…the values of the weights are adapted”. Paragraph [0034], “Each weight preferably has a value between including −1 and 1 and the output variable of the neuron is weighted by a multiplication by this weight”, here, the neuron is weighted by a multiplication is representing as updating the weight); and
training the DNN model by iterating the steps of sampling the change and updating the weight (Achterhold: paragraph [0021], “training of the deep neural network is repeated multiple times”. Paragraph [0007], “training the deep neural network in such a way that the deep neural network detects an object as a function of the training input variable of the deep neural network. During the training of the deep neural network, at least one value of one of the weights is adapted”, weight is adapted reads updating the weight),
Achterhold does not distinctly disclose wherein the plurality of weights has a high rate of sparsity after the training.
However, Yu teaches:
wherein the plurality of weights has a high rate of sparsity after the training (Yu: col 4, lines 58-64, “weight magnitudes are in top q are considered in further training…when the degree of sparseness is high (i.e., q is small)”. Col 4, lines 41-42, “where q is a threshold value for the maximal number of nonzero weights allowed”).
Yu: col 4, lines 64-65).

Regarding claim 4, Achterhold in view of Yu teaches the method of claim 1 as cited above and Achterhold further teaches:
wherein the distribution function is an exponential decay function (Achterhold: paragraph [0050], “deviation of prior distribution function 20 relative to posterior distribution function 24. Each shifted deviation is subsequently weighted with the aid of a weighting function, in particular, of a Gaussian function”, here, a Gaussian function is representing as an exponential decay function).

Regarding claim 7, Achterhold in view of Yu teaches the method of claim 1 as cited above and Achterhold further teaches:
determining whether pre-trained weights are available, and using the pre-trained weights (Achterhold: paragraph [0011], “the prior distribution function may be an assumed distribution function of the predefinable discrete values of one weight or of all weights before the training”. Paragraph [0012], “already known created deep neural functions, it is possible to reuse a piece of information about the distribution of the weight values”, here, the predefinable discrete values of weights are representing as pre-trained weights).

Regarding claim 8, Achterhold in view of Yu teaches the method of claim 1 as cited above and Achterhold further teaches:
combining a mixture of multiple distributions of weights for weight clustering (Achterhold: paragraph [0012], “multiple weights are able to be combined to form a filter, these weights may be assigned the same prior distribution function”. Paragraph [0041], “prior distribution function 20 may, for example, be carried out with the aid of a cluster analysis of the weight values”).

Regarding claim 9, Achterhold in view of Yu teaches the method of claim 1 as cited above and Achterhold further teaches:
wherein a predetermined number of iterations is performed (Achterhold: paragraph [0021], “the sequence of the steps of ascertaining the variable characterizing the cost function and of the training of the deep neural network is repeated multiple times until an abort criterion is met”).

Regarding claim 10, Achterhold in view of Yu teaches the method of claim 1 as cited above and Achterhold further teaches:
wherein the iteration continues until a predetermined rate of sparsity is achieved (Achterhold: paragraph [0021], “the training of the deep neural network is repeated multiple times until an abort criterion is met. The abort criterion may, for example, be a predefinable number of repetitions of the sequence of the steps”, a predefinable number of repetitions reads a predetermined rate of sparsity).

Regarding claim 11, Achterhold in view of Yu teaches the method of claim 1 as cited above and Achterhold further teaches:
wherein the DNN model is implemented in a software framework of a computer system (Achterhold: paragraph [0001], “a method for creating a deep neural network, to a computer program and to a device”. Paragraph [0024], “computer program including instructions which, when executed on a computer…and a machine-readable memory element, on which the computer program is stored”, a computer program including instructions is representing as a software framework of a computer system).

Regarding claim 13, Achterhold in view of Yu teaches the method of claim 11 as cited above and Achterhold further teaches:
wherein the computer system includes an image sensor or a camera for receiving an image input (Achterhold: paragraph [0023], “the method may be used in order to create deep neural networks, which may be operated on a mobile processing unit. A mobile processing unit, in particular, mobile telephones or cameras”, cameras are used in order to receive an input image).

Regarding claim 16, Achterhold teaches a computer system comprising: 
an image depository including a plurality of images (Achterhold: paragraph [0032], Fig. 1, “system 01 includes a processing unit 16, which includes a memory element 17”. Paragraph [0023], “A mobile processing unit, in particular, mobile telephones or cameras are characterized by limited memory space”, a processing unit or mobile processing unit, in particular, mobile telephones or cameras is representing as an image depository and memory element holds the images); and
a processor configured to run a deep neural networks (DNN) model comprising a plurality of layers, each layer of the plurality of layers includes a plurality of nodes (Achterhold: paragraph [0032], Fig. 1, “Processing unit 16 may be connected to deep neural network 10”. Paragraph [0033], Fig. 1, “Deep neural network 10 is made up of a plurality of layers 12, each of which includes a plurality of neurons 11”, neurons read nodes),
wherein the processor is further configured to: 
sampling a change of a weight for each of a plurality of weights based on a distribution function, each weight of the plurality of weights corresponds to each node of the plurality of nodes (Achterhold: paragraph [0041], “distribution functions 20 may be used for one selected subset each of the weights of deep neural network 10”. Paragraph [0050], “distribution function may be used, since methods such as, for example, a Monte Carlo sampling”. Paragraph [0053], “A change variable of the weights…the values of the weights are adapted as a function of adapted posterior distribution function 24”. Paragraph [0004], “The deep neural network includes a plurality of layers and connections including weights”, here, a Monte Carlo sampling and a change variable of the weights are representing as sampling a change of a weight, and layers include nodes which include weights);
update the weight with the change of the weight multiplied by a sign of the weight (Achterhold: paragraph [0053], “A change variable of the weights…the values of the weights are adapted”. Paragraph [0034], “Each weight preferably has a value between including −1 and 1 and the output variable of the neuron is weighted by a multiplication by this weight”, here, the neuron is weighted by a multiplication is representing as update the weight); and
train the DNN model by iterating the steps of sampling the change and updating the weight (Achterhold: paragraph [0021], “training of the deep neural network is repeated multiple times”. Paragraph [0007], “training the deep neural network in such a way that the deep neural network detects an object as a function of the training input variable of the deep neural network. During the training of the deep neural network, at least one value of one of the weights is adapted”, weight is adapted reads updating the weight),
Achterhold does not distinctly disclose wherein the plurality of weights has a high rate of sparsity after the training,
However, Yu teaches:
wherein the plurality of weights has a high rate of sparsity after the training (Yu: col 4, lines 58-64, “weight magnitudes are in top q are considered in further training…when the degree of sparseness is high (i.e., q is small)”. Col 4, lines 41-42, “where q is a threshold value for the maximal number of nonzero weights allowed”),
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to combine the method for optimizing deep neural networks of Achterhold with the high rate of sparsity of weights of Yu in order to continue the deep neural network tends to converge much faster than the original training (Yu: col 4, lines 64-65), 
and wherein the processor is further configured to receive an image, classify or segment the image, or detect an object within an image and update the image repository (Achterhold: [0023], “deep neural networks, which may be operated on a mobile processing unit. A mobile processing unit, in particular, mobile telephones or cameras are characterized by limited memory space … In addition to object detection, the deep neural network may alternatively be trained and/or used for classification, semantic segmentation or regression”, a mobile processing unit, in particular, cameras with memory space is representing as an image repository, and when object is detected and received by the processor then the repository is updated with the image).

Regarding claim 18, Achterhold in view of Yu teaches the computer system of claim 16 as cited above and Achterhold further teaches:
comprising an image sensor or a camera for receiving an image input (Achterhold: paragraph [0023], “the method may be used in order to create deep neural networks, which may be operated on a mobile processing unit. A mobile processing unit, in particular, mobile telephones or cameras”, cameras are used in order to receive an input image).

Regarding claim 19, Achterhold in view of Yu teaches the computer system of claim 18 as cited above and Achterhold further teaches:
wherein the processor trains the DNN model using the input image and updates the images stored in the image repository (Achterhold: paragraph [0023], “an input variable of the deep neural network is ascertained after the training of the deep neural network. An object is then detected with the aid of the trained deep neural network… the method may be used in order to create deep neural networks, which may be operated on a mobile processing unit. A mobile processing unit, in particular, mobile telephones or cameras”, a mobile processing unit, in particular, a camera is representing as a processor with image repository which store, process, manipulate or update images, and an input variable reads an image input).

Regarding claim 20, Achterhold in view of Yu teaches the computer system of claim 16 as cited below and Achterhold further teaches:
wherein the DNN model is applied to autonomous driving, augmented reality (AR), or virtual reality (VR) (Achterhold: paragraph [0023], “the trained deep neural network as a function of the ascertained input variable and subsequently an at least semiautonomous machine is advantageously activated as a function of the detected object. An at least semiautonomous machine may, for example, be a robot, in particular, a vehicle”, semiautonomous machine, for example, a vehicle is representing as autonomous driving, and augmented reality (AR) or virtual reality (VR) is a function to detect object).

Claims 2-3 and 5-6 are rejected under 35 U.S.C. 103 as being unpatentable over Achterhold in view of Yu as applied to claim 1 above, and further in view of Jaganathan et al. (US 2019/0197401 A1), hereafter referred to as Jaganathan. 

Regarding claim 2, Achterhold in view of Yu teaches the method of claim 1 as cited above, although, in paragraph [0031] and fig. 4, Achterhold describes convolutional layers, but does not distinctly disclose:
wherein the deep neural networks are convolutional neural networks (CNN).
However, Jaganathan teaches:
Jaganathan: paragraph [0164], “Convolutional neural networks (CNNs) and recurrent neural networks (RNNs) are components of deep neural networks”). 
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to combine the method for optimizing deep neural networks (DNN) of Achterhold in view of Yu with the convolutional neural networks of Jaganathan to train deep convolutional neural networks (CNNs) to improve the accuracy for image classification and object detection (Jaganathan: paragraph [0141]).

Regarding claim 3, Achterhold in view of Yu teaches the method of claim 1 as cited above, but does not distinctly disclose:
wherein the deep neural networks are recurrent neural networks (RNN).
However, Jaganathan teaches:
wherein the deep neural networks are recurrent neural networks (RNN) (Jaganathan: paragraph [0164], “Convolutional neural networks (CNNs) and recurrent neural networks (RNNs) are components of deep neural networks”). 
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to combine the method for optimizing deep neural networks (DNN) of Achterhold in view of Yu with the recurrent neural networks of Jaganathan to design to utilize sequential information of input data with cyclic connections (Jaganathan: paragraph [0164]), and to capture long-range dependencies in sequential data of varying lengths (Jaganathan: paragraph [0167]).

Regarding claim 5, Achterhold in view of Yu teaches the method of claim 1 as cited above, although, in paragraph [0053], Achterhold describes a gradient descent method (Achterhold: paragraph [0053], “During the training of deep neural network 10…the weights is preferably ascertained using an optimization method, in particular, a gradient descent method”), but does not distinctly disclose:
wherein the weights are updated during training using a stochastic gradient descent (SGD) algorithm.
However, Jaganathan teaches:
wherein the weights are updated during training using a stochastic gradient descent (SGD) algorithm (Jaganathan: paragraph [0165], “the weight parameters are updated using optimization algorithms based on stochastic gradient descent”). 
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to incorporate the method for optimizing deep neural networks (DNN) of Achterhold in view of Yu with the stochastic gradient descent (SGD) algorithm of Jaganathan to minimize the training error, the backward pass uses the chain rule to backpropagate error signals and compute gradients with respect to all weights (Jaganathan: paragraph [0165]).

Regarding claim 6, Achterhold in view of Yu teaches the method of claim 5 as cited above, although, in paragraph [0003], [0053], Achterhold describes a smaller amount of change of weight and a gradient descent method (Achterhold: paragraph [0003], “the deep neural network is characterized by a smaller number of different weights”. Paragraph [0053], “A change variable the weights is preferably ascertained using an optimization method, in particular, a gradient descent method”), but does not distinctly disclose:
wherein an amount of change in each step of updating the weight is small enough not to invalidate the convergence of the SGD algorithm.
However, Jaganathan teaches:
wherein an amount of change in each step of updating the weight is small enough not to invalidate the convergence of the SGD algorithm (Jaganathan: paragraph [0103], “The gradient descent optimization is performed by updating the weights according to:
	                
                    
                        
                            v
                        
                        
                            t
                            +
                            1
                        
                    
                     
                    =
                
                             
                    μ
                    
                        
                            v
                        
                        
                            t
                        
                    
                    -
                     
                    α
                    
                        
                            1
                        
                        
                            n
                        
                    
                    
                        
                            ∑
                            
                                i
                                =
                                1
                            
                            
                                N
                            
                        
                        
                            ∇
                            w
                            t
                            Q
                            (
                            z
                            t
                            ,
                             
                            w
                            t
                            )
                        
                    
                
            
	                        
                            
                                
                                    
                                        
                                            w
                                        
                                        
                                            t
                                            +
                                            1
                                        
                                    
                                    =
                                    
                                        
                                            w
                                        
                                        
                                            t
                                        
                                    
                                    +
                                    v
                                
                                
                                    t
                                    +
                                    1
                                
                            
                        
                    ”.
Paragraph [0104], “In the equations above, α is the learning rate. Also, the loss is computed as the average over a set of n data pairs. The computation is terminated when the learning rate α is small enough upon linear convergence”). 
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to incorporate the method for optimizing deep neural networks (DNN) of Achterhold in view of Yu with updating the weights using the stochastic gradient descent (SGD) algorithm of Jaganathan to calculate the cost function in order to avoid costly calculations of covariance matrices to decorrelate and whiten the data at every layer and step, to normalize the distribution of each input feature in each layer (Jaganathan: paragraph [0150], as well as to get a large ‘visual’ field of one output node at a relatively low computational cost (Jaganathan: paragraph [0143]).

Claims 12, 14-15, and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Achterhold in view of Yu as applied to claim 1 and 16 above, and further in view of Jabri et al. (US 5,640,494 A), hereafter referred to as Jabri. 

Regarding claim 12, Achterhold in view of Yu teaches the method of claim 1 as cited above and Achterhold further teaches:
wherein the computer system includes a random number generating hardware configured to (Achterhold: paragraph [0001], “a deep neural network, to a computer program and to a device, each of which is configured to”. Paragraph [0059], “the deep neural network has been randomly initialized, the value of the weights are randomly distributed”, the value of the weights of deep neural network is randomly initialized or randomly distributed is representing as the computer system includes a random number generator), but does not distinctly disclose:
generate a weight perturbation and apply the weight perturbation to the weight.
However, Jabri teaches:
generate a weight perturbation and apply the weight perturbation to the weight (Jabri: col 2, line 49, “applies perturbations to the strength factors of weights”. Col 3, line 19, “This technique, called ‘weight perturbation’”. Col 6, lines 48-49, “applies a perturbation to the weight currently being considered for modification and repropagates”. Col 6, lines 55-56, “the strength of the perturbation that has been applied to the weight”, the technique, called ‘weight perturbation’ reads generate a weight perturbation). 
Jabri: col 7, lines 45-48).

Regarding claim 14, Achterhold in view of Yu in view of Jabri teaches the method of claim 12 as cited above and Achterhold further teaches:
wherein the DNN model is applied to a computer-vision application including image classification, image segmentation, and object detection (Achterhold: paragraph [0023], “deep neural networks, which may be operated on a mobile processing unit. A mobile processing unit, in particular, mobile telephones or cameras… In addition to object detection, the deep neural network may alternatively be trained and/or used for classification, semantic segmentation or regression”, a mobile processing unit with cameras is representing as a computer-vision).

Regarding claim 15, Achterhold in view of Yu in view of Jabri teaches the method of claim 12 as cited above and Achterhold further teaches:
wherein the DNN model is applied to autonomous driving, augmented reality (AR), or virtual reality (VR) (Achterhold: paragraph [0023], “the trained deep neural network as a function of the ascertained input variable and subsequently an at least semiautonomous machine is advantageously activated as a function of the detected object. An at least semiautonomous machine may, for example, be a robot, in particular, a vehicle”, semiautonomous machine, for example, a vehicle is representing as autonomous driving, and augmented reality (AR) or virtual reality (VR) is a function to detect object).

Regarding claim 17, Achterhold in view of Yu teaches the computer system of claim 16 as cited above and Achterhold further teaches:
comprising a random number generating hardware configured to (Achterhold: paragraph [0001], “a deep neural network, to a computer program and to a device, each of which is configured to”. Paragraph [0059], “the deep neural network has been randomly initialized, the value of the weights are randomly distributed”, the value of the weights of deep neural network is randomly initialized or randomly distributed is representing as a random number generating hardware), but does not distinctly disclose:
generate a weight perturbation and apply the weight perturbation to the weight.
However, Jabri teaches:
generate a weight perturbation and apply the weight perturbation to the weight (Jabri: col 2, line 49, “applies perturbations to the strength factors of weights”. Col 3, line 19, “This technique, called ‘weight perturbation’”. Col 6, lines 48-49, “applies a perturbation to the weight currently being considered for modification and repropagates”. Col 6, lines 55-56, “the strength of the perturbation that has been applied to the weight”, the technique, called ‘weight perturbation’ reads generate a weight perturbation). 
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to combine the computer system with image depository of Achterhold in view of Yu and a random number generating hardware of Jabri: col 7, lines 45-48).

Prior Art
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. This includes:
US 10,891,538 B2 which describes sparse convolutional neural network accelerator.
US 10,223,610 B1 which describes system and method for detection and classification of findings in images.
US 2020/0234137 A1 which describes efficient neural networks with elaborate matrix structures in machine learning environments.
US 2020/0234130 A1 which describes slimming of neural networks in machine learning environments.
US 2016/0006543 A1 which describes low complexity error correction.
US 2020/0349946 A1 which describes utterance classifier.
WO 2017/207138 A1 which describes method of training a deep neural network.
EP 3 091 486 A2 which describes method and system for approximating deep neural networks for anatomical object detection.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MD S BARKAT whose telephone number is 303-297-4302.  The examiner can normally be reached on Monday-Friday 8:30 AM - 5:00 PM CT.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Alexey Shmatov can be reached on 571-270-3428.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/M.S.B./Examiner, Art Unit 2123                                                                                                                                                                                                        
/ALEXEY SHMATOV/Supervisory Patent Examiner, Art Unit 2123