Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .



Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claim(s) 1, 2, 5, 6, 8, 9, 12, 13, 15, 16 19, 20 is/are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Yan U.S. PAP 2019/0392323 A1.

Regarding claim 1 Yan teaches a computer implemented method for domain specific model compression (systems, methods, and computer readable medium for accelerating the inference speed of a deep neural network DNN, see abstract), the method comprising: 

applying, by the one or more computer processors, the weighting parameter to an output vector of the candidate operation (he compression method 200 generates an output of a compressed DNN with the same architecture but different parameter values, see par. [0032]); 
performing, by the one or more computer processors, a regularization of the application of the weighting parameter to the output vector (The activation regularizer module 130 is configured to provide the definition and computation of a penalty value (and its derivative) that measures the level of activation of all neurons of the network, by minimizing which one can sparsify the number of active neurons in the network, see par. [0028]; the activation regularizer 130 imposed on the output of j-th layer, see par. [0033]); 
compressing, by the one or more computer processors, the neural network model according to a result of the regularization ( the compression process 230 at step 320 evaluates gradient of the objective by backpropagation through the network model 110 and the data reader 140. The compression process 230 computes the derivatives of F(W) with respect to all the parameters W, and the parameter update at step 330 changes the values of W according to the obtained gradient and an update equation determined by the optimizer, see par. [0033]);
 and providing, by the one or more computer processors, the neural network model after compression (At step 240, the compression method 200 generates an output of a compressed DNN with the same architecture but different parameter values, see par. [0032]).  
claim 2 Yan teaches the computer implemented method according to claim 1, wherein the neural network model comprises a language processing model (FIG. 8 is a graphical diagram 530 demonstrating the present disclosure to the application of language Translation, where a Recurrent Neural Network (RNN, a special type of DNN) takes English word tokens as input and outputs Chinese characters, see par. [0046]).  
Regarding claim 5 Yan teaches the computer implemented method according to claim 1, further comprising training, by the one or more computer processors, the neural network using labeled domain data (reader module 140 to provide input data from the input preprocessing pipeline 473 to the network model module 110 and correct labels to the loss function module 120, see par. [0035]).  
Regarding claim 6 Yan teaches the computer implemented method according to claim 1, further comprising reducing, by the one or more computer processors, a neural network model attention head weighting value to zero (preserves only values of the top-r elements with largest values and suppresses the remaining to be zeros, see par. [0030]). 
Regarding claim 8 Yan teaches a computer program product for domain specific model compression, the computer program product comprising one or more computer readable storage devices and stored program instructions on the one or more computer readable storage devices (systems, methods, and computer readable medium for accelerating the inference speed of a deep neural network DNN, see abstract), the stored program instructions comprising: 
program instructions to provide a weighting parameter for a candidate operation of a neural network model (obtaining input data for the DNN from data sources, see par. [0028]; a training data set of correct input-output pairs for a DNN, measuring and improving the predictive 
program instructions to apply the weighting parameter to an output vector of the candidate operation (he compression method 200 generates an output of a compressed DNN with the same architecture but different parameter values, see par. [0032]); 
program instructions to perform a regularization of the application of the weighting parameter to the output vector (The activation regularizer module 130 is configured to provide the definition and computation of a penalty value (and its derivative) that measures the level of activation of all neurons of the network, by minimizing which one can sparsify the number of active neurons in the network, see par. [0028]; the activation regularizer 130 imposed on the output of j-th layer, see par. [0033]); 
program instructions to compress the neural network model according to a result of the regularization ( the compression process 230 at step 320 evaluates gradient of the objective by backpropagation through the network model 110 and the data reader 140. The compression process 230 computes the derivatives of F(W) with respect to all the parameters W, and the parameter update at step 330 changes the values of W according to the obtained gradient and an update equation determined by the optimizer, see par. [0033]); 
and program instructions to provide the neural network model after compression (At step 240, the compression method 200 generates an output of a compressed DNN with the same architecture but different parameter values, see par. [0032]).  

Regarding claim 9 Yan teaches the computer program product according to claim 8, wherein the neural network model comprises a language processing model (FIG. 8 is a graphical 530 demonstrating the present disclosure to the application of language Translation, where a Recurrent Neural Network (RNN, a special type of DNN) takes English word tokens as input and outputs Chinese characters, see par. [0046]).    
Regarding claim 12 Yan teaches the computer program product according to claim 8, further comprising program instructions to train the neural network using labeled domain data(reader module 140 to provide input data from the input preprocessing pipeline 473 to the network model module 110 and correct labels to the loss function module 120, see par. [0035]).  
Regarding claim 13 Yan teaches the computer program product according to claim 8, further comprising program instructions to reduce a neural network model attention head weighting value to zero (preserves only values of the top-r elements with largest values and suppresses the remaining to be zeros, see par. [0030]).    
Regarding claim 15 Yan teaches a computer system for domain specific model compression (systems, methods, and computer readable medium for accelerating the inference speed of a deep neural network DNN, see abstract), the computer system comprising: 
one or more computer processors (computer processor, see par. [0048]; 
one or more computer readable storage devices (main memory, see par. [0048]); and  P201903921US01Page 27 of 30stored program instructions on the one or more computer readable storage devices for execution by the one or more computer processors (program structures, see par. [0055]), the stored program instructions comprising: 
program instructions to provide a weighting parameter for a candidate operation of a neural network model (obtaining input data for the DNN from data sources, see par. [0028]; a training data set of correct input-output pairs for a DNN, measuring and improving the predictive 
program instructions to apply the weighting parameter to an output vector of the candidate operation (he compression method 200 generates an output of a compressed DNN with the same architecture but different parameter values, see par. [0032]); 
program instructions to perform a regularization of the application of the weighting parameter to the output vector (The activation regularizer module 130 is configured to provide the definition and computation of a penalty value (and its derivative) that measures the level of activation of all neurons of the network, by minimizing which one can sparsify the number of active neurons in the network, see par. [0028]; the activation regularizer 130 imposed on the output of j-th layer, see par. [0033]); 
program instructions to compress the neural network model according to a result of the regularization ( the compression process 230 at step 320 evaluates gradient of the objective by backpropagation through the network model 110 and the data reader 140. The compression process 230 computes the derivatives of F(W) with respect to all the parameters W, and the parameter update at step 330 changes the values of W according to the obtained gradient and an update equation determined by the optimizer, see par. [0033]); 
and program instructions to provide the neural network model after compression (At step 240, the compression method 200 generates an output of a compressed DNN with the same architecture but different parameter values, see par. [0032]).  
Regarding claim 16 Yan teaches the 16. The computer system according to claim 15, wherein the neural network model comprises a language processing model (FIG. 8 is a graphical diagram 530 demonstrating the present disclosure to the application of language Translation, 
Regarding claim 19 Yan teaches the computer system according to claim 15, further comprising program instructions to train the neural network using labeled domain data (reader module 140 to provide input data from the input preprocessing pipeline 473 to the network model module 110 and correct labels to the loss function module 120, see par. [0035]).  
Regarding claim 20 Yan teaches the computer system according to claim 15, further comprising program instructions to reduce a neural network model attention head weighting value to zero (preserves only values of the top-r elements with largest values and suppresses the remaining to be zeros, see par. [0030]). 
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim 3, 4, 7, 10, 11, 14, 17 and 18 is/are rejected under 35 U.S.C. 103 as being unpatentable over Yan U.S. PAP 2019/0392323 in view of Zhang U.S. PAP 2018/0165554 A1.

claim 3 Yan does not teach the computer implemented method according to claim 1, further comprising training, by the one or more computer processors, the neural network using unlabeled domain data.  
In the same field of endeavor Zhang teaches a method and system for modelling textual data which include a linear classifier trained on labeled data, however the model takes advantage of unlabeled data sets to get an improved performance, see par. [0048].
It would have been obvious to one of ordinary skill in the art at the time the invention was filed to combine the Yan invention with the teachings of Zhang for the benefit of improving the model performance, see par. [0048].
Regarding claim 4 Yan does not teach the computer implemented method according to claim 1, further comprising training, by the one or more computer processors, the neural network with two objectives, wherein one objective comprises a domain classification task.  
In the same field of endeavor Zhang teaches a first classifier is trained on the labeled data 102, and a set of classifier weights derived 103. The weights are then transferred for use by an autoencoder, by defining a stochastic posterior probability distribution on the set of weights 104, with an approximated marginalized loss function 105. A second classifier is trained based on the representation of the autoencoder 106. In use, a system employing the autoencoder receives unlabeled data 107, and generates classifications of the received data 108. 
It would have been obvious to one of ordinary skill in the art at the time the invention was filed to combine the Yan invention with the teachings of Zhang for the benefit of improving the model performance, see par. [0048].

claim 7 Yan does not teach the computer implemented method according to claim 1, further comprising: training, by the one or more computer processors, the neural network with two objectives, wherein one objective comprises a domain classification task; and training, by the one or more computer processors, the neural network using labeled data.  
In the same field of endeavor Zhang teaches a first classifier is trained on the labeled data 102, and a set of classifier weights derived 103. The weights are then transferred for use by an autoencoder, by defining a stochastic posterior probability distribution on the set of weights 104, with an approximated marginalized loss function 105. A second classifier is trained based on the representation of the autoencoder 106. In use, a system employing the autoencoder receives unlabeled data 107, and generates classifications of the received data 108. 
It would have been obvious to one of ordinary skill in the art at the time the invention was filed to combine the Yan invention with the teachings of Zhang for the benefit of improving the model performance, see par. [0048].


Regarding claim 10 Yan does not teach the computer program product according to claim 8, further comprising program instructions to train the neural network using unlabeled domain data.  
In the same field of endeavor Zhang teaches a method and system for modelling textual data which include a linear classifier trained on labeled data, however the model takes advantage of unlabeled data sets to get an improved performance, see par. [0048].


Regarding claim 11 Yan does not teach the computer program product according to claim 8, further comprising program instructions to train the neural network with two objectives, wherein one objective comprises a domain classification task.  
In the same field of endeavor Zhang teaches a first classifier is trained on the labeled data 102, and a set of classifier weights derived 103. The weights are then transferred for use by an autoencoder, by defining a stochastic posterior probability distribution on the set of weights 104, with an approximated marginalized loss function 105. A second classifier is trained based on the representation of the autoencoder 106. In use, a system employing the autoencoder receives unlabeled data 107, and generates classifications of the received data 108. 
It would have been obvious to one of ordinary skill in the art at the time the invention was filed to combine the Yan invention with the teachings of Zhang for the benefit of improving the model performance, see par. [0048].

Regarding claim 14 Yan does not teach the computer program product according to claim 8, further comprising program instructions to: train the neural network with two objectives, wherein one objective comprises a domain classification task; and train the neural network using labeled data.  
In the same field of endeavor Zhang teaches a first classifier is trained on the labeled data 102, and a set of classifier weights derived 103. The weights are then transferred 104, with an approximated marginalized loss function 105. A second classifier is trained based on the representation of the autoencoder 106. In use, a system employing the autoencoder receives unlabeled data 107, and generates classifications of the received data 108. 
It would have been obvious to one of ordinary skill in the art at the time the invention was filed to combine the Yan invention with the teachings of Zhang for the benefit of improving the model performance, see par. [0048].


Regarding claim 17 Yan does not teach the computer system according to claim 15, further comprising program instructions to train the neural network using unlabeled domain data.  
In the same field of endeavor Zhang teaches a method and system for modelling textual data which include a linear classifier trained on labeled data, however the model takes advantage of unlabeled data sets to get an improved performance, see par. [0048].
It would have been obvious to one of ordinary skill in the art at the time the invention was filed to combine the Yan invention with the teachings of Zhang for the benefit of improving the model performance, see par. [0048].

Regarding claim 18 Yan does not teach the computer system according to claim 15, further comprising program instructions to train the neural network with two objectives, wherein one objective comprises a domain classification task.  
102, and a set of classifier weights derived 103. The weights are then transferred for use by an autoencoder, by defining a stochastic posterior probability distribution on the set of weights 104, with an approximated marginalized loss function 105. A second classifier is trained based on the representation of the autoencoder 106. In use, a system employing the autoencoder receives unlabeled data 107, and generates classifications of the received data 108. 
It would have been obvious to one of ordinary skill in the art at the time the invention was filed to combine the Yan invention with the teachings of Zhang for the benefit of improving the model performance, see par. [0048].
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Pertinent prior art available on form 892.
Chai ‘461 teaches training a deep neural network (DNN) for reduced computational resource requirements, see abstract.
Son ‘867 teaches a lightened neural network which uses regularization on the layers to compress the models, see par. [0062].
Zhu “improving DNN sparsity through Decorrelation Regularization” teaches reducing the complexity of neural networks using regularization.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Michael Ortiz-Sanchez whose telephone number is (571)270-3711. The examiner can normally be reached Monday- Friday 9AM-6PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is 
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh Mehta can be reached on 571-272-7453. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/MICHAEL ORTIZ-SANCHEZ/Primary Examiner, Art Unit 2656