DETAILED ACTION
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim 1-2, 5, 8-9, 12, 15-16 and 19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Hashimoto et al (US 20180121799) in view of Lin et al (US 20190065486).
Regarding claim 1, Hashimoto discloses a method comprising: 
obtaining, using at least one processor of an electronic device, a base natural language understanding (NLU) model comprising a word embedding layer, the word embedding layer associated with at least one training utterance (¶32 NLP; ¶136-137 model 100 has five sub-epochs corresponding to five NLP tasks. In each sub-epoch, a batch of training examples TE.sub.1 . . . TE.sub.n corresponding to the current layer's training data is processed; current state of the embedding parameters of the current layer and of the current state of the embedding parameters of all the underlying layers; ); 
calculating, using the at least one processor, a regularization loss value for use in a determination of an intent detection loss, wherein the regularization loss value reveals an effect of word embeddings on intent determination of the training utterance (¶136-140 The two regularization terms regularize by penalizing growth in a magnitude of weights in coefficient matrices applied to the underlying layers and that successively regularize all changes in the weights in the coefficient matrices applied to the underlying layers); and 
retraining, using the at least one processor, the word embedding layer of the base NLU model using the intent detection loss to obtain a retrained NLU model (¶118-119  the disclosed successive regularization allows model 100 to continuously train the lower-level tasks without catastrophic forgetting; the optimization iterates over each full training dataset).
While Hashimoto discloses natural language processing, Hashimoto fails to specifically teach a base natural language understanding (NLU) model.
	Lin teaches a base natural language understanding (NLU) model (¶23 One or more parameters of the trained decoder circuitry are transferred to the NLPS, where the NLPS is then trained using the compressed word embeddings to improve the correctness of the responses or actions determined by the NLPS; ¶34 NLPS 300 includes natural language understanding (NLU) circuitry 305 operably connected to a storage device 310 that stores compressed word embeddings). Lin also teaches the word embedding layer associated with at least one training utterance (¶50 training process includes the decoder circuitry 600 receiving one or more compressed word embeddings; one or more decompressed word embeddings 610 may not equal or match a corresponding original uncompressed word embedding).
Therefore, it would have been obvious to one with ordinary skill in the art before the effective filing date of the invention to have implemented the teaching of a base natural language understanding (NLU) model, and the word embedding layer associated with at least one training utterance, from Lin into the method as disclosed by Hashimoto. The motivation for doing this is to improve the correctness of the responses or actions by the models.

Regarding claim 2, the combination of Hashimoto and Lin disclose the method of claim 1, further comprising: determining, using the at least one processor, a regularization coefficient for use in the determination of the intent detection loss (Hashimoto ¶136-140 The two regularization terms regularize by penalizing growth in a magnitude of weights in coefficient matrices applied to the underlying layers and that successively regularize all changes in the weights in the coefficient matrices applied to the underlying layers; ¶199 the regularization term that penalizes growth in a magnitude of weights in coefficient matrices applied to the underlying layers is λ∥W.sub.(m)∥.sup.2); and determining, using the at least one processor, the intent detection loss based on a cross entropy loss, the regularization coefficient, and the regularization loss value (Hashimoto ¶136-140 The regularized pass down training is regularized by constraining a training objective function, having a fitness function with at least two regularization terms. The two regularization terms regularize by penalizing growth in a magnitude of weights in coefficient matrices applied to the underlying layers and that successively regularize all changes in the weights in the coefficient matrices applied to the underlying layers. In one implementation, the fitness function is cross-entropy loss.).

Regarding claim 5, the combination of Hashimoto and Lin disclose the method of claim 1, wherein the regularization loss value is calculated based on a plurality of embedding weights of a word embedding matrix associated with the at least one training utterance (Hashimoto ¶136 training a training objective function, having a fitness function with at least two regularization terms. The two regularization terms regularize by penalizing growth in a magnitude of weights in coefficient matrices applied to the underlying layers and that successively regularize all changes in the weights in the coefficient matrices applied to the underlying layers).

Regarding claim(s) 8-9 and 12 (drawn to a device):               
The rejection/proposed combination of Hashimoto and Lin, explained in the rejection of method claim(s) 1-2 and 5, anticipates/renders obvious the steps of the device of claim(s) 8-9 and 12 because these steps occur in the operation of the proposed combination as discussed above. Thus, the arguments similar to that presented above for claim(s) 1-2 and 5 is/are equally applicable to claim(s) 8-9 and 12. See further Hashimoto ¶242.

Regarding claim(s) 15-16 and 19 (drawn to a CRM):               
The rejection/proposed combination of Hashimoto and Lin, explained in the rejection of method claim(s) 1-2 and 5, anticipates/renders obvious the steps of the computer readable medium of claim(s) 15-16 and 19 because these steps occur in the operation of the proposed combination as discussed above. Thus, the arguments similar to that presented above for claim(s) 1-2 and 5 is/are equally applicable to claim(s) 15-16 and 19. See further Hashimoto ¶242.

Claim 6, 13 and 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Hashimoto and Lin as applied to claim 5, 12 and 19 above, and further in view of Dally et al (US 20180046900).
Regarding claim 6, the combination of Hashimoto and Lin disclose the method of claim 5, but fails to teach setting, using the at least one processor, at least one of the embedding weights to zero when the at least one embedding weight has an absolute value smaller than a predetermined threshold.
Dally teaches setting, using the at least one processor, at least one of the embedding weights to zero when the at least one embedding weight has an absolute value smaller than a predetermined threshold (¶52 any weight with an absolute value that is close to zero (e.g. below a defined threshold) is set to zero).
Therefore, it would have been obvious to one with ordinary skill in the art before the effective filing date of the invention to have implemented the teaching of setting, using the at least one processor, at least one of the embedding weights to zero when the at least one embedding weight has an absolute value smaller than a predetermined threshold from Dally into the method as disclosed by the combination of Hashimoto and Lin. The motivation for doing this is to improve accuracy of the neural network.

Regarding claim(s) 13 (drawn to a device):               
The rejection/proposed combination of Hashimoto, Lin and Dally, explained in the rejection of method claim(s) 6, anticipates/renders obvious the steps of the device of claim(s) 13 because these steps occur in the operation of the proposed combination as discussed above. Thus, the arguments similar to that presented above for claim(s) 6 is/are equally applicable to claim(s) 13. See further Hashimoto ¶242.

Regarding claim(s) 20 (drawn to a CRM):               
The rejection/proposed combination of Hashimoto, Lin and Dally, explained in the rejection of method claim(s) 6, anticipates/renders obvious the steps of the computer readable medium of claim(s) 20 because these steps occur in the operation of the proposed combination as discussed above. Thus, the arguments similar to that presented above for claim(s) 6 is/are equally applicable to claim(s) 20. See further Hashimoto ¶242.
	

Claim 7 and 14 is/are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Hashimoto and Lin as applied to claim 1 and 8 above, and further in view of Miura et al (US 20180268823).
Regarding claim 7, the combination of Hashimoto and Lin disclose the method of claim 1, further comprising: allowing identification of one or more words of a natural language input based on an importance of the one or more words in relation to determining an intent of the natural language input using the retrained NLU model (Lin ¶35-36 The NLU circuitry 305 may analyze the semantic features of the text input 330 and access the storage device 310 to obtain the compressed word embeddings 315 for some or all of the words in the text input 330), 

Miura teaches wherein the one or more words are configured to be presented on a display differently from a remainder of the natural language input (¶200 The important words 4231 are displayed in a mode different from one for other words. For example, the important words 4231 may be displayed in letters thicker than other letters or in a color different from one for other letters. In addition, the important words 4231 may be underlined.).
Therefore, it would have been obvious to one with ordinary skill in the art before the effective filing date of the invention to have implemented the teaching of wherein the one or more words are configured to be presented on a display differently from a remainder of the natural language input from Miura into the method as disclosed by the combination of Hashimoto and Lin. The motivation for doing this is to improve user experience by emphasizing important words.

Regarding claim(s) 14 (drawn to a device):               
The rejection/proposed combination of Hashimoto, Lin and Miura, explained in the rejection of method claim(s) 7, anticipates/renders obvious the steps of the device of claim(s) 14 because these steps occur in the operation of the proposed combination as discussed above. Thus, the arguments similar to that presented above for claim(s) 7 is/are equally applicable to claim(s) 14. See further Hashimoto ¶242.

Allowable Subject Matter
Claims 3-4, 10-11 and 17-18 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
 wherein the regularization coefficient is determined by: adding a plurality of pseudo words to the at least one training utterance, wherein the pseudo words are randomly generated and a quantity of the pseudo words is equal to a quantity of words in the at least one training utterance; and performing an iterative process in which the base NLU model is trained in multiple iterations, each iteration comprising an updated value for the regularization coefficient.
Claim 4 is dependent on claim 3, claim 11 is dependent on claim 10, and claim 18 is dependent on claim 17 and would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to KEVIN KY whose telephone number is (571)272-7648. The examiner can normally be reached Monday-Friday 9-5PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Chan Park can be reached on 571-272-7409. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and 





/KEVIN KY/Primary Examiner, Art Unit 2669