DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claim(s) 1-3, 5-6, 8, 10-13, 15-16 and 18 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Larson et al. (hereinafter Larson) (10,607,598).
Regarding claim 1:  The structural elements of apparatus claim 11 perform all of the steps of method claim 1.  Thus, claim 1 is rejected for the same reasons discussed in the rejection of claim 11. 
Regarding claim 2:  Larson satisfies all the elements of claim 1.  The structural elements of apparatus claim 12 perform all of the steps of method claim 2.  Thus, claim 2 is rejected for the same reasons discussed in the rejection of claim 12. 
Regarding claim 3:  Larson satisfies all the elements of claim 1.  The structural elements of apparatus claim 13 perform all of the steps of method claim 3.  Thus, claim 3 is rejected for the same reasons discussed in the rejection of claim 13. 
Regarding claim 5:  Larson satisfies all the elements of claim 1.  The structural elements of apparatus claim 15 perform all of the steps of method claim 5.  Thus, claim 5 is rejected for the same reasons discussed in the rejection of claim 15. 
Regarding claim 6:  Larson satisfies all the elements of claim 1.  The structural elements of apparatus claim 16 perform all of the steps of method claim 6.  Thus, claim 6 is rejected for the same reasons discussed in the rejection of claim 16. 
Regarding claim 8:  Larson satisfies all the elements of claim 1.  Larson further discloses wherein the first input data (Fig. 1, input text data 101), the second input data (Fig. 1, input to model trainer for speech processing 150 from output data 141), the first output data (Fig. 1, output data 141), and the second output data (Fig. 1, output data 141 paths for speech processing) are sequence data (FIG. 2D provides an example 240 of the second modification pipeline 130. As indicated by the example 240, the second modification pipeline 130 includes a BERT language model 243, BERT embeddings 241, and a ranking algorithm 245. The BERT language model 243 may be configured with a multi-embedding input layer that, based on one or more words received as input, determines word-piece tokens, positional tokens, and sequence tokens; sums the various tokens, transfers the summing result(s) to fully connected transfer layers. Based on the output of the BERT language model 243, the ranking algorithm 245 may, based on the BERT embeddings 241, determine k nearest neighbors based on a look-up table. The look-up table may have been determined based on the BERT embeddings 241. For example, the BERT embeddings may have been based on a dictionary of 80,000 words. The covariance of the 80,000 
Regarding claim 10:  Larson satisfies all the elements of claim 1.  Arguments analogous to those stated in the rejection of claim 1 are applicable.  A non-transitory computer-readable storage medium storing instructions is inherently taught as evidenced by computing device 401 (Fig. 4) and various memories stored therein.
Regarding claim 11:  Larson discloses a processor (Fig. 4, processor 411) configured to acquire first output data (Fig. 1, output data 141) of a student model (Fig. 2D BERT language model 243) for first input data (Fig. 1, input text data 101) and second output data (Fig. 1, output data 141 paths for speech processing) of a teacher model (Fig. 1, model trainer for speech processing 150) for second input data (Fig. 1, input to model trainer for speech processing 150 from output data 141) and to train the student model such that the first output data (Fig. 1, output data 141) and the second output data (Fig. 1, output data 141) are not distinguished from each other (Fig. 1, model trainer for speech processing 150); and a memory (Fig. 4, memory 421) configured to store a parameter of the student model (Fig. 2D BERT language model 243), wherein the student model (Fig. 2D BERT language model 243) and the teacher model (Fig. 1, model trainer for speech processing 150) have different structures (Fig. 1, different tasks as shown in different paths from output data 141).
Regarding claim 12:  Larson satisfies all the elements of claim 11.  Larson further discloses wherein the student model (Fig. 2D BERT language model 243) and the teacher model (Fig. 1, model trainer for speech processing 150) are configured to process different tasks (BERT language model 243 stems from second modification pipeline 130 and it is clear in Fig. 1 that different tasks are done based upon which path is followed).
Regarding claim 13:  Larson satisfies all the elements of claim 11.  Larson further discloses wherein the first input data (Fig. 1, text data 101) and the second input data (Fig. 1, input to model trainer for speech processing 150 from output data 141) are different types of data (As compared to the input text data 101, the text data indicated by the output data 141 may be different from the input text data 101. For example, as compared to the input text data 101, text data indicated by the output data 141 may include one or more additional words, one or more additional characters, one or more different words, or one or more different characters. These additional or different words/characters may cause, for example, one or more spelling mistakes, one or more homophones, or one or more semantic changes to manifest in the text data indicated by the output data 141. Further, as compared to the input text data 101, the output data 141 may indicate two or more modified versions of the input text data 101. For example, if the input text data 101 includes a single sentence, the output data 141 may include two or more sentences. Each of the two or more sentences may be different from the single sentence of the input text data 101 and different from the other sentences of the two or more sentences., col. 4, ln. 25-43).
Regarding claim 15:  Larson satisfies all the elements of claim 11.  Larson further discloses wherein the processor (Fig. 4, processor 411) is further configured to train the student model (Fig. 2D BERT language model 243) such that the first output data (Fig. 1, output data 141) and the second output data (Fig. 1, output data 141) are not distinguished from each other by a 
Regarding claim 16:  Larson satisfies all the elements of claim 11.  Larson further discloses wherein the first output data and the second output data are a same type of data (As a brief overview, the example framework 100 illustrates a process where input text data 101 may be processed to determine output data 141. The output data 141 may be provided, as input, to one or more speech processing tasks. The one or more speech processing tasks may include, for example, a training task performed by a model trainer 150, a validation task performed by a model validator 160, a classification task performed by a classifier 170, a testing task performed by a model tester 180, and a natural language understanding task performed by natural language engine 190., col. 4, ln. 12-24).
Regarding claim 18:  Larson satisfies all the elements of claim 11.  Larson further discloses a memory (Fig. 4, memory 421) configured to store a student model (Fig. 2D BERT language model 243), a teacher model (Fig. 1, model trainer for speech processing 150), and a discriminator model (FIG. 2E provides an example 250 of a model that may be trained by the model trainer 150, validated by a model validator 160, or tested by a model tester 180. The model may be configured to output data associated with text summarization, question answering, natural language inference, or the like. As shown in the example 250, the model may be a character-based convolutional neural network (CNN) long short-term memory (LSTM) architecture 251. The character-based CNN LSTM architecture 251 may be configured to receive, as input, text data at the character level. Further, the character-based CNN LSTM .
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:


Claim 4, 7, 14, 17 and 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Larson in view of Chaudhuri (US 2020/0279279 A1).
Regarding claim 4:  Larson satisfies all the elements of claim 1.  The structural elements of apparatus claim 14 perform all of the steps of method claim 4.  Thus, claim 4 is rejected for the same reasons discussed in the rejection of claim 14. 
Regarding claim 7:  Larson satisfies all the elements of claim 1.  The structural elements of apparatus claim 17 perform all of the steps of method claim 7.  Thus, claim 7 is rejected for the same reasons discussed in the rejection of claim 17. 
Regarding claim 14:  Larson satisfies all the elements of claim 11.  Larson further discloses wherein the first input data (Fig. 1, text data 101) and the second input data (Fig. 1, input to model trainer for speech processing 150 from output data 141).
	Larson fails to specifically address are unlabeled data.
	Chaudhuri discloses are unlabeled data (The named entity recognition and disambiguation module (2306) locates and classifies elements mentioned in unstructured text into pre-defined categories such as names of persons, organizations, locations, expressions of times, quantities, monetary values, percentages etc. and determines the identity of entities mentioned in text based on the context. Most algorithms use either hand-crafted, linguistic grammar-based techniques or supervised machine learning based methods where a large amount of labeled training data is used to train the model so that it can classify unlabeled texts later., par. 147).

Regarding claim 17:  Larson satisfies all the elements of claim 11.  Larson further discloses wherein the student model (Fig. 2D BERT language model 243); and the teacher model (Fig. 1, model trainer for speech processing 150); that outputs text data based on an expression of a domain (At step 310, the one or more computing devices may train a second modification pipeline. Training may be performed so that the second modification pipeline is configured to perform a determination that processes its input based on one or more levels of semantic similarity. For example, the example 240 is shown as including a BERT language model 243, BERT embeddings 241, and a ranking algorithm 245. The BERT language model 243 may be trained on Book-Corpus and English Wikipedia corpora, totaling 3.3 billion words. The BERT embeddings 241 may be calculated and the look-up table associated with the ranking algorithm 245 may be determined (e.g., by sampling the five nearest neighbors). After training, the second modification pipeline may determine its output based on one or more nearest neighbors., col. 16, ln. 3-17).
	Larson fails to specifically address is a speech recognition model; is a language model.
	Chaudhuri discloses is a speech recognition model (FIG. 27 is a block diagram of an embodiment of a speech recognition module., par. 41); is a language model (Grammar (also called, language model) provides all the words and phrases that a user might say at any point in speech., par. 94).

Regarding claim 20:  Larson satisfies all the elements of claim 18.  Larson further discloses wherein the first input data (Fig. 1, input text data 101) and the second output data (Fig. 1, output data 141 paths for speech processing).
	Larson fails to specifically address comprise unlabeled data.
	Chaudhuri discloses comprise unlabeled data (The named entity recognition and disambiguation module (2306) locates and classifies elements mentioned in unstructured text into pre-defined categories such as names of persons, organizations, locations, expressions of times, quantities, monetary values, percentages etc. and determines the identity of entities mentioned in text based on the context. Most algorithms use either hand-crafted, linguistic grammar-based techniques or supervised machine learning based methods where a large amount of labeled training data is used to train the model so that it can classify unlabeled texts later., par. 147).
	It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to include comprise unlabeled data in order to allow the model to predict the polarity of unlabeled text later as taught by Chaudhuri (par. 145).
8.	Claim 9 is/are rejected under 35 U.S.C. 103 as being unpatentable over Larson in view of Zhang et al. (hereinafter Zhang) (US 2020/0293875 A1).
Regarding claim 9:  Larson satisfies all the elements of claim 1.  Larson further discloses to which the first output data (Fig. 1, output data 141) and the second output data (Fig. 1, output data 141 paths for speech processing) are distinguished from each other and training (As a brief overview, the example framework 100 illustrates a process where input text data 101 may be processed to determine output data 141. The output data 141 may be provided, as input, to one or more speech processing tasks. The one or more speech processing tasks may include, for example, a training task performed by a model trainer 150, a validation task performed by a model validator 160, a classification task performed by a classifier 170, a testing task performed by a model tester 180, and a natural language understanding task performed by natural language engine 190., col. 4, ln. 12-24) the student model (Fig. 2D BERT language model 243).
	Larson fails to specifically address determining an adversarial loss based on a degree; to reduce the adversarial loss. 
	Zhang discloses determining an adversarial loss based on a degree (The generator 124 and discriminator 126 are trained through an iterative machine learning process which involves minimizing losses defined for the neural network models of the generator 124 and discriminator 126. For the generator 124, an adversarial loss, damage loss, identity loss, and cycle loss are the primary loss functions that are minimized through a machine learning process., par. 51); to reduce the adversarial loss (The generator 124 and discriminator 126 are trained through an iterative machine learning process which involves minimizing losses defined for the neural network models of the generator 124 and discriminator 126. For the generator 124, an adversarial loss, damage loss, identity loss, and cycle loss are the primary loss functions that are minimized through a machine learning process., par. 51). 
.
9.	Claim 19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Larson in view of Kim (US 10,079,022).
Regarding claim 19:  Larson satisfies all the elements of claim 18.  Larson further discloses of the student model (Fig. 2D BERT language model 243); of the teacher model (Fig. 1, model trainer for speech processing 150).
	Larson fails to specifically address wherein a number of hidden layers; is lesser than a number of hidden layers.
	Kim discloses wherein a number of hidden layers (When the n-best method is initially applied, and the number of states in the higher token becomes less than the number of n-best states, the voice recognition server 200 transmits a binary information array of the corresponding states to the voice recognition terminal 100. Further, the voice recognition terminal 100 transmits the scores of the corresponding states to the voice recognition server 200 using the received binary information array.  When only the acoustic model scores corresponding to the candidate information are transmitted to the voice recognition server 200 in this way, the amount of additional data may be variably reduced.  Third, the voice recognition terminal 100 may select n-best state scores of the last hidden layer from among the calculated acoustic model scores., col. 8, ln. 26-40); is lesser than a number of hidden layers (When the n-best method is initially applied, and the number of states in the higher token becomes less than the number of n-best 
	It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to include wherein a number of hidden layers; is lesser than a number of hidden layers in order to reduce the amount of data transmitted as taught by Kim (col. 7, ln. 39-43).
Conclusion
10.	Any inquiry concerning this communication or earlier communications from the examiner should be directed to CHARLOTTE M BAKER whose telephone number is (571)272-7459.  The examiner can normally be reached on Mon - Fri 8:00-5:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, NAY A MAUNG can be reached on (571)272-7778.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.







/CHARLOTTE M BAKER/Primary Examiner, Art Unit 2664