DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 03/15/2021 was filed after the filing date of the instant application on 02/14/2020.  The submission is failed in compliance with the provisions of 37 CFR 1.97.  However, the information disclosure statement is being considered by the examiner by manually entered the references’ numbers.
	Note: The information disclosure statement (IDS) filed 03/15/2021 fails to comply with the provisions of 37 CFR 1.97, 1.98 and MPEP § 609 because an unknown IDS form  has been used for listing of U.S. Patents, U.S. Publications, foreign patent documents and non-patent literatures. See MPEP § 609, 609.05(a). 
In the next IDS filling, a use of form PTO/SB/08A and 08B, “Information Disclosure Statement,” is recommended as a means to provide the required list of information as set forth in 37 CFR 1.98(a)(1). Applicants are encouraged to use the USPTO form PTO/SB/08A and 08B when preparing an information disclosure statement because this form is updated by the Office. The form PTO/SB/08A and 08B will enable applicants to comply with the requirement to list each item of information being submitted and to provide the Office with a uniform listing of citations and with a 

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claims 1-2 and  7-15 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Ravi (US-PGPUB 2018/0336472 A1 hereinafter “Ravi”, now U.S. Patent No. 10,748,066 B2).

As for claim 1, Ravi discloses a computing system (Fig.1, Projection Neural Network System 100), comprising: one or more processors (Fig.1, Projection Neural Network System 100, ¶ [37], Figure 1 depicted a projection neural network system 100 which is implemented as computer programs on one or more computers, ¶ [98], including a programable processor, multiple processors, etc., and ¶ [102], based on a general or special purpose microprocessors); and one or more non-transitory computer-readable media that collectively store (Fig.1, Projection Neural Network System 100, ¶ [37], system 100 which is implemented as computer programs, and ¶ [103], storing programs and data include memory device, e.g., EPROM, flash memory, etc.): a pre-trained projection network (Fig.1, 100, Projection  training the projection neural network, updates the trainer network parameters, etc., ¶ [38], system 100 includes a projection neural network 102 which can be a feed-forward neural network, a recurrent neural network, etc., and ¶ [83], pre-trained projection network with the values of the trainer network parameters are updated one or more times), configured to receive a language input comprising one or more units of text (Fig.1, 100, 102, Projection Neural Network Input 104, Fig.2, 200, Projection Layer Input 110, Fig.4, 400, Receive Projection Layer Input 402, ¶ [39], projection neural network 102 is configured to receive a projection neural network input 104, ¶ [42]-[45], the input to the pre-trained projection neural network 102 is a sequence of text in one language, a sequence representing a spoken utterance, etc.) and to dynamically generate an intermediate representation from the language input (Fig.1, 100, Projection Neural Network Input 104, Projection Neural Network Output 106, Fig.2, Fig.4, 400, Generate Projection Layer Output 406, ¶ [39], projection neural network 102 is generated a projection network output 106 from the input 104, ¶ [43], generating an intermediate representation in any appropriate numerical format, e.g., vectors, etc., and ¶ [50], [53], comprising the numerical feature vectors), the projection network (Fig.1, 100, Projection Neural Network 102, Figs.2-5) comprising: a sequence of one or more projection layers (Fig.1, 100, 102, Projection Neural Network Input 104, Projection Layer Input 110, Projection Layer 108, Projection Layer Output 112, Projection Neural Network 106, ¶ [46], Figure 1 depicted the projection neural network 102 includes a sequence of one or more projection layers 110, 108, 112, etc.) wherein each projection layer is configured to receive a layer input (Fig.2, 200, Projection Layer Input 110) and apply a plurality of projection layer functions (Fig.2, 200, Projection Layer Function(s) 202-206), to the layer input to generate a projection layer output (Fig.2, Projection Function Output(s) 208-212, Projection Layer Output 112, ¶ [47], Figure 2 depicted each projection layer function (202-206) is received a layer input 110 to generate a projection layer output 112); and a sequence of one or more intermediate layers (Fig.1, Projection Layer Input 110, Fig.2, 200, Projection Layer Input 110) configured to receive the projection layer output generated by a last projection layer in the sequence of one or more projection layers and to generate one or more intermediate layer outputs (Fig.1, Projection Layer 108, Fig.2, 200, Projection Layer Function(s) 202-206), wherein the intermediate representation comprises the intermediate layer output (Fig.1, Projection Layer Output 112, Fig.2, 200, Projection Layer Function Output(s) 208-212) generated by a last intermediate layer in the sequence of one or more intermediate layers (Fig.1, Fig.2, Projection Layer Parameters 214, Figs.3-5, ¶ [37]-[41], and [50]-[53], projection layer concatenates the projection function outputs and applies the projection layer parameters 214 (e.g., a parameter matrix and a bias vector) to the concatenated projection function outputs); instructions that, when executed by the one or more processors, cause the computing system to perform operations (Figs.1-5, ¶ [37], [103], computer programs, storing programs and data include memory device, etc.), the operations comprising: obtaining the language input (Fig.1, Projection Neural Network 102, Projection Neural Network Input 104, Fig.2, ¶ [17], [38], [83], the pre-trained projection neural network 102, ¶ [42], the input to the pre-trained projection neural network 102 is a sequence of text in one language); inputting the language input into the pre-trained projection layer receives the projection layer input); and receiving the intermediate representation as an output of the pre-trained projection network (Fig.4, Generate Projection Layer Output 406, ¶ [74], the projection layer is generated the projection layer output y as: y = W·x+b).

As for claims 2 and 7, Ravi discloses wherein: the one or more non-transitory computer-readable media further collectively store (Figs.1-5, ¶ [103], storing programs and data) a machine-learned prediction model configured to receive the intermediate representation and to generate a prediction from the intermediate representation (Figs.1-5, ¶ [3], machine learning models employ layers of nonlinear units to predict an output); and the operations further comprise: inputting the intermediate representation into the machine-learned prediction model; and receiving the prediction as an output of the machine-learned prediction model (Figs.1-5, ¶ [3], [29], machine learned models, trained prediction accuracy, [83], and [88]), wherein one or both: (1) the projection network was previously trained using an unsupervised learning technique and at least the machine-learned prediction model was trained using a supervised learning technique (Figs.1-2, Fig.3, 300, Training Data 304, Training Input 302, Trainer Network 306, Figs.4-5, ¶ [29], projection network can be trained to achieve a performance level, ¶ [52], supervised or semi-supervised machine learning techniques); or (2) the projection network was previously trained using a first set of training data comprising a first plurality of training examples and at least the machine-learned prediction model was trained using a second, different set of training data comprising a second plurality 

As for claims 8-10, Ravi discloses wherein the projection network further comprises a feature extraction layer configured to receive the language input and generate a feature vector that comprises features extracted from the language input (Figs.1-5, ¶ [39], [42]-[45], [53], projection neural network 102 is configured to receive a projection neural network input 104, the input is a sequence of text in one language, etc., the feature vectors associated with the nodes), wherein the layer input for a first projection layer of the one or more projection layers comprises the feature vector, and wherein the features extracted from the language input comprise one or more of the following: skip- grams; n-grams; part of speech tags; dependency relationships; knowledge graph information; or contextual information (Figs.1-5, ¶ [51],graph is a data structure that represented by a set of nodes, numerical feature vector, set of labels “tags”, dependency relationship such as relationships between entities, etc., and [53]), wherein, for each projection layer, the plurality of projection layer functions are precomputed and held static (Figs.1-5, ¶ [73], projection layers are predetermined “precomputed” and fixed “static” before the projection network is trained, and [85], [89]), and wherein, for each projection layer, the plurality of projection layer functions are modeled using locality sensitive hashing (¶ [ 67], locality sensitive hashing functions).

.



Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claims 3-4, and 16-18 are rejected under 35 U.S.C. 103 as being unpatentable over Ravi in view of Malon et al. (US-PGPUB 2014/0236577 A1 hereinafter “Malon”).

As for claim 3-4, Ravi discloses everything claimed as applied above (see claim 1 above). Ravi also discloses machine learning models for processing common and compute-intensive parts of machine learning training or production, i.e., inference, workload (Ravi - Figs.1-5, ¶ [105]-[106]), and encoders such as the projection functions are each encoded as sparse matrices (Ravi - ¶ [15]). However, Ravi is silent about an autoencoder model, and a decoder model; although the use of the autoencoder and decoder is well known in the art as described below in one of many class G06F/704 references. In the same field of communication technology, Malon discloses systems and methods for words representation in a neural probabilistic language model (Malon – Fig.1, Original Language Model 12, Figs.2-9, Abstract, ¶ [24]), wherein the system uses a recursive neural network that includes an autoencoder 103 and a decoder 106, trained in combination with each other (Malon – Fig.5, E (autoencoder) 103, D (decoder) 106, Fig.6 depicted an exemplary training process, and Fig.8 depicted an example for the operation of the encoders and decoders, ¶ [31]-[32], [38]), and configured to receive the representation and to generate a reconstructed language input based on the representation (Malon – Figs.1-9, ¶ [31], trained for reconstruction, the reconstructed feature vectors and the originals, and ¶ [41], the decoder D is restored the inputs from the compressed vector), wherein the autoencoder model is trained to maximize a probability of the reconstructed language input matching the language input on a token-by-token basis (Malon – Figs.1-9, ¶ [21]-[23], when applied 
Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filling date of the claimed invention to modify Ravi by providing the above described features, as taught by Malon for the advantages of freedom from explicit rules, types, and exact string matching, as per teachings of Malon.  

As for claim 16, Ravi discloses a computer-implemented method to pre-train a projection network (Figs.1-2, Fig.3, 300, Training Data 304, Training Input, 302, Trainer Network 306, Projection Network 102, Figs.4-5) comprising one or more projection layers and one or more intermediate layers (Fig.1, Projection Neural Network 102, Projection Layer Input 110, Projection Layer 108, Fig.2, 200, Figure 1 depicted a projection neural network 102 comprises one or more projection layer input 110 and one or intermediate projection layer 108, and ¶ [39], [46]), each projection layer configured to apply one or more projection functions (Fig.1, Fig.2, 200, Projection Layer Function(s) 202-206, Fig.4, Generate Projection Function Outputs - 404) to project a layer input (Fig.1, Fig.2, 200, Projection Layer Input 110, Fig.4, Receive Projection Layer Input - 402) into a different dimensional space (Figs.1-2, Fig.4, ¶ [63], projection layer receives a projection layer input (step 402), ¶ [64], process the projection layer input to generate a respective projection function output (step 404), each projection function generates a corresponding projection function output by mapping the projection layer input to a different dimensional space, and the projection function outputs may belong to a much lower-dimensional space than the projection layer input), the projection network configured to receive an input and to generate an intermediate representation for the input (Fig.1, Projection Neural Network System 100, Projection Neural Network 102, Projection Neural Network Input 104, Projection Layer Input 110, Projection Layer 108, Projection Layer Output 112, Projection Neural Network Output 106, Figure 1 and ¶ [31]-[48] depicted the projection network 102 configured to receive an input 104 and to generate an intermediate representation for the input such as an output 106, inter alia, and Figure 2, ¶ [50]), the method comprising: accessing, by one or more computing devices (Fig.1, Projection Neural Network System 100, Projection Neural Network 102, ¶ [37], implemented as one or more computers), a set of training data comprising a plurality of example inputs; inputting, by the one or more computing devices, each of the plurality of example inputs into the projection network (Fig.3, 300, Training Data 304, Training Input 302, Projection Neural Network 102, Fig.5, 500, Obtain Training Example 502,  ¶ [77], training input is from a set of training data (step 502), the system is randomly sample training examples from the set of training data, and ¶ [78], provides the training example inputting into the projection neural network 102); receiving, by the one or more computing devices, a respective intermediate representation for each of the plurality of example inputs as an output of the projection network (Fig.3, 300, Training Data 304, Training Input 302, Projection Neural Network 102, Projection Network Output 106, Fig.5, 500, Obtain Training Example 502, Generate Projection Network Output for the Training Input 504, ¶ [55], projection network 102 processes the training input 302 in accordance with current values of projection network parameters to generate a projection network output 106, and ¶ [77]-[78], receiving the example inputs in step 502, and processes the example inputs in accordance with current values of projection network parameters to generate a projection network output for the training input in step 504); inputting, by the one or more computing devices, each respective intermediate representation into the projection layer configured to reconstruct inputs based on intermediate representations (Fig.1, Projection Neural Network Input 104, Projection Layer Input 110, Projection Layer 108, Projection Layer Output 112, Projection Neural Network Output 106, ¶ [42], inputting a sequence of one language text into the projection network to generate a score represented an estimated likelihood that the reconstructed text in other language is a proper translation “decoding” of the input one language text into the other language, and Fig.4, Generate projection function outputs 404, ¶ [64], generate “reconstruct” a corresponding projection function output by mapping “decoding” the projection layer input to a different space); receiving, by the one or more computing devices, a respective reconstructed input for each of the plurality of example inputs as an output of the projection neural network system (Fig.5, Obtain Training Example 502, Generate Projection Network Output for the Training Input 504, ¶ [77]-[78], the system obtains a batch of multiple training examples from the training data, and respective reconstructed input for each of example inputs as an output such as a projection network output for the training input (504), and ¶ [79]); and learning, by the one or more computing devices, one or more parameter values (Fig.5, Update the Current Value of the Trainer Network Parameters and the Projection Network Parameters - 510) for the one or more intermediate layers of the projection network based at least in part on a comparison of each respective reconstructed input to the corresponding example input (Figs.1-5, ¶ [17], training input is reconstructed a projection network output for the training input in accordance with the projection layer parameters and multiple trainer neural network parameters, ¶ [29], [77], learn to mimic the predictions of the trainer network and thereby reconstruct predictions that are nearly as accurate as each respective reconstructed input to the corresponding example input of the trainer network, and ¶ [72]).
Ravi also discloses machine learning models for processing common and compute-intensive parts of machine learning training or production, i.e., inference, workload (Ravi - Figs.1-5, ¶ [105]-[106]), furthermore, encoders such as the projection functions are each encoded as sparse matrices (Ravi - ¶ [15]). However, Ravi is silent about a decoder model; although the use of the decoder is well known in the art as described below in one of many class G06F/704 references. In the same field of communication technology, Malon discloses systems and methods for words representation in a neural probabilistic language model (Malon – Fig.1, Original Language Model 12, Figs.2-9, Abstract, ¶ [24]), wherein the system uses a recursive neural network that includes a decoder 106 (Malon – Fig.5, D (decoder) 106, Fig.6 depicted an exemplary training process, and Fig.8 depicted an example for the operation of the decoders, ¶ [31]-[32], [38]), and configured to receive the trained for reconstruction, the reconstructed feature vectors and the originals, and ¶ [41], the decoder D is restored the inputs from the compressed vector). Malon further teaches advantages include using meaning representations of the questions and supporting sentences to be free from explicit rules, question and answer types, and exact string matching (Malon – ¶ [11]).
Since Ravi and Malone are analogous in the art because they are from the same field of endeavor, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to use the known technique of a decoder in a neural network. One of ordinary skill in the art would have recognized that the results of the combination were predictable since the use of that known technique provides the rationale to arrive at a conclusion of obviousness. See KSR International Co. v. Teleflex Inc., 82 USPQ2d 1385 (U.S. 2007).
In addition, it would have been obvious to a person of ordinary skill in the art before the effective filling date of the claimed invention to modify Ravi by providing the above Malone described features, the combination of the disclosures taken as a whole suggests that incorporated the decoder would enhancement Ravi’s projection neural network system for the purpose of freedom from explicit rules, types, and exact string matching, as per teachings of Malon.

As for claims 17-18, Ravi and Malon disclose wherein learning, by the one or more computing devices, the one or more parameter values for the one or more  jointly training of the projection neural network 102 and the trainer network 306, and see Malon – Figs.1-9, ¶ [41]-[52], decoder and encoder are trained together to minimize reconstruction error), by the one or more computing devices, the projection network and the decoder to maximize a probability of each respective reconstructed input matching the corresponding example input on a token-by-token basis (Malon – Figs.1-9, ¶ [21]-[23], when applied recursively, starting with token vectors from a neural probabilistic language model, ¶ [41]-[52], decoder and encoder are trained together to minimize reconstruction error, and  ¶ [59], matching the language input on a token-by-token basis such as extract answers from support sentences by classifying each token as a word to be included in the answer or not), and further comprising, after learning the one or more parameter values: providing, by the one or more computing devices, the projection network for use as a transferable natural language representation generator (Ravi – Figs.1-4, Fig.5, Provide the Trained Values of the Network Parameters for use in Processing Network Inputs – 516, ¶ [42], input in one language and outputting scores for estimated likelihood proper translation into other language, ¶ [91], [93], the projection network (i.e., as defined by the trained values of the projection network parameters) may be deployed to a resource constrained environment (e.g., a mobile device), and ¶ [105]-[110], for production, integrated .

Claims 5-6, and 19-20 are rejected under 35 U.S.C. 103 as being unpatentable over Ravi in view of HASHIMOTO et al. (US-PGPUB 2018/0121799 A1 hereinafter “HASHIMOTO”).

As for claims 5-6, Ravi disclose everything claimed as applied above (see claim 1 above). Ravi also discloses the pre-trained projection network was previously trained (Figs.1-2, Fig.3, 300, Training Data 304, Training Input 302, Trainer Network 306, Projection Network 102, Figs.4-5) as the projection neural network configured to receive an input word and to predict a plurality of context words surrounding the input word (Figs.1-5, ¶ [42], input to the projection neural network 102 is a sequence of text “words” to estimate likelihood “predict” the set of pieces of text “a plurality of context words surrounding the input word” is a proper translation of the input text, and ¶ [43], [45]), wherein the projection neural network was trained using an objective function that includes a regularization term that provides a penalty that has a magnitude that is positively correlated with a sum of a cosine similarity between the respective intermediate representations produced by the projection network for each pair of words in a training batch (Ravi - Figs.1-5, ¶ [66] a positive objective function such as projection vectors results in positive values, and outputs value 1 in response to receiving a positive input, and ¶ [67], a locality sensitive hashing 
	Ravi does not explicitly disclose a skip-gram model; although the use of the skip-gram-model is well known in the art as described below in one of many class G06F/704 references.
In the same field of communication technology, HASHIMOTO discloses multi-task learning using neural network model (HASHIMOTO – Fig.1, Joint-Many Task Neural Network Model 100, Figs.9-11, and using a skip-gram model configured to receive an input word and to predict a plurality of context words surrounding the input word (HASHIMOTO – Fig.1, Fig.2A, Word Embedder 202, Character Embedding Space 208, 1-gram Embedding 212 … 4-gram Embedding 218, ¶ [55], word embedder 202 uses a skip-gram model to train the word embedding matrix, and ¶ [56], [62]), the skip-gram model was trained using an objective function that includes a regularization term that provides a penalty that has a magnitude that is positively correlated with a sum of a cosine similarity between the respective intermediate representations produced by the projection network for each pair of words in a training batch (HASIMOTO, ¶ [56], the character embedder 206 uses a skip-gram model to train, learned using the skip-gram objective function as the word vectors, ¶ [62], n-gram embedding also trained similarly, ¶ [144], cosine based normalizers, and sum based normalizers).
Since Ravi and HASIMOTO are analogous in the art because they are from the same field of endeavor, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to use the known technique of using the International Co. v. Teleflex Inc., 82 USPQ2d 1385 (U.S. 2007).

As for claim 19, Ravi discloses a computer-implemented method to pre-train a projection network (Figs.1-2, Fig.3, 300, Training Data 304, Training Input, 302, Trainer Network 306, Projection Network 102, Figs.4-5) comprising one or more projection layers and one or more intermediate layers (Fig.1, Projection Neural Network 102, Projection Layer Input 110, Projection Layer 108, Fig.2, 200, Figure 1 depicted a projection neural network 102 comprises one or more projection layer input 110 and one or intermediate projection layer 108, and ¶ [39], [46]), each projection layer configured to apply one or more projection functions (Fig.1, Fig.2, 200, Projection Layer Function(s) 202-206, Fig.4, Generate Projection Function Outputs - 404) to project a layer input (Fig.1, Fig.2, 200, Projection Layer Input 110, Fig.4, Receive Projection Layer Input - 402) into a different dimensional space (Figs.1-2, Fig.4, ¶ [63], projection layer receives a projection layer input (step 402), ¶ [64], process the projection layer input to generate a respective projection function output (step 404), each projection function generates a corresponding projection function output by mapping the projection layer input to a different dimensional space, and the projection function outputs may belong to a much lower-dimensional space than the projection layer input), the projection network configured to receive an input and to generate an intermediate representation for the input (Fig.1, Projection , Projection Neural Network 102, Projection Neural Network Input 104, Projection Layer Input 110, Projection Layer 108, Projection Layer Output 112, Projection Neural Network Output 106, Figure 1 and ¶ [31]-[48] depicted the projection network 102 configured to receive an input 104 and to generate an intermediate representation for the input such as an output 106, inter alia, and Figure 2, ¶ [50]), the method comprising: accessing, by one or more computing devices (Fig.1, Projection Neural Network System 100, Projection Neural Network 102, ¶ [37], implemented as one or more computers), a set of training data comprising a plurality of input words (Fig.3, 300, Training Data 304, Training Input 302, Projection Neural Network 102, Fig.5, 500, Obtain Training Example 502,  ¶ [77], training input is from a set of training data (step 502), wherein a respective set of [ground truth] context words are associated with each of the plurality of input words (Fig.1, 100, Projection Neural Network 102, ¶ [ [42], the input is a sequence of text “context words” are associated with each of a set of pieces of text), and ¶ [43]); inputting, by the one or more computing devices, each of the plurality of input words into the projection network (Fig.3, 300, Training Data 304, Training Input 302, Projection Neural Network 102, Fig.5, 500, Obtain Training Example 502,  ¶ [77], training input is from a set of training data (step 502), the system is randomly sample training input “words” from the set of training data, and ¶ [78], provides the training inputting into the projection neural network 102); receiving, by the one or more computing devices, a respective intermediate representation (Fig.1, Projection Layer Input 110, Projection Layer 108, Fig.2, 200, Projection Layer Function(s) 202-206) for each of the plurality of input words as an output of the projection network (Fig.1, Projection Network Output 106, Fig.2, Projection Layer Parameters 214, Figs.3-5, ¶ [37]-[41], generate a projection neural network output 106 from a projection network input 104, and [50]-[53], projection layer concatenates “intermediate” the projection function outputs and applies the projection layer parameters 214 (e.g., a parameter matrix and a bias vector) to the concatenated projection function outputs); determining, by the one or more computing devices, a set of predicted context words for each of the plurality of input words based at least in part on the respective intermediate representation for each of the plurality of input words (Figs.1-5, ¶ [52], trained by machine learning techniques to make prediction, and ¶ [42], input to the projection neural network 102 is a sequence of text “input words” to estimate likelihood “predict” the set of pieces of text “a set of context words” is a proper translation of the input text), and ¶ [43], [45]); and learning, by the one or more computing devices, one or more parameter values (Fig.5, Update the Current Value of the Trainer Network Parameters and the Projection Network Parameters - 510) for the one or more intermediate layers of the projection network based at least in part on a comparison, for each input word, of the respective set of predicted context words to the respective set of [ground truth] context words  (Figs.1-5, ¶ [17], training input is predicted  a projection network output for the training input in accordance with the projection layer parameters and multiple trainer neural network parameters, ¶ [29], [42], [77], learn to mimic the predictions of the trainer network and thereby reconstruct predictions that are nearly as accurate as each respective reconstructed input to the corresponding example input of the trainer network, and ¶ [72]).

In the same field of communication technology, HASHIMOTO discloses multi-task learning using neural network model (HASHIMOTO – Fig.1, Joint-Many Task Neural Network Model 100, Figs.9-11, and using ground truth context words (HASHIMOTO – Figs.1-11, ¶ [98], model 100 uses ground truth child-parent pairs to train, and ¶ [99]-[100]).
Since Ravi and HASIMOTO are analogous in the art because they are from the same field of endeavor, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to use the known technique of using the ground truth. One of ordinary skill in the art would have recognized that the results of the combination were predictable since the use of that known technique provides the rationale to arrive at a conclusion of obviousness. See KSR International Co. v. Teleflex Inc., 82 USPQ2d 1385 (U.S. 2007).

As for claim 20, Ravi and HASHIMOTO disclose wherein learning, by the one or more computing devices, the one or more parameter values comprises optimizing, by the one or more computing devices, a negative sampling objective function (Ravi – Figs.1-5, ¶ [66], a negative sampling objective function such as projection vectors results in negative values, and outputs value 0 in response to receiving a negative input, as an example, the projection function output is a binary representation, i.e., vector with components consisting of 0s and 1s, and see 

Conclusion
The prior art made of record listed below and more in attached PTO-892 form, and not relied upon is considered pertinent to applicant's disclosure.
MANDT et al. (US-PGPUB 2019/0393903 A1) efficient encoding, and decoding sequence using variational autoencoders (see Fig.1).
Ravi et al. (US-PGPUB 2020/0042596 A1) on-device neural networks for natural language understanding (see Fig.1).
Manukian et al. (U.S. Patent No. 5,276,771) rapidly converging projective neural network (see Fig.14).

	
Any inquiry concerning this communication or earlier communications from the examiner should be directed to KHAI N NGUYEN whose telephone number is (571)270-3141. The examiner can normally be reached IFP.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, AHMAD MATAR can be reached on (571)272-7488. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.


KHAI N. NGUYEN
Primary Examiner
Art Unit 2652



/Khai N. Nguyen/Primary Examiner, Art Unit 2652                                                                                                                                                                                                        
03/09/2022