DETAILED ACTION
EXAMINER’S AMENDMENT
1.	An examiner’s amendment to the record appears below. Should the changes and/or additions be unacceptable to applicant, an amendment may be filed as provided by 37 CFR 1.312. To ensure consideration of such an amendment, it MUST be submitted no later than the payment of the issue fee.

2.	Authorization for this examiner’s amendment was given in an interview with Mr. Joseph Probst on 07/28/2022.

3.	The application has been amended as follows:
In the claims:
(Currently Amended) A computing system, comprising:
one or more processors; and
one or more non-transitory computer-readable media that collectively store: 
a pre-trained projection neural network configured to receive a language input comprising one or more units of text and to dynamically generate an intermediate representation from the language input, the projection neural network comprising:
a sequence of one or more projection layers, wherein each projection layer is configured to receive a layer input and apply a plurality of projection layer functions to the layer input to generate a projection layer output; and
a sequence of one or more intermediate layers configured to receive the projection layer output generated by a last projection layer in the sequence of one or more projection layers and to generate one or more intermediate layer outputs, wherein the intermediate representation comprises the intermediate layer output generated by a last intermediate layer in the sequence of one or more intermediate layers; 
instructions that, when executed by the one or more processors, cause the computing system to perform operations, the operations comprising:
obtaining the language input;
inputting the language input into the pre-trained projection neural network; and
receiving the intermediate representation as an output of the pre-trained projection neural network.
(Original) The computing system of claim 1, wherein:
the one or more non-transitory computer-readable media further collectively store a machine-learned prediction model configured to receive the intermediate representation and to generate a prediction from the intermediate representation; and
the operations further comprise:
inputting the intermediate representation into the machine-learned prediction model; and
receiving the prediction as an output of the machine-learned prediction model.
(Currently Amended) The computing system of claim 1, wherein the pre-trained projection neural network was previously trained as part of an autoencoder model, the autoencoder model comprising:
the pre-trained projection neural network configured to receive the language input and to generate the intermediate representation; and
a decoder model configured to receive the intermediate representation and to generate a reconstructed language input based on the intermediate representation.
The computing system of claim 3, wherein the autoencoder model is trained to maximize a probability of the reconstructed language input matching the language input on a token-by-token basis.
(Currently Amended) The computing system of claim 1, wherein the pre-trained projection neural network was previously trained as a projection skip-gram model configured to receive an input word and to predict a plurality of context words surrounding the input word.
(Currently Amended) The computing system of claim 5, wherein the projection skip-gram model was trained using an objective function that includes a regularization term that provides a penalty that has a magnitude that is positively correlated with a sum of a cosine similarity between the respective intermediate representations produced by the projection neural network for each pair of words in a training batch.
(Currently Amended) The computing system of claim 2, wherein one or both:
the projection neural network was previously trained using an unsupervised learning technique and at least the machine-learned prediction model was trained using a supervised learning technique; or
the projection neural network was previously trained using a first set of training data comprising a first plurality of training examples and at least the machine-learned prediction model was trained using a second, different set of training data comprising a second plurality of training examples.
(Currently Amended) The computing system of claim 1, wherein the projection neural network further comprises a feature extraction layer configured to receive the language input and generate a feature vector that comprises features extracted from the language input, wherein the layer input for a first projection layer of the one or more projection layers comprises the feature vector, and wherein the features extracted from the language input comprise one or more of the following: skip-grams; n-grams; part of speech tags; dependency relationships; knowledge graph information; or contextual information.
(Original) The computing system of claim 1, wherein, for each projection layer, the plurality of projection layer functions are precomputed and held static.
(Original) The computing system of claim 1, wherein, for each projection layer, the plurality of projection layer functions are modeled using locality sensitive hashing.
(Original) The computing system of claim 1, the operations further comprise:
dynamically computing the plurality of projection layer functions at inference time using one or more seeds.
(Original) The computing system of claim 1, wherein the projection neural network performs natural language processing without initializing, loading, or storing any feature or vocabulary weight matrices.
(Original) The computing system of claim 1, wherein, for each projection layer, each projection function is associated with a respective set of projection vectors, and wherein applying each projection function to the layer input comprises: 
for each projection vector: 
determining a dot product between the layer input and the projection vector; 
when the dot product is negative, assigning a first value to a corresponding position in the projection function output; and 
when the dot product is positive, assigning a second value to the corresponding position in the projection function output. 
(Original) The computing system of claim 1, wherein, for each projection layer, the projection functions are each encoded as sparse matrices and are used to generate a binary representation from the layer input. 
(Original) The computing system of claim 1, wherein the intermediate representation comprises a numerical feature vector.
(Currently Amended) A computer-implemented method to pre-train a projection neural network comprising one or more projection layers and one or more intermediate layers, each projection layer configured to apply one or more projection functions to project a layer input into a different dimensional space, the projection neural network configured to receive an input and to generate an intermediate representation for the input, the method comprising:
accessing, by one or more computing devices, a set of training data comprising a plurality of example inputs;
inputting, by the one or more computing devices, each of the plurality of example inputs into the projection neural network;
receiving, by the one or more computing devices, a respective intermediate representation for each of the plurality of example inputs as an output of the projection neural network; 
inputting, by the one or more computing devices, each respective intermediate representation into a decoder model configured to reconstruct inputs based on intermediate representations; 
receiving, by the one or more computing devices, a respective reconstructed input for each of the plurality of example inputs as an output of the decoder model; and
learning, by the one or more computing devices, one or more parameter values for the one or more intermediate layers of the projection neural network based at least in part on a comparison of each respective reconstructed input to the corresponding example input.
(Currently Amended) The computer-implemented method of claim 16, wherein learning, by the one or more computing devices, the one or more parameter values for the one or more intermediate layers of the projection neural network based at least in part on the comparison of each respective reconstructed input to the corresponding example input comprises jointly training, by the one or more computing devices, the projection neural network and the decoder to maximize a probability of each respective reconstructed input matching the corresponding example input on a token-by-token basis.
(Currently Amended) The computer-implemented method of claim 16, further comprising, after learning the one or more parameter values:
providing, by the one or more computing devices, the projection neural network for use as a transferable natural language representation generator.
(Currently Amended) A computer-implemented method to pre-train a projection neural network comprising one or more projection layers and one or more intermediate layers, each projection layer configured to apply one or more projection functions to project a layer input into a different dimensional space, the projection neural network configured to receive an input and to generate an intermediate representation for the input, the method comprising:
accessing, by one or more computing devices, a set of training data comprising a plurality of input words, wherein a respective set of ground truth context words are associated with each of the plurality of input words;
inputting, by the one or more computing devices, each of the plurality of input words into the projection neural network;
receiving, by the one or more computing devices, a respective intermediate representation for each of the plurality of input words as an output of the projection neural network; 
determining, by the one or more computing devices, a set of predicted context words for each of the plurality of input words based at least in part on the respective intermediate representation for each of the plurality of input words; and
learning, by the one or more computing devices, one or more parameter values for the one or more intermediate layers of the projection neural network based at least in part on a comparison, for each input word, of the respective set of predicted context words to the respective set of ground truth context words.
(Original) The computer-implemented method of claim 19, wherein learning, by the one or more computing devices, the one or more parameter values comprises optimizing, by the one or more computing devices, a negative sampling objective function.
/QUYNH H NGUYEN/Primary Examiner, Art Unit 2652