Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Detailed Action
This office action is responsive to the Amendments and Remarks filed on 13 September 2021.  As directed by the Amendment, claims 1, 10, and 12 have been amended, claim 21 has been canceled, and claim 22 has been added.  Claims 1-10, 12-20, and 22 are pending in the application.

Objections to the Specification
The disclosure is objected to because of the following informalities: The word “LANGAUGE” in the title of the application is misspelled and should be replaced with “LANGUAGE.”
Appropriate correction is required.

Claim Objections
Claims 1 and 12 are objected to because of the following informalities:  
In claim 1, line 2, there should be a space between the terms “entries” and “based”.
In claim 12, line 3, the phrase “in a neural network mode” should instead read “in a neural network model”, based on the language of the instant specification and of independent claims 1 and 10.


Response to Arguments
The arguments presented on pages 8-10 of the Remarks filed on 13 September 2021 have been fully considered by the Examiner.  These arguments are not persuasive with respect to the independent claims, but the Examiner notes that the subject matter of dependent claim 22 has been identified as “Allowable Subject Matter” below.
On page 9 of the Remarks, the Applicant states:

    PNG
    media_image1.png
    437
    708
    media_image1.png
    Greyscale

	The Examiner notes that the present amendments to representative independent claim 1 are broader than the proposed amendments that the Examiner indicated would distinguish over the previously cited references during the interview conducted on 03 The participants discussed a potential amendment to the claims regarding the feature disclosed in at least paragraphs [0031] and [0034] of the instant disclosure, in which the embedding matrix entries representing the class memberships of vocabulary words are set to a constant value K, determined empirically by selecting an arbitrary value for K, training the model for a portion of the set of training data, and comparing its predictions to training data held in reserve. The Examiner indicated that such an amendment would overcome the previously cited art of record.”
	
Of the above discussed features, substantially only ”selecting an arbitrary value for K” was incorporated into the independent claims, in the form of the limitation “determining a non-zero value for initializing entries.” 
The Examiner contends that the previously cited combination of Luan and Bai fairly reads on the amended limitations of representative independent claim 1.  In particular and as described in more detail in the rejections below, the system and method of Bai generate a sparse binary vector for each word in a training vocabulary, wherein each element of the vector corresponds to the word’s membership or non-membership in a particular word class.  Vector entries corresponding to a class for which the word is a member are initialized to “1”, while entries corresponding to classes for which the word is not a member are initialized to “0”.  (Bai, pg. 99 § IIIC. "Sentence Feature Representation," Here we proposed a method to transfer the word vectors into a sentence vector" [Vector VT, where aij = 1 [corresponds to claimed “non-zero value”] if the word vi belongs to class tj, and aij = 0 otherwise”)


	The arguments regarding the remaining claims are based upon the claims’ similarity to claim 1, or upon their dependence from their respective base claim.  These arguments are not addressed separately here.


Allowable Subject Matter
Claim 22 is objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

Claim 22 recites [t]he method of claim 1, wherein determining the non-zero value is performed empirically by selecting an arbitrary initial value and training the neural network model for a portion of a set of training data, then comparing predictions of the neural network model to training data held in reserve.
[t]he method of claim 1.  Also as discussed above in the “Response to Arguments” section and in the rejections below, Bai teaches selecting a non-zero value (in the case of Bai, this value is “1”) to initialize vector entries for which a word is a member of a particular word class.  As Bai’s choice of “1” for a non-zero value serves only to distinguish such entries from those entries of “0” for which the word is not a member of a particular class, this fairly reads on the claimed “selecting an arbitrary initial value,” as other non-zero values rather than “1” could also be chosen to distinguish from entries that are initialized to “0”.
However, the combination of Luan, Bai, and Malon as applied to claim 1 does not teach or disclose wherein determining the non-zero value is performed empirically –or- 
training the neural network model for a portion of a set of training data, then comparing predictions of the neural network model to training data held in reserve.
	In the initialization method as presently claimed, a non-zero value for initialization is determined empirically by first choosing an arbitrary value as an initialization value, followed by using the resulting initialized matrix of vectors to train a neural network, and then comparing the prediction results of the neural network to a portion of the training data that has been held in reserve.  This method of first selecting an arbitrary initialization value for non-zero entries in an embedding matrix, training a network, and validating its results allows for an “empirical” determination of an appropriate non-zero 

Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.

“an initializing module configured to initialize”, “a training module configured to train”, and “a language processing module configured to perform a language processing task”, all first recited in claim 12.
and
“a class module configured to determine a first set of classes”, first recited in claim 13.


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 1, 9-10, 12 and 20 are rejected under 35 U.S.C. §103 as being unpatentable over Luan et al., "Efficient learning for spoken language understanding tasks with word embedding based pre-training," Interspeech 2015, Mitsubishi Electric Research Laboratories TR2015-097, hereinafter "Luan" (previously cited) in view of Bai et al., "A Sentiment Analysis Method for Facial Expression Generation in Human-Robot Interactive Communication," 2014 International Conference on Virtual Reality and Visualization, hereinafter "Bai" (previously cited) and Malon et al. (US 2014/0236578, hereinafter “Malon”) (previously cited).

Regarding claim 1, Luan discloses [a] method for language processing, comprising: […] in a neural network model; […] initializing weights in the neural network model using the word embedding matrix; (Luan, pg. 3, Col. 2, ¶ 3, "our approach uses word embedding as an additional input layer of a neural network, which mitigates the overtraining problem.  To accomplish this, we initialize the affine transformation of the first layer by using a word embedding matrix estimated from a large-scale general corpus with unsupervised training methods (pre-training)” [corresponds to initializing weights in a neural network model using a word embedding matrix.]) training the neural network, based on a training corpus; (Ibid., "Then, the entire SLU [Spoken Language Understanding] network is trained with the annotated training data [corresponds to a training corpus], with word embeddings fine-tuned to the SLU task.";  Luan, § 3.1 "Feed-forward architecture" and Fig. 1, "XBOW is a Bag-of-Words vector of the input utterance with dimension of vocabulary size V, Phi is a word embedding matrix initially learned from word2vec with dimensions n x V, where n is the and performing a language processing task using the neural network language model. (Luan, § 4.1 "Intention understanding" and Tables 1 and 2, "The experimental results are shown in Table 1. By using the word2vec features, the performance was improved from the baseline system by 2.7% (dev.) and 2.4% (eval) (absolute reduction in error). With the fine-tuning, the performance further improves to 3.3% (dev.) and 3.6% (eval), absolutely. This result confirms the effectiveness of fine-tuning for our intention understanding task." [the tables show the results of detecting intentions and goals from input utterances with a neural network model using a baseline "bag of words" input, a BoW input with the embedding matrix from word2vec, and a BoW input with the embedding matrix from word2vec and subsequent fine-tuning using an annotated training corpus as described above.])

Luan does not explicitly disclose determining a non-zero value for initializing entries[ ]based on training data 
-or-
initializing a first portion of a word embedding matrix, which is associated with pre-determined word classes based on word clustering, such that matrix entries associated with a class of which a word is a member are initialized to the non-zero value and other entries are initialized to zero.
Bai teaches determining a non-zero value for initializing entries[ ]based on training data (Bai, pg. 99, § IIIB “Word Representation” “A tool (https://code.google.com/p/word2vec/) was utilized to represent words, which takes a text corpus as input and produces the word vectors as output. It first constructs a vocabulary from the training text data [corresponds to claimed “training data”] and then learns vector representation of words.”;
Bai, pg. 99 § IIIC. "Sentence Feature Representation," Here we proposed a method to transfer the word vectors into a sentence vector" [Vector VT, where aij = 1 [corresponds to claimed “non-zero value”] if the word vi belongs to class tj, and aij = 0 otherwise”) [The training text data contains information on whether a particular vocabulary word does or does not belong to a particular word class, which naturally lends itself to a binary vector representation wherein an entry of “1” for a vector element indicates that the vocabulary word does belong to the particular word class, while an entry of “0” for a vector element indicates that the vocabulary word does not belong to the particular word class.  This choice of “1” for the claimed “non-zero value for initialized entries” is based on the binary nature of the training data used (corresponds to claimed “based on training data”]
-and-
initializing a first portion of a word embedding matrix, which is associated with pre-determined word classes based on word clustering, such that matrix entries associated with a class of which a word is a member are initialized to the non-zero value and other entries are initialized to zero; (Bai, § C. "Sentence Feature Representation," Here we proposed a method to transfer the word vectors into a sentence vector" [Vector VT, where aij = 1 [corresponds to claimed “non-zero value”] if the word vi belongs to class tj, and aij = 0 otherwise”; Ibid., "The implicit word classes were used to be the features of a sentence.  Assume S - {s1, s2, ..., sm} is a set of sentences.  […] “Let’s denote the set of word cluster result by T, and T = {t1, t2, … ,tn}, [where] n is the number of word classes” “And V = {v1, v2, ..., vn} is the vocabulary, which is a set of unique words in S.  Therefore, the word2vec tool was used to word clustering in V firstly." [That is, word2vec is used to cluster words across vocabulary V, identifying word classes for each word in the vocabulary.  Vector VT contains a cell for each word/class combination, each cell containing a one or a zero depending on whether or not the word belongs to the class, respectively])
Bai is analogous art, as it is in the field of natural language processing.
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the claimed invention to use the word2vec tool as disclosed in Laun to produce an embedding matrix of words and their corresponding classes as taught by Bai, the benefit being that using a distributed representation of words such as an embedding matrix avoids the “curse of dimensionality,” where model inputs become very large as the size of the input vocabulary grows, as cited by Bai at pg. 99, § IIIB “Word Representation,” “In order to fight the curse of dimensionality, we employed the 

The combination of Luan and Bai further does not teach initializing a second portion of the word embedding matrix, which is not associated with the pre-determined word classes, with random values
Malon teaches initializing a second portion of the word embedding matrix, which is not associated with the pre-determined word classes, with random values (Malon, Figures 1 and 2 and ¶ [0024] “In FIG. 1 words are entered into an original language model database 12 which are fed to an n-dimensional vector 14. The same word is provided to a randomizer 22 that generates an m-dimensional vector 24. The result is an (n+m) dimensional vector 26 that includes the original part and the random part.” [The word vectors of Bai are populated with ones and zero denoting the word’s membership or non-membership in each of several classes.  These individual word vectors may each be appended with the random vector entries of Malon as shown in Malon, Fig. 1.  In essence, the word vector creation of Bai serves as the “original language model database” 12 as depicted in Fig. 1 of Malon. When the individual word n columns in the embedding matrix (where n is the number of classes) will be associated with pre-determined word classes, while the rightmost m columns will contain random values as described in Malon.]

Malon is analogous art, as it is in the field of machine learning using word vectors to train a neural network classifier.

It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the claimed invention to incorporate the randomly initialized appended vector entries of Malon with the word vectors and embedding matrix of Bai, the benefit being that by increasing the dimensionality of the word vectors and initializing the added vector elements with random values, the system is able to distinguish between two word vectors that would otherwise be identical if compared based solely on their class membership values, as recited by Malon in Fig. 2 [showing two words whose vectors are identical based on the original language model, but distinct from each other once random values have been appended to the end of each vector] and ¶ [0024] “Thus, in the resulting model, the first n dimensions [correspond to claimed “associated with pre-determined word classes] always match the original model, but the remaining m [corresponds to claimed “random values”] can be used to distinguish or identify any word, including rare words.”

Claim 10 recites similar limitations as claim 1, and is rejected under the same rationale as applied to claim 1 above.

Regarding claim 9, the combination of references as applied to claim 1 above teaches [t]he method of claim 1.  Further, Bai teaches […] the word embedding matrix […] (Bai, § C. "Sentence Feature Representation," Here we proposed a method to transfer the word vectors into a sentence vector" [Vector VT, where aij = 1 [corresponds to claimed predetermined non-zero value] if the word vi belongs to class tj, and aij - 0 otherwise]; Ibid., "The implicit word classes were used to be the features of a sentence.  Assume S - {s1, s2, ..., sm} is a set of sentences.  And V = {v1, v2, ..., vn} is the vocabulary, which is a set of unique words in S.  Therefore, the word2vec tool was used to word clustering in V firstly." [that is, word2vec was used to create the embedding matrix VT of words and their corresponding classes]) 
Malon teaches wherein the first portion […] includes a first set of columns and the second portion of the word embedding matrix includes a distinct second set of columns. (Malon, Figures 1 and 2 and ¶ [0024] “In FIG. 1 words are entered into an original language model database 12 which are fed to an n-dimensional vector 14. The same word is provided to a randomizer 22 that generates an m-dimensional vector 24. The result is an (n+m) dimensional vector 26 that includes the original part and the random part.” 
[The word vectors of Bai are populated with ones and zero denoting the word’s membership or non-membership in each of several classes.  These individual word vectors may each be appended with the random vector entries of Malon as shown in 12 as depicted in Fig. 1 of Malon.] 
[When the individual word vectors (with appended random values as described in Malon) are combined into the embedding matrix VT as described in Bai, the leftmost n columns in the embedding matrix (where n is the number of classes) will be associated with pre-determined word classes (corresponds to claimed “first set of columns”), while the rightmost m columns will contain random values as described in Malon (corresponds to claimed “distinct second set of columns”).]

Claim 20 recites similar limitations as claim 9, and is rejected under the same rationale as applied to claim 9 above.

Regarding claim 12, Luan discloses [a] system for language processing, comprising: […] and to initialize weights in a neural network model using the word embedding matrix; This element is being interpreted under 35 U.S.C. 112(f) as the neural network of Fig. 2, implemented in hardware or software, and the initializing procedure of ¶ [0035] (Luan, pg. 3, Col. 2, ¶ 3, "our approach uses word embedding as an additional input layer of a neural network, which mitigates the overtraining problem.  To accomplish this, we initialize the affine transformation of the first layer by using a word embedding matrix estimated from a large-scale general corpus with unsupervised training methods (pre-training) [corresponds to initializing weights in a neural network model.]) a training module configured to train the neural network, based on a training corpus; This element is being interpreted as the neural network of Fig. 2, Ibid., "Then, the entire SLU [Spoken Language Understanding] network is trained with the annotated training data [corresponds to a training corpus], with word embeddings fine-tuned to the SLU task.";  Luan, § 3.1 "Feed-forward architecture" and Fig. 1, "XBOW is a Bag-of-Words vector of the input utterance with dimension of vocabulary size V, Phi is a word embedding matrix initially learned from word2vec with dimensions n x V, where n is the dimension of the word embedding.  Eq. 6 is an affine transformation.  W is the weight matrix between hidden layer and output layer.  Fine-tuning is achieved by updating phi together with W. [Fine-tuning the network (corresponds to training the neural network) involves updating the values of phi (corresponds to the word embedding matrix), along with the values of the inter-layer weight matrix W.  Therefore, fine-tuning uses an annotated training corpus to update the weights that were initialized with the binary values from the word embedding matrix.  Note also that Luan is using the same word2vec tool as described below in Bai to generate word embedding matrices] and a language processing module configured to perform a language processing task using the neural network language model. This element is being interpreted as the neural network of Fig. 2, implemented in hardware or software, and the training steps of Fig. 5 Block 510 and described in ¶ [0035] (Luan, § 4.1 "Intention understanding" and Tables 1 and 2, "The experimental results are shown in Table 1. By using the word2vec features, the performance was improved from the baseline system by 2.7% (dev.) and 2.4% (eval) (absolute reduction in error). With the fine-tuning, the performance further improves to 3.3% (dev.) and 3.6% (eval), absolutely. This result confirms the effectiveness of fine-tuning for our 

While Luan as discussed above discloses in a neural network mode[l], Luan does not explicitly disclose an initializing module configured to determine a non-zero value for initializing entries based on training data[…], to initialize a first portion of a word embedding matrix, which is associated with pre-determined word classes based on word clustering, such that matrix entries associated with a class of which a word is a member are initialized to the non-zero value and other entries are initialized to zero.
Bai teaches an initializing module configured to determine a non-zero value for initializing entries based on training data […], (Bai, § C. "Sentence Feature Representation," Here we proposed a method to transfer the word vectors into a sentence vector" [Vector VT, where aij = 1 [corresponds to claimed non-zero value] if the word vi belongs to class tj, and aij = 0 otherwise”) [The choice of “0” and “1” for initialization values is based on the binary word/class relationships in the training data – either a word is a member of a particular word class (leading to an initialization value of “1”), or the word is not a member of the particular word class (leading to an initialization value of “0”)]
to initialize a word embedding matrix which is associated with pre-determined word classes based on word clustering, such that matrix entries associated with a class of which a word is a member are initialized to the non-zero value and other entries are initialized to zero; This element is being interpreted under 35 U.S.C. 112(f) as the processing system of Fig. 8 and the steps of Fig. 5, Block 506 described in ¶ [0033] of the instant specification. (Bai, § C. "Sentence Feature Representation," Here we proposed a method to transfer the word vectors into a sentence vector" [Vector VT, where aij = 1 [corresponds to claimed predetermined non-zero value] if the word vi belongs to class tj, and aij - 0 otherwise]; Ibid., "The implicit word classes were used to be the features of a sentence.  Assume S - {s1, s2, ..., sm} is a set of sentences.  […] “Let’s denote the set of word cluster result by T, and T = {t1, t2, … ,tn}, [where] n is the number of word classes” “And V = {v1, v2, ..., vn} is the vocabulary, which is a set of unique words in S.  Therefore, the word2vec tool was used to word clustering in V firstly." [That is, word2vec is used to cluster words across vocabulary V, identifying word classes for each word in the vocabulary.  Vector VT contains a cell for each word/class combination, each cell containing a zero or a one depending on whether the word belongs to the class])
Bai is analogous art, as it is in the field of natural language processing.
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the claimed invention to use the word2vec tool as disclosed in Laun to produce an embedding matrix of words and their corresponding classes as taught by Bai, the benefit being that using a distributed representation of words such as an embedding matrix avoids the “curse of dimensionality,” where model inputs become 

The combination of Luan and Bai does not teach and to initialize a second portion of the word embedding matrix, which is not associated with the pre-determined word classes, with random values
Malon teaches to initialize a second portion of the word embedding matrix, which is not associated with the pre-determined word classes, with random values (Malon, Figures 1 and 2 and ¶ [0024] “In FIG. 1 words are entered into an original language model database 12 which are fed to an n-dimensional vector 14. The same word is provided to a randomizer 22 that generates an m-dimensional vector 24. The result is an (n+m) dimensional vector 26 that includes the original part and the random part.” [The word vectors of Bai are populated with ones and zero denoting the word’s membership or non-membership in each of several classes.  These individual word vectors may each be appended with the random vector entries of Malon as shown n columns in the embedding matrix (where n is the number of classes) will be associated with pre-determined word classes, while the rightmost m columns will contain random values as described in Malon.]

Malon is analogous art, as it is in the field of machine learning using word vectors to train a neural network classifier.

It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the claimed invention to incorporate the randomly initialized appended vector entries of Malon with the word vectors and embedding matrix of Bai, the benefit being that by increasing the dimensionality of the word vectors and initializing the added vector elements with random values, the system is able to distinguish between two word vectors that would otherwise be identical if compared based solely on their class membership values, as recited by Malon in Fig. 2 [showing two words whose vectors are identical based on the original language model, but distinct from each other once random values have been appended to the end of each vector] and ¶ [0024] “Thus, in the resulting model, the first n dimensions [correspond to claimed “associated with pre-determined word classes] always match the original model, but the remaining m [corresponds to claimed “random values”] can be used to distinguish or identify any word, including rare words.”



Claims 2-3, and 13-14 are rejected under 35 U.S.C. § 103 as being unpatentable over Luan, Bai and Malon and further in view of Yao et al., "Recurrent Neural Networks for Language Understanding," INTERSPEECH 2013, 25-29 August 2013, Lyon, France, pp. 2524-2528, hereinafter “Yao.” (previously cited)

Regarding claim 2, the combination of references as applied to claim 1 above teaches [t]he method of claim 1.  
The above combination does not teach further comprising determining a first set of classes using a first classification method to use as the pre-determined word classes. 
Yao teaches further comprising determining a first set of classes using a first classification method to use as the pre-determined word classes (Yao, § 2.3 “we have considered the addition of information to the system via classes derived from the Wikipedia dataset. To do this, we sub-sampled 70M words from Wikipedia, and used the clustering criterion of Brown et al. [32] to group the words into 200 classes. These were then used rather than the named entity labels.”.)
Yao is analogous art, as it is in the field of using neural networks in language processing.
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the claimed invention to modify the embedding matrices of Luan and Bai with the word classes of Yao, the benefit being that “The ATIS dataset provides two forms of input besides the words themselves: named-entity tags for the words, and syntactic labels. Previous researchers [35] have found it useful to exploit this 

Regarding claim 13, the combination of references as applied to claim 12 above teaches [t]he system of claim 12.  
The above combination does not explicitly teach further comprising a class module configured to determine a first set of classes using a first classification method to use as the pre-determined word classes.
Yao teaches further comprising a class module configured to determine a first set of classes using a first classification method to use as the pre-determined word classes.  This element is being interpreted under 35 U.S.C. 112(f) as the processing system of Fig. 8 and the class determination steps as described in ¶ [0037] of the instant disclosure. (Yao, § 2.3 “we have considered the addition of information to the system via classes derived from the Wikipedia dataset. To do this, we sub-sampled 70M words from Wikipedia, and used the clustering criterion of Brown et al. [32] to group the words into 200 classes. These were then used rather than the named entity labels.”.)

Regarding claim 3, the combination of reference as applied to claim 2 above teaches the method of claim 2.  Further, Yao teaches wherein determining the first set of classes comprises performing Brown clustering on the training corpus. (§ 2.3 classes are derived from the Wikipedia dataset; 70 million words from the dataset 

Claim 14 recites similar limitations as claim 3, and is rejected under the same rationale as applied to claim 3 above.


Claims 4-6 and 15-17 are rejected under 35 U.S.C. § 103 as being unpatentable over the combination of Luan, Bai, Malon, and Yao in further view of Lee (US 2017/0154033) (previously cited).

Regarding claim 4, the combination of references as applied to claim 2 above taches [t]he method of claim 2.  
The above combination does not teach wherein determining the first set of classes comprises providing a named entity list. 
Lee teaches wherein determining the first set of classes comprises providing a named entity list (Lee, ¶ [0105] “the method of collecting training data is not particularly limited.  The training data collector 710 can receive and collect a plurality of word sequences and corresponding class sequences from an external device, or it can receive a plurality of word sequences from an external device, and generate corresponding class sequences through a named entity recognition scheme or a part-of-speech tagging scheme using a dictionary or other resource.”)

It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the claimed invention to modify the word embedding matrix of Luan and Bai with the word classification sources of Lee, as providing a list of classified training data is one of several known methods of collecting training data, as cited by Lee at ¶ [0105], “the training data collector 710 can simply receive and collect a plurality of word sequences and class sequences corresponding to each of the word sequences from an external device, or can receive a plurality of word sequences from an external device and generate class sequences corresponding to each of the word sequences through a named entity recognition scheme or a part-of-speech tagging scheme using a dictionary or other resource.”

Claim 15 recites similar limitations as claim 4, and is rejected under the same rationale as applied to claim 4 above.

Regarding claim 5, the combination of references as applied to claim 2 above teaches [t]he method of claim 2.  
The above combination does not teach further comprising determining a second set of classes using a second classification method that is different from the first classification method, said first and second sets of classes being used together as the pre-determined word classes.  
further comprising determining a second set of classes using a second classification method that is different from the first classification method, said first and second sets of classes being used together as the pre-determined word classes.  (Lee, ¶ [0105] “the method of collecting training data is not particularly limited.  Classes may already be specified in the input data, or they may be defined by a named entity recognition scheme or a part-of-speech tagging scheme using a dictionary or other resource”) [pre-classified data, data classified by a named-entity recognition scheme, and a part-of-speech tagging scheme are all alternative classification methods]

Claim 16 recites similar limitations as claim 5, and is rejected under the same rationale as applied to claim 5 above.

Regarding claim 6, the combination of references as applied to claim 5 above teaches [t]he method of claim 5.  Further, Lee teaches wherein initializing the word embedding matrix comprises initializing one entry for each word for the first set of classes and one entry for each word for the second set of classes to the non-zero value. (Lee, ¶ [0108] “In this example, the input and the target may be expressed as one-hot vectors. For example, the input is expressed as a one-hot vector having a size of a word dictionary for which a location value of the word is 1 and other values are 0, and the target is expressed as a one-hot vector having a size of a class to be handled for which a location value of the class is 1 and other values are 0.” [the training matrix of one-hot input vectors and one-hot class vectors is operable to initialize each word's vector with a single "1" value, regardless of how the classes were determined])

Claim 17 recites similar limitations as claim 6, and is rejected under the same rationale as applied to claim 6 above.

Claims 7-8 and 18-19 are rejected under 35 U.S.C. § 103 as being unpatentable over Luan, Bai, and Malon and further in view of Li et al., “Max-Margin Zero-Shot Learning for Multi-class Classification,” Proceedings of the 18th International Conference on Artificial Intelligence and Statistics (AISTATS), 2015, hereinafter “Li.” (previously cited)

Regarding claim 7, the combination of references as applied to claim 1 above teaches [t]he method of claim 1.  
The above combination does not teach further comprising reducing a dimensionality of the word embedding matrix if the dimensionality is greater than a maximum size of the neural network. 
Li teaches further comprising reducing a dimensionality of the word embedding matrix if the dimensionality is greater than a maximum size of the neural network (Pg. 628, § 3.1 describes a label matrix Y containing row vectors representing training inputs, with each row vector containing a single "1" entry corresponding to the assigned class for that instance.  Pg. 630, § 4.1, dimension reduction is performed on all of the input data matrices using PCA (Principal Component Analysis)

It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the claimed invention to modify the embedding matrix of Luan and Bai with the dimensionality reduction of Li the benefit being that dimensionality reduction allows for a substantial decrease in vector size, such as is cited in Li, Pg. 630, Col. 2, lines 17-20, “The feature vector of each image provided within this data set is a 9,751-dimensional vector, including local texture, HOG, edge and color descriptors. We performed dimension reduction on the long vectors using PCA and obtained 500-dimensional vectors for all the images.” [a reduction in vector dimensionality from 9,751 to 500]

Claim 18 recites similar limitations as claim 7, and is rejected under the same rationale as applied to claim 7 above.

Regarding claim 8, the combination of references as applied to claim 7 above teaches [t]he method of claim 7.  Further, Li teaches wherein reducing the dimensionality of the word embedding matrix is performed using principal component analysis (Li, Pg. 628, § 3.1 describes a label matrix Y containing row vectors representing training inputs, with each row vector containing a single "1" entry corresponding to the assigned class for that instance; Li, Pg. 630, § 4.1, “dimension reduction is performed on all of the input data matrices using PCA (Principal Component Analysis)”

Claim 19 recites similar limitations as claim 8, and is rejected under the same rationale as applied to claim 8 above.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 


Contact Information
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SCOTT R GARDNER whose telephone number is (469)295-9128. The examiner can normally be reached 8:00am - 5:00pm M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ann J Lo can be reached on 571-272-9767. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.