DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statement(s) (IDS) submitted on July 17, 2020 is/are being considered by the examiner.

Specification
The title of the invention is not descriptive.  A new title is required that is clearly indicative of the invention to which the claims are directed. 
Appropriate correction is required.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 3 and 11 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Where applicant acts as his or her own lexicographer to specifically define a term of a claim contrary to its ordinary meaning, the written description must clearly redefine the claim term and set forth the uncommon definition so as to put one reasonably skilled in the art on notice that the applicant intended to so redefine that claim term. Process Control Corp. v. HydReclaim Corp., 190 F.3d 1350, 1357, 52 USPQ2d 1029, 1033 (Fed. Cir. 1999). The term “plus,” as indicated by the symbol “+,” in claims 3 and 11 is used by the claim to “represent embedding stitching,” while the accepted meaning is “addition or summation of two values.” The term is indefinite because the specification does not clearly redefine the term.
For examination purposes, the plus symbol is treated as having its accepted meaning.
Appropriate correction is required.

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claim(s) 1, 2, 4, 8-10, 12, 16-17 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Trask (U.S. Pat. App. Pub. No. 2016/0247061, hereinafter Trask).

Regarding claim 1, Trask discloses A method based on a neural network model, comprising: (“a neural network… trained using a “skip-gram” training style.”; Trask, ¶¶ [0045]) acquiring a plurality of training samples (“One or more training examples are run, where for each training example, the words surrounding the focus term are input into their respective input nodes corresponding to the appropriate partitions.”; Trask, ¶¶ [0042]), each of the training samples comprising an identifier of an input word, an identifier of an output word, and position information, (Each of the training samples includes “c^(i/j), is the location specific representation (partition j) for the word {identifier of an input word} at window position j {position information} relative to the focus word w {identifier of an output word}.”; Trask, ¶¶ [0045]) wherein in one of the training samples, the output word is a context of the input word (The training samples include a “location specific representation for the word at window position j relative to the focus word w” where the focus word {output word} is determined in light of the location specific representation of a word based on a position relative to {a context of...} the focus word {the input word}. Further, it is noted that skip gram models necessarily include context between the input word and output word, as the question being asked in a skip gram model is probability of a word being nearby to another word (i.e., the context).; Trask, ¶¶ [0045], [0042]), and the position information indicates a relative position between the output word and the input word (the “window position j {position information}” of the location specific representation c^(i/j) {the output word} is “relative to the focus word w {relative… between the output word and the input word}”; Trask, ¶¶ [0045]); calling a position relation-based Continuous Skip-gram Model (Skip-Gram), with each of the training samples as an input, to obtain an output result (“In some embodiments, a neural network of the present disclosure can be trained using a ‘skip-gram’ training style” where “one partition will receive [a focus term] as input … [and] corresponding output nodes will then generate an output.”; Trask, ¶¶ [0045]-[0046]), the output result comprising a word matrix embedding of the input word, a word matrix embedding of the output word and a position matrix embedding of the position information (The output is “probability that the focus term {output word} is a given value based on the word {input word} in the position {position information}” where the output of the nodes can be represented by a matrix, thus each of the above can be a matrix embedding of the input word, the output word and the position information.; Trask, ¶¶ [0046], [0030]); and updating the position relation-based Skip-Gram based on the output result to train the position relation-based Skip-Gram (“The actual output is compared to the preferred output {...based on the output result}, and then back-propagated through the neural network to update the synapse matrices according to any back-propagation algorithm, as is known in the art of neural networks {...updating the position relation-based Skip-Gram}.” which “can be repeated on additional sequences in the linguistic corpus, until the network is sufficiently accurate {...to train the position relation-based Skip-Gram.}.”; Trask, ¶¶ [0046]).

Regarding claim 2, Trask discloses performing sampling on a training text; ("word2vec’s skip-gram architecture to learn 100 dimensional feature representations for words in the corpus by using hierarchical softmax and negative sampling (with 10 samples) for the focus word representations."; Trask, ¶¶ [0088]) performing word segmentation on the sampled training text to obtain a plurality of words; ("Pre-processing consists of lower-casing, separating words from punctuation, and removing HTML tags. After learning word representations, stop-words are filtered out using NLTK’s [4] stop word corpus and review-level representations are created by averaging the features for each remaining word in the review."; Trask, ¶¶ [0088]) searching for an identifier of each of the plurality of words in a dictionary; ("In some embodiments, the neural network 200 has an input layer 210 of nodes that encodes this input using “one-hot” encoding. That is, one input node exists for each linguistic unit in a dictionary of linguistic units that could be used,"; Trask, ¶¶ [0032]) selecting identifiers of two words that are within a predetermined distance range, (As disclosed with reference to an embodiment, "FIG. 4 depicts a windowed partitioned embodiment having two partitions, one for the word immediately preceding the focus term (p=+1) and one for the word immediately following the focus term (p=−1). The network is shown here, analyzing the phrase “SEE SPOT RUN,” where the focus term is “SPOT.” Here, three hidden nodes are used for the p=+1 partition, and three hidden nodes are used for the p=−1 partition. Again, as a result of inputting “SEE” to the p=+1 partition, and “RUN” to the p=−1 partition, the network predicts that the focus term is ‘SPOT.’"; Trask, ¶¶ [0033]) wherein an identifier of one of the two words is selected as an identifier of the input word and an identifier of the other one of the two words is selected as an identifier of the output word; (Using the example of "See spot run," "See" can be treated as the input word and spot may be treated as the output word; Trask, ¶¶ [0033]) determining the first position of the output word relative to the center word and the second position of the input word relative to the center word; and (All positions for all words, indicated with reference to the example as "p=+1 partition...p=−1 partition" and others, are indicated relative to an arbitrary center word. As such, the first position of the output word, at p=0, is relative to the center word, and the second position of the input word at p=+1 is relative to the center word.; Trask, ¶¶ [0033]) generating the training samples according to the identifier of the input word, the identifier of the output word and the position information (Using the same example, the training samples in the skip gram model use “SPOT” as "the focus term", where "one partition will receive as input ‘SEE’, and one partition will receive as input ‘RUN’" to create the "location specific representation (partition j) for the word at window position j relative to the focus word w."; Trask, ¶¶ [0033], [0046]).

Regarding claim 4, the rejection of claim 1 is incorporated. Trask further discloses wherein updating the position relation-based Skip-Gram based on the output result comprises: determining, based on the output result, a probability that the output word in the training sample appears at a position corresponding to the relative position; and (“The output of the neural network, which correlates with the percent chance that the output word is the focus term, is then compared to a preferred output where the focus term (“SPOT”) is 100% (or a corresponding maximum output value) and the output for all other linguistic units is 0% (or a corresponding minimum output value).”; Trask, ¶¶ [0046]) updating, based on the probability, the weight matrix, the bias embedding, the word matrix embedding and the position matrix embedding of the position relation-based Skip-Gram according to a training target (“The actual output is compared to the preferred output, and then back-propagated through the neural network to update the synapse matrices according to any back-propagation algorithm, as is known in the art of neural networks.”; Trask, ¶¶ [0046]).

Regarding claim 8, the rejection of claim 1 is incorporated. Trask further discloses further comprising: acquiring a word to be predicted; and (“Using the baseline scores for comparison, the present inventors evaluate cross-language models on the Spanish corpus. {acquiring a word to be predicted}”; Trask, ¶¶ [0095]) calling the trained position relation-based Continuous Skip-Gram to predict a context of the word to be predicted (“the models” as trained on an English corpus “are able to classify polarity {predict the context of the word to be predicted} with 80% accuracy,”; Trask, ¶¶ [0095]).

Regarding claim 9, Trask discloses An apparatus based on a neural network model, comprising (the computer 700 incorporating the “neural network… trained using a ‘skip-gram’ training style.”; Trask, ¶¶ [0097], [0045]): a processor (“processing unit 702”; Trask, ¶¶ [0097]); and a memory configured to store instructions executable by the processor, wherein the processor is configured to: (“a system memory 704, and a system bus 706 that couples the memory 704 to the processing unit 702. The computer 700 further includes a mass storage device 712 for storing program modules 714. The program modules 714 can include computer-executable modules for performing the one or more functions associated with FIGS. 1-6.”; Trask, ¶¶ [0097]) acquire a plurality of training samples (“One or more training examples are run, where for each training example, the words surrounding the focus term are input into their respective input nodes corresponding to the appropriate partitions.”; Trask, ¶¶ [0042]), each of the training samples comprising an identifier of an input word, an identifier of an output word, and position information, (Each of the training samples includes “c^(i/j), is the location specific representation (partition j) for the word {identifier of an input word} at window position j {position information} relative to the focus word w {identifier of an output word}.”; Trask, ¶¶ [0045]) wherein in one of the training samples, the output word is a context of the input word (The training samples include a “location specific representation for the word at window position j relative to the focus word w” where the focus word {output word} is determined in light of the location specific representation of a word based on a position relative to {a context of...} the focus word {the input word}. Further, it is noted that skip gram models necessarily include context between the input word and output word, as the question being asked in a skip gram model is probability of a word being nearby to another word (i.e., the context).; Trask, ¶¶ [0045], [0042]), and the position information indicates a relative position between the output word and the input word (the “window position j {position information}” of the location specific representation c^(i/j) {the output word} is “relative to the focus word w {relative… between the output word and the input word}”; Trask, ¶¶ [0045]); call a position relation-based Continuous Skip-gram Model (Skip-Gram), with each of the training samples as an input, to obtain an output result (“In some embodiments, a neural network of the present disclosure can be trained using a ‘skip-gram’ training style” where “one partition will receive [a focus term] as input … [and] corresponding output nodes will then generate an output.”; Trask, ¶¶ [0045]-[0046]), the output result comprising a word matrix embedding of the input word, a word matrix embedding of the output word and a position matrix embedding of the position information (The output is “probability that the focus term {output word} is a given value based on the word {input word} in the position {position information}” where the output of the nodes can be represented by a matrix, thus each of the above can be a matrix embedding of the input word, the output word and the position information.; Trask, ¶¶ [0046], [0030]); and update the position relation-based Skip-Gram based on the output result to train the position relation-based Skip-Gram (“The actual output is compared to the preferred output {...based on the output result}, and then back-propagated through the neural network to update the synapse matrices according to any back-propagation algorithm, as is known in the art of neural networks {...updating the position relation-based Skip-Gram}.” which “can be repeated on additional sequences in the linguistic corpus, until the network is sufficiently accurate {...to train the position relation-based Skip-Gram.}.”; Trask, ¶¶ [0046]).

Regarding claim 10, the rejection of claim 9 is incorporated. Claim 10 is substantially the same as claim 2 and is therefore rejected under the same rationale as above.

Regarding claim 12, the rejection of claim 9 is incorporated. Claim 12 is substantially the same as claim 4 and is therefore rejected under the same rationale as above.

Regarding claim 16, the rejection of claim 9 is incorporated. Claim 16 is substantially the same as claim 8 and is therefore rejected under the same rationale as above.

Regarding claim 17, Trask discloses A non-transitory computer-readable storage medium having stored therein instructions that, when executed by at least one processor of an apparatus based on a neural network model, cause the apparatus to: (“The computer 700 further includes a mass storage device 712 for storing program modules 714. The program modules 714 can include computer-executable modules for performing the one or more functions associated with FIGS. 1-6,” where the mass storage device can be a “computer-readable storage media … [which] does not include transitory signals.; Trask, ¶¶ [0097]-[0098]) acquire a plurality of training samples (“One or more training examples are run, where for each training example, the words surrounding the focus term are input into their respective input nodes corresponding to the appropriate partitions.”; Trask, ¶¶ [0042]), each of the training samples comprising an identifier of an input word, an identifier of an output word, and position information, (Each of the training samples includes “c^(i/j), is the location specific representation (partition j) for the word {identifier of an input word} at window position j {position information} relative to the focus word w {identifier of an output word}.”; Trask, ¶¶ [0045]) wherein in one of the training samples, the output word is a context of the input word (The training samples include a “location specific representation for the word at window position j relative to the focus word w” where the focus word {output word} is determined in light of the location specific representation of a word based on a position relative to {a context of...} the focus word {the input word}. Further, it is noted that skip gram models necessarily include context between the input word and output word, as the question being asked in a skip gram model is probability of a word being nearby to another word (i.e., the context).; Trask, ¶¶ [0045], [0042]), and the position information indicates a relative position between the output word and the input word (the “window position j {position information}” of the location specific representation c^(i/j) {the output word} is “relative to the focus word w {relative… between the output word and the input word}”; Trask, ¶¶ [0045]); call a position relation-based Continuous Skip-gram Model (Skip-Gram), with each of the training samples as an input, to obtain an output result (“In some embodiments, a neural network of the present disclosure can be trained using a ‘skip-gram’ training style” where “one partition will receive [a focus term] as input … [and] corresponding output nodes will then generate an output.”; Trask, ¶¶ [0045]-[0046]), the output result comprising a word matrix embedding of the input word, a word matrix embedding of the output word and a position matrix embedding of the position information (The output is “probability that the focus term {output word} is a given value based on the word {input word} in the position {position information}” where the output of the nodes can be represented by a matrix, thus each of the above can be a matrix embedding of the input word, the output word and the position information.; Trask, ¶¶ [0046], [0030]); and update the position relation-based Skip-Gram based on the output result to train the position relation-based Skip-Gram (“The actual output is compared to the preferred output {...based on the output result}, and then back-propagated through the neural network to update the synapse matrices according to any back-propagation algorithm, as is known in the art of neural networks {...updating the position relation-based Skip-Gram}.” which “can be repeated on additional sequences in the linguistic corpus, until the network is sufficiently accurate {...to train the position relation-based Skip-Gram.}.”; Trask, ¶¶ [0046]).

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 3 and 11 is/are rejected under 35 U.S.C. 103 as being unpatentable over Trask in view of Li (U.S. Pat. App. Pub. No. 2020/0184339, hereinafter Li) and Gu (CN Pub. No. 107239444 A, hereinafter Gu).

Regarding claim 3, Trask discloses wherein the position relation-based Skip-Gram comprises an input layer, a hidden layer and an output layer that are sequentially connected (Discloses a skip-gram model for a neural network which "can comprise a plurality of layers of neural nodes (i.e., “neurons”). In some embodiments, a neural network can comprise an input layer, a hidden layer, and an output layer."; Trask, ¶¶ [0024]). However, Trask fails to expressly recite the hidden layer having a weight matrix W and a bias embedding B, W∈R^(Nx(N+M)), B∈R^(N+M), where R represents a dictionary, N represents dimensions of a word matrix embedding of the input word and a word matrix embedding of the output word, N represents dimensions of a word matrix embedding of the input word and a word matrix embedding of the output word, M represents dimension of the position matrix embedding, and + represents embedding stitching.
Li teaches systems and methods for text classification. (Li, ¶ [0002] ). Regarding claim 3, Li teaches the hidden layer having a weight matrix W and a bias embedding B, W∈R^(Nx(N+M)), B∈R^(N+M), (Discloses the a topic sparse encoder which can be employed for skip-gram models including a hidden layer having W∈R^(Dw ×V) is a weight matrix and b∈R^(Dw) is a hidden bias vector.; Li, ¶¶ [0053], Table 1) where R represents a dictionary, (R represents a set including a dictionary; Li, ¶¶ [0053], Table 1) N represents dimensions of a word matrix embedding of the input word and a word matrix embedding of the output word, (Dw represents the dimensions of the word related embeddings {where word related embeddings includes dimensions of a word matrix embedding of the input word and a word matrix embedding of the output word}; Li, ¶¶ [0046], Table 1).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the neural network language modeling systems of Trask to incorporate the teachings of Li to include the hidden layer having a weight matrix W and a bias embedding B, W∈R^(Nx(N+M)), B∈R^(N+M), where R represents a dictionary, N represents dimensions of a word matrix embedding of the input word and a word matrix embedding of the output word. The systems described in Li can “exploit both entity and topic modeling” to “extract discriminative representations of questions from a limited number of words” which can “improve the representation learning of questions,” as recognized by Li. (Li, ¶ [0026] ). However, Trask  and Li fail to expressly recite N represents dimensions of a word matrix embedding of the input word and a word matrix embedding of the output word, M represents dimension of the position matrix embedding, and + represents embedding stitching.
Gu teaches “a fusion of speech and position information of word vector training method and system.” (Gu, pg. 1, para 2). Regarding claim 3, Gu teaches N represents dimensions of a word matrix embedding of the input word and a word matrix embedding of the output word, ("word-speech relation weight matrix M, wherein the row dimension of matrix M is the word property marking the type of centralized speech, the elements in the matrix M is the row of the element corresponding to the part of speech of the word with the column of the element corresponding to the occurrence probability of the part of speech of the word {dimensions of a word matrix embedding of the input word and a word matrix embedding of the output word}"; Gu, ¶¶ Pg. 2, para 13) M represents dimension of the position matrix embedding, ("The speech information marking for modelling to construct word associated weight matrix M and the relative position i to the corresponding words for speech modeling construct associated with the position corresponding to the position weight matrix Mi"; Gu, ¶¶ Pg. 2, para 8 and 13) and + represents embedding stitching (The position weight matrix Mi is combined, thus the operation performed is embedding stitching; Gu, ¶¶ Pg. 2, para 8 and 13).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the neural network language modeling systems of Trask, as modified by the text classification systems of Li, to incorporate the teachings of Gu to include N represents dimensions of a word matrix embedding of the input word and a word matrix embedding of the output word, M represents dimension of the position matrix embedding, and + represents embedding stitching. The invention of Gu “provide[s] a fusion of speech and position information of word vector training method and system, so as to solve the technical problem of fusion of speech granularity the research work of part of speech information,” as recognized by Gu. (Gu, Pg. 2, para 4).

Regarding claim 11, the rejection of claim 9 is incorporated. Claim 11 is substantially the same as claim 3 and is therefore rejected under the same rationale as above.

Claims 5, 7, 13, 15 is/are rejected under 35 U.S.C. 103 as being unpatentable over Trask as applied to claims 4 and 12 above, and further in view of Giridhari (U.S. Pat. App. Pub. No. 2018/0357531, hereinafter Giridhari).

Regarding claim 5, the rejection of claim 4 is incorporated. Trask discloses all of the elements of the current invention as stated above. However, Trask fails to expressly recite wherein updating, based on the probability, the weight matrix, the bias embedding, the word matrix embedding and the position matrix embedding of the position relation-based Skip-Gram according to the training target comprises: calculating a log-likelihood function according to the probability that the output word in the training sample appears at the position corresponding to the relative position; and updating, by taking maximization of the log-likelihood function as the training target, the weight matrix, the bias embedding, the word matrix embedding and the position matrix embedding of the position relation-based Skip-Gram.
Giridhari teaches systems and methods for “text classification and feature selection.” (Giridhari, ¶ [0001]). Regarding claim 5, Giridhari teaches wherein updating, based on the probability, the weight matrix, the bias embedding, the word matrix embedding and the position matrix embedding of the position relation-based Skip-Gram according to the training target comprises: calculating a log-likelihood function according to the probability that the output word in the training sample appears at the position corresponding to the relative position; and (The system "minimizes the difference between dot product of word vectors {… that the output word in the training sample} and the logarithm of words co-occurrence probability {a log likelihood function according to the probability…}," where co-occurrence indicates appearing at the position corresponding to the relative position; Giridhari, ¶¶ [0004]) updating, by taking maximization of the log-likelihood function as the training target, the weight matrix, the bias embedding, the word matrix embedding and the position matrix embedding of the position relation-based Skip-Gram (The system includes that "vectors are updated by maximizing the likelihood L using stochastic gradient ascent" where "the document vectors and the word vectors are learned jointly {… by taking maximization of the log-likelihood function as the training target}"; Giridhari, ¶¶ [0024]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the neural network language modeling systems of Trask to incorporate the teachings of Giridhari to include wherein updating, based on the probability, the weight matrix, the bias embedding, the word matrix embedding and the position matrix embedding of the position relation-based Skip-Gram according to the training target comprises: calculating a log-likelihood function according to the probability that the output word in the training sample appears at the position corresponding to the relative position; and updating, by taking maximization of the log-likelihood function as the training target, the weight matrix, the bias embedding, the word matrix embedding and the position matrix embedding of the position relation-based Skip-Gram. The text classification systems described herein can provide for “improved classification accuracy” and “maximize the prediction probability of the co-occurrence of words,” as recognized by Giridhari. (Giridhari, ¶ [0022] ).

Regarding claim 7, the rejection of claim 5 is incorporated. Trask and Giridhari disclose all of the elements of the current invention as stated above. However, Trask fails to expressly recite wherein updating, by taking the maximization of the log-likelihood function as the training target, the weight matrix, the bias embedding, the word matrix embedding and the position matrix embedding of the position relation-based Skip-Gram comprises: updating, using a stochastic gradient descent learning algorithm, the weight matrix, the bias embedding, the word matrix embedding and the position matrix embedding of the position relation-based Skip-Gram.
The relevance of Giridhari is disclosed above with relation to claim 5. Regarding claim 7, Giridhari further teaches wherein updating, by taking the maximization of the log-likelihood function as the training target, the weight matrix, the bias embedding, the word matrix embedding and the position matrix embedding of the position relation-based Skip-Gram comprises: updating, using a stochastic gradient descent learning algorithm, the weight matrix, the bias embedding, the word matrix embedding and the position matrix embedding of the position relation-based Skip-Gram ("The word vectors are updated by maximizing the likelihood L using stochastic gradient ascent" where "The document vectors and the word vectors are learned jointly."; Giridhari, ¶¶ [0024]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the neural network language modeling systems of Trask to incorporate the teachings of Giridhari to include wherein updating, by taking the maximization of the log-likelihood function as the training target, the weight matrix, the bias embedding, the word matrix embedding and the position matrix embedding of the position relation-based Skip-Gram comprises: updating, using a stochastic gradient descent learning algorithm, the weight matrix, the bias embedding, the word matrix embedding and the position matrix embedding of the position relation-based Skip-Gram. The text classification systems described herein can provide for “improved classification accuracy” and “maximize the prediction probability of the co-occurrence of words,” as recognized by Giridhari. (Giridhari, ¶ [0022] ).

Regarding claim 13, the rejection of claim 12 is incorporated. Claim 13 is substantially the same as claim 5 and is therefore rejected under the same rationale as above.

Regarding claim 15, the rejection of claim 13 is incorporated. Claim 15 is substantially the same as claim 7 and is therefore rejected under the same rationale as above.


Allowable Subject Matter
Claims 6 and 14 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims..
The following is an examiner’s statement of reasons for indicating allowable subject matter: 
Regarding claims 6 and 14, the closest prior art of record Trask teaches the combination of limitations recited in claims 1 and 4. Further, Giridhari does teach combination of limitations described in dependent claim 5. 
However, none of the prior art references of record, either alone or in combination, teaches, suggests, or makes obvious the combination of limitations as recited in the independent claims.
More specifically, the limitations of “wherein calculating the log-likelihood function according to the probability that the output word in the training sample appears at the position corresponding to the relative position comprises: calculating the log-likelihood function by using the following formula:  

    PNG
    media_image1.png
    59
    745
    media_image1.png
    Greyscale

where (Wt+1, Wt, posi) is a positive sample, (Wj, Wt, posc+1) is a negative sample, T is a number of training samples, t is a serial number of an identifier of an output word, w, is an identifier of a t' output word, 2c is a size of a context window of an output word, -c and c respectively represent positions of two ends in the context window, c+1 represents a position outside the context window, the position of wt within the context window of the identifier of the tth output word is 0, the position of Wt+1, within the context window of the identifier of the tth output word is i, posi represents a position of Wt+1 relative to Wt, n is a number of negative samples in a training sample using Wt as the identifier of the output word, Wj is an identifier of an input word in a jth negative sample in the training sample using Wt as the identifier of the output word, posc+1 represents a position of wj relative to wt, ln represents a log function having a base e, p(Wt l Wt+I, posi) is a probability that an identifier of an output word at a position corresponding to posi, is Wt when the identifier of the input word is Wt+I, and p(Wt l Wj, posc+1) is a probability that an identifier of an output word at a position corresponding to posc+1 is Wt, when the identifier of the input word is Wj, where  Obion Ref.: 525490US Xiaorni ReP:: BR 801 7551JS China PAT ReP: 1904565-I-US-XIAOMI 
    PNG
    media_image2.png
    141
    1030
    media_image2.png
    Greyscale
  exp represents an exponential function having a base e;              
                
                    
                        
                            
                                W
                            
                            
                                k
                            
                            
                                T
                            
                        
                    
                    →
                
            
         is a transposed matrix of             
                
                    
                        
                            
                                W
                            
                            
                                k
                            
                        
                    
                    →
                
            
        ,             
                
                    
                        
                            
                                W
                            
                            
                                k
                            
                        
                    
                    →
                
            
         is a word matrix embedding of a input word corresponding to             
                
                    
                        W
                    
                    
                        k
                    
                
            
         and             
                
                    
                        W
                    
                    
                        k
                    
                
            
         represents an identifier of an input word in any negative sample;             
                
                    
                        
                            
                                W
                            
                            
                                t
                                -
                                1
                            
                            
                                T
                            
                        
                    
                    →
                
            
         is a transposed matrix of             
                
                    
                        
                            
                                W
                            
                            
                                t
                                +
                                i
                            
                        
                    
                    →
                
            
        ,             
                
                    
                        
                            
                                W
                            
                            
                                t
                                +
                                i
                            
                        
                    
                    →
                
            
         is a word matrix embedding of a input word corresponding to             
                
                    
                        W
                    
                    
                        t
                        +
                        i
                    
                
            
        ,             
                
                    
                        
                            
                                p
                                o
                                s
                            
                            
                                i
                            
                        
                    
                    →
                
            
         is a position matrix embedding corresponding to             
                
                    
                        p
                        o
                        s
                    
                    
                        i
                    
                
            
        ,             
                
                    
                        
                            
                                W
                            
                            
                                j
                            
                            
                                T
                            
                        
                    
                    →
                
            
         is a transposed matrix of             
                
                    
                        
                            
                                W
                            
                            
                                j
                            
                        
                    
                    →
                
            
        ,             
                
                    
                        
                            
                                W
                            
                            
                                j
                            
                        
                    
                    →
                
            
         is a word matrix embedding of a input word corresponding to             
                
                    
                        W
                    
                    
                        j
                    
                
            
        ,             
                
                    
                        
                            
                                p
                                o
                                s
                            
                            
                                c
                                +
                                1
                            
                        
                    
                    →
                
            
         is a position matrix embedding corresponding to             
                
                    
                        p
                        o
                        s
                    
                    
                        c
                        +
                        1
                    
                
            
        ; and • represents matrix point multiplication” is not taught by the prior art of record.
	

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Non-Patent Literature to Guthrie et al. (David Guthrie, Ben Allison, Wei Liu, Louise Guthrie, and Yorick Wilks. 2006. A Closer Look at Skip-gram Modelling. In Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06), Genoa, Italy. European Language Resources Association (ELRA), hereinafter Guthrie) discloses various methods of using skip-gram modeling to overcome data sparsity.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to Sean E. Serraguard whose telephone number is (313)446-6627. The examiner can normally be reached 07:00-17:00 M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel C. Washburn can be reached on (571) 272-5551. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/Sean E Serraguard/Patent Examiner, Art Unit 2657                                                                                                                                                                                                        

/DANIEL C WASHBURN/Supervisory Patent Examiner, Art Unit 2657