Detailed Action
This action is in response to Applicant's communications filed 15 June 2022.
Claim(s) 1, 16, and 20 was/were amended.  No claims were withdrawn.  No claims were added.  Therefore, claims 1-3, and 5-20 are pending in this Application.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 15 June 2022 has been entered.
 
Response to Amendments/Arguments
Applicant's arguments, filed 15 June 2022, regarding the rejections of claims 1-3, and 5-20 under 35 USC 103 have been fully considered but are not persuasive.
Applicant argues (Remarks, pp. 7-8) that the newly added claim language regarding "determines, based on the user-entered word, whether to carry out an online training" in independent claims 1, 16, and 20 is not taught by the prior art combination of Coccaro, Mikolov, Shokouhi, and Pilehvar, because the prior art does not teach/suggest a determining step for determining whether to carry out an online training based on the user-entered word.  Examiner disagrees with this assessment, as Mikolov teaches this limitation.  Examiner notes that the "online training" in the claim limitations refers to adding a new word to the vocabulary or updating an embedding in the vocabulary. The determining step does not limit how the determining is done, thus the broadest reasonable interpretation of the determining step includes making a determination based on the frequency of the word.  Mikolov teaches that updating the embeddings of frequent words has little impact ("the vector representations of frequent words do not change significantly after training on several million examples" sec. 2.3, p. 4), so using subsampling of frequent words can improve training speed ("We also found that the subsampling of the frequent words results in both faster training and signiﬁcantly better representations of uncommon words." sec. 7, p. 8).  To determine whether to carry out an online training for each word, Mikolov "used a simple subsampling approach: each word wi in the training set is discarded with probability computed by the formula             
                P
                
                    
                        
                            
                                w
                            
                            
                                i
                            
                        
                    
                
                =
                1
                -
                 
                
                    
                        
                            t
                        
                        
                            f
                            
                                
                                    
                                        
                                            w
                                        
                                        
                                            i
                                        
                                    
                                
                            
                        
                    
                
            
         (5) where f(wi) is the frequency of word wi and t is a chosen threshold, typically around 10−5. We chose this subsampling formula because it aggressively subsamples words whose frequency is greater than t while preserving the ranking of the frequencies." sec. 2.3, pp. 4-5).  Therefore Mikolov teaches the determining step of the amended claim limitations.
The rejection of the dependent claims for depending from rejected claims is maintained.
For the aforementioned reasons, claims 1-3, 5-20 are rejected under 35 USC 103.
Applicant’s arguments, filed 16 March 2022, regarding the rejections of claims 1-3, and 5-20 under 35 USC 103 are regarding newly amended claims and are addressed in the current rejection. 

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claim(s) 1-3, 5-7, 11-14, 16-20 is/are rejected under 35 U.S.C. 103 as being obvious in view of Coccaro et al. (US 2014/0278379, Hereinafter "Coccaro") in view of Mikolov et al. (Distributed Representations of Words and Phrases and their Compositionality, hereinafter "Mikolov"), Shokouhi (Learning to Personalize Query Auto-Completion), and Pilehvar et al. (From Senses to Texts: An All-In-One Graph-Based Approach For Measuring Semantic Similarity, hereinafter "Pilehvar").

Regarding Claim 1,
Coccaro teaches a data input system at an electronic device for inputting text items to the electronic device ("Computing device 500 includes a processor 502, memory 504, a storage device 506" [0061]), comprising:
a store ("memory 504" [0061]) holding a vocabulary (FIG. 3, Semantic Model 330) of embeddings of text items ("vectors of semantic context models" [0020]), each embedding being a numerical encoding of a text item ("A variety of different types of semantic context information can be used, such as vectors of semantic context models (e.g., LSA models, latent dirichlet allocation (LDA) models) that represent semantic contexts (e.g., likelihood of words and/or topics appearing within a semantic context) and/or distances between words and semantic contexts." [0020]; each vector teaches the embedding being a numerical encoding of a text item, wherein the vector is defined by the values of a plurality of dimensions [0005]) that indicates weights for analysis of the text item in a neural network ("As indicated by the two connections 216 and 218 that are depicted as being darker/thicker, weights (w0 and w 1) can be associated with the connections 216 and 218 based, at least in part, on training of the neural network using appropriate training data. Weights can indicate a level of association between two nodes" [0037]);
a processor ("processor 502" [0061]) which: 
receives user input comprising one or more context text items (FIG. 1, Local Context: "Is the" 106; "Such a local context can include any of an appropriate number of preceding words uttered by the speaker (user A 102), such as 1 word, 2 words, 3 words, and/or 5 words." [0023])
implements the neural network, the neural network trained to produce a prediction of a next text item in the sequence given the context text items and the vocabulary ("The computer system 100 can receive and used the local context 106 and the semantic context 108 for the dialog between user A 102 and user B 104 to determine probabilities that each of a vocabulary of words is likely to be a next word uttered by user A 102 in the dialog. As indicated by step A (122), the computer system 100 can access a neural network that includes an input layer, one or more hidden layers, and an output layer." [0030]); and
implements online training to change the vocabulary (FIG. 3, Semantic Context Generator 325, Network 304; the semantic context generator is online due to being connected to a network; "The semantic context generator 325 may update and/or generate new semantic models periodically and/or on request. The semantic context generator 325 can additionally generate vectors and values to provide as semantic context input to neural networks using the semantic models 330, such as by generating a context vector from an LSA model and/or identifying distances for word vectors from the context vector." [0048]; this teaches updating the embeddings in the vocabulary); and 
propagates results to a final layer of the neural network (FIG. 2B; "In particular, the neural network 230 takes as input for a second portion 232 of nodes SC0-SCV in the input layer 202 distances 234 for words V from a semantic context. For instance, referring to the LSA example described above with regard to FIG. 1, vectors for each of the words V can be compared to a context vector for the T×K matrix to determine a distance for each of the words" [0041]; "As indicated by step C (126), the computer system 100 can generate probability values for candidate words by propagating, through the connections between nodes in the neural network, the values applied to the input layer of the neural network through the hidden layer(s) and to the output layer. As discussed above, the connections between nodes in the neural network can be weighted based on training for the neural network, which can cause the values that are generated at the output layer to be varied and based on the local and semantic contexts. The values of the nodes at the output layer can indicate probabilities that words corresponding to the nodes are likely to be a next word that will be uttered by the user A 102." [0032]).

Coccaro does not explicitly teach determines, based on the user-entered word, whether to carry out an online training, computes, when determined to carry out the online training, one of: an embedding added to the vocabulary as a sequence of the one or more context text items followed by the new text item, or an update to an embedding already in the vocabulary for the sequence of the context items followed by the new text item.
Coccaro does not explicitly teach receives user input comprising one or more context text items followed by a new text item that is a user-entered word, the user-entered word being entered via individual letters, and comparing the new text item and the predicted next text item for learning.
Coccaro does not explicitly teach the user-entered word including one or more rare words, proper nouns, idiosyncrasies, or combinations thereof, wherein the new text item is not initially being in the vocabulary of the neural network, and wherein the user-entered word does not exist in the store.

Mikolov teaches determines, based on the user-entered word, whether to carry out an online training ("the vector representations of frequent words do not change significantly after training on several million examples" sec. 2.3, p. 4; "we used a simple subsampling approach: each word wi in the training set is discarded with probability computed by the formula             
                P
                
                    
                        
                            
                                w
                            
                            
                                i
                            
                        
                    
                
                =
                1
                -
                 
                
                    
                        
                            t
                        
                        
                            f
                            
                                
                                    
                                        
                                            w
                                        
                                        
                                            i
                                        
                                    
                                
                            
                        
                    
                
            
         (5) where f(wi) is the frequency of word wi and t is a chosen threshold, typically around 10−5. We chose this subsampling formula because it aggressively subsamples words whose frequency is greater than t while preserving the ranking of the frequencies." sec. 2.3, pp. 4-5; "We also found that the subsampling of the frequent words results in both faster training and signiﬁcantly better representations of uncommon words." sec. 7, p. 8), 
computes, when determined to carry out the online training, one of: an embedding added to the vocabulary as a sequence of the one or more context text items followed by the new text item, or an update to an embedding already in the vocabulary for the sequence of the context items followed by the new text item ("many phrases have a meaning that is not a simple composition of the meanings of its individual words. To learn vector representation for phrases, we ﬁrst ﬁnd words that appear frequently together, and infrequently in other contexts... This way, we can form many reasonable phrases without greatly increasing the size of the vocabulary" sec. 4, pp. 5-6; the vector representation for phrases teaches adding embeddings for text items by including the preceding context items; It is noted that the claim language only requires computing one limitation).
Coccaro and Mikolov are analogous art because both are directed towards representing language in vector space. It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the semantic context word predictor of Coccaro with the phrase vector representations and subsampling of Mikolov.  The modification would have been obvious because one of ordinary skill in the art would be motivated to improve the training time and quality of the vectors, as suggested by Mikolov (Mikolov: Abstract, p.1; sec. 7, p. 8).

Shokouhi teaches receives user input comprising one or more context text items followed by a new text item that is a user-entered word, the user-entered word being entered via individual letters ("Following each new character entered in the query box, search engines filter suggestions that match the updated prefix, and suggest the top-ranked candidates to the user." sec. 1, p. 103), and 
comparing the new text item and the predicted next text item for learning ("We propose a similar labelling strategy for personalized auto-completion; we start by sampling a set of impressions from search logs. For each sampled impression, we assume that the query that was eventually submitted by the user is the only right (or the most relevant) suggestion that should have been suggested right after the ﬁrst key-stroke and all the way until submission. With this assumption in mind, we decompose each sampled query into all preﬁxes that lead to it and for each case we obtain all query candidates that match in the auto-completion trie.... For each pair of prefix and auto-completion list constructed this way, we assign positive label to the query submitted by the user at the end (if it appears in the list) and zero label to others... Once the training data is collected as described above, we can apply virtually any existing learning-to-rank algorithm for training a personalized auto-completion ranker." sec. 3, pp. 106-107).
Coccaro and Shokouhi are analogous art because both are directed towards machine learning to provide a prediction of a next word. It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the semantic context word prediction of the Coccaro/Mikolov combination with the auto-completion learning method of Shokouhi.  The modification would have been obvious because one of ordinary skill in the art would be motivated to personalize auto-completion and outperform popularity-based rankers, as suggested by Shokouhi (Shokouhi: Abstract).

	Pilehvar teaches the user-entered word including one or more rare words, proper nouns, idiosyncrasies, or combinations thereof, wherein the new text item is not initially being in the vocabulary of the neural network, and wherein the user-entered word does not exist in the store ("This can be particularly problematic when measuring semantic similarity of text pairs that contain many OOV words, such as infrequent named entities, acronyms or jargon. In order to alleviate this issue, we propose two novel techniques for handling OOV terms while measuring the semantic similarity of textual items." sec. 5.4, p. 110; infrequent teaches rare words, named entities teaches proper nouns, jargon teaches idiosyncrasies; OOV words teaches text is not initially being in the vocabular of the neural network and wherein the user-entered word does not exist in the store; "For the case of our example, we introduce, in each of the two semantic signatures, new dimensions corresponding to their missing terms, i.e., Steve_Ballmern and Microsoftn for h1 and Microsoftn for h2. Fig. 7 illustrates our direct OOV handling for the sentence h2. We set the associated weights of the newly-introduced dimensions to 0.5 so as to guarantee their placement among the top dimensions in their corresponding signatures. We utilize this approach for handling OOV entries in our text-level experiments (Section 6.4) and show that it can provide considerable performance improvement on datasets containing many OOV entries." sec. 5.4.1, p. 110).
	Coccaro and Pilehvar are analogous art because both are directed towards representing language in vector space.  It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the semantic context word predictor of the Coccaro/Mikolov/Shokouhi combination with the OOV word term handling of Pilehvar.  The modification would have been obvious because one of ordinary skill in the art would be motivated to improve the performance on datasets containing OOV entries, as discussed by Pilehvar ("We utilize this approach for handling OOV entries in our text-level experiments (Section 6.4) and show that it can provide considerable performance improvement on datasets containing many OOV entries. " sec. 5.4.1, p. 110).

Regarding Claim 2,
The Coccaro/Mikolov/Shokouhi/Pilehvar combination teaches the system of claim 1.  Coccaro further teaches wherein the online training is configured to determine whether a text item in the user input is a new text item by any one or more of: checking if an embedding of the text item is available in the vocabulary ("The semantic context generator 325 can generate contexts using a variety of techniques, such as LSA, which indicates the likelihood that particular words will appear in various contexts, and LDA, which indicates the likelihood that particular topics will appear in various contexts. The semantic context generator 325 can store generated semantic models in a semantic model repository 330. The semantic context generator 325 may update and/or generate new semantic models periodically and/or on request. The semantic context generator 325 can additionally generate vectors and values to provide as semantic context input to neural networks using the semantic models 330, such as by generating a context vector from an LSA model and/or identifying distances for word vectors from the context vector." [0048]; identifying distances between word vectors and context vectors will indicate if an embedding of the text item is available in the vocabulary if the distance is zero; it is noted that the claim language only requires one feature).

Regarding Claim 3,
The Coccaro/Mikolov/Shokouhi/Pilehvar combination teaches the system of claim 1.  Coccaro further teaches wherein the online training is configured to change the vocabulary by selecting one of a plurality of possible lengths of an embedding associated with the new text item ("Portions of one or more of these three component matrices (T×K, K×K, and K×D) can be used to represent the semantic context 108. For example, various vectors can be generated from the T×K matrix and used to represent the semantic context 108. For instance, the T×K matrix can be collapsed to a context vector that represents the semantic context by combining the values for the terms (rows) in each context (columns). A variety of techniques can be used to combine the term values, such as determining the centroid of the values for the terms (rows) in each context (columns), weighting different words more or less strongly based on their significance, and/or other factors. Such a context vector, with K dimensions, may be used to represent the semantic context 108." [0029]).

Regarding Claim 5,
The Coccaro/Mikolov/Shokouhi/Pilehvar combination teaches the system of claim 1.  Coccaro further teaches wherein the processor is configured to receive a plurality of instances of the new text item in user input and to compute a plurality of associated predicted embeddings ("Each of the different samples from the speech samples 110 and the writing samples 112 may be considered to be a different context. The semantic context 108 can be a combination of different samples (contexts) from the speech samples 110 and the writing samples 112. Such a combination can be generated and modeled in any of a variety of appropriate ways, such as through the use of LSA and/or LDA. For instance, in the example of LSA, the semantic context 108 can be based on one or more vectors that are derived from a matrix of word frequency for various words in a vocabulary across a plurality of samples (documents)." [0028]).

Regarding Claim 6,
The Coccaro/Mikolov/Shokouhi/Pilehvar combination teaches the system of claim 1.  Coccaro further teaches wherein the online training is configured to set a norm of the embedding of the new text item ("For instance, referring to the LSA example described above with regard to FIG. 1, vectors for each of the words V can be compared to a context vector for the T×K matrix to determine a distance for each of the words. Distances can be determined in any of a variety of appropriate ways, such as by the dot product of the word vector and the context vector, the Euclidian distance, and/or the normalized Euclidian distance." [0041]; the context vector teaches the norm of the embedding) using one or more statistics of the vocabulary ("For instance, in the example of LSA, the semantic context 108 can be based on one or more vectors that are derived from a matrix of word frequency for various words in a vocabulary across a plurality of samples (documents)." [0028]; frequency teaches statistics of the vocabulary).

Regarding Claim 7,
The Coccaro/Mikolov/Shokouhi/Pilehvar combination teaches the system of claim 1.  Coccaro further teaches wherein the online training is configured to compare the new text item and the predicted next text item and propagate results of the comparison to a final layer of the neural network (FIG. 2B; "In particular, the neural network 230 takes as input for a second portion 232 of nodes SC0-SCV in the input layer 202 distances 234 for words V from a semantic context. For instance, referring to the LSA example described above with regard to FIG. 1, vectors for each of the words V can be compared to a context vector for the T×K matrix to determine a distance for each of the words" [0041]; "As indicated by step C (126), the computer system 100 can generate probability values for candidate words by propagating, through the connections between nodes in the neural network, the values applied to the input layer of the neural network through the hidden layer(s) and to the output layer. As discussed above, the connections between nodes in the neural network can be weighted based on training for the neural network, which can cause the values that are generated at the output layer to be varied and based on the local and semantic contexts. The values of the nodes at the output layer can indicate probabilities that words corresponding to the nodes are likely to be a next word that will be uttered by the user A 102." [0032]).

Regarding Claim 11,
The Coccaro/Mikolov/Shokouhi/Pilehvar combination teaches the system of claim 1.  Coccaro further teaches wherein the online training is configured to change a number of elements of an embedding of the new text item ("For example, various vectors can be generated from the T×K matrix and used to represent the semantic context 108. For instance, the T×K matrix can be collapsed to a context vector that represents the semantic context by combining the values for the terms (rows) in each context (columns). A variety of techniques can be used to combine the term values, such as determining the centroid of the values for the terms (rows) in each context (columns), weighting different words more or less strongly based on their significance, and/or other factors. Such a context vector, with K dimensions, may be used to represent the semantic context 108." [0029]).

Regarding Claim 12,
The Coccaro/Mikolov/Shokouhi/Pilehvar combination teaches the system of claim 1.  Coccaro further teaches wherein the online training is configured to compare the new text item and the predicted next text item and propagate results of the comparison to a final layer of the neural network (FIG. 2B; "In particular, the neural network 230 takes as input for a second portion 232 of nodes SC0-SCV in the input layer 202 distances 234 for words V from a semantic context. For instance, referring to the LSA example described above with regard to FIG. 1, vectors for each of the words V can be compared to a context vector for the T×K matrix to determine a distance for each of the words" [0041]; "As indicated by step C (126), the computer system 100 can generate probability values for candidate words by propagating, through the connections between nodes in the neural network, the values applied to the input layer of the neural network through the hidden layer(s) and to the output layer. As discussed above, the connections between nodes in the neural network can be weighted based on training for the neural network, which can cause the values that are generated at the output layer to be varied and based on the local and semantic contexts. The values of the nodes at the output layer can indicate probabilities that words corresponding to the nodes are likely to be a next word that will be uttered by the user A 102." [0032]), and 
wherein the online training is configured to compute a bias of the new text item embedding by counting occurrences of the new text item and a total number of text items observed in user input at the electronic device ("For instance, in the example of LSA, the semantic context 108 can be based on one or more vectors that are derived from a matrix of word frequency for various words in a vocabulary across a plurality of samples (documents)." [0028]; the vector being derived based on word frequency teaches the bias of the new text embedding based on occurrences).

Regarding Claim 13,
The Coccaro/Mikolov/Shokouhi/Pilehvar combination teaches the system of claim 1.  Coccaro further teaches wherein the online training is configured to compute a bias of the new text item embedding by counting occurrences of the new text item and a total number of text items observed in user input at the electronic device ("For instance, in the example of LSA, the semantic context 108 can be based on one or more vectors that are derived from a matrix of word frequency for various words in a vocabulary across a plurality of samples (documents)." [0028]; the vector being derived based on word frequency teaches the bias of the new text embedding based on occurrences).

Regarding Claim 14,
The Coccaro/Mikolov/Shokouhi/Pilehvar combination teaches the system of claim 1.  Coccaro further teaches wherein the online training is configured to initialize an embedding of the new text item using an embedding computed by a character compositional embedding model ("The semantic context generator 325 can additionally generate vectors and values to provide as semantic context input to neural networks using the semantic models 330, such as by generating a context vector from an LSA model and/or identifying distances for word vectors from the context vector." [0048]).

Regarding Claims 16 and 18,
Claims 16 and 18 recite a method corresponding to the system recited in claim 1 and 2, respectively.  The Coccaro/Mikolov/Shokouhi/Pilehvar combination teaches the limitations of claims 16, 18 as set forth above in connection with claims 1 and 2.  Therefore, claims 16 and 18 are rejected under the same rationale as respective claims 1 and 2.

Regarding Claim 17,
The Coccaro/Mikolov/Shokouhi/Pilehvar combination teach the method of claim 16.  Coccaro further teaches wherein the vocabulary is updated online at the electronic device (FIG. 3, Semantic Context Generator 325, Network 304, Word Prediction Computer System 308, "The semantic context generator 325 may update and/or generate new semantic models periodically and/or on request." [0048]).

Regarding Claim 19,
The Coccaro/Mikolov/Shokouhi/Pilehvar combination teach the method of claim 16.  Shokouhi further teaches changing the vocabulary by comparing the new text item and the predicted text item ("We propose a similar labelling strategy for personalized auto-completion; we start by sampling a set of impressions from search logs. For each sampled impression, we assume that the query that was eventually submitted by the user is the only right (or the most relevant) suggestion that should have been suggested right after the ﬁrst key-stroke and all the way until submission. With this assumption in mind, we decompose each sampled query into all preﬁxes that lead to it and for each case we obtain all query candidates that match in the auto-completion trie.... For each pair of prefix and auto-completion list constructed this way, we assign positive label to the query submitted by the user at the end (if it appears in the list) and zero label to others... Once the training data is collected as described above, we can apply virtually any existing learning-to-rank algorithm for training a personalized auto-completion ranker." sec. 3, pp. 106-107).
Coccaro and Shokouhi are analogous art because both are directed towards machine learning to provide a prediction of a next word. It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the semantic context word prediction of the Coccaro/Mikolov combination with the auto-completion learning method of Shokouhi.  The modification would have been obvious because one of ordinary skill in the art would be motivated to personalize auto-completion and outperform popularity-based rankers, as suggested by Shokouhi (Shokouhi: Abstract)

Regarding Claim 20,
Coccaro teaches one or more device-readable media with device-executable instructions that, when executed by a computing system ("Computing device 500 includes a processor 502, memory 504, a storage device 506" [0061]), direct the computing system to perform operations comprising:
storing a vocabulary (FIG. 3, Semantic Model 330) of embeddings of text items ("vectors of semantic context models" [0020]), each embedding being a numerical encoding of a text item ("A variety of different types of semantic context information can be used, such as vectors of semantic context models (e.g., LSA models, latent dirichlet allocation (LDA) models) that represent semantic contexts (e.g., likelihood of words and/or topics appearing within a semantic context) and/or distances between words and semantic contexts." [0020]; each vector teaches the embedding being a numerical encoding of a text item, wherein the vector is defined by the values of a plurality of dimensions [0005]) that indicates weights for analysis of the text item in a neural network ("As indicated by the two connections 216 and 218 that are depicted as being darker/thicker, weights (w0 and w 1) can be associated with the connections 216 and 218 based, at least in part, on training of the neural network using appropriate training data. Weights can indicate a level of association between two nodes" [0037]);
receiving user input comprising one or more context text items (FIG. 1, Local Context: "Is the" 106; "Such a local context can include any of an appropriate number of preceding words uttered by the speaker (user A 102), such as 1 word, 2 words, 3 words, and/or 5 words." [0023]), 
implementing a trained neural network to produce a prediction of a next text item in the sequence given the context text items and the vocabulary ("The computer system 100 can receive and used the local context 106 and the semantic context 108 for the dialog between user A 102 and user B 104 to determine probabilities that each of a vocabulary of words is likely to be a next word uttered by user A 102 in the dialog. As indicated by step A (122), the computer system 100 can access a neural network that includes an input layer, one or more hidden layers, and an output layer." [0030]);
implementing an online training to change the vocabulary (FIG. 3, Semantic Context Generator 325, Network 304; the semantic context generator is online due to being connected to a network; "The semantic context generator 325 may update and/or generate new semantic models periodically and/or on request. The semantic context generator 325 can additionally generate vectors and values to provide as semantic context input to neural networks using the semantic models 330, such as by generating a context vector from an LSA model and/or identifying distances for word vectors from the context vector." [0048]; this teaches updating the embeddings in the vocabulary) in a shallow backpropagation process ("As described in greater detail below with regard to FIGS. 2A-B, the connections between the nodes of the input layer, the hidden layer(s ), and the output layer can be weighted through any of a variety of training processes during which training data with verified input and output data are repeatedly passed through the neural network so as to identify appropriate adjustments to weighting values connecting nodes within the neural network." [0030]; FIG. 2B; "In particular, the neural network 230 takes as input for a second portion 232 of nodes SC0-SCV in the input layer 202 distances 234 for words V from a semantic context. For instance, referring to the LSA example described above with regard to FIG. 1, vectors for each of the words V can be compared to a context vector for the T×K matrix to determine a distance for each of the words" [0041]; "As indicated by step C (126), the computer system 100 can generate probability values for candidate words by propagating, through the connections between nodes in the neural network, the values applied to the input layer of the neural network through the hidden layer(s) and to the output layer. As discussed above, the connections between nodes in the neural network can be weighted based on training for the neural network, which can cause the values that are generated at the output layer to be varied and based on the local and semantic contexts. The values of the nodes at the output layer can indicate probabilities that words corresponding to the nodes are likely to be a next word that will be uttered by the user A 102." [0032]).

Coccaro does not explicitly teach determining, based on the user-entered word, whether to carry out an online training, computing, when determined to carry out the online training, one of: an embedding added to the vocabulary as a sequence of the one or more context text items followed by the new text item, or an update to an embedding already in the vocabulary for the sequence of the context items followed by the new text item.
Coccaro does not explicitly teach receives user input comprising one or more context text items followed by a new text item that is a user-entered word, the user-entered word being entered via individual letters, and comparing the new text item and the predicted next text item for learning.
Coccaro does not explicitly teach the user-entered word including one or more rare words, proper nouns, idiosyncrasies, or combination thereof, wherein the user-entered word does not exist in the database.

Mikolov teaches determining, based on the user-entered word, whether to carry out an online training ("the vector representations of frequent words do not change significantly after training on several million examples" sec. 2.3, p. 4; "we used a simple subsampling approach: each word wi in the training set is discarded with probability computed by the formula             
                P
                
                    
                        
                            
                                w
                            
                            
                                i
                            
                        
                    
                
                =
                1
                -
                 
                
                    
                        
                            t
                        
                        
                            f
                            
                                
                                    
                                        
                                            w
                                        
                                        
                                            i
                                        
                                    
                                
                            
                        
                    
                
            
         (5) where f(wi) is the frequency of word wi and t is a chosen threshold, typically around 10−5. We chose this subsampling formula because it aggressively subsamples words whose frequency is greater than t while preserving the ranking of the frequencies." sec. 2.3, pp. 4-5; "We also found that the subsampling of the frequent words results in both faster training and signiﬁcantly better representations of uncommon words." sec. 7, p. 8), 
computing, when determined to carry out the online training, one of: an embedding added to the vocabulary as a sequence of the one or more context text items followed by the new text item, or an update to an embedding already in the vocabulary for the sequence of the context items followed by the new text item ("many phrases have a meaning that is not a simple composition of the meanings of its individual words. To learn vector representation for phrases, we ﬁrst ﬁnd words that appear frequently together, and infrequently in other contexts... This way, we can form many reasonable phrases without greatly increasing the size of the vocabulary" sec. 4, pp. 5-6; the vector representation for phrases teaches adding embeddings for text items by including the preceding context items; It is noted that the claim language only requires computing one limitation).
Coccaro and Mikolov are analogous art because both are directed towards representing language in vector space. It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the semantic context word predictor of Coccaro with the phrase vector representations and subsampling of Mikolov.  The modification would have been obvious because one of ordinary skill in the art would be motivated to improve the training time and quality of the vectors, as suggested by Mikolov (Mikolov: Abstract, p.1; sec. 7, p. 8).

Shokouhi teaches receives user input comprising one or more context text items followed by a new text item that is a user-entered word, the user-entered word being entered via individual letters ("Following each new character entered in the query box, search engines filter suggestions that match the updated prefix, and suggest the top-ranked candidates to the user." sec. 1, p. 103), and 
learning using the new text item and the predicted next text item ("We propose a similar labelling strategy for personalized auto-completion; we start by sampling a set of impressions from search logs. For each sampled impression, we assume that the query that was eventually submitted by the user is the only right (or the most relevant) suggestion that should have been suggested right after the ﬁrst key-stroke and all the way until submission. With this assumption in mind, we decompose each sampled query into all preﬁxes that lead to it and for each case we obtain all query candidates that match in the auto-completion trie.... For each pair of prefix and auto-completion list constructed this way, we assign positive label to the query submitted by the user at the end (if it appears in the list) and zero label to others... Once the training data is collected as described above, we can apply virtually any existing learning-to-rank algorithm for training a personalized auto-completion ranker." sec. 3, pp. 106-107).
Coccaro and Shokouhi are analogous art because both are directed towards machine learning to provide a prediction of a next word. It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the semantic context word prediction of the Coccaro/Mikolov combination with the auto-completion learning method of Shokouhi.  The modification would have been obvious because one of ordinary skill in the art would be motivated to personalize auto-completion and outperform popularity-based rankers, as suggested by Shokouhi (Shokouhi: Abstract).

Pilehvar teaches the user-entered word including one or more rare words, proper nouns, idiosyncrasies, or combination thereof,  wherein the user-entered word does not exist in the database ("This can be particularly problematic when measuring semantic similarity of text pairs that contain many OOV words, such as infrequent named entities, acronyms or jargon. In order to alleviate this issue, we propose two novel techniques for handling OOV terms while measuring the semantic similarity of textual items." sec. 5.4, p. 110; infrequent teaches rare words, named entities teaches proper nouns, jargon teaches idiosyncrasies; OOV words teaches wherein the user-entered word does not exist in the store; "For the case of our example, we introduce, in each of the two semantic signatures, new dimensions corresponding to their missing terms, i.e., Steve_Ballmern and Microsoftn for h1 and Microsoftn for h2. Fig. 7 illustrates our direct OOV handling for the sentence h2. We set the associated weights of the newly-introduced dimensions to 0.5 so as to guarantee their placement among the top dimensions in their corresponding signatures. We utilize this approach for handling OOV entries in our text-level experiments (Section 6.4) and show that it can provide considerable performance improvement on datasets containing many OOV entries." sec. 5.4.1, p. 110).
	Coccaro and Pilehvar are analogous art because both are directed towards representing language in vector space.  It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the semantic context word predictor of the Coccaro/Mikolov/Shokouhi combination with the OOV word term handling of Pilehvar.  The modification would have been obvious because one of ordinary skill in the art would be motivated to improve the performance on datasets containing OOV entries, as discussed by Pilehvar ("We utilize this approach for handling OOV entries in our text-level experiments (Section 6.4) and show that it can provide considerable performance improvement on datasets containing many OOV entries. " sec. 5.4.1, p. 110).

Claims 8-10 is/are rejected under 35 U.S.C. 103 as being unpatentable over Coccaro et al. (US 2014/0278379, Hereinafter "Coccaro") in view of Mikolov et al. (Distributed Representations of Words and Phrases and their Compositionality, hereinafter "Mikolov"), Shokouhi (Learning to Personalize Query Auto-Completion), and Pilehvar et al. (From Senses to Texts: An All-In-One Graph-Based Approach For Measuring Semantic Similarity, hereinafter "Pilehvar"), and further in view of Mnih et al. (US 2015/0095017, Hereinafter "Mnih").

Regarding Claim 8,
The Coccaro/Mikolov/Shokouhi/Pilehvar combination teaches the system of claim 1.  Coccaro and Shokouhi further teaches wherein the online training is configured to compare the new text item and the predicted next text item (Shokouhi: "We propose a similar labelling strategy for personalized auto-completion; we start by sampling a set of impressions from search logs. For each sampled impression, we assume that the query that was eventually submitted by the user is the only right (or the most relevant) suggestion that should have been suggested right after the ﬁrst key-stroke and all the way until submission. With this assumption in mind, we decompose each sampled query into all preﬁxes that lead to it and for each case we obtain all query candidates that match in the auto-completion trie.... For each pair of prefix and auto-completion list constructed this way, we assign positive label to the query submitted by the user at the end (if it appears in the list) and zero label to others... Once the training data is collected as described above, we can apply virtually any existing learning-to-rank algorithm for training a personalized auto-completion ranker." sec. 3, pp. 106-107) and propagate results of the comparison to a final layer of the neural network (Coccaro: FIG. 2B; "In particular, the neural network 230 takes as input for a second portion 232 of nodes SC0-SCV in the input layer 202 distances 234 for words V from a semantic context. For instance, referring to the LSA example described above with regard to FIG. 1, vectors for each of the words V can be compared to a context vector for the T×K matrix to determine a distance for each of the words" [0041]; "As indicated by step C (126), the computer system 100 can generate probability values for candidate words by propagating, through the connections between nodes in the neural network, the values applied to the input layer of the neural network through the hidden layer(s) and to the output layer. As discussed above, the connections between nodes in the neural network can be weighted based on training for the neural network, which can cause the values that are generated at the output layer to be varied and based on the local and semantic contexts. The values of the nodes at the output layer can indicate probabilities that words corresponding to the nodes are likely to be a next word that will be uttered by the user A 102." [0032]); 
Coccaro and Shokouhi are analogous art because both are directed towards machine learning to provide a prediction of a next word. It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the semantic context word prediction of the Coccaro/Mikolov combination with the auto-completion learning method of Shokouhi.  The modification would have been obvious because one of ordinary skill in the art would be motivated to personalize auto-completion and outperform popularity-based rankers, as suggested by Shokouhi (Shokouhi: Abstract).

The Coccaro/Mikolov/Shokouhi/Pilehvar combination does not explicitly teach the processor configured to receive further user input comprising positive examples of the new text item and negative examples of the new text item, and wherein the online training is configured to update the vocabulary using both the positive and negative examples.
Mnih teaches the processor configured to receive further user input comprising positive examples of the new text item and negative examples of the new text item, and wherein the online training is configured to update the vocabulary using both the positive and negative examples ("The training samples are associated with a positive label, indicative of a positive example of association between a target word and the surrounding context words in the sample. On the contrary, the negative samples are associated with a negative label, indicative of a negative example of word association because of the pseudo-random fabrication of the sample. As mentioned above, the associations, embeddings and/or similarities between words are modeled by parameters (commonly referred to as weights) of the neural language model 11. The neural language model training module 23 is configured to learn the parameters defining the neural language model based on the training samples and the negative samples, by recursively adjusting the parameters based on the calculated error or discrepancy between the predicted probability of word association of the input sample output by the model compared to the actual label of the sample." [0035]).

Coccaro and Mnih are analogous art because they are directed towards language processing models using neural networks for predicting the next word in a sequence. It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the semantic context system of the Coccaro/Mikolov/Shokouhi/Pilehvar combination with the positive/negative example training method of Mnih.  Doing so would enable an efficient technique of training the model (Mnih [0050]).

Regarding Claim 9,
The Coccaro/Mikolov/Shokouhi/Pilehvar/Mnih combination teaches the system of claim 8.  Mnih further teaches wherein the online training is configured to update the embeddings and/or biases of the embeddings using both the positive and negative examples ("The training samples are associated with a positive label, indicative of a positive example of association between a target word and the surrounding context words in the sample. On the contrary, the negative samples are associated with a negative label, indicative of a negative example of word association because of the pseudo-random fabrication of the sample. As mentioned above, the associations, embeddings and/or similarities between words are modeled by parameters (commonly referred to as weights) of the neural language model 11. The neural language model training module 23 is configured to learn the parameters defining the neural language model based on the training samples and the negative samples, by recursively adjusting the parameters based on the calculated error or discrepancy between the predicted probability of word association of the input sample output by the model compared to the actual label of the sample." [0035]).
Coccaro and Mnih are analogous art because they are directed towards language processing models using neural networks for predicting the next word in a sequence. It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the semantic context system of the Coccaro/Mikolov/Shokouhi/Pilehvar combination with the positive/negative example training method of Mnih.  Doing so would enable an efficient technique of training the model (Mnih [0050]).

Regarding Claim 10,
The Coccaro/Mikolov/Shokouhi/Pilehvar/Mnih combination teaches the system of claim 8.  Mnih further teaches wherein the online training is configured to sample and/or batch the negative examples of the new text item ("selecting a predefined number of data samples from the training data, the selected data samples defining positive examples of word associations, generating a predefined number of negative samples for each selected data sample, the negative samples defining negative examples of word associations, wherein the number of negative samples generated for each data sample is a statistically small proportion of the number of words in the word dictionary, and training a neural probabilistic language model using the data samples and the generated negative samples." [0008]).
Coccaro and Mnih are analogous art because they are directed towards language processing models using neural networks for predicting the next word in a sequence. It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the semantic context system of Coccaro/Math with the positive/negative example training method of Mnih.  Doing so would enable an efficient technique of training the model (Mnih [0050]).

Claim 15 is/are rejected under 35 U.S.C. 103 as being unpatentable over Coccaro et al. (US 2014/0278379, Hereinafter "Coccaro") in view of Mikolov et al. (Distributed Representations of Words and Phrases and their Compositionality, hereinafter "Mikolov"), Shokouhi (Learning to Personalize Query Auto-Completion), and Pilehvar et al. (From Senses to Texts: An All-In-One Graph-Based Approach For Measuring Semantic Similarity, hereinafter "Pilehvar"), and further in view of Kamvar et al. (The Role of Context in Query Input: Using contextual signals to complete queries on mobile devices, Hereinafter "Kamvar").

Regarding Claim 15,
The Coccaro/Mikolov/Shokouhi/Pilehvar combination teaches the system of claim 1.  Coccaro further teaches wherein the neural network is configured such that, when additional user input is received comprising the new text item, the neural network computes the predicted next item using the updated vocabulary ("The computer system 100 can receive and used the local context 106 and the semantic context 108 for the dialog between user A 102 and user B 104 to determine probabilities that each of a vocabulary of words is likely to be a next word uttered by user A 102 in the dialog. As indicated by step A (122), the computer system 100 can access a neural network that includes an input layer, one or more hidden layers, and an output layer." [0030]).

The Coccaro/Mikolov/Shokouhi/Pilehvar combination does not explicitly teach wherein the processor is configured to offer the predicted next item as data for input to the electronic device.
Kamvar teaches wherein the processor is configured to offer the predicted next item as data for input to the electronic device ("The system presented in this paper reduces the key presses needed to enter a query by offering word completions as the user is typing. If the suggested completion is correct, the user can accept it with a single key press; if it is not correct, the user can continue typing as normal." p.405, sec. 1, par. 3).
Coccaro and Kamvar are analogous art because both are directed to predicting words in a sequence. It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the semantic context system of the Coccaro/Mikolov/Shokouhi/Pilehvar combination with the suggest word completion of Kamvar.  Doing so would reduce the key presses needed to enter a query (Kamvar, p. 405, sec. 1, par. 3).

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to CHARLES C KUO whose telephone number is (571)270-7477.  The examiner can normally be reached on M-F: 9:00 a.m. - 6:00 p.m..
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ann Lo can be reached on (571) 272-9767.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


/CHARLES C KUO/Examiner, Art Unit 2126  
/ANN J LO/Supervisory Patent Examiner, Art Unit 2126