DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Specification

The lengthy specification has not been checked to the extent necessary to determine the presence of all possible minor errors. Applicant’s cooperation is requested in correcting any errors of which applicant may become aware in the specification.
Claim Rejections - 35 USC § 102


The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.


Claim(s) 1-2, 4-5, and 7-20 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Chowdhury et al (US 2018/0032900).
Chowdhury teaches:
1. A computer-implemented method for generating weak-labeled data to train a machine learning model for query language prediction, the method comprising: generating a seed dictionary comprising a plurality of labeled dictionary terms ([0048] In various active learning approaches, the learner typically chooses the examples to be labeled. As a result, the number of examples needed to learn a concept may be lower than the number of examples needed for typical supervised learning approaches. For example, an active learner may attempt to select the most informative example from an unlabeled pool of example instances. In this example, the learner typically begins with a small number of instances, known as seeds, in the labeled training set L.); receiving a plurality of unlabeled sample query terms (Figure 4, 404 Input Category and Query Terms); comparing, at a first time, the plurality of unlabeled sample query terms to the plurality of labeled dictionary terms ([0078] In various embodiments, a semantic search-based seed selector (SSSS) 412 is implemented to select the candidate unannotated seed. In these embodiments, the SSSS 412 takes into consideration two different scores, a Latent Semantic Analysis (LSA) similarity score and a search engine score, which are used in combination to rank unannotated instances stored in the repository of unannotated instances and seeds 428. Once ranked, the SSSS uses the ranking in block 426 to select the next candidate seed. For example, the unannotated seed with the highest ranking may indicate it is most likely to be annotated with a positive label by the user 402); generating a first set of labeled sample query terms by labeling at least a subset of the plurality of unlabeled sample query terms based on the first comparison (Figure 4, 406) ; comparing, at a second time, remaining unlabeled sample query terms with the first set of labeled sample query terms (Figure 4, 426); generating a second set of labeled sample query terms by labeling the remaining unlabeled sample query terms based on the second comparison (Figure 4, 432 Seed annotated by system); and providing the first and second sets of labeled sample query terms to a machine learning model configured for query language prediction (Figure 4 424 ML- BASED supervised Classifier).
2. The computer-implemented method of claim 1, wherein comparing, at a first time, the plurality of unlabeled sample query terms to the plurality of labeled dictionary terms comprises: for each of the plurality of labeled dictionary terms, determining a predefined number of unlabeled sample query terms that are nearest to the labeled dictionary term ([0079] In one embodiment, an LSA distributed semantic model 414 is used to generate the LSA similarity score. In certain embodiments, the LSA similarity score is a score indicating the degree of similarity between a given unannotated instance, the current input category and associated query terms 404 provided by the user 402, and any previously-annotated seeds 406 stored within the repository of annotated seeds 408).
4. The computer-implemented method of claim 2, wherein labeling at least a subset of the plurality of unlabeled sample query terms based on the first comparison comprises: for each of the plurality of labeled dictionary terms, annotating the predefined number of unlabeled sample query terms with a language associated with the labeled dictionary term (Figure 4, 426 Select Next Seed for annotation). 
5. The computer-implemented method of claim 2, wherein comparing, at a second time, remaining unlabeled sample query terms with the first set of labeled sample query terms comprises: for each of the remaining unlabeled sample query terms, determine a predefined number of labeled sample query terms that are nearest to the remaining unlabeled sample query term ([0083] As an example, if there is a preponderance of negatively-labeled seeds in the repository of annotated seeds 408, the SSSS 412 may select a candidate unannotated seed in block 426 that it believes has the highest certainty of being positive. I.e., an unlabeled instance having a highest confidence level of being a positive instance for an input category. In this example, the candidate seed selected by the SSSS 412 would have a high ranking).
7. The computer-implemented method of claim 5, wherein labeling the remaining unlabeled sample query terms based on the second comparison comprises: for each of the remaining unlabeled sample query terms, determining a language based on languages associated with the predefined number of labeled sample query terms; and annotating each of the remaining unlabeled sample query terms with the language determined based on the languages associated with the predefined number of labeled sample query terms ([0083] -  Conversely, if there is a preponderance of positively-labeled seeds in the repository of annotated seeds 408, the SSSS 412 may select a candidate unannotated seed in block 426 that it believes has the highest certainty of being negative. I.e., an unlabeled instance having a highest confidence level of being a negative instance for an input category. To continue the example, the candidate seed selected by the SSSS 412 would have the lowest ranking, indicating that the SSSS 412 believes there is a high certainty it would be assigned a negative label if it were annotated by a human annotator 402).
8. The computer-implemented method of claim 7, wherein the language is determined based on the languages associated with the predefined number of labeled sample query terms using a voting scheme (Figure 4, 430 Ask user to annotate). 
9. The computer-implemented method of claim 1, further comprising: receiving a query term from a user application (Figure 4, 404); determining, using the machine learning model, a language of the query term (Figure 4, 404); and modifying at least one data element of a user interface based on the determined language (Figure 4, 418).
Claim 10 is rejected using similar reasoning seen in the rejection of claim 1 due to reciting similar limitations but directed towards a system. 

11. A computer-implemented method of determining a language of a query, the method comprising: receiving at least a portion of a query (figure 4, 404); and using a machine learning model trained on weak-labeled data to determine a language of the received portion of the query, wherein the weak-labeled data comprises a plurality of training queries labeled with an associated language based on a seed dictionary of terms having a known language (figure 3, 318 predict scores/categories).
12. The computer-implemented method of claim 11, wherein the machine learning model is trained on the weak-labeled data by: obtaining a plurality of labeled dictionary terms, each of the labeled dictionary terms comprising a term and an associated language of the term (Figure 3, 308 Repository of Annotated Training Input); obtaining a plurality of unlabeled sample query terms (figure3, 302 Unannotated Source Input)); comparing each of the unlabeled sample query terms to the labeled dictionary terms ([0068] In various embodiments, the GAL system 250 is implemented to perform a semantic similarity search 310 of the repository of unannotated instances and seeds 304 to select candidate seeds 306 for annotation. In certain embodiments, LSA approaches familiar to skilled practitioners of the art are implemented to perform the semantic similarity search 310. In these embodiments, the input category and any associated query terms provided by the user 314 are used in the semantic similarity search 310 to identify semantically-similar instances in the repository of unannotated instances and seeds 304); and generating the weak-labeled data based on the comparison of each unlabeled sample query term to the labeled dictionary terms (Figure 5, 514).
13. The computer-implemented method of claim 12, wherein generating the weak- labeled data based on the comparison of each unlabeled sample query term with each dictionary term comprises: determining that an unlabeled sample query term is similar to a labeled dictionary term ([0068] In various embodiments, the GAL system 250 is implemented to perform a semantic similarity search 310 of the repository of unannotated instances and seeds 304 to select candidate seeds 306 for annotation. In certain embodiments, LSA approaches familiar to skilled practitioners of the art are implemented to perform the semantic similarity search 310. In these embodiments, the input category and any associated query terms provided by the user 314 are used in the semantic similarity search 310 to identify semantically-similar instances in the repository of unannotated instances and seeds 304); and as a result of determining that the unlabeled sample query term is similar to the labeled dictionary term, labeling the unlabeled sample query term to create a labeled sample query term by assigning a language associated with the labeled dictionary term to the unlabeled sample query term ([0068] In various embodiments, the GAL system 250 is implemented to perform a semantic similarity search 310 of the repository of unannotated instances and seeds 304 to select candidate seeds 306 for annotation. In certain embodiments, LSA approaches familiar to skilled practitioners of the art are implemented to perform the semantic similarity search 310. In these embodiments, the input category and any associated query terms provided by the user 314 are used in the semantic similarity search 310 to identify semantically-similar instances in the repository of unannotated instances and seeds 304).
14. The computer-implemented method of claim 13, wherein determining that the unlabeled sample query term is similar to the labeled dictionary term comprises: determining a first distance between the unlabeled sample query term and the labeled dictionary term ([0069] - In one embodiment, the similar instances are identified through the use of term frequency-inverse document frequency (tf-idf) scores, which are generated by the search engine. As used herein, tf-idf scores broadly refer to numerical statistics that reflect the importance of a word in a corpus. As such, it is often used as a weighting factor in information retrieval and text mining. Accordingly, tf-idf scores are useful in finding similar instances in the use of a particular word or phrase.); comparing the first distance to a second distance between another unlabeled sample query term and the labeled dictionary term ([0069] - In one embodiment, the similar instances are identified through the use of term frequency-inverse document frequency (tf-idf) scores, which are generated by the search engine. As used herein, tf-idf scores broadly refer to numerical statistics that reflect the importance of a word in a corpus. As such, it is often used as a weighting factor in information retrieval and text mining. Accordingly, tf-idf scores are useful in finding similar instances in the use of a particular word or phrase); and determining that the first distance is shorter than the second distance ([0073] In various embodiments, the LSA similarity scores, the search engine scores, the input category, and any associated query terms are processed by a semantic search-based seed selector (SSSS) implemented to perform ranking operations 316. In certain embodiments, a “bag of words” (BOW) model is implemented in combination with the semantic similarity search 310 and the keyword search 312 to perform re-ranking operations 316 to predict confidence 318 of the resulting LSA similarity search engine scores).

15. The computer-implemented method of claim 13, further comprising: identifying one or more remaining unlabeled sample query terms; determining an expanded training data set by propagating one or more labels to the remaining unlabeled sample query terms; and training the machine learning model using the expanded training data set.

16. The computer-implemented method of claim 15, wherein determining the expanded training data set by propagating one or more labels to the remaining unlabeled sample query terms further comprises: comparing each of the remaining unlabeled sample query terms with each of the labeled query terms included in the training data set ([0083] -  Conversely, if there is a preponderance of positively-labeled seeds in the repository of annotated seeds 408, the SSSS 412 may select a candidate unannotated seed in block 426 that it believes has the highest certainty of being negative. I.e., an unlabeled instance having a highest confidence level of being a negative instance for an input category. To continue the example, the candidate seed selected by the SSSS 412 would have the lowest ranking, indicating that the SSSS 412 believes there is a high certainty it would be assigned a negative label if it were annotated by a human annotator 402).; and determining the expanded training data set at least based on the comparison of each remaining unlabeled sample query term with each labeled query term included in the training data set ([0083] -  Conversely, if there is a preponderance of positively-labeled seeds in the repository of annotated seeds 408, the SSSS 412 may select a candidate unannotated seed in block 426 that it believes has the highest certainty of being negative. I.e., an unlabeled instance having a highest confidence level of being a negative instance for an input category. To continue the example, the candidate seed selected by the SSSS 412 would have the lowest ranking, indicating that the SSSS 412 believes there is a high certainty it would be assigned a negative label if it were annotated by a human annotator 402).
17. The computer-implemented method of claim 16, wherein determining the expanded training data set at least based on the comparison of each remaining unlabeled sample query term with each labeled query term included in the training data set comprises: determining at least one labeled query term similar to a remaining unlabeled sample query term (Figure 4, 432); and labeling the remaining unlabeled sample query term to obtain another new labeled query term by assigning a language associated with the labeled query term to the remaining unlabeled sample query term (Figure 4, 432).
18. The computer-implemented method of claim 16, wherein determining the expanded training data set at least based on the comparison of each remaining unlabeled sample query term with each labeled query term included in the training data set comprises: determining a first labeled query term and a second labeled query term similar to a remaining unlabeled sample query term, wherein the first labeled query term is associated with a first language and the second labeled query term is associated with a second language ([0079] In one embodiment, an LSA distributed semantic model 414 is used to generate the LSA similarity score. In certain embodiments, the LSA similarity score is a score indicating the degree of similarity between a given unannotated instance, the current input category and associated query terms 404 provided by the user 402, and any previously-annotated seeds 406 stored within the repository of annotated seeds 408. In one embodiment, the search engine score is generated by a search engine 416, such as a Lucene-based search engine. In this embodiment, the search engine score is generated by creating an in-memory search index, in near-real-time, from the remaining unannotated instances, and concurrently, by also using the current input category and associated query terms 404 provided by the user 402) ; and labeling the remaining unlabeled query term to obtain another new labeled query term by assigning the first language or the second language to the remaining unlabeled sample query term [0093] However, if it was determined in step 526 that there was not an imbalance of negatively-annotated seeds stored in the repository of unannotated instances and seeds, then the SSSS is used in step 528 to select the lowest-ranked unannotated seed stored in the repository of unannotated instances and seeds as the candidate seed. A determination is then made in step 530 whether the candidate seed should be automatically annotated with a negative label by the GAL system. If not, or if it was determined in step 526 to request a user to annotate the candidate seed, or if the candidate seed was selected by a supervised classifier in step 524, then the candidate seed is provided to a user for annotation in step 532..
19. The computer-implemented method of claim 11, further comprising: predicting a complete query at least based on the determined language of the received portion of the query (Figure 3, 318 Predict Scores/ Categories).
20. The computer-implemented method of claim 11, further comprising: determining a user interface to display at least based on the determined language of the received portion of the query (Figure 4, 404).

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 3 and 6 is/are rejected under 35 U.S.C. 103 as being unpatentable over Chowdhury et al (US 2018/0032900) in view of Bondugula et al (US 2020/0034419).

Regarding claim 3, Chowdhury teaches the computer-implemented method of claim 2, and does not explicitly teach wherein the predefined number of unlabeled sample query terms are determined using a k Nearest Neighbor (kNN) model.
Bondugula teaches wherein the predefined number of unlabeled sample query terms are determined using a k Nearest Neighbor (kNN) model ([0015] - a scoring model using a trained classifier. In some aspects, a processing device can receive existing text samples corresponding to a selected class. The processing device can search a stored, pre-trained corpus defining embedding vectors for selections from the text samples to produce nearest neighbor vectors for each embedding vector. The nearest neighbor text selections are identified. The nearest neighbor text selections can be identified by systematically checking distance and retaining or selecting those that are closest. Alternatively, text selections corresponding to at least some of the nearest neighbor vectors can be ordered based on a distance between each nearest neighbor vector and the embedding vector for each selection to produce a text cloud. Text samples can be selected based on the text cloud to produce seed data that is used to train a text classifier. A scoring model can be produced using the text classifier).
Accordingly, it would have been obvious to one of ordinary skill in the art before the
effective filing date of the claimed invention to have modified the teachings of Chowdhury to include wherein the predefined number of unlabeled sample query terms are determined using a k Nearest Neighbor (kNN) model as taught by Bondugula. It would be advantageous since the scoring model can receive a large number of new text samples and provide a score indicative of a likelihood of a text sample being a member of a selected class and scores can be produced for each of the large number of new text samples.
Regarding claim 6,  Chowdhury teaches the computer-implemented method of claim 5, and does not explicitly teach wherein the predefined number of labeled sample query terms are determined using a k Nearest Neighbor (kNN) model.
Bondugula teaches wherein the predefined number of labeled sample query terms are determined using a k Nearest Neighbor (kNN) model ([0015] - a scoring model using a trained classifier. In some aspects, a processing device can receive existing text samples corresponding to a selected class. The processing device can search a stored, pre-trained corpus defining embedding vectors for selections from the text samples to produce nearest neighbor vectors for each embedding vector. The nearest neighbor text selections are identified. The nearest neighbor text selections can be identified by systematically checking distance and retaining or selecting those that are closest. Alternatively, text selections corresponding to at least some of the nearest neighbor vectors can be ordered based on a distance between each nearest neighbor vector and the embedding vector for each selection to produce a text cloud. Text samples can be selected based on the text cloud to produce seed data that is used to train a text classifier. A scoring model can be produced using the text classifier).
Accordingly, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Chowdhury to include wherein the predefined number of labeled sample query terms are determined using a k Nearest Neighbor (kNN) model.as taught by Bondugula. It would be advantageous since the scoring model can receive a large number of new text samples and provide a score indicative of a likelihood of a text sample being a member of a selected class and scores can be produced for each of the large number of new text samples.



Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SAMUEL SHARPLESS whose telephone number is (571)272-1521. The examiner can normally be reached M-F 7:30 AM- 3:30 PM (ET).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, MARK FEATHERSTONE can be reached on (571)270-3750. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/S.C.S./Examiner, Art Unit 2166                                                                                                                                                                                                        
/MARK D FEATHERSTONE/Supervisory Patent Examiner, Art Unit 2166