Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Objections
Claims 15, 17, and 19 objected to because of the following informalities:  
In claim 15, “figured configured automatically” should be “figured configured to automatically”
Claims 15, 17, and 19 are each missing a period at the end of the claim.
Appropriate correction is required.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

1.	Claims 1-2, 6-7, 9-10, 12-13, and 16-17 are rejected under 35 U.S.C. 103 as being unpatentable over Kathalagiri Somashekariah et al. (US 2020/0151647 A1) (“Kathalagiri Somashekariah”) in view of Feurer et al., “Initializing Bayesian Hyperparameter Optimization via Meta-Learning,” Association for the Advancement of Artificial Intelligence (2015) (“Feurer”) and Bordawekar et al., “Enabling Cognitive Intelligence Queries in Relational Databases using Lowdimensional Word Embeddings,” arXiv:1603.07185v1 [cs.CL] 23 Mar 2016 (“Bordawekar”) (cited by applicant in an IDS).
As to claim 1, Kathalagiri Somashekariah teaches a computer-implemented method comprising:
storing a plurality of meaningful test cases, each meaningful test case comprising […] one or more test model parameters used to create a word embedding model […]; [[0044]: “After word embedding model 208 is created and/or updated, model-creation apparatus 210 stores parameters of word embedding model 208 in a model repository 236. For example,…model-creation apparatus 208 may store the updated parameters separately from the old values (e.g., by storing each set of parameters with a different version number of the corresponding machine learning model).” That is, the data (including parameters) for each stored version of the model constitutes a “meaningful test case.” The limitation of “meaningful” is interpreted broadly to be having a meaning of any type, and is met because each stored model version has some utility due to the fact that it is stored. In particular, [0052] teaches that “multiple versions of word embedding model 208 may also be adapted to different subsets” (i.e., each version of the model may have its own use).]
receiving a production data set to be used in generating a new word embedding model, [[0042]: “As shown in FIG. 2, a model-creation apparatus 210 may create a word embedding model 208 from attributes in job histories 212.” Note that the “job histories” described here correspond to a “production data set to be used in generating a new world embedding model.”] wherein the production data set comprises data stored in a […] database […]; [The “job histories 212” is part of data 216/218 shown in FIG. 2 (as described in [0034]: “Profile data 216 and/or jobs data 218 may further include job histories 212 of members of the online network.”), which in turn is stored in data repository 134, as described in [0036]: “data repository 134 stores data that represents standardized, organized, and/or classified attributes in profile data 216 and/or jobs data 218.” See also [0042], which refers to “standardized attributes in job histories 212 from data repository 134.” Here, data repository 134 corresponds to a “database.” See also [0051], which states that “data repository 134…may be provided by…one or more databases.”];
[…] building a word embedding model based on the production data set. [[0042]: “As shown in FIG. 2, a model-creation apparatus 210 may create a word embedding model 208 from attributes in job histories 212.”]
Kathalagiri Somashekariah does not teach the following:
(1)	Each meaningful test cases further comprises “a test data profile” and the limitation that the word embedding model of the test case “has been classified as yielding meaningful results”;
(2)	the operations of “generating a data profile associated with the production data set” and “generating, based on the data profile associated with the production data set and the plurality of meaningful test cases, a recommendation for one or more production model parameters for use in” building the word embedding model based on the production data set.
(3)	The limitation that the database is “a relational database having a plurality of columns and a plurality of rows”;
Feurer, in an analogous art, teaches limitations (1) and (2) listed above. Feurer relates to “Model selection and hyperparameter optimization,” which it refers to as being “crucial in applying machine learning to a novel dataset” (abstract, first sentence). Therefore, Feurer is in the field of artificial intelligence, and also relates to the problem of obtaining parameters for models. Feurer generally teaches a method that uses previously evaluated model configurations and their corresponding datasets in order to obtain a configuration for a model when a new dataset arises: See “Introduction” section, paragraph 3: “The key concept behind meta-learning for hyperparameter search is to suggest good configurations for a novel dataset based on configurations that are known to perform well on similar, previously evaluated, datasets.” Feurer’s technique can be referred to as “meta-learning initialization” (see Feurer, Algorithm 2 title).
In particular, Feurer teaches a plurality of meaningful test cases, each comprising “a test data profile” [3rd page, Algorithm 2: “Input: …training datasets D1:N = (D1,…, DN); best configurations for training datasets,                         
                            
                                
                                    θ
                                
                                
                                    1
                                    :
                                    N
                                
                            
                            =
                            
                                
                                    
                                        
                                            θ
                                        
                                        ^
                                    
                                
                                
                                    1
                                
                            
                            ,
                            …
                            ,
                            
                                
                                    
                                        
                                            θ
                                        
                                        ^
                                    
                                
                                
                                    N
                                
                            
                        
                    .” The training datasets are represented in the form of meta features, as described on the 3rd page, right column, 1st paragraph: “we precompute the metafeatures for all training datasets D1,…, DN.” The metafeatures characterize properties of the dataset and correspond to “test data profiles” of the test cases. Metafeatures are further described in the section titled “Implemented Metafeatures” on the 3rd page and Table 1 on the 4th page, which lists metafeaturse such as the “number of features” and “total # of categorical values.”] and parameters used to create a model that “has been classified as yielding meaningful results” [The input of “best configurations for training datasets,                         
                            
                                
                                    θ
                                
                                
                                    1
                                    :
                                    N
                                
                            
                            =
                            
                                
                                    
                                        
                                            θ
                                        
                                        ^
                                    
                                
                                
                                    1
                                
                            
                            ,
                            …
                            ,
                            
                                
                                    
                                        
                                            θ
                                        
                                        ^
                                    
                                
                                
                                    N
                                
                            
                        
                    ” corresponds to parameters used to create a model. The limitation of “has been classified as yielding meaning results” is met because these configurations (i.e., parameters for models) have been evaluated to be the “best configurations” for their respective dataset. See also 3rd page, first full paragraph: “Let                         
                            
                                
                                    
                                        
                                            θ
                                        
                                        ^
                                    
                                
                                
                                    1
                                
                            
                            ,
                            …
                            ,
                            
                                
                                    
                                        
                                            θ
                                        
                                        ^
                                    
                                
                                
                                    N
                                
                            
                        
                     denote the best known hyperparameters for the previously encountered datasets D1,…, DN, respectively. These may originate from an arbitrary source, e.g., a manual search or the application of an SMBO method during an offline training phase.”]
Feurer further teaches “generating a data profile associated with the production data set” [3rd page, right column, 1st and 2nd paragraphs: “Given a new dataset DN+1, we then measure its distances to all previous datasets Di using a distance measure d… The first measure (denoted as dp) we used is the commonly-used p-norm of the difference between the datasets’ metafeatures…” See also algorithm 2, line 1. Since the previous datasets are represented as metafeatures and are compared with the new dataset using a distance measure, it is understood that the new dataset DN+1 is also represented in the form of metafeatures. The metafeatures representation of the new dataset corresponds to a “data profile associated with the production data set.”] and “generating, based on the data profile associated with the production data set and the plurality of meaningful test cases, a recommendation for one or more production model parameters for use in” building a model based on the production data set [Algorithm 2: “Result: Best hyperparameter configuration θ* found.” The “best hyperparameter configuration” constitutes a “recommendation for one or more production model parameters” because it is a suggestion of a good configuration for a novel dataset. See “Introduction” section, paragraph 3: “The key concept behind meta-learning for hyperparameter search is to suggest good configurations for a novel dataset based on configurations that are known to perform well on similar, previously evaluated, datasets.” Moreover as shown in Algorithm 2, the best configuration is “based on the data profile associated with the production data set and the plurality of meaningful test cases” because it is determined based on the distance metrics, which measure the distance between metafeatures (i.e., data profiles).]
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Kathalagiri Somashekariah with the teachings of Feurer by modifying Kathalagiri Somashekariah to include Feurer’s technique of metal-learning initialization (as generally shown in Algorithm 2 of Feurer), so as to implement the features that each meaningful test cases further comprises “a test data profile” and that the word embedding model of each test case “has been classified as yielding meaningful results,” and by adding the further operations of “generating a data profile associated with the production data set” and “generating, based on the data profile associated with the production data set and the plurality of meaningful test cases, a recommendation for one or more production model parameters” such that the recommendation is “for use in building [the] word embedding model based on the production data set.” The motivation would have been to implement a technique that obtains “good configurations for a novel dataset based on configurations that are known to perform well on similar, previously evaluated, datasets” (Feurer, “Introduction” section, paragraph 3).
Bordawekar, in an analogous art, teaches the remaining limitation of “a relational database having a plurality of columns and a plurality of rows.” Bordawekar teaches a method for “cognitive intelligence queries in relational databases using low-dimensional word embeddings” (see title). Therefore, Bordawekar is in the same field of endeavor as the claimed invention, namely language modeling.
In particular Bordawekar teaches “a relational database having a plurality of columns and a plurality of rows.” [§ 2, paragraph 4: “Relational Databases.” § 3.1 paragraph 3: “A database is a collection of tables. Each table is a collection of rows having the same number of columns.”]
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Kathalagiri Somashekariah and Feurer with the teachings of Bordawekar by modifying the method of Kathalagiri Somashekariah as already modified by Feurer such that the database (data repository) is a relational database having a plurality of columns and a plurality of rows, in order to use a known, suitable form of database that can store various types of data, as suggested by Bordawekar (§ 1, paragraph 1: “relational databases have been used to analyze enterprise datasets that comprise mostly of well-qualified typed entities… relational databases have been increasingly used to store and process free-formed unstructured text data”).

As to claim 2, the combination of Kathalagiri Somashekariah, Feurer, and Bordawekar teaches the computer-implemented method of claim 1, wherein storing a plurality of meaningful test cases comprises, for each meaningful test case of the plurality of meaningful test cases:
receiving an indication that a word embedding model generated based on a test data set associated with the meaningful test case yields meaningful results; [As noted in the rejection of claim 1, Feurer, Algorithm 2 teaches the inputs of “training datasets D1,…, DN” (corresponding to test data sets) and the “best configurations for training datasets,                         
                            
                                
                                    θ
                                
                                
                                    1
                                    :
                                    N
                                
                            
                            =
                            
                                
                                    
                                        
                                            θ
                                        
                                        ^
                                    
                                
                                
                                    1
                                
                            
                            ,
                            …
                            ,
                            
                                
                                    
                                        
                                            θ
                                        
                                        ^
                                    
                                
                                
                                    N
                                
                            
                        
                    ” (corresponding to models). See also Feurer, 3rd page, first full paragraph teaches: “Let                         
                            
                                
                                    
                                        
                                            θ
                                        
                                        ^
                                    
                                
                                
                                    1
                                
                            
                            ,
                            …
                            ,
                            
                                
                                    
                                        
                                            θ
                                        
                                        ^
                                    
                                
                                
                                    N
                                
                            
                        
                     denote the best known hyperparameters for the previously encountered datasets D1,…, DN, respectively. These may originate from an arbitrary source, e.g., a manual search or the application of an SMBO method during an offline training phase.” Since the model configurations are known to be the best “configurations” or “hyperparameters” and are used based on this characteristic in Algorithm 2, Feurer implicitly discloses that there was some “indication” that their results are “meaningful.”]
profiling the test data set associated with the meaningful test case to create a test data profile associated with the test case; [As noted in the rejection of claim 1, in Feurer, the training datasets are represented in the form of meta features, as described on the 3rd page, right column, 1st paragraph: “we precompute the metafeatures for all training datasets D1,…, DN.” The computation of the metafeatures constitutes “profiling the test data set…” as recited in the instant claim.] and
mapping the test data profile associated with the meaningful test case to parameters used to produce the word embedding model that yields meaningful results. [Feurer, 3rd page, first full paragraph teaches: “Let                         
                            
                                
                                    
                                        
                                            θ
                                        
                                        ^
                                    
                                
                                
                                    1
                                
                            
                            ,
                            …
                            ,
                            
                                
                                    
                                        
                                            θ
                                        
                                        ^
                                    
                                
                                
                                    N
                                
                            
                        
                     denote the best known hyperparameters for the previously encountered datasets D1,…, DN, respectively.” Moreover, since the “distance metric d” in Algorithm 2 is based on the metafeatures, Feurer teaches mapping (forming a correspondence) between the models                          
                            
                                
                                    
                                        
                                            θ
                                        
                                        ^
                                    
                                
                                
                                    1
                                
                            
                            ,
                            …
                            ,
                            
                                
                                    
                                        
                                            θ
                                        
                                        ^
                                    
                                
                                
                                    N
                                
                            
                        
                     and the respectively generated metafeatures for their previously encountered datasets D1,…, DN.]

As to claim 6, the combination of Kathalagiri Somashekariah, Feurer, and Bordawekar teaches the computer-implemented method of claim 1, wherein generating a data profile associated with the production data set comprises characterizing the production data set [Feurer, Table 1 (on 4th page) teaches various metafeatures that characterize datasets, and this teaching is applied to the “production data set” as discussed in the rejection of claim 1.] based on one or more of: a number of rows included in the plurality of rows of the relational database; a number of columns included in the plurality of columns of the relational database; [Feurer, Table 1, teaches the feature of “number of features.” Bordawekar, FIGS. 1 and 3 teach features are stored in columns of the database. Therefore, this alternative is taught by the combination of references. Alternatively, Bordawekar teaches that “each table is a collection of rows having the same number of columns.” Since the instant claim feature only requires a characterization that is “based on” the number of columns, this feature is also taught by Bordawekar.] a size of a vocabulary of the relational database, wherein the vocabulary comprises a number of unique words or values; [Feurer, Table 1, teaches the feature of “total # categorical values.” In the context of a word embedding model, as taught in Kathalagiri Somashekariah, the number of categorical values corresponds to a size of the vocabulary. Therefore, this alternative is taught by the combination of references.] for each column of the plurality of columns, a size of vocabulary of the column; for each column of the plurality of columns, a characterization of types of data included in the column, wherein types of data comprise at least word, string and numeric types; a distribution of unique words by column; numeric data clustering methods associated with the production data set; and cluster edge boundary detection associated with the production data set. [Since the instant claim recites an alternate expression denoted by the phrase “one or more of,” the alternate expression is satisfied if any of its items are satisfied. As noted above, the combination of references satisfy at least the alternatives of “a number of columns included in the plurality of columns of the relational database” and “a size of a vocabulary of the relational database, wherein the vocabulary comprises a number of unique words or values.”]

As to claim 7, the combination of Kathalagiri Somashekariah, Feurer, and Bordawekar teaches the computer-implemented method of claim 6, wherein model parameters [It is noted that the claim does not require these “model parameters” to be the among the “one or more production model parameters” that are recommended] comprise one or more of:  a weighting for each column of the plurality of columns, each weighting to be applied to a respective column of the plurality of columns during generation of the new word embedding model; a selection of one or more columns of the plurality of columns to include in the training of the word embedding model; a number of iterations used to in generating a word embedding model using a neural network; a selection of one or more algorithms configured to determine relationships between words for use in generating a word embedding model; [Kathalagiri Somashekariah, [0042]: For example, word embedding model 208 may be a word2vec model that outputs embeddings 214 in a vector space based on groupings of standardized attributes in job histories 212 from data repository 134.] debugging parameters; multi-threading parameters; and input and output file names. [Since the instant claim recites an alternate expression denoted by the phrase “one or more of,” the alternate expression is satisfied if any of its items are satisfied. As noted above, the combination of references satisfy at least the alternative of “a selection of one or more algorithms configured to determine relationships between words for use in generating a word embedding model.”]

As to claim 9, the combination of Kathalagiri Somashekariah, Feurer, and Bordawekar teaches the computer-implemented method of claim 1, wherein generating a recommendation for one or more production model parameters for use in building a word embedding model based on the test data set comprises:
identifying, based on a comparison of the data profile associated with the production data set to test data profiles associated with the plurality of meaningful test cases, a most similar test data profile of the meaningful test cases; [Feurer, 3rd page, Algorithm 2, line 1: “Sort dataset indices π(1),…, π(n) by increasing distance to DN+1.” Thus, the dataset with the shortest distance (i.e., an index of 1) is the “most similar test data profile of the meaningful test cases.” Here, distance is a measure of similarity.] and
selecting one or more model parameters based on the one or more test model parameters associated with the meaningful test case comprising the most similar test data profile. [Feurer, 3rd page, Algorithm 2, line 2: “                        
                            
                                
                                    θ
                                
                                
                                    i
                                
                            
                            ←
                            
                                
                                    
                                        
                                            θ
                                        
                                        ^
                                    
                                
                                
                                    π
                                    (
                                    i
                                    )
                                
                            
                        
                    ”, which states that the model parameters of the most similar training data set, i.e., the parameter corresponding to index 1 (analogous to the “most similar test data profile”), is used to determine the final optimal parameter θ* (the output of line 4).]

As to claim 10, the combination of Kathalagiri Somashekariah, Feurer, and Bordawekar teaches the computer-implemented method of claim 1, wherein generating a recommendation for one or more production model parameters for use in building a word embedding model based on the test data set comprises:
identifying, based on a comparison of the data profile associated with the production data set to test data profiles associated with the plurality of meaningful test cases, a plurality of similar test data profiles of the meaningful test cases, [Feurer, 3rd page, Algorithm 2, line 1: “Sort dataset indices π(1),…, π(n) by increasing distance to DN+1.” Here, the “distance” is a measure of similarity. Thus, the datasets in this part of the algorithm have been identified as having a measured degree of similarity to DN+1 (analogous to the production data set).] wherein each similar test data profile exceeds a threshold level of similarity with the data profile associated with the production data; [An arbitrary subset of the datasets that are the closest (most similar) to DN+1 are considered to exceed a threshold level of similarity, because this threshold level of similarity may be regarded as the next most similar dataset. The instant claim does not require a more particular use of the “threshold level of similarity,” and does not require the test data profiles that do not exceed the threshold level of similarity not to be used.] and
selecting one or more model parameters based on the one or more test model parameters associated with each of the meaningful test cases corresponding to the plurality of similar test data profiles. [Feurer, 3rd page, Algorithm 2, line 2: “                        
                            
                                
                                    θ
                                
                                
                                    i
                                
                            
                            ←
                            
                                
                                    
                                        
                                            θ
                                        
                                        ^
                                    
                                
                                
                                    π
                                    (
                                    i
                                    )
                                
                            
                        
                    ”, which states that the model parameters of the similar test data profile, i.e., the parameters corresponding to a subset of the datasets of indices i through j (analogous to “the plurality of similar test data profiles”), is used to determine the final optimal parameter θ* (the output of line 4).]

As to claims 12-13, these claims are directed to a system for performing operations that are the same or substantially the same as those recited in claims 1-2. Therefore, the rejections made to claims 1-2 are applied to claims 12-13, respectively.
Furthermore, Kathalagiri Somashekariah teaches “a system comprising: a processor communicatively coupled to a memory, the processor configured to…” [[0064]: “Computer system 500 includes a processor 502, memory 504, storage 506, and/or other components found in electronic computing devices.” [0069]: “The methods and processes described in the detailed description section can be embodied as code and/or data, which can be stored in a computer-readable storage medium as described above.”]

As to claims 16-17, these claims are directed to a computer program product for performing operations that are the same or substantially the same as those recited in claims 1-2. Therefore, the rejections made to claims 1-2 are applied to claims 16-17, respectively.
Furthermore, Kathalagiri Somashekariah teaches “a computer program product comprising a computer readable storage medium having program instructions embodied therewith the program instructions executable by a computer processor to cause the computer processor to perform a method comprising…” [[0064]: “Computer system 500 includes a processor 502, memory 504, storage 506, and/or other components found in electronic computing devices.” [0069]: “The methods and processes described in the detailed description section can be embodied as code and/or data, which can be stored in a computer-readable storage medium as described above.”]

2.	Claims 3, 14, and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Kathalagiri Somashekariah in view of Feurer and Bordawekar, and further in view of Walters et al. (US 2020/0012584 A1) (“Walters”).
As to claim 3, the combination of Kathalagiri Somashekariah, Feurer, and Bordawekar teaches the computer-implemented method of claim 2, but does not teach the further limitation that the indication that the word embedding model yields meaningful results “represents a determination that a degree to which results of one or more queries of the word embedding model correspond to expected results of the query exceeds a predetermined threshold.”
Walters, in an analogous art, teaches the above limitation. Walters generally relates to a “fully automated machine learning system which generates and optimizes solutions given a dataset and a desired outcome” (see title). Therefore, Walters is in the same general field of endeavor as the claimed invention, namely artificial intelligence models. 
In particular, Walters teaches “represents a determination that a degree to which results of one or more queries of the word embedding model correspond to expected results of the query” [[0125]: “Model optimizer 1303 can be configured to evaluate performance criteria of a newly created synthetic data model.” [0126]: “In various embodiments, the performance criteria can include prediction metrics. The prediction metrics can enable a user to determine whether data models perform similarly for both synthetic and actual data. The prediction metrics can include a prediction accuracy check, a prediction accuracy cross check, a regression check, a regression cross check, and a principal component analysis check. In some aspects, a prediction accuracy check can determine the accuracy of predictions made by a model (e.g., recurrent neural network, kernel density estimator, or the like) given a dataset.” Note that the description of “given a dataset” indicates that the model is used with an input, i.e., “one or more queries of the word embedding model,” and that the notion of “accuracy” implies there existence of an “expected results of the query.” It is further noted that the specific limitation of “word embedding model” is already taught by the other references, and that Walters has been relied upon for techniques that are applicable to machine learning models in general.]  “exceeds a predetermined threshold.” [[0127]: “Model optimizer 1303 can be configured to store the newly created synthetic data model and metadata for the new synthetic data model in model storage 1305 based on the evaluated performance criteria, consistent with disclosed embodiments. For example, model optimizer 1303 can be configured to store the metadata and new data model in model storage when a value of a similarity metric or a prediction metric satisfies a predetermined threshold.” The description of “satisfies a predetermined threshold” implicitly includes the situation of “exceeding” that threshold.]
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Kathalagiri Somashekariah, Feurer, and Bordawekar with the teachings of Walters by modifying the thus-far combination of references to implement the feature the indication that the word embedding model yields meaningful results “represents a determination that a degree to which results of one or more queries of the word embedding model correspond to expected results of the query exceeds a predetermined threshold.” The motivation would have been to maintain a library of reusable created in a manner that accounts for accuracy, as suggested by Walters, [0005] (“There is also a need to create a model library to meet a variety of analysis needs. Models trained on the same or similar data can differ in predictive accuracy or the output that they generate. By training an original, template model with differing hyperparameters, trained models with differing degrees of accuracy or differing outputs can be generated for use in an application. The model with the desired degree of accuracy can be selected for use in the application. Furthermore, development of high-performance models can be enhanced through model re-use.”). 

As to claim 14, the further limitations recited in this claims are the same or substantially the same as those recited in claim 3. Therefore, the rejection made to claim 3 is applied to claim 14. 

As to claim 18, the further limitations recited in this claim are the same or substantially the same as those recited in claim 3. Therefore, the rejection made to claim 3 is applied to claim 18. 

3.	Claim 4 is rejected under 35 U.S.C. 103 as being unpatentable over Kathalagiri Somashekariah in view of Feurer and Bordawekar, and further in view of Koch et al. (US 2018/0240041 A1) (“Koch”).
As to claim 4, the combination of Kathalagiri Somashekariah, Feurer, and Bordawekar teaches the computer-implemented method of claim 1, but does not teach the method further comprising “outputting the recommendation for one or more production model parameters for use in building a word embedding model based on the production data set for display.”
Koch, in an analogous art, teaches the above limitations. Koch teaches a “hyperparameter tuning system for machine learning” (see title). Thus, Koch in the same field of endeavor as the claimed invention, namely artificial intelligence, and is reasonably pertinent to the problem of parameter selection.
In particular, Koch teaches “outputting the recommendation for one or more production model parameters for use in building a word embedding model based on the production data set for display.” [[0147]: “one or more of the output tables may be presented on display 216 when the tuning process is complete. As another option, display 216 may present a statement indicating that the tuning process is complete.” [0148]: “For example, the user can select the hyperparameters included in the ‘Best Configuration’ output table.” [0236]: “Referring to FIG. 13, a best configuration table 1300 captures the hyperparameter configuration for the final model configuration defined by evaluation number 2551.” That is, the best hyperparmaters are presented to the user on a display, and the user may select them. Note that the “best configuration” is analogous to the “recommendation for one or more production model parameters” in the instant claim.]
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Kathalagiri Somashekariah, Feurer, and Bordawekar with the teachings of Koch by implementing the further operation of “outputting the recommendation for one or more production model parameters for use in building a word embedding model based on the production data set for display.” The motivation would have been to present the recommended model parameters to a user, as suggested by Koch ([0067]: “A user can interact with one or more user interface windows presented to the user in a display under control of model tuning application 222” and parts quoted above).

4.	Claim 5, 15, and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Kathalagiri Somashekariah in view of Feurer and Bordawekar, and further in view of Deo et al. (US 2019/0147371 A1) (“Deo”).
As to claim 5, the combination of Kathalagiri Somashekariah, Feurer, and Bordawekar teaches the computer-implemented method of claim 1, further comprising, […] initiating the training of a word embedding model based on the production data set [Kathalagiri Somashekariah, [0043]: “model-creation apparatus 210 may train word embedding model 208 so that standardized attributes that are shared by a relatively large proportion of job histories 212 are closer to one another in the vector space than standardized attributes that are shared by a smaller proportion of job histories 212.”] and the recommendation for one or more production parameters for use in building the word embedding model. [The “recommendation” is taught by Feurer for the reasons discussed in the rejection of claim 1, above.]
The combination of Kathalagiri Somashekariah, Feurer, and Bordawekar does not explicitly teach the limitation that the initiating is performed “automatically.”
Deo, in an analogous art, teaches the above limitation. Deo pertains to “training, validating, and monitoring artificial intelligence and machine learning models” (title) and is therefore in the same field of endeavor as the claimed invention.
In particular, Deo teaches initiating training “automatically” [[0055]: “several different stages of the process for training, validating, and monitoring artificial intelligence and machine learning models are automated… a technique that automatically trains, validates, and monitors artificial intelligence and machine learning models.” [0083]: “process 400 may include training the model with the unbiased training data to generate a plurality of trained models.” Note that [0080] teaches “process 400 may include receiving a model and data for the model,” where “the model” is analogous to the “recommendation for one or more production parameters.”]
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Kathalagiri Somashekariah, Feurer, and Bordawekar and the teachings of Deo by implementing the operation of initiating the training to be performed “automatically,” in order to remove human subjectivity from the process and to achieve speed and efficiency, as suggested by Deo ([0055]: “automated, which may remove human subjectivity and waste from the process, and which may improve speed and efficiency of the process and conserve computing resources”).

As to claim 15, the further limitations recited in this claim are the same or substantially the same as those recited in claim 5. Therefore, the rejection made to claim 5 is applied to claim 15. 

As to claim 19, the further limitations recited in this claim are the same or substantially the same as those recited in claim 5. Therefore, the rejection made to claim 5 is applied to claim 19.

5.	Claim 8 is rejected under 35 U.S.C. 103 as being unpatentable over Kathalagiri Somashekariah in view of Feurer and Bordawekar, and further in view of Bobovich et al. (US 2018/0268584 A1) (“Bobovich”) and Venkataraman et al., “Techniques for effective vocabulary selection,” arXiv:cs/0306022v1 [cs.CL] 4 Jun 2003 (“Venkataraman”).
As to claim 8, the combination of Kathalagiri Somashekariah, Feurer, and Bordawekar teaches the computer-implemented method of claim 7, but does not teach the limitation of wherein generating a recommendation for one or more production model parameters for use in building a word embedding model based on the production data set comprises the further operation of “recommending a weighting for each column of the plurality of columns, the recommendation being determined by: assigning higher relative weightings to columns comprising a large vocabulary size relative to other columns; and assigning lower relative weightings to columns having a small vocabulary size relative to other columns and columns having a high number of null values relative to other columns.”
Bobovich, in an analogous art, teaches the limitation of “recommending a weighting for each column of the plurality of columns.” Bobovich pertains to “weight initialization for machine learning models” (see title), and is therefore in the same field of endeavor as the claimed invention, namely artificial intelligence.
In particular, Bobovich teaches “recommending a weighting for each column of the plurality of columns” [Abstract: “determining, based at least on the respective effectiveness of the first feature and the second feature, a first initial weight for the first feature and a second initial weight for the second feature.” Note that first feature and second feature are analogous to different columns, such as the columns taught in Bordawekar. “Effectiveness” is generally defined as being effectiveness of the overall model. See abstract: “effectiveness of the first feature and the second feature in enabling the convolutional neural network to classify images in the image set.” Additionally, the initial weights depend on the perceived effectiveness, as described in [0005]: “The first initial weight associated with the first feature may be greater than the second initial weight associated with the second feature, when the first quantity is greater than the second quantity.” Bobovich generally teaches that the initialization of the weights is a result-effective variable in [0021]: “The ability of the…neural network to achieve convergence as well as the number of epochs required to achieve convergence may generally depend on the initial weights.”].
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Kathalagiri Somashekariah, Feurer, and Bordawekar with the teachings of Bobovich by modifying the “generating a recommendation” step to include the further operation of “recommending a weighting for each column of the plurality of columns.” The motivation would have been to be initialize weights so as to facilitate the training of a model, as suggested by Bobovich ([0021]: “The ability of the…neural network to achieve convergence as well as the number of epochs required to achieve convergence may generally depend on the initial weights.”).
Venkataraman suggests the remaining limitations. Venkataraman pertains to “techniques for effective vocabulary selection” (title) for the application of “a language model or speech recognition system” (§ 1). Therefore, Venkataraman is in the same field of endeavor as the claimed invention, namely language processing systems.
In particular, Venkataraman suggests “the recommendation being determined by: assigning higher relative weightings to columns comprising a large vocabulary size relative to other columns; and assigning lower relative weightings to columns having a small vocabulary size relative to other columns and columns having a high number of null values relative to other columns.” [§ 1: “The size and performance of a language model or speech recognition system are often strongly influenced by the size of its vocabulary….a large and comprehensive vocabulary may be desirable from the point of view of lexical coverage.” These teachings suggest the instant limitation for the reasons stated below.]    
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Kathalagiri Somashekariah, Feurer, Bordawekar, and Bobovich based the above teachings of Venkataraman, so as to achieve the feature that the recommendation is determined by “assigning higher relative weightings to columns comprising a large vocabulary size relative to other columns; and assigning lower relative weightings to columns having a small vocabulary size relative to other columns and columns having a high number of null values relative to other columns.” The motivation for combining the teachings would have been to account for the factor of vocabulary size, as suggested by Venkataraman (see parts quoted above).
Furthermore, as noted above, Bobovich teaches that the initial weight is a result effective variable. As stated in MPEP § 2144.05(II)(B), “the presence of a known result-effective variable would be one…motivation for a person of ordinary skill in the art to experiment to reach another workable product or process.” Furthermore, Venkataraman teaches that effectiveness correlates with the size of the vocabulary in at least some circumstances. Thus, under the combined teachings of the references, the instant limitation would have been achieved through routine optimization, in light of the principle that “where the general conditions of a claim are disclosed in the prior art, it is not inventive to discover the optimum or workable ranges by routine experimentation.” MPEP § 2144.05(II) (citing In re Aller, 220 F.2d 454, 456, 105 USPQ 233, 235 (CCPA 1955)).

6.	Claim 11 is rejected under 35 U.S.C. 103 as being unpatentable over Kathalagiri Somashekariah in view of Feurer and Bordawekar, and further in view of Meeter et al., (US 2019/0180175 A1) (“Meeter”).
As to claim 11, the combination of Kathalagiri Somashekariah, Feurer, and Bordawekar teaches the computer-implemented method of claim 1, but does not teach the further limitations that “generating a recommendation for one or more production model parameters for use in building a word embedding model based on the production data set” comprises “generating a recommendation for preprocessing of the production data to include one or more of: a numeric data clustering method; and cluster edge boundary detection method to achieve data clustering.”
Meteer, in an analogous art, teaches the above limitation. Meteer generally pertains to a system to analyze “communications from customers,” wherein the system “can use the clusters to train a machine learning classifier” (see abstract). Therefore, Meteer is in the same field of endeavor as the claimed invention, namely natural language processing. In general, Meteer teaches the use of clustering (FIG. 4, step 408) prior to training of the model.
In particular, Meteer teaches “generating a recommendation for preprocessing of the production data to include one or more of: a numeric data clustering method; and cluster edge boundary detection method to achieve data clustering.” [[0079]: “the segments can undergo a clustering phase 408 in which the system automatically clusters the segments based on semantic similarity or relatedness (e.g., LCS, Path Distance Similarity, Lexical Chains, Overlapping Glosses, Vector Pairs, HAL, LSA, LDA, ESA, PMI-IR, Normalized Google Distance, DISCO, variations of one or more of these semantic similarity measures, or other similarity measures quantifying similarity by the meaning of segments).” This clustering is considered to be a numeric method because they involve “quantifying similarity by the meaning of segments.” Moreover, the clustering is considered to be “preprocessing” because it occurs prior to the training that is described in [0083] (“The system can proceed to a modeling phase 412 in which the system generates a machine learning classifier from the set of classifications received at phase 410, such as by using one of the machine learning algorithms”). In regards to the limitation of “generating a recommendation,” since the claim does not recite a specific context of the recommendation (e.g., a recommendation to a user), this limitation is deemed to be met by the act of determining that cluster is to be performed.] 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Kathalagiri Somashekariah, Feurer, and Bordawekar with the teachings of Meteer by modifying the operation of generating the  recommendation to further comprise “generating a recommendation for preprocessing of the production data to include one or more of: a numeric data clustering method; and cluster edge boundary detection method to achieve data clustering,” as taught in Meteer. The motivation would have been to evaluate the similarity of data to identify groups that are similar, as suggested by Meteer (see [0016]: “evaluate the similarity of the segments to identify clusters or groupings of segments that are more similar” and parts quoted above).

7.	Claims 20-22 and 24 are rejected under 35 U.S.C. 103 as being unpatentable over Kathalagiri Somashekariah in view of Bordawekar, Feurer, and Walters.
As to claim 20, Kathalagiri Somashekariah teaches a computer-implemented method comprising:
receiving a data set for use in generation of a word embedding model, [[0042]: “As shown in FIG. 2, a model-creation apparatus 210 may create a word embedding model 208 from attributes in job histories 212.” Note that the “job histories” described here correspond to a “production data set to be used in generating a new world embedding model.”] the data set comprising data stored in a […] database […]; [The “job histories 212” is part of data 216/218 shown in FIG. 2 (as described in [0034]: “Profile data 216 and/or jobs data 218 may further include job histories 212 of members of the online network.”), which in turn is stored in data repository 134, as described in [0036]: “data repository 134 stores data that represents standardized, organized, and/or classified attributes in profile data 216 and/or jobs data 218.” See also [0042], which refers to “standardized attributes in job histories 212 from data repository 134.” Here, data repository 134 corresponds to a “database.” See also [0051], which states that “data repository 134…may be provided by…one or more databases.”] […]
[…] building a word embedding model; [[0042]: “As shown in FIG. 2, a model-creation apparatus 210 may create a word embedding model 208 from attributes in job histories 212.”]
generating, by training a neural network using unsupervised machine learning based on the first data set, a word embedding model […]; [[0042]: “As shown in FIG. 2, a model-creation apparatus 210 may create a word embedding model 208 from attributes in job histories 212.” [043]: “As a result, model-creation apparatus 210 may train word embedding model 208 so that standardized attributes that are shared by a relatively large proportion of job histories 212 are closer to one another in the vector space than standardized attributes that are shared by a smaller proportion of job histories 212.” [0063]: “The word embedding model is then generated based on the groupings of attributes (operation 408) for the members. For example, a word2vec model may be trained using the groupings, so that embeddings produced by the model reflect relationships and/or trends in the members' education and/or job histories.” Note that “word2vec” is well known in the art as an unsupervised neural network model, and that the description of the training of the model (particularly the description “…closer to one another in the vector space…”) indicates an unsupervised machine learning model.] […]
[…] updating the set of meaningful test cases to include a new test case comprising […] model parameters used to create the word embedding model. [[0044]: “After word embedding model 208 is created and/or updated, model-creation apparatus 210 stores parameters of word embedding model 208 in a model repository 236.”]
Kathalagiri Somashekariah does not teach the following:
(1)	The limitation of the database being “a relational database having a plurality of columns and a plurality of rows”;
(2)	The operations of “generating a data profile associated with the data set; generating, based on the data profile and a set of meaningful test cases, a recommendation for one or more model parameters for use in” the building of the word embedding model, and related limitation of the generated embedding model being “based on the recommended one or more model parameters and the data set”;
(3)	The operation of “based on one or more queries of the word embedding model, receiving an indication of a determination of a degree of meaningfulness of query results”, and the related limitations that the updating of the set of meaningful test cases is “responsive to the degree of meaningfulness of query results exceeding a predetermined threshold” and “the data profile” being included in the new test case included in the set of meaningful test cases.  
Bordawekar, in an analogous art, teaches the limitations (1) listed above. Bordawekar teaches a method for “cognitive intelligence queries in relational databases using low-dimensional word embeddings” (see title). Therefore, Bordawekar is in the same field of endeavor as the claimed invention, namely language modeling.
In particular Bordawekar teaches “a relational database having a plurality of columns and a plurality of rows.” [§ 2, paragraph 4: “Relational Databases.” § 3.1 paragraph 3: “A database is a collection of tables. Each table is a collection of rows having the same number of columns.”]
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Kathalagiri Somashekariah with the teachings of Bordawekar by modifying the method of such that the database (data repository) is a relational database having a plurality of columns and a plurality of rows, in order to use a known, suitable form of database that can store various types of data, as suggested by Bordawekar (§ 1, paragraph 1: “relational databases have been used to analyze enterprise datasets that comprise mostly of well-qualified typed entities… relational databases have been increasingly used to store and process free-formed unstructured text data”).
Feurer, in an analogous art, teaches limitations (2) listed above. Feurer relates to “Model selection and hyperparameter optimization,” which it refers to as being “crucial in applying machine learning to a novel dataset” (abstract, first sentence). Therefore, Feurer is in the field of artificial intelligence, and also relates to the problem of obtaining parameters for models. Feurer generally teaches a method that uses previously evaluated model configurations and their corresponding datasets in order to obtain a configuration for a model when a new dataset arises: See “Introduction” section, paragraph 3: “The key concept behind meta-learning for hyperparameter search is to suggest good configurations for a novel dataset based on configurations that are known to perform well on similar, previously evaluated, datasets.” Feurer’s technique can be referred to as “meta-learning initialization” (see Feurer, Algorithm 2 title).
Feurer teaches “generating a data profile associated with the data set” [3rd page, right column, 1st and 2nd paragraphs: “Given a new dataset DN+1, we then measure its distances to all previous datasets Di using a distance measure d… The first measure (denoted as dp) we used is the commonly-used p-norm of the difference between the datasets’ metafeatures…” See also algorithm 2, line 1. Since the previous datasets are represented as metafeatures and are compared with the new dataset using a distance measure, it is understood that the new dataset DN+1 is also represented in the form of metafeatures. The metafeatures representation of the new dataset corresponds to a “data profile associated with the production data set.” Furthermore, Feurer, 3rd page, Algorithm 2 teaches: “Input: …training datasets D1:N = (D1,…, DN); best configurations for training datasets,                         
                            
                                
                                    θ
                                
                                
                                    1
                                    :
                                    N
                                
                            
                            =
                            
                                
                                    
                                        
                                            θ
                                        
                                        ^
                                    
                                
                                
                                    1
                                
                            
                            ,
                            …
                            ,
                            
                                
                                    
                                        
                                            θ
                                        
                                        ^
                                    
                                
                                
                                    N
                                
                            
                        
                    .” The training datasets are represented in the form of meta features, as described on the 3rd page, right column, 1st paragraph: “we precompute the metafeatures for all training datasets D1,…, DN.” The metafeatures characterize properties of the dataset and correspond to “test data profiles” of the test cases. Metafeatures are further described in the section titled “Implemented Metafeatures” on the 3rd page and Table 1 on the 4th page, which lists metafeatures such as the “number of features” and “total # of categorical values.”] and “generating, based on the data profile and a set of meaningful test cases, a recommendation for one or more model parameters for use in” the building of a model [Algorithm 2: “Result: Best hyperparameter configuration θ* found.” The “best hyperparameter configuration” constitutes a “recommendation for one or more production model parameters” because it is a suggestion of a good configuration for a novel dataset. See “Introduction” section, paragraph 3: “The key concept behind meta-learning for hyperparameter search is to suggest good configurations for a novel dataset based on configurations that are known to perform well on similar, previously evaluated, datasets.” Moreover as shown in Algorithm 2, the best configuration is “based on the data profile associated with the production data set and a set of meaningful test cases” because it is determined based on the distance metrics, which measure the distance between metafeatures (i.e., data profiles). Further in regards to the “set of meaningful test cases,” the input of “best configurations for training datasets,                         
                            
                                
                                    θ
                                
                                
                                    1
                                    :
                                    N
                                
                            
                            =
                            
                                
                                    
                                        
                                            θ
                                        
                                        ^
                                    
                                
                                
                                    1
                                
                            
                            ,
                            …
                            ,
                            
                                
                                    
                                        
                                            θ
                                        
                                        ^
                                    
                                
                                
                                    N
                                
                            
                        
                    ” corresponds to parameters used to create a model. The limitation of “meaningful” is met because these configurations (i.e., parameters for models) have been evaluated to be the “best configurations” for their respective dataset. See also 3rd page, first full paragraph: “Let                         
                            
                                
                                    
                                        
                                            θ
                                        
                                        ^
                                    
                                
                                
                                    1
                                
                            
                            ,
                            …
                            ,
                            
                                
                                    
                                        
                                            θ
                                        
                                        ^
                                    
                                
                                
                                    N
                                
                            
                        
                     denote the best known hyperparameters for the previously encountered datasets D1,…, DN, respectively. These may originate from an arbitrary source, e.g., a manual search or the application of an SMBO method during an offline training phase.”]
Feurer further suggests a generated model being “based on the recommended one or more model parameters and the data set” [Abstract: “In this paper we mimic a strategy human domain experts use: speed up optimization by starting from promising configurations that performed well on similar datasets.” See also “Introduction” section, paragraph 3: “The key concept behind meta-learning for hyperparameter search is to suggest good configurations for a novel dataset based on configurations that are known to perform well on similar, previously evaluated, datasets.” That is, Feurer the hyperparameters obtained in its method are suitable for further use as part of a model.] 
Walters, in an analogous art, teaches the remaining limitations (3) listed above. Walters generally relates to a “fully automated machine learning system which generates and optimizes solutions given a dataset and a desired outcome” (see title). Therefore, Walters is in the same general field of endeavor as the claimed invention, namely artificial intelligence models. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Kathalagiri Somashekariah and Bordawekar with the teachings of Feurer by modifying Kathalagiri Somashekariah (as already modified by Bordawekar) to include Feurer’s technique of metal-learning initialization (as generally shown in Algorithm 2 of Feurer), so as to implement the operations of “generating a data profile associated with the data set; generating, based on the data profile and a set of meaningful test cases, a recommendation for one or more model parameters for use in” the building of the word embedding model, and related feature that the generated embedding model is “based on the recommended one or more model parameters and the data set.” The motivation would have been to implement a technique that obtains “good configurations for a novel dataset based on configurations that are known to perform well on similar, previously evaluated, datasets” (Feurer, “Introduction” section, paragraph 3).
	In particular, Walters teaches “based on one or more queries of the word embedding model, receiving an indication of a determination of a degree of meaningfulness of query results” [[0125]: “Model optimizer 1303 can be configured to evaluate performance criteria of a newly created synthetic data model.” [0126]: “In various embodiments, the performance criteria can include prediction metrics. The prediction metrics can enable a user to determine whether data models perform similarly for both synthetic and actual data. The prediction metrics can include a prediction accuracy check, a prediction accuracy cross check, a regression check, a regression cross check, and a principal component analysis check. In some aspects, a prediction accuracy check can determine the accuracy of predictions made by a model (e.g., recurrent neural network, kernel density estimator, or the like) given a dataset.” Note that the description of “given a dataset” indicates that the model is used with an input, i.e., “one or more queries of the word embedding model.” It is further noted that the specific limitation of “word embedding model” is already taught by the other references, and that Walters has been relied upon for techniques that are applicable to machine learning models in general.] and the updating of the set of meaningful test cases being “responsive to the degree of meaningfulness of query results exceeding a predetermined threshold” [[0127]: “Model optimizer 1303 can be configured to store the newly created synthetic data model and metadata for the new synthetic data model in model storage 1305 based on the evaluated performance criteria, consistent with disclosed embodiments. For example, model optimizer 1303 can be configured to store the metadata and new data model in model storage when a value of a similarity metric or a prediction metric satisfies a predetermined threshold.” The description of “satisfies a predetermined threshold” implicitly includes the situation of “exceeding” that threshold.], and the new test case included in the set further comprising “the data profile.” [[0127]: “the metadata can include an indication of the origin of the new synthetic data model, the data used to generate the new synthetic data model.”]  
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Kathalagiri Somashekariah, Bordawekar, and Feurer with the teachings of Walters by modifying the thus-far combination of references to include the further operation of “based on one or more queries of the word embedding model, receiving an indication of a determination of a degree of meaningfulness of query results”, and such that the updating of the set of meaningful test cases is “responsive to the degree of meaningfulness of query results exceeding a predetermined threshold” and “the data profile” being included in the new test case included in the set of meaningful test cases. The motivation would have been to obtain a library of reusable created in a manner that accounts for accuracy, as suggested by Walters, [0005] (“There is also a need to create a model library to meet a variety of analysis needs. Models trained on the same or similar data can differ in predictive accuracy or the output that they generate. By training an original, template model with differing hyperparameters, trained models with differing degrees of accuracy or differing outputs can be generated for use in an application. The model with the desired degree of accuracy can be selected for use in the application. Furthermore, development of high-performance models can be enhanced through model re-use.”). 

As to claim 21, the combination of Kathalagiri Somashekariah, Bordawekar, Feurer, and Walters teaches the computer-implemented method of claim 20, wherein generating a data profile associated with the data set comprises characterizing the data set [Feurer, Table 1 (on 4th page) teaches various metafeatures that characterize datasets, and this teaching is applied to the “production data set” as discussed in the rejection of claim 1.] based on one or more of: a number of rows included in the plurality of rows of the relational database; a number of columns included in the plurality of columns of the relational database; [Feurer, Table 1, teaches the feature of “number of features.” Bordawekar, FIGS. 1 and 3 teach features are stored in columns of the database. Therefore, this alternative is taught by the combination of references. Alternatively, Bordawekar teaches that “each table is a collection of rows having the same number of columns.” Since the instant claim feature only requires a characterization that is “based on” the number of columns, this feature is also taught by Bordawekar.] a size of a vocabulary of the relational database, wherein the vocabulary comprises a number of unique words or values; [Feurer, Table 1, teaches the feature of “total # categorical values.” In the context of a word embedding model, as taught in Kathalagiri Somashekariah, the number of categorical values corresponds to a size of the vocabulary. Therefore, this alternative is taught by the combination of references.] for each column of the plurality of columns, a size of vocabulary of the column; for each column of the plurality of columns, a characterization of types of data included in the column, wherein types of data comprise at least word, string and numeric types; a distribution of unique words by column; numeric data clustering methods associated with the data set; and cluster edge boundary detection associated with the data set. [Since the instant claim recites an alternate expression denoted by the phrase “one or more of,” the alternate expression is satisfied if any of its items are satisfied. As noted above, the combination of references satisfy at least the alternatives of “a number of columns included in the plurality of columns of the relational database” and “a size of a vocabulary of the relational database, wherein the vocabulary comprises a number of unique words or values.”]

As to claim 22, the combination of Kathalagiri Somashekariah, Bordawekar, Feurer, and Walters teaches the computer-implemented method of claim 20, wherein the set of meaningful test cases comprise a plurality of known meaningful test cases, each known meaningful test case being associated with respective test data and respective selected parameters, each known meaningful test case being classified as yielding meaningful results [As noted above, Feurer teaches“best configurations for training datasets,                         
                            
                                
                                    θ
                                
                                
                                    1
                                    :
                                    N
                                
                            
                            =
                            
                                
                                    
                                        
                                            θ
                                        
                                        ^
                                    
                                
                                
                                    1
                                
                            
                            ,
                            …
                            ,
                            
                                
                                    
                                        
                                            θ
                                        
                                        ^
                                    
                                
                                
                                    N
                                
                            
                        
                    ” corresponds to parameters used to create a model. The limitation of “…classified as yielding meaning results” is met because these configurations (i.e., parameters for models) have been evaluated to be the “best configurations” for their respective dataset. See also 3rd page, first full paragraph: “Let                         
                            
                                
                                    
                                        
                                            θ
                                        
                                        ^
                                    
                                
                                
                                    1
                                
                            
                            ,
                            …
                            ,
                            
                                
                                    
                                        
                                            θ
                                        
                                        ^
                                    
                                
                                
                                    N
                                
                            
                        
                     denote the best known hyperparameters for the previously encountered datasets D1,…, DN, respectively. These may originate from an arbitrary source, e.g., a manual search or the application of an SMBO method during an offline training phase.”] in response to creating a word embedding model by training a neural network using unsupervised machine learning based on the respective test data and the respective selected parameters. [Kathalagiri Somashekariah teaches that its models are “word2vec”, as discussed in the rejection of claim 1 above. This model is an unsupervised neural network model, and each model is trained based on a dataset and set of parameters.] 

As to claim 24, this claim is directed to a system for performing operations that are the same or substantially the same as those recited in claim 20. Therefore, the rejection made to claim 20 is applied to claim 24.
Furthermore, Kathalagiri Somashekariah teaches “a system comprising: a processor communicatively coupled to a memory, the processor configured to…” [[0064]: “Computer system 500 includes a processor 502, memory 504, storage 506, and/or other components found in electronic computing devices.” [0069]: “The methods and processes described in the detailed description section can be embodied as code and/or data, which can be stored in a computer-readable storage medium as described above.”]

8.	Claims 23 and 25 are rejected under 35 U.S.C. 103 as being unpatentable over Kathalagiri Somashekariah in view of Bordawekar, Feurer, and Walters, and further in view of Zhang et al. (US 2018/0211260 A1) (“Zhang”).
As to claim 23, the combination of Kathalagiri Somashekariah, Bordawekar, Feurer, and Walters teaches the computer-implemented method of claim 20, but does not teach the further limitation that the operation of receiving an indication of a determination of a degree of meaningfulness of query results comprises “receiving a user input representing the degree of meaningfulness of query results.”
	Zhang, in an analogous art, teaches the above limitation. Zhang generally pertains to application of machine learning models. See [0045], which states that the model “may be generated using an artificial neural network, Naïve Bayes classifier, Bayesian network, clustering technique, logistic regression technique, decision tree, and/or other type of machine learning model or technique.” Therefore, Zhang is in the same general field of endeavor as the claimed invention, namely machine learning models.
	In particular, Zhang teaches “receiving a user input representing the degree of meaningfulness of query results.” [[0046]: “analysis apparatus 202, validation apparatus 204, and/or other components of the classification system may obtain assigned categories in the training data and/or verify classified categories from statistical model 206 using feedback from multiple users or domain experts. In turn, the feedback may be used to generate a “vote” on the quality of the training data and/or statistical model output and allow the classification system to track the quality of the training data and/or statistical model output over time.”]
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Kathalagiri Somashekariah, Bordawekar, Feurer, and Walters with the teachings of Zhang by modifying the operation of receiving an indication of a determination of a degree of meaningfulness of query results to comprise “receiving a user input representing the degree of meaningfulness of query results.” The motivation would have been to utilize user input to assess the quality of the model’s output, as suggested by Zhang ([0046]: “the feedback may be used to generate a “vote” on the quality of the training data and/or statistical model output and allow the classification system to track the quality of the training data and/or statistical model output over time”).

As to claim 25, the further limitations recited in this claims are the same or substantially the same as those recited in claim 23. Therefore, the rejection made to claim 23 is applied to claim 25. 

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. The following documents depict the state of the art.
Kao et al. (US 2019/0286734 A1) teaches that “Suitable examples of unsupervised machine learning algorithms include neural language models such as Word2vec, although other unsupervised machine learning algorithms are also suitable” ([0041]), evidencing that Word2vec is an unsupervised neural network model.
Agea et al. (US 2020/0162484 A1) teaches: “The trained network is a combination of simple CNNs with one layer on top of word vectors obtained using Word2Vec , an unsupervised neural language model” ([0114]). This reference also shows that Word2vec is an unsupervised neural network model.
Dasgupta (US 10719301 B1) teaches the use of a model repository.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to YAO DAVID HUANG whose telephone number is (571)270-1764. The examiner can normally be reached Monday - Friday 9:00 am - 5:30 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Miranda Huang can be reached on (571) 270-7092. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/Y.D.H./Examiner, Art Unit 2124                                                                                                                                                                                                        

/MIRANDA M HUANG/Supervisory Patent Examiner, Art Unit 2124