DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendments
The action is responsive to the Applicant’s Amendment filed on 8/12/2021. Claims 1-20 are pending in the application. Claims 1-4, 12, and 17-18 are amended.
Applicant’s amendments to the claims have overcome each and every objection previously set forth in the Non-Final Office Action mailed on 7/06/2021.

Response to Arguments
Applicant’s arguments with respect to the rejections previously made and the amended claims filed on 8/12/2021 have been fully considered but they are not persuasive. In view of the claim amendments, the rejections are being updated accordingly.  
In regards to independent claim 1, Applicant argued that cited reference Brueckner in view of Anderson does not teach the following: (1) determining ... the number of classes is greater that a threshold number of instances and converting ... responsive to the determining, the categorical data into numerical data using natural language processing; (2) generating ... data having a plurality of latent classes by clustering the vector representations of the numerical data into a number of clusters having semantically similar said vector representations, the number of clusters is smaller than the number of classes in the categorical data; (3) responsive to 
Examiner respectfully disagrees with the above arguments.
In response to the arguments, it is submitted the cited limitations are being properly addressed by Brueckner in view of Anderson based at least on Brueckner in view of Anderson disclosing the following:
First, in response to the argument that Brueckner in view of Anderson does not teach determining ... the number of classes is greater that a threshold number of instances and converting ... responsive to the determining, the categorical data into numerical data using natural language processing, Anderson teaches in Fig. 1 and in paragraph [0054], “FIG. 2 outlines the process by which content vectors for merchants are built and used to form Merchant Clusters. This process is engaged in the Category Reduction Algorithm 112.” Anderson also teaches in paragraph [0087] when the number of classes is greater that a threshold number, the number of classes is further reduced to a smaller subset. As indicated in Fig. 1, Anderson teaches the categorical data is converted into numerical data in element 104.
Second, in response to the argument that Brueckner in view of Anderson does not teach generating ... data having a plurality of latent classes by clustering the vector representations of the numerical data into a number of clusters having semantically similar said vector representations, Anderson teaches generating data having a plurality of latent classes by clustering the vector representations in paragraph [0054] ,“FIG. 2 outlines the process by which 
Third, in response to the argument that Brueckner in view of Anderson does not teach, responsive to determining that the number of classes in the categorical data is less than a threshold number of instances, generating training data by converting the categorical data into numerical data using natural language processing to form the training data, Anderson teaches this limitation in Fig. 1 in the Category Reduction Algorithm 112, and in Fig. 4 and paragraph [0044], “The model development component 401 makes use of the Category Reduction Algorithm 112 and the enumeration algorithms 110 to prepare the past data 404 for use in training the Statistical Model 116”.
Fourth, in response to the argument that Brueckner in view of Anderson does not teach, responsive to determining that the number of classes in the categorical data is greater than the threshold number of instances, generating the training data, Anderson teaches this limitation in Fig.1, Fig. 4, and in paragraph [0044] as noted above.
Thus, for at least the reasons as set forth above, it is submitted that Brueckner in view of Anderson discloses the limitations as recited in amended claims 1 and 12.
In regards to independent claim 12, and 18, the emphasized limitations that the Applicant argues in claim 12 and 18 are similar to the emphasized limitations of claim 1, which have been addressed above. See the response of claim 1 above for explanation.

Furthermore, it is also submitted that all limitations in pending claims, including those not specifically argued, are properly addressed. The reason is set forth in the rejections. See claim analysis below for detail.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-2, 5, 10-14, and 17-20  are rejected under 35 U.S.C. 103 as being unpatentable over Brueckner et al. (U.S. Patent No. 10,318,882 B2), hereinafter Brueckner, in view of Anderson et al. (Publication No. US 20090234683 A1), hereinafter Anderson.

With respect to claim 1,  Brueckner discloses in a digital medium environment to improve operation of a computing device to perform machine learning using categorical data ([Col. 3, lines 33-36]: FIG. 26 illustrates an example of an iterative procedure that may be used to improve the quality of predictions made by a machine learning model), a method implemented by the computing device ([Col. 74, lines 22-23]: FIG. 46,  illustrates such a general-purpose computing device 9000), the method comprising: 
receiving, by the computing device, categorical data that includes a categorical variable having a non-numerical data type having a number of classes ([Col. 10, lines 44-48]: The input data may comprise data records that include variables of any of a variety of data types, such as… a categorical data type; [Col. 54, lines 52-56]: Any given independent variable as well as the dependent variable may take on any number of different values [number of classes], and may be of any desired data type such as… categorical).
However, Brueckner does not explicitly teach “determining, by the computing device, the number of classes of the categorical data is greater than a threshold number of instances; converting, by the computing device, responsive to the determining, the categorical data into numerical data as vector representations using natural language processing; generating, by the computing device, data having a plurality of latent classes by clustering the vector representations of the numerical data into a number of clusters having semantically similar said vector representations, the number of clusters is smaller than the number of classes in the categorical data; processing, by the computing device, the generated data having the plurality of latent classes by a model using machine learning; and outputting, by the computing device, a result of the processing of the latent classes by the model using machine learning.”
On the other hand, in the same field of endeavor, Anderson teaches 
(Fig. 1, Category Reduction Algorithm 112; [0087]: The resulting list of unique merchant names may still yield several millions Ums.... So threshold amount may be used to select a subset of the UMs to further reduce the categorical content of the merchant names data; [0111]: For example, merchant clusters may be eliminated where more than a threshold amount of the merchants therein (e.g., 90%) are within the same zip code prefix (3 digits));
converting, by the computing device responsive to the determining, the categorical data into numerical data as vector representations using natural language processing (Fig. 1, Numerical Data 104; [0043]: The Low-Categorical data fields 106 are converted to Numerical Data 104 using one of the many well-known enumeration algorithms 110; [0089]: Further, to the extent that natural language processing (NLP) techniques may be used to perform the data dimensionality reduction of credit/transactional data to a form using for input to a detection model, these methods and systems come within the scope of the invention; Fig. 41, [Col. 67, line 8]: vector representation 5180);
generating, by the computing device, data having a plurality of latent classes (Fig. 2, step 218, Output: A cluster-ID for each UM) by clustering the vector representations of the numerical data into a number of clusters having semantically similar said vector representations ([0054]: FIG. 2 outlines the process by which content vectors for merchants are built and used to form Merchant Clusters; [0018]: The cardholder vectors are derived from the co-occurrence statistics of merchant names in the credit data files, such that cardholders who frequent a similar group of merchants form a cluster [semantic similarity]),
([0061]: For each merchant name (UM), there can then be output a Cluster ID identifying the merchant cluster to which the merchant belongs. The merchant cluster Ids are Low Categorical data because they are relatively limited in number (e.g. typically 50 to 400) relative to the number of unique merchant names (typically in the millions) [the merchant names correspond to the number of classes in the categorical data]);
processing, by the computing device, the generated data having the plurality of latent classes by a model using machine learning (Fig.1; [0043]: The post-conversion, all-numeric data is supplied to… the ultimate Statistical Model 116. The Statistical Model 116 scores the transaction(s), providing a transaction score indicative of the level of risk in the transaction (e.g., a score indicative of the likelihood of fraud); and 
outputting, by the computing device, a result of the processing of the latent classes by the model using machine learning (Fig. 1, Output 120; [0127]: Taking the affinity measure, the Statistical Model 116, in conjunction with any other desired information about the cardholder (e.g. from the profile 406), scores the transaction, and outputs this score).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the method of Brueckner with the teachings of Anderson to include “determining, by the computing device, the number of classes of the categorical data is greater than a threshold number of instances; converting, by the computing device, responsive to the determining, the categorical data into numerical data as vector representations using natural language processing; generating, by the computing device, data having a plurality of latent classes by clustering the vector representations of the numerical data into a number of clusters having semantically similar said vector representations, the 
The motivation for doing so would be to convert categorical information into numerical information that can be used in mathematical equations, as recognized by Anderson ([Abstract] of Anderson: The post-conversion, all-numeric data is supplied to a Pre-Processing and Profiling Layer 114 that may maintain historical profiles of the individuals identified in the transaction (e.g. profiles of the account holders), and perform various other calculations before forwarding the pre-processed data to the ultimate Statistical Model 116).

With respect to claim 2, the combined teachings of Brueckner and Anderson disclose all of the elements of the current invention as stated above concerning claim 1, and 
Brueckner further discloses wherein the converting includes converting the non-numerical data type into the numerical data as into n-grams (See Brueckner, [Col. 71, lines 44-50]: some model generators may require that categorical input variables be converted into numerical or Boolean variables) as into n-grams vector representations (Fig. 41, [Col. 67, line 8]: vector representation 5180; [Col. 27, lines 4-7]: The recipe may use a number of different transformation functions or methods defined in one or more libraries 1152, such as functions to form Cartesian products of variables, n-grams (for text data)).

With respect to claim 5, the combined teachings of Brueckner and Anderson disclose all of the elements of the current invention as stated above concerning claim 4, and 
(See Brueckner, Fig. 5: n-gram functions 5152 for text, [Col. 83, lines 36-43]: the set of library function definitions comprise one or more of: (a) a quantile bin function, (b) a Cartesian product function, (c) a bi-gram function, (d) an n-gram function [bi-grams and tri-grams are examples of n-grams]). 

With respect to claim 10, the combined teachings of Brueckner and Anderson disclose all of the elements of the current invention as stated above concerning claim 1, and 
Brueckner further discloses parsing the received categorical data and the converting is based on the parsed categorical data (See Brueckner, FIG. 13, [Col. 30, lines 51-55]: tools such as ANTLR may generate a parser than can build an abstract syntax tree from a text version of a recipe, and the abstract syntax tree may then be converted into a processing plan by the MLS control plane).


With respect to claim 11, the combined teachings of Brueckner and Anderson disclose all of the elements of the current invention as stated above concerning claim 10, and 
Brueckner further discloses wherein the parsing includes removing characters from the received categorical data that include punctuation and stop words (See Brueckner, Fig. 16, [Col. 29, lines 59-60]: punctuation is removed (via the “nopunct” function); ([Col. 30, line 65]: Automated Parameter Tuning for Recipe Transformations; [Col 32, lines 1-2]: removing sparse or infrequent words from documents being analyzed).

With respect to claim 12, Brueckner discloses in a digital medium environment ([Col. 1, lines 65-66]: FIG. 1 illustrates an example system environment) to improve operation of a computing device to enable the computing device to perform machine learning using categorical data ([Col. 3, lines 33-35]: FIG. 26 illustrates an example of an iterative procedure that may be used to improve the quality of predictions made by a machine learning model), the computing device comprising: 
a processing system ([Col. 74, line 34]: Processors 9010); and a computer-readable storage medium having instructions stored thereon ([Col. 74, line 45]: System memory 9020 may be configured to store instructions) that, responsive to execution by the processing system configures the processing system to execute instructions to perform operations (Processors 9010 may be any suitable processors capable of executing instructions) comprising:
receiving categorical data that includes a categorical variable having a non-numerical data type having a number of classes ([Col. 10, lines 44-48]: The input data may comprise… a categorical data type; [Col. 54, lines 52-56]: Any given independent variable as well as the dependent variable may take on any number of different values, and may be of any desired data type such as numerical, categorical, Boolean, character, and so on), 
However, Brueckner does not explicitly teach “responsive to determining that the number of classes in the categorical data is less than a threshold number of instances, generating training data by converting the categorical data into numerical data using natural language processing o form the training data; responsive to determining that the number of classes in the categorical data is greater than the threshold number of instances, generating the training data by: converting the categorical data into numerical data using natural language processing, and generating a plurality of latent classes by clustering the numerical data into a number of clusters that is 
On the other hand, in the same field of endeavor, Anderson teaches
responsive to determining that the number of classes in the categorical data is less than a threshold number of instances (Fig. 1, Category Reduction Algorithm 112), generating training data by converting the categorical data into numerical data using natural language processing to form the training data using natural language processing (Fig. 1, Numerical Data 104; [0043]: The Low-Categorical data fields 106 are converted to Numerical Data 104 using one of the many well-known enumeration algorithms 110; [0089]: Further, to the extent that natural language processing (NLP) techniques may be used to perform the data dimensionality reduction of credit/transactional data to a form using for input to a detection model, these methods and systems come within the scope of the invention; Fig. 41, [Col. 67, line 8]: vector representation 5180); 
responsive to determining that the number of classes in the categorical data is greater than the threshold number of instances (Fig. 1, Category Reduction Algorithm 112, Fig. 2; [0087]: The resulting list of unique merchant names may still yield several millions Ums.... So threshold amount may be used to select a subset of the UMs to further reduce the categorical content of the merchant names data; [0111]: For example, merchant clusters may be eliminated where more than a threshold amount of the merchants therein (e.g., 90%) are within the same zip code prefix (3 digits)), 
generating the training data by: converting the categorical data into numerical data using natural language processing (Fig. 1, Numerical Data 104; [0043]: The Low-Categorical data fields 106 are converted to Numerical Data 104 using one of the many well-known enumeration algorithms 110; [0089]: Further, to the extent that natural language processing (NLP) techniques may be used to perform the data dimensionality reduction of credit/transactional data to a form using for input to a detection model, these methods and systems come within the scope of the invention), and 
generating a plurality of latent classes by clustering the numerical data into a number of clusters (Fig. 2, step 202-Identify Unique Merchants (Ums) [the unique merchants correspond to the latent classes]; [0054]: FIG. 2 outlines the process by which content vectors for merchants are built and used to form Merchant Clusters) 
that is smaller than the number of classes to form the training data (Fig. 2; [0061]: For each merchant name (UM), there can then be output a Cluster ID identifying the merchant cluster to which the merchant belongs. The merchant cluster Ids are Low Categorical data because they are relatively limited in number (e.g. typically 50 to 400) relative to the number of unique merchant names (typically in the millions) [the merchant names correspond to the number of classes in the categorical data]); 
training a model using machine learning based on the training data (Fig. 4; [0044]: The model development component 401 makes use of the Category Reduction Algorithm 112 and the enumeration algorithms 110 to prepare the past data 404 for use in training the Statistical Model 116), and
processing subsequent categorical data using the trained model (Fig.1; [0043]: The post-conversion, all-numeric data is supplied to… the ultimate Statistical Model 116. The Statistical Model 116 scores the transaction(s), providing a transaction score indicative of the level of risk in the transaction (e.g., a score indicative of the likelihood of fraud)).

The motivation for doing so would be to convert categorical information into numerical information that can be used in mathematical equations, as recognized by Anderson ([Abstract] of Anderson: The post-conversion, all-numeric data is supplied to a Pre-Processing and Profiling Layer 114 that may maintain historical profiles of the individuals identified in the transaction (e.g. profiles of the account holders), and perform various other calculations before forwarding the pre-processed data to the ultimate Statistical Model 116).


With respect to claim 13, the combined teachings of Brueckner and Anderson disclose all of the elements of the current invention as stated above concerning claim 12, and
Brueckner further discloses the operations further comprising parsing the categorical data to remove characters that do not contribute to the clustering (See Brueckner, Fig. 12; [Col. 29, lines 51-59]: In the example output section 1210… A term-frequency-inverse document frequency (tfidf) statistic is obtained for the variables included in the LONGTEXT group, after punctuation is removed (via the “nopunct” function); [Col. 30, lines 44-47]: FIG. 13 illustrates an example grammar that may be used to define acceptable recipe syntax, according to at least some embodiments. The grammar shown may be formatted in accordance with the requirements of a parser generator).

With respect to claim 14, the combined teachings of Brueckner and Anderson disclose all of the elements of the current invention as stated above concerning claim 12, and 
	Brueckner further discloses wherein the numerical data is configured as vector representations (See Brueckner, Fig. 41, vector representation 5180; [Col. 32, lines 34-38]: Automated parameter exploration may also be used for selection dimensionality values for a vector representation of a text document (e.g., in accordance with the Latent Dirichlet Allocation (LDA) technique) or other natural language processing techniques).

With respect to claim 17, the combined teachings of Brueckner and Anderson disclose all of the elements of the current invention as stated above concerning claim 12, and 
	Brueckner further discloses wherein the threshold number of instances has been found to produce results having limited accuracy (See Brueckner, [Col. 23, lines 13]: element 957]: Thus, at least in some scenarios there may be some loss of statistical quality or predictive accuracy as a result of performing filtering at the chunk level rather than the data record level. However, in general the loss of quality/accuracy may be kept within reasonable bounds by choosing chunk sizes appropriately. FIG. 19 illustrates tradeoffs associated with varying the chunk size used for filtering operation sequences on machine learning data sets, according to at least some embodiments [the chunk size is interpreted as the threshold number of instances]).

With respect to claim 18, Brueckner discloses one or more computer readable storage media having instructions stored thereon ([Col. 74, line 45]: System memory 9020 may be configured to store instructions) that, responsive to execution by a processing system (Processors 9010 may be any suitable processors capable of executing instructions), causes the processing system to perform operations comprising: 
receiving categorical data that includes a categorical variable having a non- numerical data type having a number original classes ([Col. 10, lines 44-48]: The input data may comprise… a categorical data type); 
However, Brueckner does not explicitly teach “responsive to determining that the number of classes in the categorical data is less than a threshold number of instances, generating training data by   converting the categorical data into numerical data using natural language processing to form the training data; responsive to determining that the number of classes in the categorical data is greater than the threshold number of instances, generating the training data by: converting the categorical data into numerical data using natural language processing; and generating a plurality of latent classes by clustering the numerical data into a number of clusters that is smaller than the number of classes to form the training data; and training a model using machine learning by processing the training data using additional information in a form of features described in the original classes; training a model using machine learning by processing the training data; and processing subsequent categorical data using the trained model.”
On the other hand, in the same field of endeavor, Anderson teaches
([0087]: The resulting list of unique merchant names may still yield several millions UMs. In some embodiments it may be desirable to further limit this list according to frequency of merchant name, total merchant volume, average transaction size, or other metrics…  So threshold amount may be used to select a subset of the UMs to further reduce the categorical content of the merchant names data), 
generating training data by  converting the categorical data into numerical data using natural language processing to form the training data (Fig. 1, Numerical Data 104; [0043]: The Low-Categorical data fields 106 are converted to Numerical Data 104 using one of the many well-known enumeration algorithms 110) using natural language processing ([0089]: Further, to the extent that natural language processing (NLP) techniques may be used to perform the data dimensionality reduction of credit/transactional data to a form using for input to a detection model, these methods and systems come within the scope of the invention); 
responsive to determining that the number of original classes in the categorical data is greater than the threshold number of instances ([0111]: For example, merchant clusters may be eliminated where more than a threshold amount of the merchants therein (e.g., 90%) are within the same zip code prefix (3 digits)), generating the training data by: 
converting the categorical data into numerical data using natural language processing (Fig. 1, Numerical Data 104; [0043]: The Low-Categorical data fields 106 are converted to Numerical Data 104 using one of the many well-known enumeration algorithms 110; [0089]: Further, to the extent that natural language processing (NLP) techniques may be used to perform the data dimensionality reduction of credit/transactional data to a form using for input to a detection model, these methods and systems come within the scope of the invention); and 
(Fig. 2, step 202-Identify Unique Merchants (Ums) [the unique merchants correspond to the latent classes]; [0054]: FIG. 2 outlines the process by which content vectors for merchants are built and used to form Merchant Clusters. This process is engaged in the Category Reduction Algorithm 112); 
training a model using machine learning by processing the training data; and processing subsequent categorical data using the trained model. (Fig. 4; [0044]: The model development component 401 makes use of the Category Reduction Algorithm 112 and the enumeration algorithms 110 to prepare the past data 404 for use in training the Statistical Model 116) and
processing subsequent categorical data using the trained model (Fig.1; [0043]: The post-conversion, all-numeric data is supplied to… the ultimate Statistical Model 116. The Statistical Model 116 scores the transaction(s), providing a transaction score indicative of the level of risk in the transaction (e.g., a score indicative of the likelihood of fraud).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the computer readable storage media of Brueckner with the teachings of Anderson to include “responsive to determining that the number of classes in the categorical data is less than a threshold number of instances, generating training data by converting the categorical data into numerical data using natural language processing to form the training data; responsive to determining that the number of classes in the categorical data is greater than the threshold number of instances, generating the training data by: converting the categorical data into numerical data using natural language processing; and generating a plurality of latent classes by clustering the numerical data into a 
The motivation for doing so would be to convert categorical information into numerical information that can be used in mathematical equations, as recognized by Anderson ([Abstract] of Anderson: The post-conversion, all-numeric data is supplied to a Pre-Processing and Profiling Layer 114 that may maintain historical profiles of the individuals identified in the transaction (e.g. profiles of the account holders), and perform various other calculations before forwarding the pre-processed data to the ultimate Statistical Model 116).

With respect to claim 19, the combined teachings of Brueckner and Anderson disclose all of the elements of the current invention as stated above concerning claim 18, and
Brueckner further discloses wherein the converting includes converting the categorical data into the numerical data (See Brueckner, [Col. 71, lines 44-50]: some model generators may require that categorical input variables be converted into numerical or Boolean variables)  as vector representations of the number of classes (Fig. 41, [Col. 67, line 8]: vector representation 5180).

With respect to claim 20, the combined teachings of Brueckner and Anderson disclose all of the elements of the current invention as stated above concerning claim 18, and
(See Brueckner, Fig. 12; [Col. 29, lines 51-59]: In the example output section 1210… A term-frequency-inverse document frequency (tfidf) statistic is obtained for the variables included in the LONGTEXT group, after punctuation is removed (via the “nopunct” function); [Col. 30, lines 44-47]: FIG. 13 illustrates an example grammar that may be used to define acceptable recipe syntax, according to at least some embodiments. The grammar shown may be formatted in accordance with the requirements of a parser generator).

Claims 3-4, and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Brueckner et al. (U.S. Patent No. 10,318,882 B2), hereinafter Brueckner, in view of Anderson et al. (Publication No. US 20090234683 A1), hereinafter Anderson, and in further view of Scholtes (Publication No. US 20140156567 A1).

With respect to claim 3, the combined teachings of Brueckner and Anderson disclose all of the elements of the current invention as stated above concerning claim 1. 
However, the combined teachings of Brueckner and Anderson does not explicitly teach “wherein a single vector representation of the vector representations represents multiple words.”
 On the other hand, in the same field of endeavor, Scholtes teaches wherein a single vector representation of the vector representations represents multiple words ([0092]: FIG. 16 is an illustrative overview 1600 of the bag-of-words approach and creation of feature vectors for machine learning. From this example, it can be seen that very different sentences [multiple words] obtain similar vector representation). 

The motivation for doing so would be to allow for the use of a single vector representation, as recognized by Scholtes ([0062] of Scholtes: the various structural, syntactical and semantic information for the selected document is obtained from the meta data information store. This information is converted into a vector representation).

With respect to claim 4, the combined teachings of Brueckner and Anderson disclose all of the elements of the current invention as stated above concerning claim 1. 
However, the combined teachings of Brueckner and Anderson does not explicitly teach “wherein the converting includes converting a set of strings of alphabetical characters of non-numerical values into the vector representations based on features included in the set of strings of alphabetical text.”
On the other hand, in the same field of endeavor, Scholtes teaches a similar method that includes wherein the converting includes converting a set of strings of alphabetical characters of non-numerical values into the vector representations based on features included in the set of strings of alphabetical text ([0006]: Accordingly, in illustrative aspects of the present invention there is provided… a machine learning module configured to generate a model representation for automatic document classification based on feature vectors built from the normalized and extracted semantic information; [0009]: The extracted information is normalized by using… string-matching algorithms; Fig. 4; [0062]: This information is converted into a vector representation in step 402). 
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Brueckner and Anderson with the teachings of Scholtes to include “wherein the converting includes converting a set of strings of alphabetical characters of non-numerical values into the vector representations based on features included in the set of strings of alphabetical text.” 
The motivation for doing so would be to allow for using any type of known technique to extract information, as recognized by Scholtes ([0050] of Scholtes: Extracted information can be normalized by using any suitable type of known technique).

With respect to claim 15, the combined teachings of Brueckner and Anderson teaches all of the elements of the current invention as stated above concerning claim 2. 
However, the combined computing device of Brueckner and Anderson does not explicitly teach “wherein alphanumeric characters in the categorical data are converted into the vector representations.”
On the other hand, in the same field of endeavor, Scholtes teaches wherein alphanumeric characters in the categorical data are converted into the vector representations ([0062]: FIG. 4 illustrates an automatic classification process 400 of new documents with the machine learning model 400… For example, at step 217, the various structural, syntactical and semantic information for the selected document is obtained from the meta data information store. This information is converted into a vector representation in step 402). 

The motivation for doing so would be to allow for using any type of known technique to extract information, as recognized by Scholtes ([0050] of Scholtes: Extracted information can be normalized by using any suitable type of known technique).

Claims 6-7, and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Brueckner et al. (U.S. Patent No. 10,318,882 B2), hereinafter Brueckner, in view of Anderson et al. (Publication No. US 20090234683 A1), hereinafter Anderson, and in further view of Parandehgheibi et al. (U.S. Patent No. 10,728,119 B2), hereinafter Parandehgheibi.

With respect to claim 6, the combined teachings of Brueckner and Anderson disclose all of the elements of the current invention as stated above concerning claim 1. 
 However, the combined method of Brueckner and Anderson does not explicitly teach “wherein the clustering uses a K-means clustering technique.”
On the other hand, in the same field of endeavor, Parandehgheibi teaches wherein the clustering uses a K-means clustering technique ([Col. 23, lines 10-13]: the network utilizes machine learning (e.g., k-means clustering, EM, DBScan, decision trees, etc.) to analyze the similarities among nodes to determine the optimal clustering).

The motivation for doing so would be to allow for partition clustering, as recognized by Parandehgheibi ([Col. 17, lines 47-48 of Parandehgheibi]: The k-means algorithm is an example of partition clustering).

With respect to claim 7, the combined teachings of Brueckner and Anderson disclose all of the elements of the current invention as stated above concerning claim 1. 
However, the combined teachings of Brueckner and Anderson does not explicitly teach “wherein the clustering uses a Silhouette clustering technique based on a measure of: cohesion indicating how similar numerical values, of the numerical data, are to each other within a respective said cluster; and separation indicating how dissimilar numerical values, of the numerical data, are to at least one other said cluster.”
On the other hand, in the same field of endeavor, Parandehgheibi teaches wherein the clustering uses a Silhouette clustering technique ([Col. 19, lines 58-60]: Weights can be user-specified or automatically obtained from automated cluster evaluations, such as via silhouette scores) based on a measure of: cohesion indicating how similar numerical values, of the numerical data, are to each other within a respective said cluster ([Col. 20, lines 38-41]: The silhouette score can be calculated with any similarity or distance metric. ([Col. 20, lines 36-38]: a high value indicates that the node is well matched to its own cluster and badly matched to neighboring cluster. If most nodes have a high silhouette score, then the clustering maybe accurate); and separation indicating how dissimilar numerical values, of the numerical data, are to at least one other said cluster ([Col. 20, lines 36-38]: If many nodes have a low or negative silhouette score, then the clustering may have too many or too few clusters).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Brueckner and Anderson with the teachings of Parandehgheibi to include “wherein the clustering uses a Silhouette clustering technique based on a measure of: cohesion indicating how similar numerical values, of the numerical data, are to each other within a respective said cluster; and separation indicating how dissimilar numerical values, of the numerical data, are to at least one other said cluster.”
The motivation for doing so would be to validate consistency within data clusters, as recognized by Parandehgheibi ([Col. 20, lines 29-31 of Parandehgheibi]: Silhouette scoring is a method of interpretation and validation of consistency within clusters of data).

With respect to claim 16, the combined teachings of Brueckner and Anderson teaches all of the elements of the current invention as stated above concerning claim 12. 
However, the combined teachings of Brueckner and Anderson does not explicitly teach “wherein the clustering is based on a measure of: cohesion indicating how similar numerical values, of the numerical data, are to each other within a respective said cluster; and separation indicating how dissimilar numerical values, of the numerical data, are to at least one other said cluster.”
On the other hand, in the same field of endeavor, Parandehgheibi teaches wherein the wherein the clustering is based on a measure of: cohesion indicating how similar numerical ([Abstract]: The network can compare the similarity of the respective feature vectors and determine each node's cluster based on similarity measures between nodes; [Col. 20, lines 38-41]: The silhouette score can be calculated with any similarity or distance metric. ([Col. 20, lines 36-38]: a high value indicates that the node is well matched to its own cluster and badly matched to neighboring cluster. If most nodes have a high silhouette score, then the clustering maybe accurate); and 
separation indicating how dissimilar numerical values, of the numerical data, are to at least one other said cluster ([Col. 20, lines 36-38]: If many nodes have a low or negative silhouette score, then the clustering may have too many or too few clusters).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Brueckner and Anderson with the teachings of Parandehgheibi to include “wherein the clustering is based on a measure of: cohesion indicating how similar numerical values, of the numerical data, are to each other within a respective said cluster; and separation indicating how dissimilar numerical values, of the numerical data, are to at least one other said cluster.”
The motivation for doing so would be to validate consistency within data clusters, as recognized by Parandehgheibi ([Col. 20, lines 29-31 of Parandehgheibi]: Silhouette scoring is a method of interpretation and validation of consistency within clusters of data).

Claim 8 is rejected under 35 U.S.C. 103 as being unpatentable over Brueckner et al. (U.S. Patent No. 10,318,882 B2), hereinafter Brueckner, in view of Anderson et al. (Publication No. US 20090234683 A1), hereinafter Anderson, and in further view of Yin et al. (CN106570167A), hereinafter Yin.

With respect to claim 8, the combined teachings of Brueckner and Anderson disclose all of the elements of the current invention as stated above concerning claim 1.
 However, the combined teachings of Brueckner and Anderson does not explicitly teach “wherein the threshold number of instances is ten or more”.
On the other hand, in the same field of endeavor, Yin teaches wherein the threshold number of instance is ten or more (Step 3-3, cluster K-centers clustering results, the given threshold hierarchical clustering until all the distance between the class is greater than the threshold… In this embodiment, the threshold is 10-20).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Brueckner and Anderson with the teachings of Yin to include “wherein the threshold number of instances is ten or more”.
The motivation for doing so would be to attain more accurate clustering results, as recognized by Yin ([Abstract of Yin]: However, when using only traditional clustering methods for the topic discovery of microblogs, the high-dimensional and sparseness of the feature vectors will be caused, resulting in inaccurate clustering results).

Claim 9 is rejected under 35 U.S.C. 103 as being unpatentable over Brueckner et al. (U.S. Patent No. 10,318,882 B2), hereinafter Brueckner, in view of Anderson et al. (Publication No. US 20090234683 A1), hereinafter Anderson, and in further view of Chiu et al. (CN102859516A), hereinafter Chiu.

With respect to claim 9, the combined teachings of Brueckner and Anderson disclose all of the elements of the current invention as stated above concerning claim 1.
However, the combined teachings of Brueckner and Anderson does not explicitly teach “wherein the categorical data includes uniform resource locators.”
On the other hand, in the same field of endeavor, Chiu teaches wherein the categorical data includes uniform resource locators ([0046]: each classification data record 222 contains an information item locator such as URL 224; [0058]: the process of propagating categorical data from a URL).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Brueckner and Anderson with the teachings of Chiu to include “wherein the categorical data includes uniform resource locators”.
The motivation for doing so would be to generate categorical data for a URL, as recognized by Chiu ([0073 of Chiu]:…generate categorical data for the URL (see description of operation 328-6, FIG. 3B)).




Examiner Note
Examiner has cited particular columns/paragraph and line numbers in the references applied to the claims above for the convenience of the applicant. Although the specified citations are representative of the teachings of the art and are applied to specific limitations within the individual claim, other passages and figures may apply as well. It is respectfully requested from the applicant in preparing responses, to fully consider the references in entirety as potentially 
In the case of amending the Claimed invention, Applicant is respectfully requested to indicate the portion(s) of the specification which dictate(s) the structure relied on for proper interpretation and also to verify and ascertain the metes and bounds of the claimed invention. This will assist in expediting compact prosecution. MPEP 714.02 recites: "Applicant should also specifically point out the support for any amendments made to the disclosure. See MPEP § 163.06. An amendment which does not comply with the provisions of 37 CFR 1.12l(b), (c),  (d), and (h) may be held not fully responsive. See MPEP § 714." Amendments not pointing to
specific support in the disclosure may be deemed as not complying with provisions of 37 C.F.R. 1.131(b), (c), (d), and (h) and therefore held not fully responsive. Generic statements such as "Applicants believe no new matter has been introduced" may be deemed insufficient.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SHIRLEY D. HICKS whose telephone number is (571)272-3304.  The examiner can normally be reached on Mon - Fri 7:30 - 4:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Fred Ehichioya can be reached on (571) 272-4034.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/IRETE F EHICHIOYA/Supervisory Patent Examiner, Art Unit 2168