DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Objections
Claims 4, 21, and 29 are objected to because of the following informalities:  
Claim 4 sets forth limitations of “other signifiers or features” and “as well as mathematical features, like sums, means, and distribution parameters”, which can be construed as indefinite.  The Specification, Page 5, Lines 12 to 18, describes these limitations, but the term “like” appears in this context to connote ‘similar to’, as there does not appear to be an art-recognized meaning for “like sums”.  MPEP §2173.05(d) states that exemplary claim language can be indefinite in use of phrases “such as” or “like material”.  The limitations of “other” signifiers or features and “like” sums, means, and distribution parameters, then, are indefinite.  Similarly, “modifications of those” is unclear because it is not fully determined what these modifications encompass, and “as well as” is somewhat unclear after the limitation of “one or more of”, where “as well as” appears that it is simply equivalent to “and”.
Claim 21 sets forth limitations of “other discussion fora” and “other text and forms of media”, which are somewhat indefinite.  Applicants can address this by deleting the terms “other”, which are somewhat indeterminate as to what they are referring to.  Additionally, “as well as their metadata” is somewhat ambiguous after the limitation of “one or more of”, but could be simplified as “and media metadata”.
.
Appropriate correction is required.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 4 to 6, 9, 13, 15 to 19, and 21 to 28 are rejected under 35 U.S.C. 103 as being unpatentable over Revesz et al. (U.S. Patent Publication 2015/0154289) in view of Tetreault et al. (U.S. Patent Publication 2017/0257329).
Concerning independent claims 1, 26, and 28, Revesz et al. discloses a method, system, and computer-program product for categorizing and moderating user-generated content in an online environment, comprising:
“a) providing a set of data contents as training data, the data contents being labeled as acceptable or unacceptable contents” – a reference corpus or reference database refer to a collection of textual examples that, for a particular category of content, are classified (“labeled”) as either positive or negative examples of that particular category; a reference database may contain a collection of positive or 
“b) the moderator tool receiving said training data” – a machine learning algorithm may be trained and tested using examples in a reference corpus (¶[0112]); different folds are identified in the reference corpus for use in training and testing the machine learning system; training examples present in the selected fold are used to train the machine learning system (¶[0118] - ¶[0119]: Figure 7: Steps 704 and 706); here, a machine learning system that is trained for moderating user-generated content is “the moderator tool”;
“c) the moderator tool executing a first algorithm that identifies features that exist in the training data and extracts them, and ending up with a feature space” – the term ‘feature’ refers to a sub-sequence of consecutive textual terms from a particular textual sequence (¶[0038]); the term ‘vector’ refers to a representation of a particular textual content as a vector in hyperspace (“a feature space”); the hyperspace may be a multi-dimensional space to which a text categorization program is mapped in order to facilitate a machine learning algorithm; the term ‘training vector’ refers to a vector that is used in training a machine learning system (¶[0040] - ¶[0041]); a features table is 
“d) the moderator tool executing a second algorithm in the feature space for choosing the features in a moderation model to be created and defining [a weighting of] data features, the choosing and defining based on the data contents labelled as the acceptable contents and the unacceptable contents” – text of each positive and negative example in the reference corpus is parsed to generate a sequence of n-grams and the frequency with which each n-gram appears in each example; a features table is generated or indexed to populate the features table with the n-grams and associated n-gram frequencies for the examples in the reference corpus; if the same n-gram appears in two or more examples, the same n-gram is not entered multiple times in the features table, but rather the frequency of the same n-gram entry is updated based on its recurrence; the n-gram entries in the features table are sorted by decreasing n-gram frequency; one or more stop words may be discarded from the sorted features table (“choosing the features”); stop words are commonly used terms and tend to appear at 
“training parameters of a machine learning model [based on the weighted data features] in order to create the moderation model” – training examples are used to train the machine learning system; the machine learning system accepts as training input a set of training vectors generated from the training examples; the machine learning system is trained on the training vectors (¶[0119] - ¶[0121]: Figure 7: Steps 708 to 710);
“e) the moderator tool receiving new data content to be moderated” – embodiments may receive textual content for categorizing web page content generated by a user using a trained machine learning system (¶[0175] - ¶[0176]: Figure 15: Step 1502);
“f) the moderator tool executing the first algorithm on the new data content for identifying the data features in the new data content to be moderated in accordance with the moderation model created” – embodiments may process the selected content to generate a vector that may be used by a trained machine learning system to determine whether the selected content is a positive example of a category by parsing the content to generate a sequence of one or more n-grams based on the selected 
“g) producing a moderated result for the new data content by indicating whether the new data content is acceptable or unacceptable in accordance with the moderation model created” – embodiments automatically determine a probability value indicating that the user-generated content is either a positive example or a negative example of one or more unsuitable categories; if the user-generated content is determined to be a positive example of any of the unsuitable categories (“where the new data content is acceptable or unacceptable”) to a predefined degree of certainty, the content may be automatically excluded from publication in the online environment (“producing a moderated result for the new data content”) (Abstract).

Concerning independent claim 26, Revesz et al. additionally discloses “one or more client devices with means for sending training data and data contents to the moderator tool to be moderated” and “an interface for interaction between the one or more client devices and the moderator tool” – a user may interact with computing device 1600 through a display that may use one or more user interfaces 1620 (“an interface for 

Concerning independent claims 1, 26, and 28, Revesz et al. omits only the limitations of “defining a weighting of data features” and training parameters of a machine learning model “based on the weighted data features”.  Broadly, Revesz et al. discloses “choosing the features to be used in a moderation model to be created . . . and choosing . . . based on the data contents labelled as the acceptable and the unacceptable contents” so as to provide “training parameters of a machine learning model based on the . . . data features in order to create the moderation model”.  Here, Revesz et al. discloses training a machine learning algorithm with a reference corpus that is compiled using positive and negative content examples for each category.  (¶[0090] - ¶[0092]: Figure 4: Step 404)  Positive and negative examples in a reference corpus are parsed during pre-processed to generate a sequence of n-grams in a features tables, where n-grams are sorted by decreasing frequency, and stop words Revesz et al., then, teaches “choosing the features to be used in a moderation model to be created . . . based on the data contents labelled as the acceptable contents and the unacceptable contents” because some features are retained from positive and negative training examples (“data contents labelled”).  Additionally, Revesz et al. discloses that training includes updating a distribution of weights to indicate an importance of certain examples during training.  (¶[0115])  However, Revesz et al. does not clearly disclose that a weighting is provided to data features during training, only that a weighting is applied to examples during training.  Still, this modification would be to some degree obvious because the features are obtained by parsing the training examples, so that any weighting applied to training examples would be inherited by the features remaining after stop word removal.    
Concerning independent claims 1, 26, and 28, even if these limitations of “defining a weighting of data features” and training parameters of a machine learning model “based on the weighted data features” are omitted by Revesz et al., they are taught by Tetreault et al.  Generally, Tetreault et al. teaches natural language processing to provide feedback to a user regarding a user’s message before the user sends the message.  (Abstract)  A model may be trained using one or more features 116 to detect a condition using features 116 of a message.  Machine learning may be performed by a machine learning algorithm on a number of training examples to generate a model used to detect a condition.  A hate speech detection model may be used to make a prediction of whether or not a message contains hateful, abusive, etc., content, or content that may be perceived to be hateful, abusive, etc.  A hate speech  etc.  (¶[0035] - ¶[0036]: Figure 1)  Tetreault et al., then, is similar to “a moderator tool” using “a moderation model” because it predicts whether a message should be ‘moderated’ by providing feedback to the user.  A machine learner can assign weights to features in a feature set from analysis of training examples.  Features in the feature set that are assigned higher weights, relative to other features in the feature set are considered more predictive of a label than features with lower weights.  (¶[0069]: Figure 6)  A hate, or abusive language, linter may be used to identify abusive or hateful language, and to flag a user’s message as abusive, hateful, etc., before the user sends the message.  (¶[0076])  A machine learning algorithm can use n-gram features for training example 702 to train a model.  The model can be trained using a number of examples, each of which is represented using features and its respective label.  The machine learning algorithm generates weights for each of the features in the feature set using the training examples.  Features with higher weights are considered more predictive of a label relative to features having lower weights.  (¶[0079]: Figure 8)  Tetreault et al., then, teaches “choosing the features to be used in the moderation model to be created and defining a weighting of data features, the choosing and defining based on the data contents labelled as the acceptable contents or the unacceptable contents” and “training parameters of a machine learning model based on the weighted data features in order to create the moderation model”.  An objective is to provide feedback to a user regarding an error or condition detected in a user’s message before the user sends the message.  (Abstract)  It would have been obvious to one Tetreault et al. to moderate user-generated content in Revesz et al. for a purpose of providing feedback to a user before the user sends a message.  

Concerning claim 4, Revesz et al. discloses that features can be generated by n-grams, which are a sub-sequence of consecutive textual items from a particular textual sequence (“wherein the features consists of one or more of . . . text items, . . . , n-grams”).  (¶[0037] - ¶[0038])  Additionally, these n-grams can be unigrams, which are individual “words”, and implicitly, words at least comprise “characters” and “character strings”, where n-grams are “words combinations” and “phrases”.
Concerning claim 5, Revesz et al. discloses that user-generated content comprises textual items, which may include unigrams, bigrams, and trigrams (“different types of data including at least one of text or metadata”).  (¶[0037])  
Concerning claim 6, Revesz et al. discloses a data structure (“a data format”) that includes table entries including one or more columns, and suitability information.  (¶[0074] - ¶[0076]: Figure 2)  Here, Figure 2 includes “separate fields” defined by the columns of the data structure representing different categories (‘abusive’, ‘racist’, ‘sexist’) (“for different types of data content”), where each category is a ‘label’. 
Concerning claim 9, Revesz et al. discloses that features table 650 has a column 654 for unique IDs associated with n-grams, and a column 656 for n-gram frequencies.  (¶[0111]: Figure 6B); here, a feature frequency is a “feature distribution”; one or more stop words may be discarded from the sorted features table; stop words are commonly 
Concerning claim 13, Revesz et al. discloses that a reference corpus can be enriched with more examples to improve the accuracy of the trained system (“sending additional training data to be used for updating the moderation model”) (¶[0099]: Figure 4: Step 404).
Concerning claim 15, Revesz et al. discloses moderating user-generated content in an online environment (Abstract); moderation of user-generated content is performed before publication of the content on a web page (¶[0029]); a machine learning system may be used on new user-generated content for a blog (¶[0097]); computing device 1600 may include a network interface 1612 for the Internet (“an interface that communicates”) (¶[0191]: Figure 16); clients 1706 and 1708 may communicate with servers 1702 and 1704 over the Internet (“that communicates with client devices and using the service”) (¶[0193]: Figure 17); generally, content moderation in an online environment for web pages is “a web service”.
Concerning claims 16 to 19, Revesz et al. discloses that a network environment 1700 may include one or more servers 1702 and 1704 coupled to one or more clients 
Concerning claim 21, Revesz et al. discloses moderation of user-generated content that can include blogs and textual content of a web page (“wherein the data contents to be moderated are user-generated content including at least one of: blogs”).  (¶[0070] and ¶[0097])  
Concerning claim 22, Revesz et al. discloses that moderation of user-generated content is performed before publication of the content on a web page (“earlier published by the client”).  (¶[0029])  A reference corpus may be compiled for each category to contain verified positive and negative content examples, where the verification may be performed by a human reviewer (“wherein training data is based on human-generated moderated data”).  (¶[0092]: Figure 4: Step 404)  Moderation of user-generated content that can include blogs and textual content of a web page.  (¶[0070] and ¶[0097])  
Concerning claim 23, Revesz et al. discloses that different folds are identified in the reference data for use in training and testing the machine learning system (¶[0118]: Figure 7: Step 704); the trained machine learning system is tested on test vectors (¶[0123]: Figure 7: Step 714); for each parameter value, an accuracy is determined of the machine learning system; embodiments assess the change in accuracy over different values for each parameter, and parameter values are selected to maximize the accuracy of the machine learning system; this results in the generation of a set of parameter values, each parameter value corresponding to a different parameter, of which the accuracy of the machine learning system is maximized (¶[0128] - ¶[0129]: Figure 7: Step 728); here, determining an accuracy of the machine learning system with testing data is equivalent to “checks the performance of the updated moderation model for a test set separated from the additional training data”; when an accuracy is maximized for a machine learning system this can “assure that it works better than the foregoing moderation model”.
Concerning claim 24, Revesz et al. discloses automatically determining a probability value indicating that the user-generated content is either a positive example or a negative example of one or more unsuitable categories to a predefined degree of certainty (Abstract; ¶[0031]); a probability value is used as an indication that the content is an example of an unsuitable category, where a higher likelihood value is that the content is unsuitable for publication (¶[0050]); embodiments aggregate the unsuitability information for the comments of the user to generate an indication of how suitable or 
Concerning claim 25, Revesz et al. discloses that different threshold values can be set for different unsuitable categories; threshold values may be lower for a more inflammatory category of ‘racist’, and the threshold values may be higher for a more general category of ‘abusive’ (¶[0054]); here, if a threshold value is set higher or lower for a category, then this is equivalent to “applying a strictness value on the moderation request”; that is, a lower threshold value would have a greater ‘strictness’ as compared to a higher threshold value. 
Concerning claim 27, Revesz et al. discloses clients 1706 and 1708 may train and test a machine learning system, and submit the trained machine learning system to servers 1702 and 1704 for using the trained machine learning system.  (¶[0194]: Figure 17)  That is, Revesz et al. uses clients that provide their own training data so that a machine learning system (“a moderation model”) “is specific for a given type of training data to serve each client device individually by using a specific moderation model.”  

Claim 2 is rejected under 35 U.S.C. 103 as being unpatentable over Revesz et al. (U.S. Patent Publication 2015/0154289) in view of Tetreault et al. (U.S. Patent Publication 2017/0257329) as applied to claim 1 above, and further in view of Lim et al. (U.S. Patent Publication 2011/0078187).
Revesz et al. does not expressly disclose that a moderator tool differentiates the acceptable and unacceptable contents “by defining a boundary in the feature space” that separates the labeled acceptable contents and the unacceptable contents.  Still, this could be understood by one skilled in the art to be implicit for Revesz et al.  That is, Revesz et al. discloses that a trained machine learning system categorizes user content into suitable and unsuitable categories with a predefined degree of certainty in a feature space.  Implicitly, this machine learning system operates in a high dimensional feature space, where features in the features space are clustered according to the categories, and the clusters define ‘boundaries’ between categories.  
However, Lim et al. teaches training a classifier based on examples and extracted features.  (Abstract)  A user is presented with records within tuples, and a user can mark positive and/or negative examples to indicate tuples that satisfy and/or do not satisfy a desired query.  These examples are used to populate the training data.  (¶[0026] - ¶[0027])  Active learning logic can select records of tuples that are deemed important to characterizing the query as optimized examples, and the user may be prompted to label the optimized examples in the learning process as satisfying or not satisfying the query.  The learning process begins by developing the classifying feature vectors.  (¶[0034] - ¶[0035])  Training data should include both positive and negative examples.  (¶[0028])  Specifically, Lim et al. teaches that support vector machines (SVMs) may be used as a base learner in creating the classifier, where SVMs are a class of supervised learning algorithms that learn a linear decision boundary to discriminate between two classes of positive and negative examples in the training data.  The result is a linear classification rule that can be used to classify new test examples Lim et al., then, teaches “defining a boundary in a feature space that separates the labeled acceptable contents and unacceptable contents” as illustrated in Figure 5.  An objective is to train a classifier with a support vector machine using optimized examples.  (¶[0040])  It would have been obvious to one having ordinary skill in the art to define a boundary in a feature space that separates labeled acceptable and unacceptable contents as taught by Lim et al. to moderate user-generated content of Revesz et al. for a purpose of using a support vector machine to train a machine learning model with optimized examples.

Claims 7 to 8 are rejected under 35 U.S.C. 103 as being unpatentable over Revesz et al. (U.S. Patent Publication 2015/0154289) in view of Tetreault et al. (U.S. Patent Publication 2017/0257329) as applied to claim 1 above, and further in view of Davi et al. (U.S. Patent Publication 2011/0078242).
Concerning claim 7, Revesz et al. does not expressly disclose using metadata for moderation.  Still, Figure 2 illustrates a data table that includes various data that could be construed as ‘metadata’, e.g., user ID, warned, and flagged.  Anyway, Davi et al. a and an action field 64b.  Additionally, media content metadata 62 can include a unique identifier, specification of media type, a title assigned to the media content, and information about the registered user.  (¶[0037] - ¶[0040])  An objective is to provide automatic moderation that has an advantage of being scalable when a moderator can be overwhelmed by an amount of uploaded content.  (¶[0002])  It would have been obvious to one having ordinary skill in the art that a data table of Revesz et al. includes metadata as taught by Davi et al. for a purpose of providing automatic moderation that is scalable and does not produce data that overwhelms a moderator.
Concerning claim 8, Revesz et al. discloses that a features table is updated for new examples in a reference corpus (“processing content data for defining additional features”).  (¶[0102]: Figure 5: Step 504)

Claim 10 is rejected under 35 U.S.C. 103 as being unpatentable over Revesz et al. (U.S. Patent Publication 2015/0154289) in view of Tetreault et al. (U.S. Patent Publication 2017/0257329) as applied to claim 1 above, and further in view of Mylonakis et al. (U.S. Patent Publication 2014/0200878).
Concerning claim 10, Revesz et al. discloses dividing training data into a training set and test set.  (¶[0118] - ¶[0119]: Figure 7: Steps 704 to 706)  However, Revesz et al. does not expressly disclose “a development set” for “defining some parameters for the model”.  Still, Mylonakis et al. discloses model adaptation using training examples, e.g., maximize, a scoring metric.  (¶[0054])  Feature weights are tuned on a development set for a mixture of usage and particular styles and genres that match a test domain.  (¶[0078])  The aim is to optimize the scores over all training samples in a development corpus until an optimal combination of weights is found.  (¶[0083])  It would have been obvious to one having ordinary skill in the art to use a development set for defining parameters for a model as taught by Mylonakis et al. to moderate user-generated content of Revesz et al. for a purpose of optimizing a model to match a mixture of styles and genres of a test set. 

Claim 14 is rejected under 35 U.S.C. 103 as being unpatentable over Revesz et al. (U.S. Patent Publication 2015/0154289) in view of Tetreault et al. (U.S. Patent Publication 2017/0257329) as applied to claim 1 above, and further in view of Miura et al. (U.S. Patent Publication 2015/0254228).
Revesz et al. does not expressly disclose a moderation model that performs “language-specific processing”.  However, Revesz et al. would implicitly produce a moderation model that is specific to a given language, e.g., English.  Anyway, Miura et al. teaches using machine learning to estimate a topic of a document, where an information processing unit handles English as a first language and Japanese as a second language.  (¶[0021])  A controller executes a multilingual document classifying program, which obtains text from first-language field A and second-language field A, and obtains first-language-and-second-language word-sense information.  (¶[0024] - ¶[0027])  An objective is to execute processing for classifying multilingual documents.  Miura et al. to moderate user-generated content of Revesz et al. for a purpose of classifying multilingual documents. 

Claim 20 is rejected under 35 U.S.C. 103 as being unpatentable over Revesz et al. (U.S. Patent Publication 2015/0154289) in view of Tetreault et al. (U.S. Patent Publication 2017/0257329) as applied to claim 1 above, and further in view of Srinivasan et al. (U.S. Patent Publication 2015/0254228).
Revesz et al. does not expressly disclose that a client moderation tool has an application programming interface (API), but this is a common software component that could be construed as inherent.  That is, a web interface may be construed as an API, and a web interface appears to be disclosed by Revesz et al.  Anyway, Srinivasan et al. teaches automatic moderation of online content, where a web page graphical user interface 310 can generate new content.  (¶[0023]: Figure 3)  This web page graphical user interface can be construed as an API.  An objective is to reduce a difficulty for human moderators to process the sheer volume of information in a timely manner due the popularity of social networking.  (¶[0002])  It would have been obvious to one having ordinary skill in the art to include an application programming interface as taught by Srinivasan et al. to moderate user-generated content of Revesz et al. for a purpose of helping human moderators to process large volumes of information.

Claim 29 is rejected under 35 U.S.C. 103 as being unpatentable over Revesz et al. (U.S. Patent Publication 2015/0154289) in view of Tetreault et al. (U.S. Patent  as applied to claims 1 and 9 above, and further in view of Jaiswal (U.S. Patent Publication 2012/0303559).
Tetreault et al. teaches that a machine learner assigns weights to features in a feature set from its analysis of training examples, so that features in the feature set that are assigned higher weights relative to other features are considered to be more predictive of a label than features with lower weights.  (¶[0069])  Tetreault et al., then, teaches “weighting text items”, but omits that this weighting is “based on at least one of tf.idf or entropy.”  Still, it is known in the prior art at least to use term frequency/inverse document frequency (“tf.idf”) to determine important words in a document.  Generally, Jaiswal teaches generating machine learning classifiers for detecting specific categories of sensitive information.  (Abstract)  Machine learning classifiers are generated by obtaining training data for each specific category of sensitive information, e.g., training data sets 122(1)-(n), each of which includes a plurality of positive and a plurality of negative examples of a specific category of sensitive information to be protected.  (¶[0034]: Figure 1)  Specifically, Jaiswal teaches (1) extracting a feature set from the training data set that includes statistically significant features of the positive examples within the training data set and statistically significant features of the negative examples within the training data set, and then (2) using the feature set to build a machine learning-based classifier model that is capable of indicating whether or not new items of data contain information that falls within the specific category of sensitive information associated with the training data set.  Examples of features include a word, e.g., ‘proprietary’, a pair of words, e.g., ‘stock market’, and a phrase, e.g., ‘please do not distribute’.  Features may be extracted from a training data set in a variety of ways.  A e.g., words, within both the positive and negative examples within the training data set, (2) rank these positive features and negative features based on the frequency of occurrence, and (3) select the highest ranked features for inclusion within a feature set.  The weight associated with each feature may be the frequency of occurrence of the specific feature.  Training module 106 may filter out commonly used words during this process, including ‘the’, ‘it’, ‘and’, ‘or’, etc.  Training module may use a term frequency-inverse document frequency (TF-IDF) algorithm to select and/or weight features within the feature set (“wherein weighting text items includes weighting text items based on at least one of tf.idf . . .”), or may use a feature-extraction and/or feature-weighting algorithm of segment-set term frequency – inverse document frequency (STF-IDF).  After training module 106 has generated a feature set for a particular training data set, training module 106 may generate a machine learning-based classification model based on the feature set.  (¶[0053] - ¶[0057]: Figure 3)  Jaiswal, then, teaches weighting text items based on term frequency – inverse document frequency to select features for training in machine learning.  An objective is to more accurately detect and protect sensitive data using machine-learning techniques to identify sensitive data that is similar to but not exactly the same as known examples of sensitive data.  (¶[0003])  It would have been obvious to one having ordinary skill in the art to determine weighting of text items based on term frequency – inverse document frequency as taught by Jaiswal to moderate user-Revesz et al. for a purpose of detecting data using machine-learning that is similar to but not exactly the same as known examples.

Response to Arguments
Applicants’ arguments filed 01 July 2021 have been fully considered but they are not persuasive.
Applicants’ amendments overcome the rejection of claim 9 for indefiniteness under 35 U.S.C. §112(b).
However, some additional informalities are noted for claims 4, 21, and 20.
Applicants do not amend the independent claims, but present arguments traversing the prior rejection of these independent claims as being obvious under 35 U.S.C. §103 over Revesz et al. (U.S. Patent Publication 2015/0154289) in view of Tetreault et al. (U.S. Patent Publication 2017/0257329).  Applicants amend dependent claim 9, and add new claim 29, where new claim 29 includes some limitations deleted from claim 9.  Generally, Applicants’ argument against the obviousness of the independent claims is that neither Revesz et al. nor Tetreault et al. discloses or teaches the limitations of “defining a weighting of data features” and “training parameters of a machine learning model based on the weighted data features”.  Applicants state that these limitations are not disclosed by Revesz et al., which only uses all n-grams minus any predefined stop words in training, but does not do so with any sort of weighting.  Then Applicants argue that these limitations are not taught by Tetreault et al., citing ¶[0079].  Applicants characterize Tetreault et al., ¶[0079], as teaching a machine 
Generally, Applicants’ arguments are not persuasive, but new grounds of rejection are necessitated as directed to new dependent claim 29.  Here, dependent claim 29 is directed to a new limitation of weighting text items “based on at least one of tf.idf or entropy”, which limitation was not encompassed by claim 9, where choosing features to be used only “includes at least one of” a plurality of alternative limitations, e.g., “not selecting text items which are too frequent”.  Mainly, Applicants’ arguments are not persuasive, but new grounds of rejection are formally necessitated by dependent claim 29, which is now rejected as being obvious over Jaiswal (U.S. Patent Publication 2012/0303559).
Applicants’ arguments are not persuasive given what is taught by Tetreault et al.  It is maintained that Applicants’ argument when carefully considered does not make sense in a technical way, and that these limitations of using weighted words to train a machine learning model is reasonably taught by Tetreault et al.  The examiner agrees with Applicants’ argument that the limitations directed to selecting and weighting features are not clearly disclosed by Revesz et al.  Mainly, the rejection is relying for these limitations on Tetreault et al.  Still, there is some similarity between the claim language and what is being done by Revesz et al.  Removing stop words can be broadly construed as “choosing the features” when the remaining words of the training examples are the features.  Weighting of training examples is similar to providing “weighted data features in order to create the moderation model”.  Revesz et al., at ¶[0115], ¶[0160], and ¶[0165], discloses this weighting of training examples.
Tetreault et al. clearly teaches “defining a weighting of data features” and “training parameters of a machine learning model based on the weighted data features”.  One problem with Applicants’ argument is that it only considers ¶[0079] outside of context in Tetreault et al.  Here, Tetreault et al., ¶[0054], states:
The model can be trained by a machine learning algorithm using training data, which comprises a number of training examples represented as features, such as NLP features 116. Once trained, a model can received input, e.g., feature set input, and provide output that can be used by the linter to determine whether a condition exists that warrants feedback to the user. The feature set input can include some of the NLP features 116. The same feature set used to represent each training example can be used to represent the input provided to a linter's model to generate the output used by the linter.  (emphasis added)

Then Tetreault et al., ¶[0056], states:
At step 502, a model for a given linter is trained using a training data set. The model can be validated using test data. By way of a non-limiting example, a model is trained using a machine learning algorithm to learn which features in the model's feature set are the most predictive, least predictive, etc. of a certain label.  (emphasis added)

Tetreault et al., ¶[0069] - ¶[0070], continues to describe how weights are assigned to features in machine learning so that words that are more predictive of a classification are given higher weights, e.g., ‘dear’ is given a higher weight than ‘deer’ as a word to determine sentiment and formality:
[0069] The machine learner can assign weights to the features in the feature set from its analysis of the training examples. In the case of supervised learning, the machine learning algorithm uses each example's respective label as well as the example's features. Features in the feature set that are assigned higher weights, relative to other features in the feature set, by the machine learning algorithm are considered to be more predictive of a label than features with lower weights. 
[0070] FIG. 6 provides an example of features and corresponding weights for the deer and dear confused word example in accordance with one or more embodiments of the present disclosure. In the example shown in FIG. 6, features with negative weights are more predictive of the deer label. In the example, the the ate feature, which has the highest negative weight, e.g., -0.5, is the most predictive of the deer label, relative to the other features shown in the example. Conversely, features with positive weights are more predictive of the dear label. In the example, the friend feature, which has the highest positive weight, e.g., 0.6, is the most predictive of the dear label, relative to the other features shown in the example. 
Tetreault et al., but this does not actually support their argument:
[0079] As discussed in connection with step 502 of FIG. 5, a machine learning algorithm can use the n-gram features for the training example 702, in addition to other training examples, to train a model. The model can be trained using a number of examples, each of which is represented using features and its respective label. The machine learning algorithm generates weights for each of the features in the feature set using the training examples. By way of a non-limiting example, features with higher weights are considered more predictive of a label relative to features having lower weights. 
	Specifically, it is maintained that Tetreault et al. does actually say that weighted features are used to train a model by machine learning.  It is clear from ¶[0052] and ¶[0054] that Tetreault et al. is training a model using training examples represented as features to learn which features are most predictive of a given classification.  Then ¶[0069] of Tetreault et al. describes assigning weights to these features by supervised learning.  Specifically, ¶[0079] of Tetreault et al. does not exclude using weighted features to train a machine learning algorithm, but actually teaches it.  That is, there can be a number of subsidiary steps in training the machine learning model, where a first step can be using machine learning to determine weights of features for significant words, and then a second step can be training a classifier model by machine learning using the weighted features, and this is what is taught by Tetreault et al.  Technically, it would not make sense to only apply weighted features to input text that is to be classified but not to use the weighted features to train the machine learning model because then the machine learning model would not know how to use the weighted features to classify the input text.  The whole point of machine learning is to train a model to determine how to classify the input text, so that a model is trained by machine learning to detect what words are going to be most significant.  Applicants’ argument as Tetreault et al. failing to teach “training parameters of a machine learning model based on the weighted data features”, then, is not persuasive both from a technical standpoint and from what is actually described in that reference.  The examiner agrees that Revesz et al. does not clearly disclose these limitations, but this reference is doing something similar if one were to assume that weighted training examples encompass, are equivalent to, or are parsed to obtain, weighted features.  
	Applicants might also consider in this context what is taught by Jaiswal.  Currently, Jaiswal is only cited against new dependent claim 29, but would be independently relevant if the limitations of this dependent claim were incorporated into the independent claims.  Specifically, Jaiswal, ¶[0053] - ¶[0057], teaches extracting features from positive and negative training data, where a weight may be associated with each of the features in order to indicate the relative importance of that feature relative to other features.  At ¶[0057], Jaiswal expressly teaches that these weighted features are used as a feature set to train a machine learning model.  Conceivably, then, Jaiswal could be substituted for Tetreault et al. in a rejection of the independent claims.  Although Jaiswal is directed to detecting sensitive information instead of hateful or abusive message content as taught by Tetreault et al., one skilled in the art could still understand that categories of sensitive information could include ‘inflammatory’ or ‘abusive’ categories of Revesz et al.
	Applicants’ arguments are not persuasive.  The rejection of the independent claims is maintained to be proper.  New grounds of rejection are necessitated for a new dependent claim.  Accordingly, this rejection is properly FINAL.


Conclusion
The prior art made of record and not relied upon is considered pertinent to Applicants’ disclosure.
Cardie et al. and Scholtes disclose related prior art.
Applicants’ amendment necessitated the new grounds of rejection presented in this Office Action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP §706.07(a).  Applicants are reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MARTIN LERNER whose telephone number is (571) 272-7608.  The examiner can normally be reached on Monday-Thursday 8:30 AM-6:00 PM.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel Washburn can be reached on (571) 272-5551.  The fax phone 
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair.  Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/MARTIN LERNER/Primary Examiner
Art Unit 2657     
July 13, 2021