DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 4 to 6, 9, 13, 15 to 19, 21 to 29, and 31 are rejected under 35 U.S.C. 103 as being unpatentable over Revesz et al. (U.S. Patent Publication 2015/0154289) in view of Jaiswal (U.S. Patent Publication 2012/0303558).
Concerning independent claims 1, 26, and 28, Revesz et al. discloses a method, system, and computer-program product for categorizing and moderating user-generated content in an online environment, comprising:
“a) providing a set of data contents as training data, the data contents being labeled as acceptable or unacceptable contents” – a reference corpus or reference database refer to a collection of textual examples that, for a particular category of content, are classified (“labeled”) as either positive or negative examples of that particular category; a reference database may contain a collection of positive or negative textual examples for a single category or for a plurality of textual categories (¶[0036]); a reference corpus may be compiled for each category to contain verified positive and negative content examples in each category; verification may be performed by a human reviewer (¶[0092]: Figure 4: Step 404); a machine learning algorithm may be trained and tested using examples in a reference corpus (¶[0112]); here, textual examples (“a set of data content”) are used as “training data” for training a machine learning algorithm, where positive examples are “unacceptable content” and negative examples are “acceptable content” for each category as verified by a human reviewer;
“b) the moderator tool receiving said training data” – a machine learning algorithm may be trained and tested using examples in a reference corpus (¶[0112]); different folds are identified in the reference corpus for use in training and testing the machine learning system; training examples present in the selected fold are used to train the machine learning system (¶[0118] - ¶[0119]: Figure 7: Steps 704 and 706); here, a machine learning system that is trained for moderating user-generated content is “the moderator tool”;
“c) the moderator tool executing a first algorithm that identifies features that exist in the training data and extracts them, and ending up with a feature space” – the term ‘feature’ refers to a sub-sequence of consecutive textual terms from a particular textual sequence (¶[0038]); the term ‘vector’ refers to a representation of a particular textual content as a vector in hyperspace (“a feature space”); the hyperspace may be a multi-dimensional space to which a text categorization program is mapped in order to facilitate a machine learning algorithm; the term ‘training vector’ refers to a vector that is used in training a machine learning system (¶[0040] - ¶[0041]); a features table is generated to populate the features table with n-grams and n-gram frequencies for the reference corpus (¶[0102]: Figure 5); the machine learning system accepts as training input a set of training vectors generated from the training examples (¶[0120]: Figure 7: Step 708); embodiments may select a training example of the selected category from a reference corpus; the example may be a positive example or a negative example (¶[0135]: Figure 10: Step 1002); generation of training vectors includes parsing a training example to generate a sequence of n-grams and looking up a unique identifier for each n-gram in a features table; embodiments create a training vector associated with the selected example based on the unique identifiers of the n-grams (¶[0136] - ¶[0140]: Figure 10: Steps 1004 to 1010); 
“d) the moderator tool executing a second algorithm in the feature space for choosing the features in a moderation model to be created and defining [a weighting of] data features, the choosing and defining based on the data contents labelled as the acceptable contents and the unacceptable contents” – text of each positive and negative example in the reference corpus is parsed to generate a sequence of n-grams and the frequency with which each n-gram appears in each example; a features table is generated or indexed to populate the features table with the n-grams and associated n-gram frequencies for the examples in the reference corpus; the n-gram entries in the features table are sorted by decreasing n-gram frequency; one or more stop words may be discarded from the sorted features table (“choosing the features”); stop words are commonly used terms and tend to appear at the top of the sorted features table due to their relatively high frequency; certain non-word textual features that are predictive of categories may be retained, where examples of these textual features include capitalization, question marks, and exclamation points; the sorted features table is stored and used for generating training vectors (¶[0100] - ¶0107]: Figure 5; Compare Claim 9, where “choosing features includes at least one of: not selecting text items which are too frequent”); 
“training parameters of a machine learning model [based on the weighted data features] in order to create the moderation model” – training examples are used to train the machine learning system; the machine learning system accepts as training input a set of training vectors generated from the training examples; the machine learning system is trained on the training vectors (¶[0119] - ¶[0121]: Figure 7: Steps 708 to 710);
“e) the moderator tool receiving new data content to be moderated” – embodiments may receive textual content for categorizing web page content generated by a user using a trained machine learning system (¶[0175] - ¶[0176]: Figure 15: Step 1502);
“f) the moderator tool executing the first algorithm on the new data content for identifying the data features in the new data content to be moderated in accordance with the moderation model created” – embodiments may process the selected content to generate a vector that may be used by a trained machine learning system to determine whether the selected content is a positive example of a category by parsing the content to generate a sequence of one or more n-grams based on the selected content, looking up in a features table the unique identifier for each n-gram generated based on the selected content, and generating a combination of unique identifiers that can be used as a vector (¶[0177] - ¶[180]: Figure 15: Steps 1502 to 1512); here, a vector is generated from selected content (“the new data content”) during use of the trained machine learning system, where this vector represents “the data features” of n-grams in the content of a features table; 
“g) producing a moderated result for the new data content by indicating whether the new data content is acceptable or unacceptable in accordance with the moderation model created” – embodiments automatically determine a probability value indicating that the user-generated content is either a positive example or a negative example of one or more unsuitable categories; if the user-generated content is determined to be a positive example of any of the unsuitable categories (“where the new data content is acceptable or unacceptable”) to a predefined degree of certainty, the content may be automatically excluded from publication in the online environment (“producing a moderated result for the new data content”) (Abstract).
Concerning independent claim 26, Revesz et al. additionally discloses “one or more client devices with means for sending training data and data contents to the moderator tool to be moderated” and “an interface for interaction between the one or more client devices and the moderator tool” – a user may interact with computing device 1600 through a display that may use one or more user interfaces 1620 (“an interface for interaction between the one or more client devices and the moderator tool”) associated with embodiments (¶[0190]: Figure 16); a network environment 1700 may include one or more servers 1702 and 1704 coupled to one or more clients 1706 and 1708; clients 1706 and 1708 may train and test a machine learning system (“one or more client devices for sending training data and data contents to the moderator tool to be moderated”), and submit the trained machine learning system to servers 1702 and 1704 for using the trained machine learning system to moderate user-generated content; alternatively, servers 1702 and 1704 may train and test a machine learning system, and submit the trained machine learning system to clients 1706 and 1708 (¶[0196] - ¶[0197]: Figure 17).
Concerning independent claims 1, 26, and 28, Revesz et al. discloses training a machine learning algorithm with a reference corpus that is compiled using positive and negative content examples for each category.  (¶[0090] - ¶[0092]: Figure 4: Step 404) Additionally, positive and negative examples in a reference corpus are parsed during pre-processed to generate a sequence of n-grams in a features table, where n-grams are sorted by decreasing frequency, and stop words with high frequency are discarded, but certain textual features that are predictive of categories are retained for training examples.  (¶[0100] - ¶[0106]: Figure 5)  Here, retaining some features and discarding some features from n-grams of positive and negative examples according to their frequency of occurrence for training is similar to “choosing features to be used in a moderation tool to be created . . . the choosing and defining based on the data contents labeled as the acceptable contents and the unacceptable contents” and “training the parameters of the machine learning model . . . to create the moderation model”.  However, Revesz et al. does not expressly disclose “weighting of data features” in the limitation of “choosing the features to be used in a moderation model to be created and defining a weighting of data features, the choosing and defining based on the data contents labeled as acceptable contents and unacceptable contents” so as to provide “training parameters of a machine learning model based on the weighted data features in order to create the moderation model”.  Moreover, Revesz et al. suggests that training includes updating a distribution of weights to indicate an importance of certain examples during training.  (¶[0115])  Here, Revesz et al. does not clearly disclose that a weighting is provided to data features during training, only that a weighting is applied to examples during training, but data features are, at least, derived from weighted training examples by parsing.  
Concerning independent claims 1, 26, and 28, Jaiswal teaches whatever limitations that might be construed as omitted by Revesz et al. as directed to “choosing features to be used in a moderation model to be created and defining a weighting of data features” and “training parameters of a machine learning model based on the weighted data features in order to create the moderation model”.  Generally, Jaiswal teaches generating machine learning classifiers for detecting specific categories of sensitive information.  (Abstract)  Machine learning classifiers are generated by obtaining training data for each specific category of sensitive information, e.g., training data sets 122(1)-(n), each of which includes a plurality of positive and a plurality of negative examples of a specific category of sensitive information to be protected.  (¶[0034]: Figure 1)  Here, Jaiswal teaches (1) extracting a feature set from the training data set that includes statistically significant features of the positive examples within the training data set and statistically significant features of the negative examples within the training data set, and then (2) using the feature set to build a machine learning-based classifier model that is capable of indicating whether or not new items of data contain information that falls within the specific category of sensitive information associated with the training data set.  Examples of features include a word, e.g., ‘proprietary’, a pair of words, e.g., ‘stock market’, and a phrase, e.g., ‘please do not distribute’.  Specifically, a weight may be associated with each extracted feature in order to indicate the relative importance of that feature relative to other features.  Training module 106 may (1) determine the frequency of occurrence of various features, e.g., words, within both the positive and negative examples within the training data set, (2) rank these positive features and negative features based on the frequency of occurrence, and (3) select the highest ranked features for inclusion within a feature set.  The weight associated with each feature may be the frequency of occurrence of the specific feature.  Training module 106 may filter out commonly used words during this process, including ‘the’, ‘it’, ‘and’, ‘or’, etc.  Training module may use a term frequency-inverse document frequency (TF-IDF) algorithm to select and/or weight features within the feature set, or may use a feature-extraction and/or feature-weighting algorithm of segment-set term frequency – inverse document frequency (STF-IDF).  (¶[0053] - ¶[0056]: Figure 3)  Compare Applicants’ Claim 29, which describes weighting as including tf.idf.  Jaiswal, then, clearly teaches “choosing and weighting” features and using these “weighted data features” to train a machine learning model.  An objective is to more accurately detect and protect sensitive data using machine-learning techniques to identify sensitive data that is similar to but not exactly the same as known examples of sensitive data.  (¶[0003])  It would have been obvious to one having ordinary skill in the art to train a machine learning model using weighted data features as taught by Jaiswal to moderate user-generated content between acceptable and unacceptable contents in Revesz et al. for a purpose of detecting data using machine-learning that is similar to but not exactly the same as known examples.

Concerning claim 4, Revesz et al. discloses that features can be generated by n-grams, which are a sub-sequence of consecutive textual items from a particular textual sequence (“wherein the features consists of one or more of . . . text items, . . . , n-grams”).  (¶[0037] - ¶[0038])  Additionally, these n-grams can be unigrams, which are individual “words”, and implicitly, words at least comprise “characters” and “character strings”, where n-grams are “words combinations” and “phrases”.
Concerning claim 5, Revesz et al. discloses that user-generated content comprises textual items, which may include unigrams, bigrams, and trigrams (“different types of data including at least one of text or metadata”).  (¶[0037])  
Concerning claim 6, Revesz et al. discloses a data structure (“a data format”) that includes table entries including one or more columns, and suitability information.  (¶[0074] - ¶[0076]: Figure 2)  Here, Figure 2 includes “separate fields” defined by the columns of the data structure representing different categories (‘abusive’, ‘racist’, ‘sexist’) (“for different types of data content”), where each category is a ‘label’. 
Concerning claim 9, Revesz et al. discloses that features table 650 has a column 654 for unique IDs associated with n-grams, and a column 656 for n-gram frequencies.  (¶[0111]: Figure 6B); here, a feature frequency is a “feature distribution”; one or more stop words may be discarded from the sorted features table; stop words are commonly used terms and tend to appear at the top of the sorted features table due to their relatively high frequencies; stop words may be discarded after n-grams are generated, or may be discarded before n-grams are generated (¶[0105]: Figure 5: Step 510); n-gram entries in features table 650 are sorted by decreasing n-gram frequency, and entries with stop words as n-grams are discarded from features table 650 (¶[0111]: Figure 6B); here, discarding stop words that have relatively high frequency is equivalent to “not selecting text items which are too frequent”; that is, a stop word is a text item that is too frequent.
Concerning claim 13, Revesz et al. discloses that a reference corpus can be enriched with more examples to improve the accuracy of the trained system (“sending additional training data to be used for updating the moderation model”) (¶[0099]: Figure 4: Step 404).
Concerning claim 15, Revesz et al. discloses moderating user-generated content in an online environment (Abstract); moderation of user-generated content is performed before publication of the content on a web page (¶[0029]); a machine learning system may be used on new user-generated content for a blog (¶[0097]); computing device 1600 may include a network interface 1612 for the Internet (“an interface that communicates”) (¶[0191]: Figure 16); clients 1706 and 1708 may communicate with servers 1702 and 1704 over the Internet (“that communicates with client devices and using the service”) (¶[0193]: Figure 17); generally, content moderation in an online environment for web pages is “a web service”.
Concerning claims 16 to 19, Revesz et al. discloses that a network environment 1700 may include one or more servers 1702 and 1704 coupled to one or more clients 1706 and 1708; clients 1706 and 1708 may train and test a machine learning system, and submit the trained machine learning system to servers 1702 and 1704 for using the trained machine learning system to moderate user-generated content; alternatively, servers 1702 and 1704 may train and test a machine learning system, and submit the trained machine learning system to clients 1706 and 1708 (¶[0196] - ¶[0197]: Figure 17).  Here, moderating user-generated content is disclosed with a capability of being performed at the client or at the server according to principles of distributed processing.  The steps a) and b) of receiving training data and implementing moderation can be performed “on the client device”, or “implemented as a standalone tool on a server” and “located on a separate moderation server accessed through a technical interface”, where “a technical interface” can be a network interface. 
Concerning claim 21, Revesz et al. discloses moderation of user-generated content that can include blogs and textual content of a web page (“wherein the data contents to be moderated are user-generated content including at least one of: blogs”).  (¶[0070] and ¶[0097])  
Concerning claim 22, Revesz et al. discloses that moderation of user-generated content is performed before publication of the content on a web page (“earlier published by the client”).  (¶[0029])  A reference corpus may be compiled for each category to contain verified positive and negative content examples, where the verification may be performed by a human reviewer (“wherein training data is based on human-generated moderated data”).  (¶[0092]: Figure 4: Step 404)  Moderation of user-generated content that can include blogs and textual content of a web page.  (¶[0070] and ¶[0097])  Implicitly, a blog is “published by a client” at a time that is conventionally, “earlier”, but, logically, publication of a blog must be either “earlier” or “not earlier”. 
Concerning claim 23, Revesz et al. discloses that different folds are identified in the reference data for use in training and testing the machine learning system (¶[0118]: Figure 7: Step 704); the trained machine learning system is tested on test vectors (¶[0123]: Figure 7: Step 714); for each parameter value, an accuracy is determined of the machine learning system; embodiments assess the change in accuracy over different values for each parameter, and parameter values are selected to maximize the accuracy of the machine learning system; this results in the generation of a set of parameter values, each parameter value corresponding to a different parameter, of which the accuracy of the machine learning system is maximized (¶[0128] - ¶[0129]: Figure 7: Step 728); here, determining an accuracy of the machine learning system with testing data is equivalent to “checks the performance of the updated moderation model for a test set separated from the additional training data”; when an accuracy is maximized for a machine learning system this can “assure that it works better than the foregoing moderation model”.
Concerning claim 24, Revesz et al. discloses automatically determining a probability value indicating that the user-generated content is either a positive example or a negative example of one or more unsuitable categories to a predefined degree of certainty (Abstract; ¶[0031]); a probability value is used as an indication that the content is an example of an unsuitable category, where a higher likelihood value is that the content is unsuitable for publication (¶[0050]); embodiments aggregate the unsuitability information for the comments of the user to generate an indication of how suitable or unsuitable the user’s comments are in each category (¶[0081]: Figure 3: Step 304); clients may moderate user-generated web page content (¶[0194] - ¶[0195]: Figure 17); here, a probability value or likelihood of certainty that content is suitable or unsuitable is equivalent to “a confidence value revealing how certain the moderator tool is about the moderation result”, where this moderation result is “sent to the client device”.
Concerning claim 25, Revesz et al. discloses that different threshold values can be set for different unsuitable categories; threshold values may be lower for a more inflammatory category of ‘racist’, and the threshold values may be higher for a more general category of ‘abusive’ (¶[0054]); here, if a threshold value is set higher or lower for a category, then this is equivalent to “applying a strictness value on the moderation request”; that is, a lower threshold value would have a greater ‘strictness’ as compared to a higher threshold value. 
Concerning claim 27, Revesz et al. discloses clients 1706 and 1708 may train and test a machine learning system, and submit the trained machine learning system to servers 1702 and 1704 for using the trained machine learning system.  (¶[0194]: Figure 17)  That is, Revesz et al. uses clients that provide their own training data so that a machine learning system (“a moderation model”) “is specific for a given type of training data to serve each client device individually by using a specific moderation model.” 
Concerning claims 29 and 31, Jaiswal teaches that training module 106 may use a term frequency-inverse document frequent (TF-IDF) algorithm to select and/or weight features within the feature set (“wherein weighting text items includes weighting text items based on at least one of term frequency-inverse document frequency (tf.idf) . . .” and “wherein the moderation tool, in executing the first algorithm and the second algorithm, performs at least one of . . . weighting using tf.idf”), or feature extraction and/or feature weighting algorithms of segment-set term frequency-inverse segment-set frequency (STF-ISSF), or segment-set term frequency-inverse document frequency (STF-IDF).  (¶[0056]) 

Claims 7 to 8 are rejected under 35 U.S.C. 103 as being unpatentable over Revesz et al. (U.S. Patent Publication 2015/0154289) in view of Jaiswal (U.S. Patent Publication 2012/0303558) as applied to claim 1 above, and further in view of Davi et al. (U.S. Patent Publication 2011/0078242).
Concerning claim 7, Revesz et al. does not expressly disclose using metadata for moderation.  Still, Figure 2 illustrates a data table that includes various data that could be construed as ‘metadata’, e.g., user ID, warned, and flagged.  Anyway, Davi et al. teaches a similar system of automatic moderation of media content, where moderation metadata 64 describes the corresponding moderation action executed by the originating online provider, including an originating moderator field 64a and an action field 64b.  Additionally, media content metadata 62 can include a unique identifier, specification of media type, a title assigned to the media content, and information about the registered user.  (¶[0037] - ¶[0040])  An objective is to provide automatic moderation that has an advantage of being scalable when a moderator can be overwhelmed by an amount of uploaded content.  (¶[0002])  It would have been obvious to one having ordinary skill in the art that a data table of Revesz et al. includes metadata as taught by Davi et al. for a purpose of providing automatic moderation that is scalable and does not produce data that overwhelms a moderator.
Concerning claim 8, Revesz et al. discloses that a features table is updated for new examples in a reference corpus (“processing content data for defining additional features”).  (¶[0102]: Figure 5: Step 504)

Claim 10 is rejected under 35 U.S.C. 103 as being unpatentable over Revesz et al. (U.S. Patent Publication 2015/0154289) in view of Jaiswal (U.S. Patent Publication 2012/0303558) as applied to claim 1 above, and further in view of Mylonakis et al. (U.S. Patent Publication 2014/0200878).
Concerning claim 10, Revesz et al. discloses dividing training data into a training set and test set.  (¶[0118] - ¶[0119]: Figure 7: Steps 704 to 706)  However, Revesz et al. does not expressly disclose “a development set” for “defining some parameters for the model”.  Still, Mylonakis et al. discloses model adaptation using training examples, where features of a model 112 are optimized on a development set 66 of text to optimize, e.g., maximize, a scoring metric.  (¶[0054])  Feature weights are tuned on a development set for a mixture of usage and particular styles and genres that match a test domain.  (¶[0078])  The aim is to optimize the scores over all training samples in a development corpus until an optimal combination of weights is found.  (¶[0083])  It would have been obvious to one having ordinary skill in the art to use a development set for defining parameters for a model as taught by Mylonakis et al. to moderate user-generated content of Revesz et al. for a purpose of optimizing a model to match a mixture of styles and genres of a test set. 

Claim 14 is rejected under 35 U.S.C. 103 as being unpatentable over Revesz et al. (U.S. Patent Publication 2015/0154289) in view of Jaiswal (U.S. Patent Publication 2012/0303558) as applied to claim 1 above, and further in view of Miura et al. (U.S. Patent Publication 2015/0254228).
Revesz et al. does not expressly disclose a moderation model that performs “language-specific processing”.  However, Revesz et al. would implicitly produce a moderation model that is specific to a given language, e.g., English.  Anyway, Miura et al. teaches using machine learning to estimate a topic of a document, where an information processing unit handles English as a first language and Japanese as a second language.  (¶[0021])  A controller executes a multilingual document classifying program, which obtains text from first-language field A and second-language field A, and obtains first-language-and-second-language word-sense information.  (¶[0024] - ¶[0027])  An objective is to execute processing for classifying multilingual documents.  (¶[0006])  It would have been obvious to one having ordinary skill in the art to perform language-specific processing of text as taught by Miura et al. to moderate user-generated content of Revesz et al. for a purpose of classifying multilingual documents. 

Claim 20 is rejected under 35 U.S.C. 103 as being unpatentable over Revesz et al. (U.S. Patent Publication 2015/0154289) in view of Jaiswal (U.S. Patent Publication 2012/0303558) as applied to claim 1 above, and further in view of Srinivasan et al. (U.S. Patent Publication 2015/0317562).
Revesz et al. does not expressly disclose that a client moderation tool has an application programming interface (API), but this is a common software component that could be construed as inherent.  That is, a web interface may be construed as an API, and a web interface appears to be disclosed by Revesz et al.  Anyway, Srinivasan et al. teaches automatic moderation of online content, where a web page graphical user interface 310 can generate new content.  (¶[0023]: Figure 3)  This web page graphical user interface can be construed as an API.  An objective is to reduce a difficulty for human moderators to process the sheer volume of information in a timely manner due the popularity of social networking.  (¶[0002])  It would have been obvious to one having ordinary skill in the art to include an application programming interface as taught by Srinivasan et al. to moderate user-generated content of Revesz et al. for a purpose of helping human moderators to process large volumes of information.

Claim 30 is rejected under 35 U.S.C. 103 as being unpatentable over Revesz et al. (U.S. Patent Publication 2015/0154289) in view of Jaiswal (U.S. Patent Publication 2012/0303558) as applied to claim 1 above, and further in view of Lim et al. (U.S. Patent Publication 2011/0078187).
Revesz et al. does not expressly disclose that a moderator tool differentiates the acceptable and unacceptable contents “by defining a boundary in the feature space” that separates the labeled acceptable contents and the unacceptable contents.  Still, “defining a boundary in the feature space” is a known characteristic of support vector machines (SVMs).  Specifically, Jaiswal teaches that a machine learning-based classifier includes a map of support vectors that represent boundary features, and these boundary features may be selected from and/or represent the highest ranked features in a feature set.  (¶[0057])  Jaiswal, then, may be construed to teach “wherein the moderator tool differentiates the acceptable and unacceptable contents by defining a boundary in the feature space that separates the labeled acceptable contents and the unacceptable contents”.  
Even if these limitations are omitted by Jaiswal, they are taught by Lim et al.  Generally, Lim et al. teaches training a classifier based on examples and extracted features.  (Abstract)  Active learning logic can select records of tuples that are deemed important to characterizing the query as optimized examples, and the user may be prompted to label the optimized examples in the learning process as satisfying or not satisfying the query.  The learning process begins by developing the classifying feature vectors.  (¶[0034] - ¶[0035])  Training data should include both positive and negative examples.  (¶[0028])  Specifically, Lim et al. teaches that support vector machines (SVMs) may be used as a base learner in creating the classifier, where SVMs are a class of supervised learning algorithms that learn a linear decision boundary to discriminate between two classes of positive and negative examples in the training data.  The result is a linear classification rule that can be used to classify new test examples as part of the classifier.  SVMs learn a decision boundary in a combined feature space of features.  Decision boundaries are established and refined using SVMs as a hyperplane comprising a central decision boundary with a margin positioned between the positive and negative examples in the training data.  An active learning logic iteratively selects the best example to incorporate into the training data to improve the quality of the classifier.  (¶[0040] - ¶[0041]: Figure 5)  Lim et al., then, teaches “defining a boundary in a feature space that separates the labeled acceptable contents and unacceptable contents” as illustrated in Figure 5.  An objective is to train a classifier with a support vector machine using optimized examples.  (¶[0040])  It would have been obvious to one having ordinary skill in the art to define a boundary in a feature space that separates labeled acceptable and unacceptable contents as taught by Lim et al. to moderate user-generated content of Revesz et al. for a purpose of using a support vector machine to train a machine learning model with optimized examples.



Response to Arguments
Applicants’ arguments filed 05 May 2022 have been fully considered but they are not persuasive.
Applicants amend the independent claims to delete limitations directed to “wherein the moderator tool differentiates the acceptable contents and the unacceptable contents by defining a boundary in the feature space that separates the labeled acceptable content and unacceptable contents” and “wherein the moderator tool, in executing the first algorithm and the second algorithm, performs at least one of language detection, determining sums, determining means, or determining distribution parameters, weighting using tf.idf, weighting using entropy, or normalizing document vectors”.  Applicants include these deleted limitations as new dependent claims 30 to 31.  Applicants state that these claim amendments revert to the previous state of the independent claims for purposes of appeal.  
Then Applicants present arguments directed against the rejection of these broadened independent claims as being obvious under 35 U.S.C. §103 over Revesz et al. (U.S. Patent Publication 2015/0154289) in view of Jaiswal (U.S. Patent Publication 2012/0303558).  Applicants state that they incorporate by reference the arguments of the Declaration under 37 CFR §1.132 of co-inventor Mari-Sanna Paukkeri, and the prior arguments of an amendment, filed on 18 October 2021.  Applicants note that the Office Action maintains that the Declaration under Rule 132 is directed to opinion evidence going to an ultimate issue of patentability as a legal conclusion, but allege that the Rule 132 Declaration contains no opinion on the legal conclusion at issue, i.e., obviousness.  Applicants focus on the limitations of “choosing the features to be used in a moderation model to be created and defining a weighting of data features, the choosing and defining based on the data contents labelled as the acceptable contents and the unacceptable contents, and training parameters of a machine learning model based on the weighted data features”.  Applicants argue that the terms ‘features’, ‘weighting’, and ‘parameters’ of the independent claims are separate elements as alleged in the Declaration under Rule 132.  Applicants contend that Jaiswal fails to teach weighting of features for training.  Applicants state that the Rule 132 Declaration explains that Jaiswal mentions weights only in connection with feature selection, but does not mention using weighted features for training.  Applicants argue, then, that “training parameters of a machine learning model based on the weighted data features in order to create the moderation model” is not taught by Jaiswal.  These arguments are not persuasive.
Firstly, the examiner would like to state at the outset that he believes that it is unfortunate that his suggestions to amend the claims to provide allowable subject matter or at least to forward prosecution were not accepted by Applicants, and that they have evidently decided to go forward with an appeal.  The USPTO, however, does have certain standards for patentability, and not every claim is deserving of patent.  A first patent is commonly the most difficult to obtain as new applicants may not yet fully appreciate legal standards of claim interpretation and strategies for patent prosecution.  Additionally, Applicants might have been led to believe that their Declaration should be accorded more weight than is justified under provisions of the Manual of Patent Procedure (MPEP), where even experienced attorneys sometimes fail to appreciate these provisions for evaluating Declarations.  Certainly, if Applicants do decide to go forward with an appeal, the Patent Trial and Appeal Board (PTAB) provides the final decision on patentability, subject to any additional appeal to the courts.  However, the examiner believes the issues here are fairly straightforward, and Applicants are unlikely to prevail.  Which is a shame because Applicants sound that they have a nice company.   
Secondly, it should be noted that the Rule 132 Declaration filed on 18 October 2021 is mainly directed against a prior rejection of the independent claims as being obvious under 35 U.S.C. §103 over Revesz et al. (U.S. Patent Publication 2015/0154289) in view of Tetreault et al. (U.S. Patent Publication 2017/0257329), but the current grounds of rejection do not rely upon Tetreault et al., but are instead premised on Revesz et al. (U.S. Patent Publication 2015/0154289) and Jaiswal  (U.S. Patent Publication 2012/0303558).  Mainly, Applicants’ Rule 132 Declaration is directed against Revesz et al. and Tetreault et al. in Paragraphs 7 to 22.  The prior rejection of claim 29 was based on Jaiswal, but this is only considered in Paragraphs 23 to 25 of the Rule 132 Declaration.  Applicants’ incorporation of arguments from the amendment and Rule 132 Declaration, then, mostly fail to address the new grounds of rejection.
Applicants should note, too, that their Declaration under Rule 132 has an issue of being untimely in accordance with MPEP §716.01, which states that Declaration must be filed (1) prior to final rejection, (2) before appeal in an application not having a final rejection, (3) after final rejection but before appeal, but only upon showing of good and sufficient reasons why it was necessary and not presented earlier, or (4) after prosecution is closed if filed with a request for continued examination (RCE).  Here, Applicants’ Declaration under Rule 132 was filed after final rejection, but did not include a showing of good and sufficient reasons why it was necessary and not presented earlier.  
Thirdly, Applicants allege that the Declaration under Rule 132 contains no opinion on the ultimate legal conclusion, but the examiner maintains that this statement is false.  MPEP §716.01(c) III states that factual evidence is preferable to opinion testimony, but opinion testimony is entitled to some weight if it does not go to an ultimate legal conclusion.  Mainly, Applicants’ Declaration under Rule 132 sets forth opinion evidence on how the language of the claims should be interpreted by one skilled in the art.  Applicants’ Paragraph 7 of the Declaration states: “In my view, neither Revesz nor Tetreault discloses automatically choosing/defining weighted features, based on training data . . . .”  Similarly, Paragraph 11 of the Declaration states: “In my view, Revesz also does not disclose the claim 1 language of ‘defining a weighting of data features . . . based on the data contents labelled as acceptable contents and unacceptable contents.’”  Then Paragraph 12 of the Declaration states: “From my experience, I believe that persons in the field of machine learning for language processing would not characterize the weighting of training examples in Revesz to be similar to the weighting of features in independent claim 1.”  Paragraph 13 states, “Thus, based on my review of Revesz, I believe that Revesz fails to disclose ‘choosing the features to be used in a moderation model to be created and defining a weighting . . . .’” and Paragraph 22 states, “Thus, based on my review, I believe that Tetreault fails to disclose ‘choosing features to be used in a moderation model to be created and defining a weighting of data features . . . .’”  Here, the point is that these statements presented in the Rule 132 Declaration are replete with conclusory words of opinion, e.g., ‘I believe’, ‘in my view’, ‘based on my review’, but do not present factual evidence.  Granted, Applicants’ arguments in the Declaration do not actually state that an obviousness rejection is improper under 35 U.S.C. §103, but by the affiant simply alleging without factual support that the specific limitations cannot be construed as disclosed or taught by the two references, this is equivalently the same thing as going to an ‘ultimate legal conclusion’.  MPEP §716.01(c) III does provide that opinion evidence is given some weight, but the policy against according significant weight to opinion evidence is to exclude biased statements by an inventor which is what is going on here.  
Fourthly, Applicants argue that ‘features’, ‘weighting’, and ‘parameters’ of the independent claims are ‘separate elements’, but they do not explain why this statement is relevant.  However, Revesz et al. clearly discloses that a machine learning model has parameters.  See, e.g., Steps 702 and 726 to 728 of Figure 7 of Revesz et al.  Moreover, Jaiswal clearly teaches ‘features’ and ‘weighting’.  At ¶[0055], Jaiswal states:
[0055] The systems described herein may extract a feature set from a training data set in a variety of ways. In some examples, a weight may be associated with each extracted feature in order to indicate the relative importance of that feature relative to other features. For example, training module 106 may (1) determine the frequency of occurrence of various features (e.g., words) within both the positive and negative examples within a training data set, (2) rank these positive features and negative features based on, for example, frequency of occurrence, and then (3) select the highest ranked features for inclusion within a feature set. In this example, the weight associated with each feature may be the frequency of occurrence of the specific feature. In some examples, training module 106 may also filter out commonly used words during this process, such as "the," "it," "and," "or," etc. 

[0056] In some examples, training module 106 may use a term frequency-inverse document frequency (TF-IDF) algorithm to select, and/or weight features within, the feature set. Training module 106 may also use other feature-extraction and/or feature-weighting algorithms, such as segment-set term frequency-inverse segment-set frequency (STF-ISSF), segment-set term frequency-inverse document frequency (STF-IDF), Kullback-Leibler divergence (i.e., information gain), etc. In addition, training module 106 may perform feature extraction multiple times, each time using a different feature-extraction algorithm. The feature sets generated using the different algorithms may each be used to generate different machine learning-based classification models. In this example, the feature set with the highest quality metrics may be saved and the others may be discarded. In one embodiment, an administrator of DLP system 200 may specify the feature-selection algorithm to be used by training module 106.

Fifthly, Applicants’ only remaining argument is that Jaiswal somehow does not use these weighted features to train parameters of a machine learning model.  The examiner maintains that this argument is not reasonable because training of a machine learning model using these features is repeatedly taught by Jaiswal.  Here, ¶[0006] of Jaiswal states:
[0006] In one embodiment, using machine learning to train the machine learning-based classifier may include, for each training data set, (1) extracting a feature set from the training data set that includes statistically significant features of the positive examples within the training data set and statistically significant features of the negative examples within the training data set and then (2) building a machine learning-based classification model from the feature set that is capable of indicating whether or not items of data contain the specific category of sensitive information associated with the training data set. In some embodiments, the negative examples within a particular training data set may represent the positive examples from all other training data sets. 

Similarly, ¶[0053] of Jaiswal states:

[0053] The systems described herein may perform step 306 in a variety of ways and contexts. In one example, the systems described herein may train the machine learning-based classifier by, for each training data set obtained in step 304, (1) extracting a feature set from the training data set that includes statistically significant features of the positive examples within the training data set and statistically significant features of the negative examples within the training data set and then (2) using the feature set to build a machine learning-based classification model that is capable of indicating whether or not new items of data contain information that falls within the specific category of sensitive information associated with the training data set.

Clearly, Jaiswal teaches using statistically significant features to train a machine learning model.  Even if Jaiswal only teaches weighting of features at ¶[0054] - ¶[0055], these are the same features comprising the feature set that are used to train the machine learning model.  That is, the features are extracted from the training data set as a feature set, and the feature set is used to train the machine learning model.  It is not reasonable to argue that the weighted features are not the same features in the feature set that are used to training the machine learning model merely because there are only two paragraphs that describe weighting the features in the feature set of Jaiswal.  
The rejection of the independent claims as being obvious under 35 U.S.C. §103 over Revesz et al. (U.S. Patent Publication 2015/0154289) in view of Jaiswal (U.S. Patent Publication 2012/0303558) is maintained to be proper.  The rejection of some of the dependent claims continues to rely upon Davi et al. (U.S. Patent Publication 2011/0078242), Mylonakis et al. (U.S. Patent Publication 2014/0200878), Miura et al. (U.S. Patent Publication 2015/0254228), and Srinivasan et al. (U.S. Patent Publication 2015/0317562).  New grounds of rejection are set forth as directed to dependent claim 30 as being obvious under 35 U.S.C. §103 further in view of Lim et al. (U.S. Patent Publication 2011/0078187).  These new grounds of rejection are necessitated by amendment due to the reversion of limitations from the independent claims to the dependent claims by Applicants.  Here, Lim et al. was cited against a corresponding dependent claim in prior rejections.
The rejection are maintained to be proper.  Applicants’ arguments are not persuasive.  Any new grounds of rejection are necessitated by amendment.  Accordingly, this rejection is properly FINAL.

Conclusion
Applicants’ amendment necessitated the new grounds of rejection presented in this Office Action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP §706.07(a).  Applicants are reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MARTIN LERNER whose telephone number is (571) 272-7608.  The examiner can normally be reached on Monday-Thursday 8:30 AM-6:00 PM.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel Washburn can be reached on (571) 272-5551.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair.  Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/MARTIN LERNER/Primary Examiner
Art Unit 2657     
May 17, 2022