DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This Office Action is in response to the amendment filed October 17, 2022.  Claims 1 and 10 have been amended.  Claims 8 and 17 have been cancelled.  Claims 1-7, 9-16, and 18 remain pending.

Claim Rejections - 35 USC § 101
In view of the amendment to claims 1 and 10, the rejections under 35 USC 101 have been withdrawn.

Claim Rejections - 35 USC § 103
The text of those sections of Title 35, U.S. Code not included in this action can be found in a prior Office action.
Claims 1-3, 5-6, 10-12, 14-15  are rejected under 35 U.S.C. 103 as being unpatentable over Li et al. (US 8,805,845; hereafter Li) in view of Kryscinski et al. (US 2021/0124876; hereafter Krys) in view of Xu et al. (US 2011/0231347, hereafter Xu) and  further in view of Asadorian et al. (US 2021/0241162: hereafter Asadorian).
Regarding claims 1 and 10, 
Li teaches:
A system for automatically labeling data using conceptual descriptions, the system comprising an electronic processor configured to (see Li col 12:1-5: operations of example methods described herein may be performed by one or more processors configured to perform the relevant operations)
for each of a plurality of categories (see Li col 4:61-64: four types of label entities [plurality of categories] produced by the first layer of the CRF-Tagger are used as possible candidates for classification),
determine one or more concepts associated with a conceptual description of the category (see Li col 1:12-20, col 3:19-40, col 9:65 – col 10:26: each document may belong to several topics; classify or categorize documents that may potentially be associated with topics [concepts] from a vast topics dictionary of skills [an example of a document], entries in the dictionary of skills [a conceptual description] are used by the framework as labels or tags for categorizing [of the category] electronic documents [a conceptual description]; weak classifiers module configured to apply one or more weak classifiers to the contents of the document in order to identify one or more [determine one or more] seed labels, seed labels represent preliminary content topics [concepts] associated with the electronic document [associated with a conceptual description]); and 
generate a weak annotator for each of the one or more concepts (see Li col 9:65 – col 10:26: weak classifiers module configured to apply one or more weak classifiers [generate a weak annotator] to the contents of the document [for each of the] in order to identify one or more [one or more] seed labels, seed labels represent preliminary content topics [concepts] associated with the electronic document); and
apply each weak annotator to each training data example (see Li col 3:41-58: Given a set of documents to be classified or categorized [to each training data example] the first step is to employ different multi-class classifiers that are trained over the given training dataset to produce initial labels for each document [apply each weak annotator]),
Li does not teach:
generate unlabeled training data examples from one or more natural language documents; when a training data example satisfies a weak annotator, output a category associated with the weak annotator;
Krys discloses:
generate unlabeled training data examples from one or more natural language documents (see Krys ¶ 44, 46, 47, 55: FIG. 2 is a method for generating an artificial, weakly-supervised data set for training a factual consistency checking model; sample module 132 of training data generation module extracts text samples from the source documents, each sample is a single sentence [natural language]; transform module 134 of [training] data generation module performs text transformations on the text sampled from source documents in order to create a training dataset—i.e., generated data, transformations generate novel claim sentences that may be used as examples for [a] training model; [¶ 55 explains that the data in the training dataset is later labeled (as is the case in the instant application), thereby demonstrating that the training data examples as detailed above in ¶ 46 and 47 is unlabeled]);
when a training data example satisfies a weak annotator, output a category associated with the weak annotator (see Krys ¶ 22, 35, 44: disclosure provides a weakly-supervised, model-based approach for verifying or checking factual consistency and identifying conflicts between source documents and a generated summary; training data generation module used to generate an artificial training dataset by applying one or more rule-based transformations to the one or more sentences sampled or extracted from one or more unannotated source documents of a dataset to generate respective novel claim sentences [training data example], each of the resulting claim sentences can be either semantically variant or invariant from the respective original sampled sentence, and training data generation module [weak annotator] labels them accordingly, for example, as “correct” if semantically invariant [satisfies] from the sampled sentence, or as “incorrect” if semantically variant [satisfies] from the sampled sentence [satisfies a weak annotator, output a category associated with the weak annotator]; method for generating an artificial, weakly-supervised data set for training a factual consistency checking model);
Li and Krys are considered to be analogous because they are from the field of natural language processing.  Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention(s) to have modified Li to incorporate the disclosure of Krys in order to create large amounts of training data for training a model at a marginal cost (see Krys ¶ 44, 52: FIG. 2 is a method for generating an artificial, weakly-supervised data set for training a factual consistency checking model; using an artificially generated dataset allows for creation of large volumes of data at a marginal cost).
Furthermore, regarding claims 1 and 10,
Li in combination with Krys do not disclose:
for each training data example, determine based on one or more categories output as a result of applying each weak annotator to the training data example, a probabilistic distribution of the plurality of categories, the probabilistic distribution representing, for each of the plurality of categories, a likelihood that the category is a correct label for the training data example; and for each training data example, label the training data example with a category having the highest value in the probabilistic distribution determined for the training data example.
Xu discloses:
for each training data example, determine, based on one or more categories output as a result of applying each weak annotator to the training data example, a probabilistic distribution of the plurality of categories (see Xu ¶ 7, 8, 24, 37-38, 48: generating training data configured to train the topic model comprising scanning a data source for search queries and collecting the search queries [training data] that have seed named entities, source scanned again for search queries having identified contexts [training data], search queries that have contexts are collected [training data], named entities are extracted [Named Entity Recognition in Query (NERQ)] from search queries that have contexts [for each training data example], at least one classification [category] is predicted for the detected named entity, entity and its classification(s) [plurality of categories] are outputted to the user; where each of the seed named entities 204 in the seed set 202 receives one or more classifications (also described as classes or labels) 122 through a classification assignment 208. There may be multiple classes 122 assigned to each seed named entity 204, based on the likelihood of the class 122, and considering the context of the seed named entity 204 in the queries it is extracted from [one or more categories output as a result of applying the weak annotator]; where the classification is done using automation; topic model is trained based on the new named entities and their classifications [categories] using a Weakly Supervised Latent Dirichlet Allocation (WS-LDA) learning method, processor executes an offline training component and an online prediction component; a query having one named entity may be represented as a triple (e,t,c), where e denotes a named entity, t is the context of the named entity, and c is the class [category] of the named entity; online prediction component is configured to perform the detection and prediction functions using a probabilistic approach generalized as a process of finding the most likely triples in a function G(q) for a query q, function G(q) may be generated by segmenting a query into a named entity its context in all possible ways and labeling the segmented named entities with all possible classes [plurality of categories], for each triple (e, t, c) in G(q) [distribution] the joint probability Pr(e, t, c) is then calculated [determine a probabilistic distribution of the plurality of categories]), 
the probabilistic distribution representing, for each of the plurality of categories, a likelihood that the category is a correct label for the training data example (see Xu ¶ 7, 21, 24, 48: generating training data configured to train the topic model comprising scanning a data source for search queries and collecting the search queries [for the training data example]; classes of named entities may be labels, topics, or categories [label, categories] for the named entities based on the context of the named entity as used in the query; a query having one named entity may be represented as a triple (e,t,c), where e denotes a named entity, t is the context of the named entity, and c is the class [category] of the named entity; function G(q) may be generated by segmenting a query into a named entity its context in all possible ways and labeling the segmented named entities with all possible classes [for each of the plurality of categories], for each triple (e, t, c) in G(q) [distribution] the joint probability Pr(e, t, c) [the probabilistic distribution] is then calculated, in an example embodiment the triples with highest probabilities are the output results for NERQ [representing a likelihood that the category is a correct label]); and
for each training data example, label the training data example with a category having the highest value in the probabilistic distribution determined for the training data example (see Xu ¶ 7, 21, 24, 48: generating training data configured to train the topic model comprising scanning a data source for search queries and collecting the search queries [training data] that have seed named entities, source scanned again for search queries having identified contexts [training data], search queries that have contexts are collected [training data], named entities are extracted [Named Entity Recognition in Query (NERQ)] from search queries that have contexts [for each training data example/for the training data example], at least one classification [category] is predicted for the detected named entity, entity and its classification(s) are outputted to the user; classes of named entities may be labels, topics, or categories [label, categories] for the named entities based on the context of the named entity as used in the query; a query having one named entity may be represented as a triple (e,t,c), where e denotes a named entity, t is the context of the named entity, and c is the class [category] of the named entity; function G(q) may be generated by segmenting a query into a named entity its context in all possible ways and labeling the segmented named entities with all possible classes [label the training data example with a category], for each triple (e, t, c) in G(q) [distribution] the joint probability Pr(e, t, c) [the probabilistic distribution] is then calculated, in an example embodiment the triples with highest probabilities are the output results [determined] for NERQ [having the highest value in the probabilistic distribution determined].
Li, Krys, and Xu are considered to be analogous because they are from the field of natural language processing.  Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention(s) to have modified Li and Krys to incorporate the disclosure of Xu in order to better understand the context of a training data input and therefore suggest more relevant classifications of the input (see Xu ¶ 23: identifying named entities can help to understand search intents [training data input] better, by identifying a named entity, more relevant query suggestions [classifications] may be generated by treating named entities and contexts separately).
The combination of Li, Krys, and Xu do not expressly teach:
train the machine learning system using the labeled training examples; execute the trained machine learning system to categorize a natural language passage
	Asadorian discloses:
train the machine learning system using the labeled training examples (see Asadorian ¶ 107: the set of objection-parent [training examples] may be used as the training data for training the model [train the machine learning system], the training data may be labeled [using the labeled training examples] using a supervised or a barely or weakly supervised technique); and 
execute the trained machine learning system to categorize a natural language passage (see Asadorian ¶ 24, 29, 133: system may automatically identify and classify objection messages [to categorize a natural language passage]; a message may be an email [or] text messages [natural language passage]; message classification component may identify that the new message is classified as an objection message based on processing the new message and the parent message using the machine learning model [execute the trained machine learning system]).
Li, Krys, Xu, and Asadorian are considered to be analogous because they are from the field of natural language processing.  Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention(s) to have modified the teachings of Li, Krys, and Xu to incorporate the disclosure of Asadorian in order to improve model processing efficiencies by reducing the amount of natural language input data that is analyzed using a machine learning model (see Asadorian ¶ 13: pre-processing techniques may be used to reduce the amount of messages that are analyzed using a machine learning model, which may improve processing efficiencies, among other benefits).
Regarding claim 10: 
method claim 10 and system claim 1 are related as a system and method of using the same, with each claimed element's function corresponding to the claimed method step. Accordingly claim 10 is similarly rejected under the same rationale as applied above with respect to the system claim.

Regarding claims 2 and 11, Li in view of Krys  in view of Xu and further in view of Asadorian teach all the limitations of claims 1 and 10 above.
The combination of Li, Krys, Xu, and Asadorian disclose:
select training data examples, based on the probabilistic distributions associated with the training data examples, to use to train a machine learning system (see Xu ¶ 7, 8, 21, 48: generating training data configured to train the topic model  [to use to train a machine learning system] comprising scanning a data source for search queries and collecting the search queries [select training data examples]; topic model is trained based on the new named entities and their classifications using a Weakly Supervised Latent Dirichlet Allocation (WS-LDA) learning method [machine learning system]; classes of named entities may be labels, topics, or categories for the named entities based on the context of the named entity as used in the query [associated with the training data examples]; online prediction component is configured to perform the detection and prediction functions using a probabilistic approach generalized as a process of finding the most likely triples in a function G(q) for a query q, for each triple (e, t, c) in G(q) [distribution] the joint probability Pr(e, t, c) is then calculated [based on the probabilistic distributions]).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention(s) to have modified the teachings of Li and Krys to incorporate the disclosure of Xu in order to better understand the context of a training data input and therefore suggest more relevant classifications of the input (see Xu ¶ 23: identifying named entities can help to understand search intents [training data input] better, by identifying a named entity, more relevant query suggestions [classifications] may be generated by treating named entities and contexts separately).

Regarding claims 3 and 12, Li in view of Krys  in view of Xu and further in view of Asadorian teach all the limitations of claims 2 and 11 above.
The combination of Li, Krys, Xu, and Asadorian disclose:
select training data examples to use to train a machine learning system when a value included in a probabilistic distribution determined for a training data example is above a predetermined threshold (see Xu ¶ 7, 8, 21, 41, 42: generating training data configured to train the topic model  [to use to train a machine learning system] comprising scanning a data source for search queries and collecting the search queries [select training data examples]; topic model is trained based on the new named entities and their classifications using a Weakly Supervised Latent Dirichlet Allocation (WS-LDA) learning method [machine learning system]; classes of named entities may be labels, topics, or categories for the named entities based on the context of the named entity as used in the query; data source scanned again, scanning the search queries in the data source for the contexts, search queries having the identified contexts may then be collected [for a training data example], new named entities extracted from those search queries containing the contexts, to ensure high quality extraction of new named entities [determined for a training data example] a heuristic threshold cut-off may be made in this process, if the new named entity appears with less than N unique contexts [when a value included in a probabilistic distribution] collected above the new named entity would be cut off/excluded from the collection [is above a predetermined threshold when more than N unique contexts], these new named entities may be used to train the topic model [train a machine learning system], topic model includes a probabilistic generative model based upon the idea that documents are mixtures of topics, where a topic is a probabilistic distribution over words, words in a document are independently sampled from document topics according to their word distributions; Pr(c|e)) for the newly extracted named entities with the probability of having a context t [a value included in a probabilistic distribution], given a particular classification c).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention(s) to have modified the teachings of Li and Krys to incorporate the disclosure of Xu in order to better understand the context of a training data input and therefore suggest more relevant classifications of the input (see Xu ¶ 23: identifying named entities can help to understand search intents [training data input] better, by identifying a named entity, more relevant query suggestions [classifications] may be generated by treating named entities and contexts separately).

Regarding claims 5 and 14, Li in view of Krys  in view of Xu and further in view of Asadorian teach all the limitations of claims 1 and 10 above.
The combination of Li, Krys, Xu, and Asadorian further discloses:
outputting a category associated with a concept for a training data example, when the training data example contains a term that is an instance of the concept (see Krys ¶ 35, 66, 67: training data generation module used to generate an artificial training dataset by applying one or more rule-based transformations to the one or more sentences sampled or extracted from one or more unannotated source documents [training data example] of a dataset to generate respective novel claim sentences [training data example, concept], each of the resulting claim sentences can be either semantically variant or invariant [a concept] from the respective original sampled sentence, and training data generation module [weak annotator] labels them accordingly, for example, as “correct” if semantically invariant [outputting a category associated with a concept for a training data example] from the sampled sentence, or as “incorrect” if semantically variant [outputting a category associated with a concept for a training data example] from the sampled sentence; factual consistency model determines or classifies whether the text summarization or claim sentence remains factually consistent with the source document, model may perform two-way classification to classify the claim sentence as either “CONSISTENT” (or correct) or “INCONSISTENT” (or incorrect) with the source document; factual consistency model configured to identify the portion [a term that is an instance of the concept] or span (e.g., words, phrases, sentences) of the source document that should support the claim sentence [when the training data example contains a term that is an instance of the concept].
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention(s) to have modified the teachings of Li and Xu to incorporate the disclosure of Krys in order to create large amounts of training data for training a model at a marginal cost (see Krys ¶ 44, 52: FIG. 2 is a method for generating an artificial, weakly-supervised data set for training a factual consistency checking model; using an artificially generated dataset allows for creation of large volumes of data at a marginal cost).

Regarding claims 6 and 15, Li in view of Krys  in view of Xu and further in view of Asadorian teach all the limitations of claims 1 and 10 above.
Li further teaches:
computing, using word embeddings, a similarity between a concept associated with a weak annotator and a training data example; and based on the computed similarity, determining whether to output a category for a training data example (see Li col 3:19-40, col 3:59-60, col 4:36-63: in order to classify or categorize documents [training data examples] that may be associated with topics [concepts] from a dictionary of skills, a framework for large-scale multi-label classification [hereafter “framework”] may be utilized, entries in the dictionary of skills [concepts] are used by the framework as labels or tags for categorizing electronic documents, contents of an electronic document [training data example] may be related to computer programming, cloud computing, and streaming video, [a concept associated with a weak annotator] and may be labeled by the framework [a weak annotator] with "computer programming," "cloud computing," and "streaming video" labels, entries in the dictionary of skills--referred to as labels when utilized by the framework--may be assigned correlation values [computing a similarity], correlation value for a pair of labels [between a concept associated with a weak annotator and a training data example] may be calculated [computed] based on co-occurrence of the labels; the framework [a weak annotator] may be described as consisting of three layers: (1) Weak-Classification [a weak annotator]; one or more weak classifiers may be used to discover the initial set of seed labels [concepts], two different types of weak classifiers [weak annotators] are used by the framework [a weak annotator], first weak classifier is a conditional random field based tagger (termed CRF-Tagger) that uses named entity recognition (NER), NER task is a typical natural language processing task of extracting named entities [a concept]  from the contents of a document [a training example]. dataset supports four types of label entities [categories], namely "Organization," "Person," "Location," and "Misc," the CRF-Tagger is designed to incorporate entries from a dictionary of skills [concepts] into the dictionary set of the CRF-Tagger. The CRF-Tagger employs traditional SVM-based [based on the computed similarity] classification [output a category] approach to determine whether an entry from the skills dictionary should be associated with the given document or not [determining whether to output a category for a training data example], SVM stands for "support vector machines," [using word embeddings] which are supervised learning models for data analysis and pattern recognition [computing a similarity]).

Claims 4, 7, 13, and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Li et al. (US 8,805,845; hereafter Li) in view of Kryscinski et al. (US 2021/0124876; hereafter Krys) in view of Xu et al. (US 2011/0231347, hereafter Xu) in view of Asadorian et al. (US 2021/0241162; hereafter Asadorian) and further in view of Björkqvist et al. (US 2022/0207240: hereafter Björk).
Regarding claims 4 and 13, Li in view of Krys  in view of Xu and further in view of Asadorian teach all the limitations of claims 1 and 10 above.
The combination of Li, Krys, and Xu disclose:
produce one or more natural language documents in a state in which training data examples may be extracted from the one or more natural language documents (see Krys ¶ 44, 46, 47: FIG. 2 is a method for generating [produce] an artificial, weakly-supervised data set [one or more natural language documents] for training a factual consistency checking model; sample module of training data generation module extracts text [produce] samples from the source documents, each sample is a single sentence [natural language]; transform module 134 of [training] data generation module performs text transformations on the text sampled from source documents in order to create a training dataset—i.e., generated data [from the one or more natural language documents], transformations generate novel claim sentences [training data examples] that may be used as examples for [a] training model [in a state in which training data examples may be extracted])
by extracting text from the natural language documents (see Krys ¶ 46: sample module of training data generation module extracts text [extracting text] samples from the source documents, each sample is a single sentence [from the natural language documents]); 
tokenizing the extracted text (see Krys ¶ 46, 52: sample module of training data generation module extracts text [extracted text] samples from the source documents; one or more training examples are injected with noise, for each token (e.g., word or grouping of characters) in a claim sentence, transform module add[s] noise);
splitting the text into sentences (see Krys ¶ 46: sample module of training data generation module extracts text samples [text] from the source documents, each sample is a single sentence [splitting the text into sentences]);
Where the motivation to combine is the same as previously presented.
Furthermore, regarding claims 4 and 13, 
The combination of Li, Krys, Xu, and Asadorian do not teach:
annotating each token with a part of speech tag; and annotating the dependency relations for pairs of words.
Björk discloses:
annotating each token with a part of speech tag (see Björk ¶ 107: a second set of natural language tokens, tokenizing the text [each token], part-of-speech (POS) tagging the tokens [annotating each token with a part of speech tag];
and annotating the dependency relations for pairs of words (see Björk ¶ 107: deriving [annotating] [the tokens] syntactic dependencies [the dependency relations] and meronym and holonym expressions, matched pairs of noun chunks [for pairs of words] are formed utilizing the meronym and holonym expressions [dependency relations], noun chunk pairs form or can be used to deduct meronym relation edges).
Li, Krys, Xu, Asadorian, and Björk are considered to be analogous because they are from the field of natural language processing.  Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention(s) to have modified the teachings of Li, Krys, Asadorian, and Xu to incorporate the disclosure of Björk in order to more quickly identify similarities and differences between two blocks of natural language text in technical documents (see Björk ¶ 9, 10: comparing of technical content of two or more documents or explaining the relevance of a particular document, for example for patent novelty or validity evaluation purposes; provide a solution that helps to quicker identify similarities and differences between two blocks of natural language such as a claim of one patent document and the contents of another patent document).

Regarding claims 7 and 16, Li in view of Krys  in view of Xu and further in view of Asadorian teach all the limitations of claims 1 and 10 above.
The combination of Li, Krys, Xu, and Asadorian do not teach:
wherein a training data example is a tuple including parts of speech of a natural language sentence.
Björk discloses:
wherein a training data example is a tuple including parts of speech of a natural language sentence (see Björk ¶ 51, 64, 75, 102: “Block of natural language” refers to a data instance containing a linguistically meaningful combination of natural language units, for example one or more complete or incomplete sentences [of a natural language sentence] of a language; “Natural language token” refers to a word or word chunk in a block of natural language, a token may contain also metadata relating to the word or word chunk such as the part-of-speech (POS) label or syntactic dependency tag, a “set” of natural language tokens [a tuple] refers to tokens grouped based on their POS label [a tuple including parts of speech]; the trainer typically receives as training data combinations of graphs [training data example] or augmented graphs; graph conversion subsystem is adapted to convert the blocks to graphs by first identifying from the blocks a first set of natural language tokens (e.g. nouns and noun chunks) [a tuple including parts of speech], the first set of tokens [a tuple including parts of speech] is arranged as nodes of said graphs [a training data example is] utilizing matched pairs).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention(s) to have modified the teachings of Li, Krys, Asadorian, and Xu to incorporate the disclosure of Björk in order to more quickly identify similarities and differences between two blocks of natural language text in technical documents (see Björk ¶ 9, 10: comparing of technical content of two or more documents or explaining the relevance of a particular document, for example for patent novelty or validity evaluation purposes; provide a solution that helps to quicker identify similarities and differences between two blocks of natural language such as a claim of one patent document and the contents of another patent document).

Claims 9 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Li et al. (US 8,805,845; hereafter Li) in view of Kryscinski et al. (US 2021/0124876; hereafter Krys) in view of Xu et al. (US 2011/0231347, hereafter Xu) in view of Asadorian et al. (US 2021/0241162;  hereafter Asadorian) and further in view of Baker et al. (US 10,002,371: hereafter Baker).
Regarding claims 9 and 18, Li in view of Krys  in view of Xu and further in view of Asadorian teach all the limitations of claims 1 and 10 above.
		The combination of Li, Krys, Xu, and Asadorian do not expressly teach:
		remove noisy concepts from the one or more concepts.
		Baker discloses:
remove noisy concepts from the one or more concepts (see Baker col 11:12-18, col 13:19-41: PMI used with the noisy-labeled data for each of the classes, assigned the class label of the category it is derived from; opinion mining model developed exploiting noisy-labeled data [noisy concepts], for noisy-labeled data the Pos and Neg noisy-labeled data were further filtered, all sentences with negation [concept] are filtered out [remove noisy concepts], sentences reversing a sentence's polarity [concept] are removed [remove noisy concepts], items that soften claims [concept] were also excluded [remove noisy concepts] from the collection [from the one or more concepts]).
Li, Krys, Xu, Asadorian, and Baker are considered to be analogous because they are from the field of natural language processing.  Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention(s) to have used the known technique of removing or excluding, while generating a training dataset, sentences with negation, sentences reversing a sentence’s polarity, and items that soften claims, as taught by Baker, to improve a similar method of generating a training dataset in the same way, as taught by Li, Krys, Asadorian, and Xu, in order to generate training datasets optimized for training a broader range of machine learning model types (see KSR v. Teleflex), including a sentiment classifier model (see Baker col 13:19-41: sentences with strong, not softened, sentiment are better for training sentiment classifiers, [as such] sentences with items that soften claims were also excluded from the collection).

Response to Arguments
Applicant's arguments filed October 17, 2022,  have been fully considered but they are not persuasive. 
Applicant argues Li, Xu and Kryscinski fail to teach for each of a plurality of categories, determine one or more concepts associated with a conceptual description of the category and generate a weak annotator for each of the one or more concepts.    The Examiner notes, as indicated in the rejection above, Li teaches determining one or more concepts associated with a conceptual description of the category (see Li col 1:12-20, col 3:19-40, col 9:65 – col 10:26: each document may belong to several topics; classify or categorize documents that may potentially be associated with topics [one or more concepts] from a vast topics dictionary of skills [an example of a document], entries in the dictionary of skills [a conceptual description] are used by the framework as labels or tags for categorizing [of the category] electronic documents [a conceptual description]; weak classifiers module configured to apply one or more weak classifiers to the contents of the document in order to identify one or more [determine one or more] seed labels, seed labels represent preliminary content topics [concepts] associated with the electronic document [associated with a conceptual description]); and  generate a weak annotator for each of the one or more concepts (see Li col 9:65 – col 10:26: weak classifiers module configured to apply one or more weak classifiers [generate a weak annotator] to the contents of the document [for each of the] in order to identify one or more [one or more] seed labels, seed labels represent preliminary content topics [concepts] associated with the electronic document); and apply each weak annotator to each training data example (see Li col 3:41-58: Given a set of documents to be classified or categorized [to each training data example] the first step is to employ different multi-class classifiers that are trained over the given training dataset to produce initial labels for each document [apply each weak annotator]).    Further,  Xu teaches generating training data configured to train the topic model comprising scanning a data source for search queries and collecting the search queries [training data] that have seed named entities, source scanned again for search queries having identified contexts [training data], search queries that have contexts are collected [training data], named entities are extracted [Named Entity Recognition in Query (NERQ)] from search queries that have contexts [for each training data example/for the training data example], at least one classification [category] is predicted for the detected named entity, entity and its classification(s) are outputted to the user; classes of named entities may be labels, topics, or categories [label, categories] for the named entities based on the context of the named entity as used in the query.  The cited references provide support for the concepts of "for each of a plurality of categories, determine one or more concepts associated with a conceptual description of the category and generate a weak annotator for each of the one or more concepts,” as recited in claim 1.  
Applicant argues Li does not disclose multiple categories (described as labels in paragraph [0003] of the specification as filed) where for each one of them there is conceptual description (see, for example, FIG. 3 of the specification as filed) and a determination of concepts (described, in paragraph [0022] of the specification as filed, as unigram or bigram keywords included in the conceptual description) associated with the conceptual description of the category.  In response to applicant's argument that the references fail to show certain features of applicant’s invention, it is noted that the features upon which applicant relies (i.e., details of the labels from paragraph 0003 of the specification; details of the conceptual description from Fig.3; details of concepts from paragraph 0022; or “unigram or bigram keywords included in the conceptual description”) are not recited in the rejected claim(s).  Although the claims are interpreted in light of the specification, limitations from the specification are not read into the claims.  See In re Van Geuns, 988 F.2d 1181, 26 USPQ2d 1057 (Fed. Cir. 1993).
Applicant argues Xu does not teach how the seed named entities are labeled automatically.  The Examiner notes, Xu specifically teaches the processing of seed named entities being labeled.  Additionally, Xu specifically teaches the classification can be performed using automation such that “any of the acts of any of the methods described herein may be implemented at least partially by a processor or other electronic device based on instructions stored on one or more computer-readable media” [paragraph 0104].    Since Xu specifically teaches how the labelling is conducted, and suggests that the methods can be implemented using automation (using processors and computer readable media), one having ordinary skill would have been able to apply the labelling processing automatically.
Applicant argues, “There is no indication in Xu that the predefined taxonomy is determined automatically. Therefore, paragraph [0037] does not teach, "for each training data example, determine, based on one or more categories output as a result of applying each weak annotator to the training data example, a probabilistic distribution of the plurality of categories, the probabilistic distribution representing, for each of the plurality of categories, a likelihood that the category is a correct label for the training data example," as recited by amended claim 1.”  Applicant's arguments fail to comply with 37 CFR 1.111(b) because they amount to a general allegation that the claims define a patentable invention without specifically pointing out how the language of the claims patentably distinguishes them from the references.



Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ANGELA A ARMSTRONG whose telephone number is (571)272-7598. The examiner can normally be reached M,T,TH,F 11:30-8:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Pierre Desir can be reached on 571-272-7799. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

ANGELA A. ARMSTRONG
Primary Examiner
Art Unit 2659



/ANGELA A ARMSTRONG/Primary Examiner, Art Unit 2659