Notice of Pre-AIA  or AIA  Status

The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Specification

The title of the invention is not descriptive.  A new title is required that is clearly indicative of the invention to which the claims are directed. 

Claim Rejections - 35 USC § 101

35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 16-18 are rejected under 35 U.S.C. 101 because these claims are directed to a computer readable medium, which does not fall into one of the enumerated four categories of patent eligible subject matter recited in 35 U.S.C. 101 (process, machine, manufacture, or composition of matter).  Claims 16-18 are non-statutory under the most recent interpretation of the Guidelines regarding 35 U.S.C.101 because the computer readable medium claimed is disclosed in the specification as both statutory and non-statutory (i.e., carrier wave) embodiments (see para 0030; although the paragraph defines a non-transitory machine readable storage medium, this is one definition of a generic memory; the claim language states ‘computer readable medium’, which is not limited to the non-transitory definition in para 0030).  When the broadest reasonable interpretation of a claim covers a signal per se, the claim must be rejected under 35 U.S.C. § 101 as covering non-statutory subject matter.  See In re Nuijten, 500 F.3d 1346, 1356-57 (Fed. Cir. 2007)  transitory embodiments are not directed to statutory subject matter) and Interim Examination Instructions for Evaluating Subject Matter Eligibility Under 35 U.S.C. § 101, Aug. 24, 2009; p. 2. 
	 
Claim Rejections - 35 USC § 102

In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. 
 
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claim(s) 1,14,15 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Kurata et al (20180090128).

As per claim 1, Kurata et al (20180090128) teaches a computer system to classify content based on a machine-learning natural language processing (ML-NLP) classifier, the computer system comprising (as word embedding using unsupervised and supervised learning – para 0017: a processor programmed to: 
receive content comprising natural language text (as receiving terms from a user through an interface – para 0021); 
access an identification of a plurality of topics of interest that are associated with a domain; provide an input based on the content to the ML-NLP classifier, the ML-NLP classifier pre-trained on a general corpus of data, further pre-trained a domain-specific corpus of data specific to the domain, and then fine-tuned on a topic-specific corpus of data labeled according to the plurality of topics of interest (with the first word embedding from a general corpus – para 0022, with a second word embedder fine-tuned – see para 0023, and as an example, specific domains – see para 0018);
 generate, as an output of the ML-NLP classifier, one or more classifications of the content based on the input, each classification indicating a respective probability that the content relates to a corresponding topic from among the plurality of topics of interest (as using a term frequency-inverse document frequency count to extract listings from documents, to generate a list of terms – para 0028);
and generate a report based on the one or more classifications (Generating the classification reports and displaying – para 0042).


As per claim 14, Kurata et al (20180090128) teaches the system of claim 1, wherein the processor is further programmed to: receive, via a graphical user interface, an identification of at least a first topic; identify, one or more content that was classified into the first topic; and provide data based on the identified one or more content (as, displaying to the user, the result – para 0040, which includes the identified content/topic – via, the NLP uses domain-specific para 0024, as well as terminologies in the domain specific corpus – para 0026; as an example – para 0004, searching for two types of different cars).

As per claim 15, Kurata et al (20180090128) teaches the system of claim 1, wherein the processor is further programmed to: receive, from a manual curator, feedback comprising an indication of whether or not the content was correctly identified as being related to a particular topic of interest from among the plurality of topics of interest (as, allowing the user to manual alter the threshold of the ratio between term frequency and inverse document frequency to establish if the word belongs to the topic); and add the content to the topic-specific corpus as labeled training data based on the indication (as adding the word to the corpus – para 0029-0031).
Claim Rejections - 35 USC § 103

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 2-9,12,13,16-20 are rejected under 35 U.S.C. 103 as being unpatentable over Kurata et al (20180090128) in view of Anerousis et al (20190188276).

	Claims 2-6 state differing thresholds and functions associated with those thresholds; namely, a first and second thresholds, wherein if the probability calculation exceeds a first threshold, classifying the content as relating to the topic;  if the probability calculation does not exceed a first threshold but exceeds a lower second threshold, mark the content to be re-checked/verified that it corresponds to the topic; if the probability calculation does not exceed a first threshold or a second threshold, marking the content as not belonging to the topic and further, remove the content from a pool of possible content relating to the topic.  
As per claims 2-6, Kurata et al (20180090128) teaches the system of claim 1, and further teaches “tf-idf” (term frequency, inverse document frequency) ratios to determine if a word belongs to a topic and populate the report based on the determinations (see the application of Kurata et al for claim 1); but does not explicitly teach, a first or second threshold, and performing differing tasks (acceptance, re-checking/validating, or rejecting/removing) based on the score versus the 2 thresholds;  Anerousis et al (20190188276), however, teaches using a TF-IDF matrix that contains various values measuring similarity/matching scores of a query word and a word in a topic  -- end of para 0038), along with probabilistic calculations (among various types of calculations) to measure a distance/closeness for a matching word into a topic – para 0042, Figure 5.  The classifier also includes term filtering and term association with topics, and using predetermined values in making these classifications (para 0043, see Fig. 6).  Relating to the claim scope pertaining to different thresholds, not only does Anerousis et al (20190188276) teach the standard matching score threshold to match to a topic – para 0029, Anerousis et al (20190188276) also teaches a lower threshold to determine a stop word list (para 0028), ie, if the occurrence is high enough but a different threshold not to meet the topic, then the word is on a ‘stop’ list (ie, removed from the pool for consideration).  Further, Anerousis et al (20190188276) teaches maximizing probabilities and clustering into groups – para 0066 – showing intermediate values and clustering and updating/refreshing/rechecking these models – para 0065.  Therefore, it would have been obvious to one of ordinary skill in the art of relational topic tying to modify the simplified tf-idf of Kurata et al (20180090128) with the enhanced matrix calculations of Anerousis et al (20190188276) relating words with topics because it would advantageously improve upon the recommendations of the application program interface in providing topic suggestions on a first and second text data input (Anerousis et al (20190188276), para 0007, 0021, 0023).       

As per claim 7, the combination of Kurata et al (20180090128) in view of Anerousis et al (20190188276) teaches the system of claim 2, wherein the content relates to an entity, and wherein the processor is further programmed to: receive a request for contents relating to the entity and the plurality of topics of interest (Anerousis et al (20190188276), solving the issue of entities and topics of interest – para 0022, relating topics to entities – para 0023); determine that the content relates to the entity, wherein the report is generated responsive to the request; and transmit the report to the user (Kurata et al (20180090128), generating the classification reports and displaying – para 0042).

As per claims 8,9, the combination of Kurata et al (20180090128) in view of Anerousis et al (20190188276) teaches first, second, multiple topics of interest, as well as reclassifying secondary topics of interest (Anerousis et al (20190188276), para 0030-0031—topics can be inferred, or re-evaluated through clustering).

As per claim 12, the combination of Kurata et al (20180090128) in view of Anerousis et al (20190188276) teaches the system of claim 1 (see application of the Kurata et al (20180090128) reference to claim 1 above) wherein the processor is further programmed to: receive an identification of one or more new topics of interest; update the plurality of topics of interest based on the one or more new topics of interest; and retrain the ML-NLP classifier based on the updated plurality of topics ((Anerousis et al (20190188276), para 0030-0031—topics can be inferred, or re-evaluated through clustering; as well as new domains – para 0046).

As per claim 13, the combination of Kurata et al (20180090128) in view of Anerousis et al (20190188276) teaches the system of claim 1, wherein the processor is further programmed to: access a relevance score that indicates a level of relevance of the content to an entity of interest; and weight the respective probabilities based on the relevance score (Anerousis et al (20190188276), generating relevance scores with the content and entity, and weighting the respective probabilities --  using a TF-IDF matrix that contains various values measuring similarity/matching scores of a query word and a word in a topic  -- end of para 0038; along with probabilistic calculations (among various types of calculations) to measure a distance/closeness for a matching word into a topic – para 0042, Figure 5.  The classifier also includes term filtering and term association with topics, and using predetermined values in making these classifications (para 0043, see Fig. 6). 

Claims 16-18 are computer readable medium claims that perform steps common to claims 1-9,12,13 above and as such, claims 16-18 are similar in scope and content to claims 1-9,12,13 above; therefore, claims 16-18 are rejected under similar rationale as presented against claims 1-9,12,13 above.  Furthermore, Kurata et al (20180090128) teaches a storage medium – figure 1, memory, and cpu.

Claims 19,20 are method claims whose steps are performed by the system claims 1-9,12,13 above and as such, claims 19,20 are similar in scope and content to claims 1-9,12,13 above; therefore, claims 19,20 are rejected under similar rationale as presented against 1-9,12,13.  Furthermore, Kurata et al (20180090128) teaches a processor/memory – figure 1, memory, and cpu.

Claim 10 is rejected under 35 U.S.C. 103 as being unpatentable over Kurata et al (20180090128) in view of Coquard (20200327151).

As per claim 10, Kurata et al (20180090128) teaches the system of claim 1, wherein the ML-NLP classifier, but does not explicitly teach comprises a bidirectional encoder representations from transformers model; Coquard (20200327151) teaches word/topic relationships modeling new articles, webpages, contract documents and the like, using BERT (bidirectional encoder representations transformation) models – para 0096.  Therefore, it would have been obvious to one of ordinary skill in the art of machine learning modeling of natural language text to modify the Kurata et al (20180090128) natural language processing, using BERT models, as taught by Coquard (20200327151), because it would advantageously allow for various ways of modeling the text (Coquard (20200327151), para 0096).  The combination of Kurata et al (20180090128) in view of Coquard (20200327151) further teaches the corpus of data towards a financial domain, and relating to environmental, social or governance (Coquard (20200327151) , using the modeling engine for financial databases/ applications – see para 0111, used for risk analysis (governance) in financial transactions).

Claim 11 is rejected under 35 U.S.C. 103 as being unpatentable over Kurata et al (20180090128) in view of Abir (20040122656).

As per claim 11, Kurata et al (20180090128) teaches the claim elements of claim 1, as rejected above; furthermore, Kurata et al (20180090128) teaches the concept of updating the domain corpus (para 0024, see retraining as well); but does not explicitly teach performing a language translation from a first language to a second language, taking the translation and reverse translating to the first language, and update the corpus when the translation is different from the original; Abir (20040122656) teaches bidirectional language translation (ie, first language to a second language back to the first language) para 0202.  Therefore, it would have been obvious to one of ordinary skill in the art of corpus/database building to modify the database training/updating of Kurata et al (20180090128) with bidirectional language translation and database updating, as taught by Abir (20040122656), because it would advantageously build out databases in both language translation directions, thereby offering more accurate databases (see Abir, end of para 02020, as well as para 0213, 0214).  

Conclusion

The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Please see related art listed on the PTO-892 form.
Bala et al (20190333078) teaches natural language processing models for financial domains – para 0046,0047, 0069)
Wang et al (20180314689) teaches translation from a first language to a second language, reverting to a first language (para 0005 – the ‘third language’ is a reverse direction language translation to the original language).

Any inquiry concerning this communication or earlier communications from the examiner should be directed to Michael Opsasnick, telephone number (571)272-7623, who is available Monday-Friday, 9am-5pm. 
If attempts to reach the examiner by telephone are unsuccessful, the examiner's supervisor, Mr. Richemond Dorvil, can be reached at (571)272-7602.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free).

/Michael N Opsasnick/
Primary Examiner, Art Unit 2658
09/13/2022