DETAILED ACTION

Introduction

1.	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . A response was filed in this application on 04/19/2022. Claims 1 and 5 have been amended in this submission, while no new claims have been added or cancelled. Thus claims 1-8 are currently pending for reconsideration by the Examiner and are examined below.

Response to arguments

2.	The Applicants arguments with regards to the newly amended claim limitations have been fully considered but are moot in light of new grounds of rejection as necessitated by amendments presented in this latest submission. 

	With regards to the unamended claim limitations, the Applicants arguments once again have been fully considered but are unpersuasive for at least the reasons outlined below. 

With regards to claim 1, the Applicants argue that Examiner quoted prior art reference Michalewicz (U.S. Patent Application Publication # 2002/0065857 A1) fails to disclose or mention "natural language processing (NLP) module configured to: match the textual data with at least one natural language processing (NLP) framework from a plurality of frameworks obtained from the one or more sources using a mapping method based on a plurality of parameters" as recited in amended claim 1 because Michalewicz uses data preparation module and data acquisition module while the Applicants claim using a natural language processing (NLP) module.

The Examiner would like to respectfully disagree and argue that the data in data preparation as well as in data acquisition modules in Michalewicz is natural language data. It is clear from paragraphs 87-92 of Michalewicz that the data acquired by the data acquisition module is obtained by use of web crawlers or intelligent "spiders" which are capable of crawling through the contents of the Internet. As the contents of the world wide web are in natural language, thus the data acquired is natural language data. This data is then processed by the data preparation module. 

Applicants next argue that Michalewicz fails to disclose or mention applying feature engineering and transformation on the textual data as recited in amended claim 1 because Michalewicz uses linguistic analysis.

The Examiner once again respectfully disagrees and agues that the Applicants have not explicitly defined feature engineering and transformation. Semantic features are being extracted in para 95 of Michalewicz, and the means by which they are extracted are interpreted as the claimed feature engineering and transformation step. This could be linguistic means (as the Applicants have themselves admitted) or statistical or graphical means as also recited in para 95. However, in the absence of any further details regarding said feature engineering and transformation, the metes and bounds of the instant claim are met. 

	The Applicants also argue that Michalewicz fails to teach analyzing matched textual data using at least one analysis method, wherein the analysis method comprises at least one of a part of speech (POS) tagging, a sentiment method, a topic modelling, a clustering method and a document classification method.

	The Examiner once again respectfully disagrees and argues that the above limitation requires only one of the analysis methods outlined, i.e. either of a part of speech (POS) tagging, a sentiment method, a topic modelling, a clustering method and a document classification method. In this regard, Michalewicz clearly teaches both clustering as well as document classification (by means of clustering) in para 95 and figure 6. Thus, once again the metes and bounds of the instant claim limitation are met. In case the Applicants wish to include all the analysis methods in this limitation, they should remove “at least one of” language from the claimed limitation.

	The Applicants have not presented any other arguments with regards to the dependent claims and hence they too are deemed addressed by the discussion above. 

Claim Rejections - 35 USC § 103

In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:

A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

3.	Claim 1-8 are rejected under 35 U.S.C. 103 as being unpatentable over Michalewicz (U.S. Patent Application Publication # 2002/0065857 A1) in view of Millius (U.S. Patent Application Publication # 2019/0034483 A1).


With regards to claim 1, Michalewicz teaches a system to analyze and predict impact of textual data comprising a memory configured to store a plurality of data sets acquired from one or more sources (Para 66, teaches a Data Acquisition module that provides for data acquisition and includes a file system and a database. The DA is designed to supply documents from the Web or user FS and update them with required frequency. Para 166, teaches that document are downloaded by the DSA module filtered and saved in the DSA database);

a processing subsystem operatively coupled to the memory, and configured to select textual data from the plurality of data sets (Para 166, teaches that document are downloaded by the DSA module filtered and saved in the DSA database);

extract data from one or more external sources through web crawling (Para 64, teaches that said Data Acquisition module also uses web crawlers or spiders that find and retrieve documents from a data source e.g., Internet, intranet, file system, etc.);

identify at least one context of the textual data using one or more context identification methods, wherein the processing subsystem comprises a natural language processing (NLP) module configured to match the textual data with at least one natural language processing (NLP) framework from a plurality of frameworks obtained from the one or more sources using a mapping method based on a plurality of parameters (Para 92, teaches building a comprehensive dictionary based on the keywords identified by the algorithms from the entire text of the document, and not on the keywords specified by the document creator. This eliminates the scope of scamming where the creator may have wrongly meta-tagged keywords to attain a priority ranking. The text is parsed not merely for keywords or the number of its occurrences, but the context in which the word appeared);

apply feature engineering and transformation on the textual data to extract a plurality of features from the plurality of data sets (Para 95, teaches the extraction of word semantics); 

analyze matched textual data using at least one analysis method, wherein the at least one analysis method comprises at least one of a part of speech (POS) tagging, a sentiment method, a topic modelling, a clustering method and a document classification method (Para 95 and figure 6, also teach the sequence of steps for the analysis-clustering process. The process creates a thematic catalog of documents on the basis of a pre-selected thematic structure of Web pages. The documents from the selected structure, and the words contained therein, are analyzed for statistical information such as, documents and word occurrences, identification of relationships between words, elimination of insignificant words, and extraction of word semantics. This step may also construct an inter-connection graph for the documents. The analyzed Web catalog documents are then grouped into larger blocks, e.g., clusters. The clusters are constructed into a hierarchical structure based on pre-calculated data);

store an analyzed result of the textual data in the memory (Para 143 and figure 15, teach that the best five documents for each cluster are preferably generated into DOC2TOKEN table. Elementary clusters with too many documents is found by implementing key algorithm 6. In this step, elementary clusters containing too many documents are found and this information is stored into LEAVES. Also, documents are assigned to elementary clusters and this information is saved into DOCS_FOR_LEAVES table);

a predictive module operatively coupled to the natural language processing (NLP) module, and configured to obtain the analyzed result of the textual data from the memory and predict one or more future values of the analyzed textual data using one or more predictive methods based on the analyzed result (Para 154, teaches the step of transforming unstructured textual data or web documents into structured data using the previously described steps. Para 159, further teaches summarizing the structured data. Statistical models for each language are applied to the summary and the model that models the text best i.e., the one that "predicts" the text best is assumed to reflect the actual language of the summary and, hence, of the whole input text);

However, Michalewicz may not explicitly detail the limitation wherein at least one or more context identification methods comprises at least one or more machine learning models. This is taught by Millius (Para 59, teaches improved text extraction and context determination outputs by predicting relevant portions of extracted text, assigned contexts for relevant text, and/or current user contexts with increased accuracy using machine-learned context determination model. Accuracy levels for a context determination model are increased due to a machine-learned context determination model's ability to consider text data from multiple users across one or more text-based applications operation on a mobile computing device. Accuracy levels for a text extraction model are also be increased due to a machine-learned text extraction model's ability to learn which portions of message text are relevant to a user and should be assigned to particular contexts or categories);

Millius also teaches predicting one or more future values of the analyzed textual data using one or more machine learning models (Para 59, teaches improved text extraction and context determination outputs by predicting relevant portions of extracted text, assigned contexts for relevant text, and/or current user contexts with increased accuracy using machine-learned context determination model. Accuracy levels for a context determination model are increased due to a machine-learned context determination model's ability to consider text data from multiple users across one or more text-based applications operation on a mobile computing device. Accuracy levels for a text extraction model are also be increased due to a machine-learned text extraction model's ability to learn which portions of message text are relevant to a user and should be assigned to particular contexts or categories);

Michalewicz and Millius can be considered as analogous art as they belong to a similar field of endeavor in text processing. It would thus have been obvious to one having ordinary skill in the art to advantageously combine the teachings of Millius (Use of machine learning models for text and context prediction) with those of Michalewicz (Method for analyzing and clustering of documents for a search engine) so as to provide improved text extraction and context determination outputs by predicting relevant portions of extracted text, assigned contexts for relevant text, and/or current user contexts with increased accuracy. Accuracy levels for a context determination model are increased due to a machine-learned context determination model's ability to consider text data from multiple users across one or more text-based applications operation on a mobile computing device. Accuracy levels for a text extraction model are also be increased due to a machine-learned text extraction model's ability to learn which portions of message text are relevant to a user and should be assigned to particular contexts or categories (Millius, para 59). 

With regards to claim 2, Michalewicz teaches the system as claimed in claim 1, wherein the plurality of data sets comprises at least one of a plurality of structured data sets, a plurality of unstructured data sets and a plurality of semi- structured data sets result (Para 154, teaches the step of transforming unstructured textual data including web documents into structured data. Para 64, teaches obtaining these web documents using spiders or web crawlers).

With regards to claim 3, Michalewicz teaches the system as claimed in claim 1, wherein the plurality of parameters comprises at least one of a use case, a statistical influence and a previous predictive sample (Para 92, teaches that the text is parsed not merely for keywords or the number of its occurrences, but the context in which the word appeared. Number of occurrences is a statistical influence).  

With regards to claim 4, Michalewicz teaches the system as claimed in claim 1, further comprises a representation module operatively coupled to the processing subsystem, and configured to represent one or more predicted future values in one or more forms (Para 165 and figure 17, teach that the user is able to retrieve the output information at the user interface 1708).

With regards to claim 5, Michalewicz teaches a method for analyzing and predicting impact of textual data comprising acquiring a plurality of data sets from one or more sources (Para 64, teaches that said Data Acquisition module also uses web crawlers or spiders that find and retrieve documents from a data source e.g., Internet, intranet, file system, etc.);

selecting textual data from the plurality of data sets (Para 166, teaches that document are downloaded by the DSA module filtered and saved in the DSA database);

identifying at least one context of the textual data using one or more context identification methods (Para 92, teaches building a comprehensive dictionary based on the keywords identified by the algorithms from the entire text of the document, and not on the keywords specified by the document creator. This eliminates the scope of scamming where the creator may have wrongly meta-tagged keywords to attain a priority ranking. The text is parsed not merely for keywords or the number of its occurrences, but the context in which the word appeared);

matching the textual data with at least one natural language processing (NLP) framework from a plurality of frameworks obtained from the one or more sources using a mapping method based on a plurality of parameters (Para 92, teaches building a comprehensive dictionary based on the keywords identified by the algorithms from the entire text of the document, and not on the keywords specified by the document creator. This eliminates the scope of scamming where the creator may have wrongly meta-tagged keywords to attain a priority ranking. The text is parsed not merely for keywords or the number of its occurrences, but the context in which the word appeared);

applying feature engineering and transformation on the textual data to extract a plurality of features from the plurality of data sets (Para 95, teaches the extraction of word semantics); 

analyzing matched textual data using at least one analysis method (Para 95 and figure 6, also teach the sequence of steps for the analysis-clustering process. The process creates a thematic catalog of documents on the basis of a pre-selected thematic structure of Web pages. The documents from the selected structure, and the words contained therein, are analyzed for statistical information such as, documents and word occurrences, identification of relationships between words, elimination of insignificant words, and extraction of word semantics. This step may also construct an inter-connection graph for the documents. The analyzed Web catalog documents are then grouped into larger blocks, e.g., clusters. The clusters are constructed into a hierarchical structure based on pre-calculated data);

and predicting one or more future values of the analyzed textual data using the one or more predictive methods based on an analysis result (Para 154, teaches the step of transforming unstructured textual data or web documents into structured data using the previously described steps. Para 159, further teaches summarizing the structured data. Statistical models for each language are applied to the summary and the model that models the text best i.e., the one that "predicts" the text best is assumed to reflect the actual language of the summary and, hence, of the whole input text);

However, Michalewicz may not explicitly detail predicting one or more future values of the analyzed textual data using one or more machine learning models. This is taught by Millius (Para 59, teaches improved text extraction and context determination outputs by predicting relevant portions of extracted text, assigned contexts for relevant text, and/or current user contexts with increased accuracy using machine-learned context determination model. Accuracy levels for a context determination model are increased due to a machine-learned context determination model's ability to consider text data from multiple users across one or more text-based applications operation on a mobile computing device. Accuracy levels for a text extraction model are also be increased due to a machine-learned text extraction model's ability to learn which portions of message text are relevant to a user and should be assigned to particular contexts or categories);

Michalewicz and Millius can be considered as analogous art as they belong to a similar field of endeavor in text processing. It would thus have been obvious to one having ordinary skill in the art to advantageously combine the teachings of Millius (Use of machine learning models for text and context prediction) with those of Michalewicz (Method for analyzing and clustering of documents for a search engine) so as to provide improved text extraction and context determination outputs by predicting relevant portions of extracted text, assigned contexts for relevant text, and/or current user contexts with increased accuracy. Accuracy levels for a context determination model are increased due to a machine-learned context determination model's ability to consider text data from multiple users across one or more text-based applications operation on a mobile computing device. Accuracy levels for a text extraction model are also be increased due to a machine-learned text extraction model's ability to learn which portions of message text are relevant to a user and should be assigned to particular contexts or categories (Millius, para 59). 

With regards to claim 6, Michalewicz teaches the method as claimed in claim 5, wherein acquiring the plurality of data sets from one or more sources comprises acquiring the plurality of data from at least one of a web, a manual entry of data, a local data set, an internal storage, an external storage and an experimental data set (Para 154, teaches the step of transforming unstructured textual data including web documents into structured data. Para 64, teaches obtaining these web documents using spiders or web crawlers).

With regards to claim 7, Michalewicz teaches the method as claimed in claim 5, wherein analyzing the matched textual data using the at least one analysis method comprises analyzing the matched textual data using at least one of a part of speech (POS) tagging, a sentiment method, a topic modelling, a clustering method and a document classification method (Para 95 and figure 6, also teach the sequence of steps for the analysis-clustering process. The process creates a thematic catalog of documents on the basis of a pre-selected thematic structure of Web pages. The documents from the selected structure, and the words contained therein, are analyzed for statistical information such as, documents and word occurrences, identification of relationships between words, elimination of insignificant words, and extraction of word semantics. This step may also construct an inter-connection graph for the documents. The analyzed Web catalog documents are then grouped into larger blocks, e.g., clusters. The clusters are constructed into a hierarchical structure based on pre-calculated data).

With regards to claim 8, Michalewicz teaches the method as claimed in claim 5, further comprises representing one or more predicted future values of the textual data in one or more forms (Para 165 and figure 17, teach that the user is able to retrieve the output information at the user interface 1708).

Conclusion

4.	Applicant's amendment necessitated the new grounds of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  The following prior art, made of record but not relied upon, is considered pertinent to applicant's disclosure: Smith (U.S. Patent Application Publication # 2018/0285461 A1), Takaai (U.S. Patent Application Publication # 2017/0046625 A1). These references are also included in the PTO-892 form attached with this office action.

A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 

Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.

Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. If you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). In case you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to NEERAJ SHARMA whose contact information is given below.  The examiner can normally be reached on Monday to Friday 8 am to 5 pm. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Pierre Louis-Desir can be reached on 571-272-7799 (Direct Phone).  The fax number for the organization where this application or proceeding is assigned is 571-273-8300.


/NEERAJ SHARMA/
Primary Examiner, Art Unit 2659
571-270-5487 (Direct Phone)
571-270-6487 (Direct Fax)
neeraj.sharma@uspto.gov (Direct Email)