DETAILED ACTION

Introduction

1.	This office action is in response to Applicant's submission filed on 08/07/2019. The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . Claims 1-8 are currently pending and examined below. 

Drawings

2.	The drawings filed on 08/07/2019 have been accepted and considered by the Examiner. 

Information Disclosure Statement

3.	The Information Statement (IDS) filed on 08/07/2019 has been accepted and considered in this office action and is in compliance with the provisions of 37 CFR 1.97.




Priority

4.	The Applicants priority to Indian Patent Application # 201841029703 filed on August 07, 2018 has been accepted and considered in this office action. 

Claim Rejections - 35 USC § 102

In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) The claimed invention was patented, described in a printed publication, or in public use, on sale or otherwise available to the public before the effective filing date of the claimed invention.



5.	Claims 1-8 are rejected under 35 U.S.C. 102 (a) (1) as being anticipated by Michalewicz (U.S. Patent Application Publication # 2002/0065857 A1).

With regards to claim 1, Michalewicz teaches a system to analyze and predict impact of textual data comprising a memory configured to store a plurality of data sets acquired from one or more sources (Para 66, teaches a Data Acquisition module that provides for data acquisition and includes a file system and a database. The DA is designed to supply documents from the Web or user FS and update them with required frequency. Para 166, teaches that document are downloaded by the DSA module filtered and saved in the DSA database);

a processing subsystem operatively coupled to the memory, and configured to select textual data from the plurality of data sets (Para 166, teaches that document are downloaded by the DSA module filtered and saved in the DSA database);

extract data from one or more external sources through web crawling (Para 64, teaches that said Data Acquisition module also uses web crawlers or spiders that find and retrieve documents from a data source e.g., Internet, intranet, file system, etc.);

identify at least one context of the textual data using one or more context identification methods, wherein the processing subsystem comprises a natural language processing (NLP) module configured to match the textual data with at Para 92, teaches building a comprehensive dictionary based on the keywords identified by the algorithms from the entire text of the document, and not on the keywords specified by the document creator. This eliminates the scope of scamming where the creator may have wrongly meta-tagged keywords to attain a priority ranking. The text is parsed not merely for keywords or the number of its occurrences, but the context in which the word appeared);

apply feature engineering and transformation on the textual data to extract a plurality of features from the plurality of data sets (Para 95, teaches the extraction of word semantics); 

analyze matched textual data using at least one analysis method, wherein the at least one analysis method comprises at least one of a part of speech (POS) tagging, a sentiment method, a topic modelling, a clustering method and a document classification method (Para 95 and figure 6, also teach the sequence of steps for the analysis-clustering process. The process creates a thematic catalog of documents on the basis of a pre-selected thematic structure of Web pages. The documents from the selected structure, and the words contained therein, are analyzed for statistical information such as, documents and word occurrences, identification of relationships between words, elimination of 

store an analyzed result of the textual data in the memory (Para 143 and figure 15, teach that the best five documents for each cluster are preferably generated into DOC2TOKEN table. Elementary clusters with too many documents is found by implementing key algorithm 6. In this step, elementary clusters containing too many documents are found and this information is stored into LEAVES. Also, documents are assigned to elementary clusters and this information is saved into DOCS_FOR_LEAVES table);

a predictive module operatively coupled to the natural language processing (NLP) module, and configured to obtain the analyzed result of the textual data from the memory and predict one or more future values of the analyzed textual data using one or more predictive methods based on the analyzed result (Para 154, teaches the step of transforming unstructured textual data or web documents into structured data using the previously described steps. Para 159, further teaches summarizing the structured data. Statistical models for each language are applied to the summary and the model that models the text best 

With regards to claim 2, Michalewicz teaches the system as claimed in claim 1, wherein the plurality of data sets comprises at least one of a plurality of structured data sets, a plurality of unstructured data sets and a plurality of semi- structured data sets result (Para 154, teaches the step of transforming unstructured textual data including web documents into structured data. Para 64, teaches obtaining these web documents using spiders or web crawlers).

With regards to claim 3, Michalewicz teaches the system as claimed in claim 1, wherein the plurality of parameters comprises at least one of a use case, a statistical influence and a previous predictive sample (Para 92, teaches that the text is parsed not merely for keywords or the number of its occurrences, but the context in which the word appeared. Number of occurrences is a statistical influence).  

With regards to claim 4, Michalewicz teaches the system as claimed in claim 1, further comprises a representation module operatively coupled to the processing subsystem, and configured to represent one or more predicted future values in one or more forms (Para 165 and figure 17, teach that the user is able to retrieve the output information at the user interface 1708).

claim 5, Michalewicz teaches a method for analyzing and predicting impact of textual data comprising acquiring a plurality of data sets from one or more sources (Para 64, teaches that said Data Acquisition module also uses web crawlers or spiders that find and retrieve documents from a data source e.g., Internet, intranet, file system, etc.);

selecting textual data from the plurality of data sets (Para 166, teaches that document are downloaded by the DSA module filtered and saved in the DSA database);

identifying at least one context of the textual data using one or more context identification methods (Para 92, teaches building a comprehensive dictionary based on the keywords identified by the algorithms from the entire text of the document, and not on the keywords specified by the document creator. This eliminates the scope of scamming where the creator may have wrongly meta-tagged keywords to attain a priority ranking. The text is parsed not merely for keywords or the number of its occurrences, but the context in which the word appeared);

matching the textual data with at least one natural language processing (NLP) framework from a plurality of frameworks obtained from the one or more sources using a mapping method based on a plurality of parameters (Para 92, teaches building a comprehensive dictionary based on the keywords identified by the 

applying feature engineering and transformation on the textual data to extract a plurality of features from the plurality of data sets (Para 95, teaches the extraction of word semantics); 

analyzing matched textual data using at least one analysis method (Para 95 and figure 6, also teach the sequence of steps for the analysis-clustering process. The process creates a thematic catalog of documents on the basis of a pre-selected thematic structure of Web pages. The documents from the selected structure, and the words contained therein, are analyzed for statistical information such as, documents and word occurrences, identification of relationships between words, elimination of insignificant words, and extraction of word semantics. This step may also construct an inter-connection graph for the documents. The analyzed Web catalog documents are then grouped into larger blocks, e.g., clusters. The clusters are constructed into a hierarchical structure based on pre-calculated data);

Para 154, teaches the step of transforming unstructured textual data or web documents into structured data using the previously described steps. Para 159, further teaches summarizing the structured data. Statistical models for each language are applied to the summary and the model that models the text best i.e., the one that "predicts" the text best is assumed to reflect the actual language of the summary and, hence, of the whole input text). 

With regards to claim 6, Michalewicz teaches the method as claimed in claim 5, wherein acquiring the plurality of data sets from one or more sources comprises acquiring the plurality of data from at least one of a web, a manual entry of data, a local data set, an internal storage, an external storage and an experimental data set (Para 154, teaches the step of transforming unstructured textual data including web documents into structured data. Para 64, teaches obtaining these web documents using spiders or web crawlers).

With regards to claim 7, Michalewicz teaches the method as claimed in claim 5, wherein analysing the matched textual data using the at least one analysis method comprises analysing the matched textual data using at least one of a part of speech (POS) tagging, a sentiment method, a topic modelling, a clustering method and a document classification method (Para 95 and figure 6, also teach the sequence of steps for the analysis-clustering process. The process creates a 

With regards to claim 8, Michalewicz teaches the method as claimed in claim 5, further comprises representing one or more predicted future values of the textual data in one or more forms (Para 165 and figure 17, teach that the user is able to retrieve the output information at the user interface 1708).

Conclusion

6.	The following prior art, made of record but not relied upon, is considered pertinent to applicant's disclosure: Ah-Pine (U.S. Patent Application Publication # 2010/0004925 A1), Madan (U.S. Patent #  8433559 B2). These references are also included in the PTO-892 form attached with this office action.



Any inquiry concerning this communication or earlier communications from the examiner should be directed to NEERAJ SHARMA whose contact information is given below.  The examiner can normally be reached on Monday to Friday 8 am to 5 pm. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Pierre Louis-Desir can be reached on 571-272-7799 (Direct Phone).  The fax number for the organization where this application or proceeding is assigned is 571-273-8300.

/NEERAJ SHARMA/
Primary Examiner, Art Unit 2659
571-270-5487 (Direct Phone)
571-270-6487 (Direct Fax)