Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claims 1-20 are pending in this application.


Claim Rejections - 35 USC § 103

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 1-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Lai US 2020/0302122 (hereinafter Lai) in view of Haldar et al., US 2021/0365306 (hereinafter Haldar).

For claims 1, 11, 12, Lai teaches a computer implemented method, comprising:
receiving, from a user, a natural language query for data contained within at least one data repository (see Fig. 5, [0071], “The analysis system receives 510 a natural language question for accessing data that may be spread across a plurality of data sources”);
identifying at least one concept from the natural language query, wherein the at least one concept comprises an entity and an intent (see [0071], “The analysis system parses 520 the natural language question to determine various components of the question including the data elements that are referred to in the question and the intent of the question”);
identifying a plurality of datasets satisfying the natural language query by querying the at least one data repository utilizing the at least one concept (see [0071] – [0072], “the analysis system identifies multiple data sources and ranks them based on their relevance to the data model to select a particular data model”);
generating an extract-transform-load script that extracts, transforms, and loads a dataset selected by the user from the plurality of datasets (see [0029] – [0030], “Following steps are typically performed for answering a question based on data stored across a plurality of heterogeneous data sources: (1) Ask user to identify each data source. (2) Perform ETL (extract transform load) job to move all data from the data source. (3) Receive from users, a filter and determine a subset of the data obtained from each data source. (4) Receive from users, a query to join the subset of data from each data source” and “analysis system automatically performs several steps of the above process, for example, steps 1, 2, and 4. The analysis system automatically determines (1) what data needs to be processed, (2) where that data is present, and (3) how the data should be extracted and combined,” [0072], “The analysis system generates 540 a set of instructions for accessing and processing the necessary data to answer the question. These set of instructions are also referred to herein as an execution plan for the question,” [0103], “instructions may include: (1) instructions to extract data from a data source, (2) instructions to transform data to match formats across data sources, or (3) instructions to combine (join) data from different data sources to generate new tables”); and
retrieving data included in the dataset utilizing the extract-transform-load script, wherein the retrieving comprises returning the data to the user (see [0071] – [0072], [0091], “The analysis system 100 executes 635 the generated execution plan to determine a result set as an answer to the question. The analysis system 100 sends 640 the answer to the question, for example, to a client device that requested the answer,” [0103]).

Haldar teaches “wherein the generating comprises ranking the plurality of datasets by determining, for each of the plurality of datasets, a relevance probability utilizing at least an unsupervised model built over each dataset (see [0025] – [0027], “query results can rank the relevant documents/excerpts...based on degree of correspondence (e.g., best match to lowest match)” with an “unsupervised event extraction technique” representing ranking relevance probability utilizing an unsupervised model, [0075]), wherein each of the unsupervised models comprises a query independent portion and a query dependent portion” (see [0006], “comprise a query component that receives a query request regarding an event and employs the structured event information to identify one or more parts of the unstructured text comprising information relevant to the query request. With these embodiments, the query component can employ unsupervised event extraction to generate structured query event schema for the query request... For instance, an example IT related user query may state that their device “battery does not charge past 50% under the new operating system on my computer” representing query dependent portion, [0022] - [0024], “unsupervised event extraction...applied to facilitate answering user queries,” [0025] – [0027] “To semantically understand and answer such user queries, the system needs domain knowledge on components (entities), their state and their semantic relations. Such knowledge is typically embedded in the domain content such as troubleshooting documents. In accordance with this example, the disclosed unsupervised event extraction techniques can be used to automatically extract such domain knowledge from the troubleshooting documents for reasoning and disambiguation in order to narrow down a user's problem to a targeted component and related state” and “to extract event schema” representing query independent portion, [0046]).  It would have been obvious to one skilled in the art at the time of the invention to modify the teachings of Lai with the teachings of Haldar to extract relevant data from unstructured data sets utilizing unsupervised machine learning (see Haldar, [0001], [0022] – [0024]).

For claims 2, 13, Lai teaches wherein the identifying at least one concept comprises extracting, using an entity extraction technique, at least one entity from the natural language query and appending at least one intent to the natural language query by enriching the at least one entity utilizing an ontology to identify the at least one intent (see [0035], [0060], “builds a virtual data model representing a question. The virtual data model may comprise one or more entities and relations between the entities,” [0071], “The analysis system parses 520 the natural language question to determine various components of the question including the data elements that are referred to in the question and the intent of the question. The analysis system generates 530 a virtual data model specific to the data fields identified as matching the natural language question. The virtual data model comprises entities that contain the data fields and relations between the data entities”).

For claims 3, 14, Lai teaches wherein the identifying a plurality of datasets comprises performing the querying in view of user provided constraints (see [0071] – [0072], where “parses 520 the natural language question to determine various components of the question including the data elements that are referred to in the question and the intent of the question” provides constraints for identifying datasets).

For claims 4, 15, Lai teaches wherein the identifying a plurality of datasets comprises accessing the data contained within at least one data repository, extracting metadata from the data (see Lai, [0040] – [0042], [0050], [0067], “analysis system 100 receives questions, for example, natural language questions from users. The analysis system 100 processes the questions using the metadata obtained from the data sources,” [0078], “The system accesses and stores metadata describing data from various data sources”), and annotating the data with business concepts (see [0054], “The metadata collected by the analysis system 100 also includes names, tags, and synonyms defined by the data source,” [0089], where metadata comprising names and tags represents annotating with business concepts).

For claims 5, 16, Lai teaches wherein the identifying a plurality of datasets comprises identifying a plurality of datasets satisfying the natural language query and ranking the plurality of datasets in view of a relevance to the natural language query (see Fig. 7, [0072], “the analysis system identifies multiple data sources and ranks them based on their relevance to the data model to select a particular data model,” [0080], “The analysis may match multiple data sources with each object from the question and may rank them in order of relevance to select the most relevant or most usable data source. For example, two data sources may store values of a particular attribute. However, one data source may store data in a manner that satisfies data compliance regulations whereas another data source may not be complaint. In this situation, the analysis system ranks the data from the compliance source higher,” [0092] – [0093], “ranking data assets for a question”).

For claims 6, 17, Lai teaches wherein the ranking comprises ranking both tables and columns within each of the plurality of datasets (see [0080], [0083], [0092] – [0093], “process of ranking data assets for a question” the analysis system “may identify data assets corresponding to the keyword, for example, by matching the keyword based on matching of the name of the data asset, for example, name of the table or file name” “For example, the format of data may be used to determine that a column stores addresses or social security numbers, and so on”).

For claims 7, 18, Lai teaches wherein the generating comprises generating an extract-transform-load script that extracts, transforms, and loads data included in the dataset selected by the user that is distributed across more than one data repository (see [0029] – [0030], “(2) Perform ETL (extract transform load) job to move all data from the data source” and “Accordingly, the analysis system automatically performs several steps of the above process, for example, steps 1, 2, and 4. The analysis system automatically determines (1) what data needs to be processed, (2) where that data is present, and (3) how the data should be extracted and combined,” [0060], “The analysis system 100 further generates and displays instructions 328 representing directions to the various systems for accessing the required data assets from their corresponding data source to be able to answer the question 320,” [0068], [0071] – [0072]).

For claims 8, 19, Lai teaches wherein the generating comprises generating a recommended extract-transform-load script for approval by the user (see [0060], “The analysis system 100 further generates and displays instructions 328 representing directions to the various systems for accessing the required data assets from their corresponding data source to be able to answer the question 320,” [0103], where display of instructions represents generating recommended script for approval by the user).

For claims 9, 20, Lai teaches wherein the generating is performed in view of maintaining constraints on the data (see [0029] – [0030], [0072], “The analysis system generates 540 a set of instructions for accessing and processing the necessary data to answer the question. These set of instructions are also referred to herein as an execution plan for the question,” [0103], where set of instructions target specific data sets representing constraints).

For claim 10, Lai teaches the computer implemented method of claim 1, comprising receiving feedback from the user regarding the plurality of datasets and improving subsequent identifications of datasets utilizing the feedback (see [0050] – [0051], “allows the user to select specific fields for further analysis” and “user interface for allowing a user to select specific data assets to be processed in a data source” and “The analysis system 100 may store information describing the user selections and use the selected subset of data assets for further analysis”).


Response to Arguments

Applicant’s arguments with respect to claim(s) rejected under 35 U.S.C. 102(a)(1) have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.


Conclusion

Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 

The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Talvola et al., US 11,238,469. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to JENSEN HU whose telephone number is (571)270-3803. The examiner can normally be reached Monday - Friday 9-5 PT.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Usmaan Saeed can be reached on 571-272-4046. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/JENSEN HU/Primary Examiner, Art Unit 2169