Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
	This action is responsive to amendment filed on January 20, 2021.  Claims 12-27 are presented for examination.
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 1/20/21 has been entered.
 

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

s 12-27 is/are rejected under 35 U.S.C. 103 as being unpatentable over Dawson et al (USPN. 2010/0228693) in view of Cerino et al (USPN. 2019/0114370).

12 and 20, Dawson discloses:
a cognitive data processing system for intelligent annotation and data extraction across a plurality of complex documents comprising: at least one hardware processor configured to (figs. 16 and 17):
extract information from the plurality of complex documents (figs. 16 and 17, item 524 document, pars. 251-252, unstructured documents and sentences extracted), wherein a text structure, trained with a first set of labeled data, extracts information from text of the plurality of complex documents, and wherein a table structure model, trained with a second set of labeled data, extracts information from tables of the plurality of complex documents, wherein the first set of labeled data includes text from one or more Summary Plan Description documents, and wherein the second set of labeled data includes tables from one or more Summary Plan Description documents (pars. 31, 75 and 260, “non-text documents” comprises images and tables, “structured” comprises textual documents, decompose documents into form such as concept maps, summaries, facts or keywords comprises labeling/classifying.  Note that different type of documents are used based on application.  Applicant does not claim any specific document format or advantages of Summary Plan Description documents to overcome the existing prior art);
map the extracted information to a semantic representation (fig. 17, item 528, par. 263, mapping to a semantic representation of items 523 comprising a defined data structure); 
utilize a natural language process and a machine learning process to extract one or more entities from the semantic representation (fig. 17, pars. 262 and 264, items 528, 532 concept mapping and 534 facts listing, RDF statements, mapping and associations), 

wherein the machine learning process is trained with a third set of labeled data and is trained using supervised ML (pars. 260 and 273, “output representations 530 comprise a toolset…ability of the system to represent knowledge and to have this system knowledge modified and augmented by personal knowledge extracted from human actions”, this implies the learning model is trained by building knowledge representations 530, wherein integrating the knowledge with user actions creates larger islands of semantic networks comprising more relationships/connections, that is equated to the supervised learning process with a third or additional layer/label), and wherein the one or more entities include one or more of:
a plan entity or conditions/exclusions entity (par. 148, “recognize named entities, analyze coreferences” and par. 149, “create simple… clauses… using set of rules and heuristics”.  Note that entities and relations between entities are analyzed which require conditions/rules for entities)
It is noted that Dawson teaches extracting and handling structured and unstructured data comprising images and natural language, but Dawson does not explicitly teach two distinct models for text and images/table extracting.  However, Cerino teaches document and spreadsheet extraction using image processing component 254 and natural language processing component 255 (fig. 2, items 210, 254 and 255, par. 33, spreadsheet, document, distinct components, Cerino).  Hence, it would have been obvious to one of ordinary skill in the art at the effective filing date to extract, process and summarize different type of data present and processed using Dawson semantic networks complimented by Cerino distinct components (par. 33, Cerino).  One would have been motivated because processing of data using a single component/processor is interchangeable with the functional use of multiple components/processes that perform and generate the same result using conventional methods at the time of the effective filing date.
Dawson/Cerino teach,

provide a subset of the structured data corresponding to the query, wherein the subset of structured data may be arranged to correlate entities with corresponding attributes for one or more complex document (fig. 17, par. 266, “subset of output representations”, Sawson).

13.     Dawson/Cerino teach, the system of claim 12, wherein the extracted entities are compared to determine differences with regard to context and meaning as presented in the respective complex document (fig. 17, items 532-534, pars. 264 and 269, weighted facts regarding extracted entities 523, Dawson).

14.     Dawson/Cerino teach the system of claim 12, wherein the processor is further configured to extract a type of information presented as a text element in at least one complex document and as a non-text element in at least one other complex document (par. 260, any format document is processed and transformed into collection or set of semantic data, Dawson).

15.   Dawson/Cerino teach, the system of claim 12, wherein the semantic representation includes the extracted information and a context indicating where in the complex document the extracted information is located (pars. 189, directory location information, and par. 270, semantic relations about other semantic relations such as metadata, this data is determined and analyzed, Dawson).

16.  Dawson/Cerino teach, the system of claim 12, wherein the extracted information includes information extracted from both unformatted portions and formatted portions of the plurality 

17.   Dawson/Cerino teach, the system of claim 12, wherein an annotated data set is provided to the machine learning process and the machine learning process generates a machine learning model to extract entities based on the provided annotated data set (pars. 266-276, user input for type of representation helps machine learning in determining user intention, and fig. 17, annotations 542, Dawson).

18.   Dawson/Cerino teach, the system of claim 12, wherein an annotated data set is provided to the machine leaning process to generate and train a machine learning model, and wherein the trained machine learning model is utilized to automatically annotate a received unannotated complex document (par. 264, annotation of original text, Dawson).

19.    Dawson/Cerino teach, the system of claim 12, wherein the processor is further configured to:
receive a query requesting a type of information common to each of the plurality of complex documents (figs. 17 and 19, user input and par. 280, Dawson); and
return the requested information in a readable format allowing side-by-side comparison of the extracted information for each of the plurality of complex documents (fig. 17, par. 280, output representations of plurality of documents, Dawson).

.

Response to Arguments
Applicant's arguments filed 1/20/21 have been fully considered but they are not persuasive. See comments below.
Applicant alleges the machine learning (ML) using entity extraction such a plan entity or conditions/exclusions entity are not taught by the prior art.
Examiner disagrees.  Specific table and textual extracting wherein a text structure, trained with a first set of labeled data, extracts information from text of the plurality of complex documents, and wherein a table structure model, trained with a second set of labeled data, extracts information from tables of the plurality of complex documents is taught,  (see pars. 31, 75 and 260, “non-text documents” comprises images and tables, “structured” comprises textual documents, decompose documents into form such as concept maps, summaries, facts or keywords comprises labeling/classifying, Dawson).
	In additioin, a plan entity or conditions/exclusions entity is clearly taught, see (par. 148, “recognize named entities, analyze coreferences” and par. 149, “create simple… clauses… using set of rules and heuristics”.  Note that entities and relations between entities are analyzed which require conditions/rules for entities).  Dawson analyzing and recognizing named entities and analyzing coreferences from extracted sentences derives relationships from existing content.  If applicant believes his claimed tables differ from Dawson extracted tables (pars. 31, 75 and 260, “non-text documents” comprises images and tables, “structured” comprises textual documents, decompose documents into form such as concept maps, summaries, facts or keywords comprises labeling/classifying) he is welcome to claim schema/format details.  As such, the allegations are believed moot.

Conclusion

The prior art made of record and not relied upon is considered pertinent to applicant's disclosure in the field of document/data processing and labeling:
USPN. 2012/02544143 par. 61 and fig. 5 [Wingdings font/0xE0] label columns.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to MARCIN R FILIPCZYK whose telephone number is (571)272-4019.  The examiner can normally be reached on M-F 7-4 EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.

If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, BORIS GORNEY can be reached on 571-270-5626.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.

Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a 






January 28, 2021
/MARCIN R FILIPCZYK/Primary Examiner, Art Unit 2158