DETAILED ACTION
This communication is in response to the RCE (IDS) filed on 11/05/2021.
Application No: 16/784,726. 
 Please see claims set in NOA (EXAMINERS AMENDMENT) mailed on 11/01/2021.
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
 
Reasons for allowance
Claims 1-20 are allowed.
The following is an examiner’s statement of reasons for allowance: 
The reason for allowance is that the prior arts of record fail to teach the limitations along with preamble as a whole claim. The limitations recited in the independent claims comprise a particular combination of elements, functions and preamble, which are neither taught nor-suggested by the prior arts as a whole claim. 
 

The representative claim 1 distinguish features are underlined and summarized below:
 	A computer-implemented method comprising:
	receiving, using a processor, an unstructured document and a corresponding structured document, the structured document comprising a plurality of labeled portions, wherein each labeled portion is associated with a label that classifies the respective labeled portion as being one of a plurality of element types;
	generating, using the processor and by applying a parsing tool to the unstructured document, a parsed document comprising one or more extracted objects, wherein each extracted object has a bounding box that corresponds to a region of the unstructured document within which the extracted object is positioned and each extracted object comprises one of a textbox, an image location and a geometric shape;
	identifying, using the processor and by applying a matching algorithm to the structured document and the parsed document, one or more matching extracted objects, wherein each matching extracted object comprises an extracted object of the parsed document that corresponds to a labeled portion of the structured document; 
	for each of the one or more matching extracted objects, annotating, using the processor, a region of the unstructured document that corresponds to the bounding box of the respective matching extracted object with a respective label of the corresponding labeled portion of the unstructured document; and 
	storing, as part of a set of training data, the corresponding bounding box, label and polygonal segmentation associated with the annotated region.

 
The representative claim 11 distinguish features are underlined and summarized below:
A system comprising:
	a memory having computer readable instructions; and
	one or more processors for executing the computer readable instructions, the computer readable instructions controlling the one or more processors to perform operations comprising:
		receiving an unstructured document and a corresponding structured document, the structured document comprising a plurality of labeled portions, wherein each labeled portion is associated with a label that classifies the respective labeled portion as being one of a plurality of element types;
		generating, by applying a parsing tool to the unstructured document, a parsed document comprising one or more extracted objects, wherein each extracted object has a bounding box that corresponds to a region of the unstructured document within which the extracted object is positioned and each extracted object comprises one of a textbox, an image location and a geometric shape;
		identifying, by applying a matching algorithm to the structured document and the parsed document, one or more matching extracted objects, wherein each matching extracted object comprises an extracted object of the parsed document that corresponds to a labeled portion of the structured document; 
		for each of the one or more matching extracted objects, annotating a region of the unstructured document that corresponds to the bounding box of the respective matching extracted object with a respective label of the corresponding labeled portion of the unstructured document; and
	storing, as part of a set of training data, the corresponding bounding box, label and polygonal segmentation associated with the annotated region.
 

The representative claim 17 distinguish features are underlined and summarized below:
 	A computer program product comprising a computer readable non-transitory storage medium having program instructions embodied therewith, the program instructions executable by a computer processor to cause the computer processor to perform a method comprising:
	receiving an unstructured document and a corresponding structured document, the structured document comprising a plurality of labeled portions, wherein each labeled portion is associated with a label that classifies the respective labeled portion as being one of a plurality of element types;
	generating, by applying a parsing tool to the unstructured document, a parsed document comprising one or more extracted objects, wherein each extracted object has a bounding box that corresponds to a region of the unstructured document within which the extracted object is positioned and each extracted object comprises one of a textbox, an image location and a geometric shape;
identifying, by applying a matching algorithm to the structured document and the parsed document, one or more matching extracted objects, wherein each matching extracted object comprises an extracted object of the parsed document that corresponds to a labeled portion of the structured document; 
	for each of the one or more matching extracted objects, annotating a region of the unstructured document that corresponds to the bounding box of the respective matching extracted object with a respective label of the corresponding labeled portion of the unstructured document; and
	storing, as part of a set of training data, the corresponding bounding box, label and polygonal segmentation associated with the annotated region.


Applicant's independent claim 1 comprises a particular combination of underlined features in combination with other recited limitations, which are neither taught nor-suggested by the prior arts as a whole claim. 
Similarly other independent claims 11 and 17 comprises a particular combination of underlined features in combination with other recited limitations with analogous wording, which are neither taught nor-suggested by the prior arts as a whole claim.
Dependent claims are deemed allowable for the same reasons as corresponding independent claims.
 

Prior Art References   
The closest combined references of Phillips, KODURU and BALA teach following:
Phillips (US 20160314104 A1) teaches a method for extracting text from unstructured documents. The method includes creating a spatial index for storing information about words on a page of a document to be analyzed; using the spatial index to detect white space that indicates boundaries of columns within the page, aggregate words into lines, identify lines that are part of a header or footer of the page, and identify lines that are part of a table or a figures within the page; and joining lines together to generate continuous text flows. In one embodiment, the continuous text is divided into sections. In one embodiment, references within the document are identified. In one embodiment, inline citations within the document body are replaced with the corresponding reference information, or portions thereof.

KODURU (US 20160055376 A1) teaches a method and system for identifying and extracting data from electronic documents. The method comprises of extracting text from scanned documents with location on page data using OCR technology, identifying one or more tables present in a page using patterns in text placement in rows and columns, identifying the table boundaries using a pattern recognition method, identifying table borders using the location on page data, identifying the rows and columns on the table based on the identified table borders, defining a table structure for data extraction and automatically extracting data from cells of the table formed by identified rows and columns.

BALA (US 20190122043 A1) teaches techniques for extracting data from electronic documents, including determining vertical positions for text elements encoded in an electronic document based on an intended visual appearance of the text elements; generating text rows for subsets of the text elements based on the vertical positions of the text elements; generating obtaining a first set of rules selecting a row group type as a function of an indicated text row; obtaining a second set of rules selecting a row subgroup type as a function of an indicated text row; and creating a record in an electronic database, the record including a field value based on characters included in text cell associated with a text row selected based on the first and second sets of rules.

 	However cited references, alone or in any combination, neither discloses nor fairly suggests combination of features specifically listed above and/or underlined, in particular, 
generating, using the processor and by applying a parsing tool to the unstructured document, a parsed document comprising one or more extracted objects, wherein each extracted object has a bounding box that corresponds to a region of the unstructured document within which the extracted object is positioned and each extracted object comprises one of a textbox, an image location and a geometric shape;
	identifying, using the processor and by applying a matching algorithm to the structured document and the parsed document, one or more matching extracted objects, wherein each matching extracted object comprises an extracted object of the parsed document that corresponds to a labeled portion of the structured document; 
	for each of the one or more matching extracted objects, annotating, using the processor, a region of the unstructured document that corresponds to the bounding box of the respective matching extracted object with a respective label of the corresponding labeled portion of the unstructured document; and 
	storing, as part of a set of training data, the corresponding bounding box, label and polygonal segmentation associated with the annotated region.

Phillips teaches a method for extracting text from unstructured documents; but failed to teach one or more limitations including, 
generating, using the processor and by applying a parsing tool to the unstructured document, a parsed document comprising one or more extracted objects, wherein each extracted object has a bounding box that corresponds to a region of the unstructured document within which the extracted object is positioned and each extracted object comprises one of a textbox, an image location and a geometric shape;
	identifying, using the processor and by applying a matching algorithm to the structured document and the parsed document, one or more matching extracted objects, wherein each matching extracted object comprises an extracted object of the parsed document that corresponds to a labeled portion of the structured document; 
	for each of the one or more matching extracted objects, annotating, using the processor, a region of the unstructured document that corresponds to the bounding box of the respective matching extracted object with a respective label of the corresponding labeled portion of the unstructured document; and 
	storing, as part of a set of training data, the corresponding bounding box, label and polygonal segmentation associated with the annotated region.
 
KODURU and BALA alone or in combination failed to cure the deficiency of Phillips.

	 Thus, the cited references, alone or in combination, fail to disclose or suggest each of the elements recited by the independent claims.

	  
The present invention provides an improved method for automatically generating structured training data based on an unstructured document.
In published literature, vital information is often contained in tables included in the document. This is particularly true of medical literature, such as clinical studies, in which much of the information about results across different groups is contained only in these tables and is not present in the remaining text. Further, the PDF   documents are convenient for human consumption, automatic processing of these documents is difficult because understanding document layout and extracting information using this format is complicated.
	The present invention address the above-described shortcomings  by providing novel computer-implemented techniques for providing automated generation of structured training data from unstructured documents. These novel techniques provide technical advantages of providing automated classification of objects in unstructured documents on a very large scale that enables the creation of machine learning models to then automatically classify objects in newly presented unstructured documents that do not have corresponding structured documents from which to infer a structure. This can be particularly useful in enabling the automatic identification of tables within a large variety of documents, which has traditionally been a technical challenge to automate by rule-based systems.


Therefore, when taken as a whole application, and incorporating all the respective limitations, none of the prior art discloses the features as claimed.

Conclusion
Any comments considered necessary by applicant must be submitted no laterthan the payment of the issue fee and, to avoid processing delays, should preferablyaccompany the issue fee. Such submission should be clearly labeled "Comments onStatement of Reasons for Allowance." 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to Mahendra Patel whose telephone number is (571)270-7499. The examiner can normally be reached on 9:30 AM to 5:30 PM (EST).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, Applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. 
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Anthony Addy can be reached on (571) 272-779. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would 

/MAHENDRA R PATEL/Primary Examiner, Art Unit 2645