Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

DETAILED ACTION
 
Continued Examination under 37 CFR 1.114
1.	A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 2/16/21 has been entered.
2.	This action is responsive to the communication filed on 2/16/21.  Claims 1, 7 and 12 have been amended. Claims 1-12 are pending.

Claim Rejections - 35 USC § 102
3.	The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale or otherwise available to the public before the effective filing date of the claimed invention.

4.	Claims 1, 4, 7, 9 and 12 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Xiao et al (U.S. 20130205202 A1 hereinafter, “Xiao”).
5.	With respect to claim 1,
	Xiao discloses a method of generating a parsed document from a digital document, wherein the method comprises the following steps in the order named:
(a)    segmenting the digital document into at least one section;
(b)    classifying the at least one section of the digital document into at least one of a class: text class, table class, figure class, noise class;
(c)    identifying a reading order of the digital document; and
(d)    processing each of the at least one section of the digital document, wherein the processing comprises:
extracting content from each of the at least one section based on the at least one of a class: text class, table class, figure class, noise class;
eliminating content classified in the noise class from the extracted content; and
structuring the extracted content, after eliminating the content classified in noise class, based on the identified reading order; and
generating the parsed document from the digital document based on the structured content (Xiao [0045] - [0048], [0050] – [0051], [0065], [0078], [0092] e.g. [0045] In an example, an engine is provided that includes machine readable instructions to generate a dynamic composition of extracted text blocks and visual blocks of a document, based on semantic features of the visual blocks and attribute data and document functions of the text blocks, to provide the interactive media content. [0048] In an example implementation, the extractor performs the operations in block 205 to decompose a document and segment the document into text blocks and visual blocks based on visual properties.  In an example, the extractor traverses the document structure to de-layer the text and images of the document. [0050] The operations of block 205 can be implemented for analysis of PDF documents, including technical documents and other documents in PDF format.  The technical documents may have simple layout and may be homogenous in text fonts.  In an example, other documents in PDF format, such as but not limited to consumer magazines, may have more complex layouts and include differing text fonts.  The text blocks and visual bock (including image objects) can be designated as the basic unit for user interaction.  These units are also the starting point for reading order determination.  These structures may not be readily accessible in a document in PDF format.  [0051] A non-limiting example of a text grouping operation to provide text blocks is as follows.  In a document, text can be represented as words with attributes of font name, font size, color and orientation.  [0062] Image elements in a document can include text, for example but not limited to, advertisement insertion in an article in a magazine.  For such type of document, in addition to the SIFT feature, operations of block 210 can also index these images based on embedded text extracted by, for example, optical character recognition (OCR), to recognize logos and brands. [0065] Non-limiting examples of semantic features of text blocks and visual blocks include title, heading, main body, advertisement, position in the document, size, reading order of the text blocks, links between images of the visual blocks for multi-page images), and links between articles of the document. [0078] Another example implementation of block 215 for page layout reorganization facilitates removing unrelated content, including advertisement, or adding additional content, to provide the interactive media content.  This implementation may applicable for a document that includes a large number and area of unrelated content, including advertisements [as wherein the method comprises the following steps in the order named:
(a)    segmenting the digital document into at least one section (e.g. blocks);
(b)    classifying the at least one section of the digital document into at least one of a class: text class (e.g. text blocks), table class, figure class (e.g. visual blocks), noise class (e.g. unrelated content - advertisements);
(c)    identifying a reading order (e.g. reading order) of the digital document; and
(d)    processing each of the at least one section of the digital document, wherein the processing comprises:
extracting (e.g. extract) content from each of the at least one section based on the at least one of a class: text class (e.g. text blocks), table class, figure class (e.g. visual blocks), noise class (e.g. unrelated content - advertisements);
eliminating (e.g. filter/remove) content classified in the noise class from the extracted content (e.g. extracted); and
structuring (e.g. composition) the extracted content, after eliminating the content classified in noise class, based on the identified reading order (e.g. reading order); and
generating the parsed document from the digital document based on the structured content]).
6.	With respect to claim 4,
	Xiao further discloses wherein extracting content from each of at least one section having a text class comprises:
identifying one or more text blocks and text block features from the at least one section having text class; and
extracting text and text features from the one or more text blocks using optical character recognition techniques (Xiao [0045] - [0048], [0050] – [0051], [0065], [0078], [0092] e.g. optical character recognition (OCR)).
7.	Claims 7 and 9 are same as claims 1 and 4 and are rejected for the same reasons as applied hereinabove.
8.	Claims 12 is same as claim 1 and is rejected for the same reasons as applied hereinabove.

Claim Rejections - 35 USC § 103
9.	In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
10.	This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
11.	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

12.	The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
13.	Claims 2-3, 5-6, 8 and 10-11 are rejected under 35 U.S.C. 103 as being unpatentable over Xiao in view of Chakraborty et al (U.S. 20020118379 A1 hereinafter, “Chakraborty”).
14.	With respect to claim 2,
Although Xiao substantially teaches the claimed invention, Song does not explicitly indicate wherein the method further comprises determining an importance factor for each of the at least one section of the digital document.
Chakraborty teaches the limitations by stating wherein the method further comprises determining an importance factor for each of the at least one section of the digital document (Chakraborty [0046] e.g. relevant information).
Therefore, it would have been obvious to one of ordinary skill in the art at the time of the effective filing date of the invention, in view of the teachings of Xiao and Chakraborty, to provide a method of analyzing and extracting text from PDF documents created using various means (Chakraborty [0010]).
15.	With respect to claim 3,
	Chakraborty further discloses wherein the identifying the reading order of the digital document comprises:
-    identifying layout of at least one section of the digital document; and
-    determining a sequential order of the at least one section based on the layout (Chakraborty [0009], [0043] e.g. logical layout analysis perform region identification or classification in a derived geometric layout; reading order).
16.	With respect to claim 4,
	Chakraborty further discloses wherein extracting content from each of at least one section having a text class comprises:
-    identifying one or more text blocks and text block features from the at least one section having text class; and
-    extracting text and text features from the one or more text blocks using optical character recognition techniques (Chakraborty [0052] e.g. [0052] The present invention provides a method for extracting images.  What makes this problem challenging is that text may not be distinguished from polylines, which constitute the underlying line drawings.  While developing a general method that would work for all kinds of line-drawing images is difficult, the present invention makes use of underlying structures of the concerned documents.  The present invention localizes images according to the geometry and length of the text strings.  These localized regions are analyzed using OCR software to extract the textual content).
17.	With respect to claim 5,
	Chakraborty further discloses wherein extracting content from each of at least one section having a figure class comprises:
-    converting the figure in the at least one section to a grayscale format;
-    calculating a histogram (Chakraborty [0061] – [0062] e.g. histogram) of grayscale-formatted figure;
-    applying a thresholding operation on the figure based on the calculated histogram (Chakraborty [0055] e.g. [0055] The method simplifies the images to extract text strings 306.  The grayscale images are converted to black and white images by thresholding 307.  The method looks for text strings in either grayscale or black/white images. Thus, if the image is non-colored, it is reduced to black and white); and
-    detecting boundaries and/or dimensions of the figure (Chakraborty [0057] e.g. a bounding box).
18.	With respect to claim 6,
	Chakraborty further discloses wherein structuring the extracted content based on the reading order comprises:
-    labelling one or more text blocks based on the text block features and/or text features (Chakraborty [0043] e.g. a Logical page layout analysis includes determining a page type, assigning functional labels such as title, note, footnote, caption etc., to each block of the page, determining the relationships of these blocks and ordering the text blocks according to a reading order);
-    identifying associations (Chakraborty Abstract, [0043] e.g. relationship) between one or more text blocks and extracted figure and/or extracted table from the at least one section having figure class and/or table class respectively; and
-    arranging (Chakraborty Abstract, [0043] e.g. determining the relationships of these blocks and ordering the text blocks according to a reading order) one or more labelled text blocks, extracted figure and/or extracted table based on the labels, associations and the reading order.
19.	Claims 7-11 are same as claims 1-2 and 4-6 and are rejected for the same reasons as applied hereinabove.

Response to Argument
20.	Applicant’s remarks and arguments presented on 2/16/21 have been fully considered but they are moot in view of the new grounds of rejection presented in this office action.

Conclusion
The prior art made of record, listed on form PTO-892, and not relied upon, if any, is considered pertinent to applicant's disclosure.
21.	The examiner requests, in response to this office action, support be shown for language added to any original claims on amendment and any new claims. That is, indicate support for newly added claim language by specifically pointing to page(s) and line no(s) in the specification and/or drawing figure(s). This will assist the examiner in prosecuting the application.
22.	When responding to this office action, Applicant is advised to clearly point out the patentable novelty which he or she thinks the claims present, in view of the state of the art disclosed by the reference cited or the objections made. He or she must also show how the amendments avoid such references or objections See 37 CFR 1.111(c).
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SyLing Yen whose telephone number is 571-270-1306.  The examiner can normally be reached on Mon-Fri 8:30am - 5:00pm.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Mark Featherstone can be reached at 571-270-3750.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


SyLing Yen
Examiner
Art Unit 2166



/SYLING YEN/Primary Examiner, Art Unit 2166                                                                                                                                                                                                        
April 29, 2021