Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Arguments
Applicant's amendments filed on 6/9/2022 are acknowledged and an interview was initiated by the examiner in order to expedite the prosecution. See interview summary for details. 
Examiner's Amendment/Statement
An examiner’s amendment to the record appears below. Should the changes and/or additions be unacceptable to applicant, an amendment may be filed as provided by 37 CFR 1.312. To ensure consideration of such an amendment, it MUST be submitted no later than the payment of the issue fee.
Authorization for this examiner’s amendment was given in an email by Tyler Espy on 6/16/2022 at 12:45pm PT.
The claims have been amended as follows:
Cancel claim 3.
In claim 1, line 13, delete “and”.
In claim 1, line 16, replace “the content pixel density ratio for the page segment;” with the following: 
the content pixel density ratio for the page segment; and
wherein the segmented page for the particular input page is configured to describe the content pixel density ratio for the page segment as a grayscale value, and wherein generating the segmented page as the data object comprises: updating the segmented page by replacing the page segment with a segment color, wherein the magnitude of the segment color indicates a relative value of the content pixel density ratio for the page segment;
In claim 10, line 15, delete “and”.
In claim 10, line 18, replace “the content pixel density ratio for the page segment;” with the following: 
the content pixel density ratio for the page segment; and
wherein the segmented page for the particular input page is configured to describe the content pixel density ratio for the page segment as a grayscale value, and wherein generating the segmented page as the data object comprises: updating the segmented page by replacing the page segment with a segment color, wherein the magnitude of the segment color indicates a relative value of the content pixel density ratio for the page segment;
In claim 17, line 15, delete “and”.
In claim 17, line 18, replace “the content pixel density ratio for the page segment;” with the following: 
the content pixel density ratio for the page segment; and
wherein the segmented page for the particular input page is configured to describe the content pixel density ratio for the page segment as a grayscale value, and wherein generating the segmented page as the data object comprises: updating the segmented page by replacing the page segment with a segment color, wherein the magnitude of the segment color indicates a relative value of the content pixel density ratio for the page segment;
Allowable Subject Matter
Claims 1-2 and 4-20 are allowed subject to the above examiner’s amendment.
The following is an examiner’s statement of reasons for allowance:
The prior art fails to teach Claims 1-2 and 4-20, alone or in reasonable combination, which specifically comprise the following limitations (in consideration of the claim as a whole):  
wherein the segmented page for the particular input page is configured to describe the content pixel density ratio for the page segment as a grayscale value, and wherein generating the segmented page as the data object comprises: updating the segmented page by replacing the page segment with a segment color, wherein the magnitude of the segment color indicates a relative value of the content pixel density ratio for the page segment. 
The closest prior arts, Proux (US 20190005050 A1) and Wong et al. (“Document analysis system.” IBM J. Res. Dev., 26: 647–656, 1982) reveal a similar technique and system as discussed in the previous office action, but fail to anticipate or render obvious, either singularly or in combination with the other cited references, the above limitations (as combined with the other claimed limitations).
Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee.  Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance.”
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Héroux et al., "Classification method study for automatic form class identification." In Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No. 98EX170), vol. 1, pp. 926-928. IEEE, 1998.

    PNG
    media_image1.png
    409
    757
    media_image1.png
    Greyscale

Baldi et al., "Using tree-grammars for training set expansion in page classification", Document Analysis and Recognition 2003. Proceedings. Seventh International Conference on, pp. 829-833, 2003.

    PNG
    media_image2.png
    676
    1524
    media_image2.png
    Greyscale

Usilin et al., "Visual appearance based document image classification", Image Processing (ICIP) 2010 17th IEEE International Conference on, pp. 2133-2136, 2010.

    PNG
    media_image3.png
    524
    758
    media_image3.png
    Greyscale

Mao et al., "Unsupervised style classification of document page images", Image Processing 2005. ICIP 2005. IEEE International Conference on, vol. 2, pp. II-510, 2005.

    PNG
    media_image4.png
    715
    1571
    media_image4.png
    Greyscale

Coquard et al. (US 20200327151 A1): Provided is a system and method for processing contract documents. The method includes parsing a first contract document to identify a plurality of clauses in the first contract document, each clause of the plurality of clauses including a sequence of words, generating a plurality of representation vectors based on the first contract document and at least one embedding model, wherein each representation vector of the plurality of representation vectors is generated based on a separate clause of at least a subset of clauses of the plurality of clauses, comparing each representation vector of the plurality of representation vectors with a second plurality of representation vectors stored in a vector database, and generating output data based on the representation vectors and the first contract document. (Abstract)
Le Chevalier et al. (US 20130174010 A1): [0051] The PCA algorithm first normalizes the respective sizes of different regions by lining up their top and bottom or left and right coordinates, and generates a matrix that contains information on the location and size of all the corresponding regions in the source page and rendered page. The PCA algorithm then reduces the dimension of the matrix to reveal the most effective lower dimensional representation of the page. The resulting matrix with a reduced dimension removes information less useful and decomposes precisely the sourced and rendered content pages into uncorrelated components specific to a document page layout. The LCA algorithm works as a pre-filter process for the PCA algorithm. This filtering process aims to maximize between class variance (across regions) and minimize within-class variance (within regions). The outcome of the process is a page-specific list of regions that can be stored in a vector and compared between the source and rendered pages.

    PNG
    media_image5.png
    408
    692
    media_image5.png
    Greyscale

Sharma et al. (US 20160098645 A1): Automatic relationship extraction is provided. A machine learning approach using statistical entity-type prediction and relationship predication models built from large unlabeled datasets is interactively combined with minimal human intervention and a light pattern-based approach to extract relationships from unstructured, semi-structured, and structured documents. Training data is collected from a collection of unlabeled documents by matching ground truths for a known entity from existing fact databases with text in the documents describing the known entity and corresponding models are built for one or more relationship types. For a modeled relationship-type, text chunks of interest are found in a document. A machine learning classifier predicts the probability that one of the text chunks is the entity being sought. The combined machine learning and light pattern-based approach provides both improved recall and high precision through filtering and allows constraining and normalization of the extracted relationships. (Abstract)
Bisson-Krol et al. (US 20210110298 A1): [0102] A search was made to decide which model or algorithm can allow for both clustering existing items (given system constraints) and retraining of the model on user input without changing the coordinates of exiting items on a graph. This required some experimentation on different types of techniques for dimensionality reduction such as t-Distributed Stochastic Neighbor Embedding (t-SNE) and Principal Component Analysis (PCA), clustering algorithms like K-Means, Fuzzy K-Means and Denclue, correlation algorithms like Spearman's rank-order, Pearson correlation coefficient and Cramer's V statistic. Another uncertainty was whether moving one or more items would produce enough information to meaningfully retrain the model to produce desirable results.
Yang et al. (US 20190026550 A1): Disclosed systems and methods categorize text regions of an electronic document into document object types based on a combination of semantic information and appearance information from the electronic document. A page segmentation application executing on a computing device accesses textual feature representations that represent text portions in a vector space, where a set of pixels from the page is mapped to a textual feature representation. The page segmentation application generates a visual feature representation, which corresponds to an appearance of a document portion including the set of pixels, by applying a neural network to the page of the electronic document. The page segmentation application generates an output page segmentation of the electronic document by applying the neural network to the textual feature representation and the visual feature representation. (Abstract)

    PNG
    media_image6.png
    551
    501
    media_image6.png
    Greyscale


Shmueli et al. (US 6442555 B1): Page decomposition algorithms are known in the art and are typically included in commercially available document scanning software. Traditionally the output of a page decomposition algorithm is a collection of geometric shapes marking discrete blocks on the page. The page decomposition algorithms can either provide binary block data or weighted block data dependent on, for example, the font size in a text block, or some other pixel density measure in general. (col. 6, lines 38-46)

    PNG
    media_image7.png
    498
    493
    media_image7.png
    Greyscale

Bernzott et al. (US 5131053 A): A system for recognition of characters on a medium. The system includes a scanner for scanning a medium such as a page of printed text and graphics and producing a bit-mapped representation of the page. The bit-mapped representation of the page is then stored in a memory means such as the memory of a computer system. A processor processes the bit-mapped image to produce an output comprising coded character representation of the text on the page. The present invention discloses parsing a page to allow for production of the output characters in a logical sequence, a combination of feature detection methods and template matching methods for recognition of characters and a number of methods for feature detection such as use of statistical data and polygon fitting. (Abstract)
Sharma et al. (US 9727911 B2): Exemplary systems, methods and computer-accessible mediums can receive information comprising a first speckle pattern(s) associated with a portion(s) of the paper. The information can be generated by an optical arrangement, and the first speckle pattern(s) can be compared with a second speckle pattern(s) to determine if a similarity measure based on local or global descriptors is of equal to a predetermined amount or within a predetermined range. (Abstract)
Any inquiry concerning this communication or earlier communications from the examiner should be directed to FENG NIU whose telephone number is (571)272-9592.  The examiner can normally be reached on Monday - Friday, 8am-5pm PT.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Chan Park can be reached on (571) 272-7409.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/FENG NIU/Primary Examiner, Art Unit 2669