DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Status of Claims
	Applicant’s response, filed 09/13/2022, amended claims 1, 4-10, 12, and 15-21.  Claims 1-22 are pending.
	Applicant’s amendments have overcome the § 112(b) rejection.

Response to Arguments
Applicant's arguments filed 09/13/2022 have been fully considered but they are not persuasive. 

Applicant argues that Privault fails to teach the limitations of amended claim 1.  The examiner respectfully disagrees and directs attention to the updated mapping provided below.  Applicant’s explicit definition of “segment” in Specification paragraph [0020], which is still quite broad in scope, controls the examiner’s claim interpretation.  (“As used herein, a segment is a group of characters (e.g., words or sequence of alpha-numeric characters) in a line with no more than a regular space (e.g., the space produced by one or two strokes of the spacebar on a keyboard) between the words or character sequences.”)

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claims 1-22 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Privault et al. (US 2010/0312725 A1, hereinafter “Privault”).

	Regarding claim 1, Privault teaches 
A method for classifying semi-structured documents, the method comprising: 
accessing a plurality of documents including the semi-structured documents [Privault, ¶ 0043, set of documents; ¶ 0044, HTML documents]; 
identifying segment pairs in each of the plurality of documents, wherein each segment pair includes two segments that are on a same horizontal line or on a same vertical line, and the two segments are separated by one or more non-space characters [Privault, ¶ 0064, multiword expressions evaluated as vectors, multiword expressions would written on a same horizontal line and have another word between them (e.g., three or more words in a phrase).  Applicant’s specification paragraph 0020 defines “As used herein, a segment is a group of characters (e.g., words or sequence of alpha-numeric characters) in a line with no more than a regular space (e.g., the space produced by one or two strokes of the spacebar on a keyboard) between the words or character sequences.”  Thus, a segment as currently recited in the claims is capable of being a single word in a compound phrase];
generating, based on the identified segment pairs, a map of unique segment pairs across all documents in the plurality of documents [Privault, ¶ 0064, multiword expressions evaluated as vectors]; 
generating for each of the plurality of documents a respective feature vector based on one or more unique segment pairs in the map that are also identified in the document [Privault, ¶¶ 0063 & 0064, representation based on word or multiword expressions]; and 
clustering the plurality of documents into a plurality of clusters, using the feature vectors [Privault, ¶ 0061].

Regarding claim 2, Privault teaches the method of claim 1, further comprising, prior to the generating step: 
computing for each unique segment pair in the map a normalized document frequency [Privault, ¶ 0063, word frequency vector representation]; and 
removing from the map segment pairs having a frequency less than a specified threshold [Privault, ¶ 0062, responsive vs. non-responsive classification based on a similarity probability value having a difference less than a threshold value].

Regarding claim 3, Privault teaches the method of claim 1, further comprising: 
associating a respective layout template for each cluster in the plurality of clusters, the layout template for a particular cluster being based on [Privault, ¶ 0089, layout similarity used to organize document clusters]: 
one or more segment pairs corresponding to feature vectors associated with that particular cluster [Privault, ¶ 0089, layout similarity used to organize document clusters]; and 
respective layout information of each of the one or more segment pairs [Privault, ¶ 0089, layout similarity used to organize document clusters].

Regarding claim 4, Privault teaches the method of claim 3, further comprising, for each cluster in the plurality of clusters: 
training a respective machine-learning (ML) information extractor to extract document information using the respective layout template associated with each cluster [Privault, ¶ 0050, labels and ¶ 0051, classifier model using support vector machines]; and 
generating and storing a respective ML model based on training the ML information extractor [Privault, ¶ 0051, memory stores classifier model].

Regarding claim 5, Privault teaches the method of claim 3, further comprising: 
selecting a particular document from the plurality of documents [Privault, ¶ 0051, classifier model using support vector machines]; 
determining a cluster in the plurality of clusters to which the particular document belongs [Privault, ¶ 0051, classifier model using support vector machines]; and 
using a machine-learning (ML) information extractor, trained using the respective layout template associated with the determined cluster, to extract document information [Privault, ¶ 0051, classifier model using support vector machines].

Regarding claim 6, Privault teaches the method of claim 1, wherein for each document in the plurality of documents, the step of identifying the segment pairs in the document is performed for each page in the document [Privault, ¶ 0063, all documents evaluated].

Regarding claim 7, Privault teaches the method of claim 1, further comprising storing each segment pair in a data structure along with layout information [Privault, ¶ 0117 and Figure 6, the reviewer highlights two phrases by running across them with a finger and selects the value “hot” for “highly responsive document” in the attached sticky note.  By identifying/highlighting two phrases in a line, the phrases would be stored].

Regarding claim 8, Privault teaches the method of claim 7, wherein the layout information includes a location and a size of the segment pair, or co-ordinates of a bounding box around the segment pair [Privault, ¶ 0117 and Figure 6, the reviewer highlights two phrases by running across them with a finger and selects the value “hot” for “highly responsive document” in the attached sticky note.  By identifying/highlighting two phrases in a line, the phrases would be stored].

Regarding claim 9, Privault teaches the method of claim 1, wherein each document in the plurality of documents is obtained via file transfer, email, web access, or scanning of a physical document [Privault, ¶ 0044, email, HTML files, OCR document processing].

Regarding claim 10, Privault teaches the method of claim 1, 
wherein the plurality of documents comprises a hyper-text markup language (HTML) document [Privault, ¶ 0044, HTML documents], the identification of segment pairs in the HTML document comprising: 
identifying HTML tags representing textual information by parsing the HTML document using a script executable in a headless mode [Privault, ¶ 0050, automatic class determination from HTML and metadata information]; and 
accessing location and size information of the HTML tags [Privault, ¶ 0050, automatic class determination from HTML and metadata information].

Regarding claim 11, Privault teaches the method of claim 1, wherein the plurality of documents comprises a plurality of invoices [Privault, ¶ 0044, records and accounts].

Claims 12-22 recite limitations corresponding to claims 1-11, respectively, and are rejected for the same reasons discussed above.

Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to Scott A. Waldron whose telephone number is (571)272-5898. The examiner can normally be reached Monday - Friday 9:00 am - 5:00 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Neveen Abel-Jalil can be reached on (571)270-0474. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/Scott A. Waldron/Primary Examiner, Art Unit 2152                                                                                                                                                                                                        09/22/2022