DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.

The following is a quotation of the first paragraph of pre-AIA  35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.

Claims 1-7, 9-15 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA  35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention.
Independent claim 1 recites “wherein the first feature is a similarity of relative positions of the plurality of the first labels in the table”, which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA  35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention. Applicant’s specification at paragraph [0025] discloses “In one example embodiment, a similar of relative positions of two labels can be used as a feature of the training set. For example, if two labels both appears on row 33 of two different documents, it is likely that they are associated”, but does not disclose the relative positions of two labels in the same table, as recited in the claim 1.
Claims 1-7, 9-15 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the enablement requirement.  The claim(s) contains subject matter which was not described in the specification in such a way as to enable one skilled in the art to which it pertains, or with which it is most nearly connected, to make and/or use the invention.
Independent claim 1 recites “wherein the first feature is a similarity of relative positions of the plurality of the first labels in the table”, which was not described in the specification in such a way as to enable one skilled in the art to which it pertains, or with which it is most nearly connected, to make and/or use the invention. Applicant’s specification at paragraph [0025] discloses: “In one example embodiment, a similar of relative positions of two labels can be used as a feature of the training set. For example, if two labels both appears on row 33 of two different documents, it is likely that they are associated”. The specification does not disclose how do the relative positions of two labels in the same table can be used as a feature of the training set, as recited in the claim 1.
  

The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 1-7, 9-15 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Independent claim 1 recites “the first feature is a similarity of relative positions of the plurality of first labels in the table”, which render the claim indefinite because it is unclear how the similarity of relative positions of the plurality of the first labels in the table can be determined.



Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claims 1-2, 5-7, 9-15 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Pai et al. (US 2018/0285982 A1), hereinafter “Pai”.

As per claim 1, Pai teaches a method comprising:
“receiving a training set of documents, each document including a plurality of first labels associated with first data points” at [0046]-[0047], [0053];
(Pai teaches receiving account mapping data 202, which includes a plurality labels associated with accounting data)
“wherein each document displays the plurality of first labels and first data points in a table” at [0075]-[0080] and Fig. 4A;
(Pai teaches at Fig. 4A the document displays the plurality of first labels such as “Accumulated Depreciations”, “Office Equipment” and data points such as “”846,12”, “2,500.00” in a table 400)
“receiving a map demonstrating associations between respective first labels of the documents” at [0048];
(Pai teaches the account mapping data 202 includes series and code association 212. The series and code associations include data that indicates how a value associated with account name, type, and/or sub-type has been used in (i.e., mapped to) a field, user interface element, line, or box within a destination)
“extracting a first feature of each document in the training set, wherein the first feature is a similarity of relative positions of the plurality of first labels in the table” at [0029]-[0035], [0047]-[0053];
(Pai teaches the account mapping data 202 includes series and code association 212. The series and code associations includes any data that indicate how a value associated with an account name, type, and/or sub-type has been used in (i.e., mapped to) a field, user interface element, line, or box (i.e., “relative position”) within a destination, such as a tax form. The feature extraction component 206 generates numerical feature vectors from the contents of the account mapping data 202)
“training a classification model using the training set of documents, the map and the first feature” at [0051]-[0052];
(Pai teaches the text classifier 208 includes any classification algorithm that may train the account mapping model 116 based on output of the feature extraction component 206. The text classifier 208 classifies labels form the labeled data 210 based on the output of the feature extraction component 206. The account mapping model 116 is build using any suitable machine learning library)
“receiving a second set of documents, each document including a plurality of second labels associated with second data points” at [0055]-[0056] and Fig. 3A;
(Pai teaches receiving account name from a client, the account name includes a text labels for an account. The account name may be from a chart of accounts or a trial balance of a business entity)
“extracting a second feature of each document in the second set” at [0050]-[0051];
(Pai teaches the feature extraction component 206 generates the feature using a bag of words model or term frequency-inversed document frequency. The feature extraction component 206 output, for each word wound within the account names of the account mapping data 202, an index value that corresponds to the frequency of the word in the account mapping data 202)
“providing the second set of documents and the second feature to the classification model” at [0055]-[0058];
(Pai teaches the account mapping is generated by applying the account mapping model to the account name. Application of the account mapping model may include any operations that results in the account mapping model outputting one or more of an account type, an account sub-type, a code, a series and a confidence value. In particular, the account type may be a top or first level classification of the account name received at step 304, the series may identify a destination for a value associated with the account name, such as a form that will receive the value associated with the account, and the code may be identify a specific location within the destination, such as a field or a box)
“receiving a prediction score from the classification model, the prediction score indicating a likelihood of the two second labels being associated with each other” at [0034], [0058].
(Pai teaches the confidence value that quantifies the expected accuracy or truth of the mapping. The confidence value may be any value representative of the accuracy of the mapping)

As per claim 2, Pai teaches the method of claim 1, further comprising “generating a map based on the prediction score” at [0058]-[0061].

As per claim 15, Pai teaches the method of claim 1, further comprising “extracting a second feature of each document in the training set” at [0047]-[0053].

As per claim 5, Pai teaches the method of claim 15, wherein “the second feature is cosine similarity, context similarity, bi-gram similarity and/or tri-gram similarity” at [0047]-[0051].

As per claim 6, Pai teaches the method of claim 15, wherein “the second feature is a hierarchy of a label” at [0029]-[0035],  [0047]-[0051].

As per claim 7, Pai teaches the method of claim 15, wherein “the second feature is recap data” at [0029]-[0035], [0047]-[0051].
As per claim 9, Pai teaches the method of claim 15, wherein “the second feature is a similarity of labels appearing before or after a label” at [0047]-[0051].

As per claim 10, Pai teaches the method of claim 15, wherein “the second feature is a number of words in a label” at [0048]-[0050].

As per claim 11, Pai teaches the method of claim 1, wherein “training the classification model includes a stacking technique” at [0040]-[0043],  [0051]-[0061].

As per claim 12, Pai teaches the method of claim 11, wherein “the stacking technique combines multiple classification or regression models via a meta-classifier or a meta-regressor” at [0040]-[0043], [0051]-[0061].

As per claim 13, Pai teaches the method of claim 11, wherein “base level models are trained based on a complete training set” at [0051]-[0061].

As per claim 14, Pai teaches the method of claim 12, wherein “the meta-classifier or the meta-regressor is trained on output of  based level models as features” at [0040]-[0043], [0051]-[0061].




Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 3-4 are rejected under 35 U.S.C. 103 as being unpatentable over Pai as applied to claim 1-2, 5-7, 9-15 above, and in view of Martins (US 2012/0310868 A1), hereinafter “Martins”. 

As per claim 3, Pai teaches the method of claim 1 discussed above. Pai does not explicitly teach: “converting the first set of documents to another format, the other format being an XML format” as claimed. However, Martins teaches a method for extracting and managing information contained in electronic documents including the steps of converting a set of training documents from PDF to XML format before inputting into a machine learning at [0056]-[0057]. Thus, it would have been obvious to one of ordinary skill in the art to combine Martins with Pai’s teaching because “the advantage of utilizing XML format for labeling samples is that any text editor allows the user to perform this task” and “generating an XML document from the contents of the original document preserving… features related to the appearance and formatting of the text (e.g., font size, typeface, and style, alignment, etc.). These feature may be easily represented by special XML processing instructions or XML entities incorporated into the text”, as suggested by Martins at [0056]-[0057].

As per claim 4, Pai teaches the method of claim 1 discussed above. Pai does not explicitly teach: “converting the second set of documents to another format, the other format being an XML format” as claimed. However, Martins teaches a method for extracting and managing information contained in electronic documents including the steps of converting a set of documents from PDF to XML format before inputting into a machine learning at [0056]-[0057]. Thus, it would have been obvious to one of ordinary skill in the art to combine Martins with Pai’s teaching because “the advantage of utilizing XML format for labeling samples is that any text editor allows the user to perform this task” and “generating an XML document from the contents of the original document preserving… features related to the appearance and formatting of the text (e.g., font size, typeface, and style, alignment, etc.). These feature may be easily represented by special XML processing instructions or XML entities incorporated into the text”, as suggested by Martins at [0056]-[0057].




Response to Arguments
Applicant's arguments filed 8/03/2022 have been fully considered but they are not persuasive. The examiner respectfully traverses Applicant’s arguments.
Regarding claim 1, Applicant argued that Pai does not teach “each document displays the plurality of first labels and the first data points in a table”. On the contrary, Pai teaches at [0021] the document of the training sets of documents includes a trial balance report, which displays a plurality of lables (e.g., “Accumulated Depreciation”, “Office Equipment”) and data points (e.g., “846.12”, “2,500.00”, as shown in Fig. 4A reproduced below.

    PNG
    media_image1.png
    606
    834
    media_image1.png
    Greyscale

Applicant further argued that Pai does not teach “the first feature is a similarity of relative positions of the plurality of first labels in the table”. On the contrary, Pai teaches at [0029]-[0035], [0047]-[0053] the account mapping data 202 includes series and code association 212. The series and code associations includes any data that indicate how a value associated with an account name, type, and/or sub-type has been used in (i.e., mapped to) a field, user interface element, line, or box (i.e., “relative position”) within a destination, such as a tax form.   The feature extraction component 206 then generates numerical feature vectors from the contents of the account mapping data 202.

	In light of the foregoing arguments, the 35 U.S.C 102 and 103 rejections are hereby sustained.















Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 

In the case of amending the Claimed invention, Applicant is respectfully requested to indicate the portion(s) of the specification which dictate(s) the structure relied on for proper interpretation and also to verify and ascertain the metes and bounds of the claimed invention. 
	
Any inquiry concerning this communication or earlier communications from the examiner should be directed to KHANH B PHAM whose telephone number is (571)272-4116. The examiner can normally be reached Monday - Friday, 8am to 4pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Mark Featherstone can be reached on (571)270-3750. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/KHANH B PHAM/Primary Examiner, Art Unit 2166                                                                                                                                                                                                        

September 1, 2022