Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Arguments
Applicant’s arguments with respect to claim(s) 1, 10, 11, 15, 16, 20 have been considered but are moot because the new ground of rejection does not rely on Vijayaraghavan reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
The nonstatutory double patenting rejections of claims 1-20 are maintained because applicant does not make any argument regarding the nonstatutory double patenting rejections.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claim(s) 1, 10, 11, 15, 16, 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Penn (US 10,417,488) in view of Kumar (US 10,489,682).
	As to claim 1, Penn discloses a method of extracting data from an image comprising:
 	receiving, from an optical character recognition (OCR) system, OCR text
in response to sending an image to the OCR system, wherein the OCR text comprises a
plurality of lines of text (col. 5, lines 5-37);
 	classifying each line of text as either a line item or not a line item, wherein a line item is an individual expense item in a list of expense items and not a line item is not an individual expense item in the list of expense items (col. 5, lines 47-57, col. 6, lines 30-60, col. 12, lines 45-49); and
 	extracting a plurality of data fields from each line of text classified as a
line item (col. 6, lines 9-67).
 	Although Penn teaches using machine learning environment at column 14, lines 3, 22, Penn is silent on using a machine learning algorithm to classify each line of text as either a line item or not a line item.
 	Kumar teaches for each line of text in the plurality of lines of text, providing a machine learning
model input based on the line of text, the machine learning model configured to generate output
based on the line of text, the output classifying the line of text as either a line item or not a
line item, wherein a line item is an individual expense item in a list of expense items and not a line item is not an individual expense item in the list of expense items (col. 4, lines 9-12, 21-24, 28-29, 46-58).
It would have been obvious to one of ordinary skill in the art to use machine learning algorithm as taught by Kumar in Penn in order to quickly classify a line item on a receipt or an invoice.
As to claim 10, the combination of Penn and Kumar discloses the method of claim 1 wherein the line item comprises one or more of a room charge, a valet parking charge, a room service charge, an entertainment charge, a tax fee, or a resort fee, and wherein the extracted plurality of data fields comprises one or more of a date, an amount, a description, or an expense type (Penn, col. 3, lines 19-55; Kumar, Fig. 2, items 209, 210, col. 4, lines 57-59).
As to claims 11, 15, 16, 20, these claims recite features similar to features recited in claims 1, 10.  Therefore, they are rejected for reasons similar to those discussed above.
Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.
Claims 1-20 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1-3, 5-11, 13-16, 18-20 of U.S. Patent No. 10,824,854. Although the claims at issue are not identical, they are not patentably distinct from each other because the present claims are either anticipated by or obvious variants of the patent claims.  In particular, the following table shows the corresponding limitations between the present claims and the patent claims.

Present Application Claims
Patent claims
1. A method of extracting data from an image comprising: receiving, from an optical character recognition (OCR) system, OCR text in response to sending an image to the OCR system, wherein the OCR text comprises a plurality of lines of text; for each line of text in the plurality of lines of text, providing a machine learning model input based on the line of text, the machine learning model configured to generate output based on the line of text, the output classifying the line of text as either a line item or not a line item, wherein a line item is an individual expense item in a list of expense items and not a line item is not an individual expense item in the list of expense items; and extracting a plurality of data fields from each line of text classified as a line item. 

2. The method of claim 1 wherein classifying comprises: processing a term frequency-inverse document frequency (tf-idf) of a plurality of tokens from the plurality of lines of text using the machine learning algorithm to produce an output indicating whether each of the plurality of lines of text is a line item or not a line item. 



3. The method of claim 2 wherein classifying further comprises: normalizing numbers in each line of text to a same value; tokenizing each line of text to produce the plurality of tokens; and determining the tf-idf for the plurality of tokens from each line of text. 







4. The method of claim 3 wherein a classification software component performs said classifying step, including said normalizing numbers step, said normalizing number step occurring as the lines of text as the lines of text are processed, and wherein numbers in the lines of text are not normalized when input to said extracting step. 

5. The method of claim 2 wherein the tf-idf of each token in each line of text comprises a plurality of parameters comprising: a total number of lines of text, n, from a corpus of lines of text used to train the classification model; a term frequency specifying a number of times the term, t, shows up in a document, d; and a document frequency specifying a number of documents, d, that contain the term t, wherein documents are individual lines of text from the OCR text and terms are tokens. 

6. The method of claim 5 further comprising training the classification model using a corpus of lines of text, wherein each line of text in the corpus of lines of text is associated with an indicator specifying that a line of text is a line item or is not a line item, said training comprising: normalizing numbers in each line of text in the corpus to a same value; tokenizing each line of text in the corpus to produce a plurality of training tokens; determining a term frequency-inverse document frequency (tf-idf) of the plurality of tokens from each line of text in the corpus; and processing the tf-idf of the plurality of training tokens from each line of text in the corpus using a classification model to produce said trained classification model. 


7. The method of claim 1 further comprising: determining if adjacent lines of text comprise hanging text, wherein hanging text is a plurality of adjacent lines of text that form a single line item, including: determining a center line of text; and sequentially processing each line of text, starting at a line of text adjacent to the center line of text, until a header line of text and a footer line of text is found, wherein said determining if the adjacent lines of text comprise hanging text comprises sequentially processing the lines of text between the header line of text and footer line of text; appending the adjacent lines of text; and classifying the appended adjacent lines of text as either a line item or not a line item using the machine learning algorithm. 



8. The method of claim 7 wherein sequentially processing each line of text, starting at a line of text adjacent to the center line of text, until a header line of text is found comprises: moving up one line of text; classifying the current line of text using a second classification model to determine if the current line of text is a header line of text or not a header line of text; repeating said moving up one line of text and said classifying the current line of text until said classifying determines that the current line of text is a header line of text. 

9. The method of claim 7 wherein sequentially processing each line of text, starting at a line of text adjacent to the center line of text, until a header line of text and a footer line of text is found comprises: moving down one line of text; classifying the current line of text using a third classification model to determine if the current line of text is a footer line of text or not a footer line of text; repeating said moving down one line of text and said classifying the current line of text until said classifying determines that the current line of text is a footer line of text. 

10. The method of claim 1 wherein the line item comprises one or more of a room charge, a valet parking charge, a room service charge, an entertainment charge, a tax fee, or a resort fee, and wherein the extracted plurality of data fields comprises one or more of a date, an amount, a description, or an expense type. 

11. A computer system comprising: one or more processors; and non-transitory machine-readable medium coupled to the one or more processors, the non-transitory machine-readable medium storing a program executable by at least one of the processors, the program comprising sets of instructions for: receiving, from an optical character recognition (OCR) system, OCR text in response to sending an image to the OCR system, wherein the OCR text comprises a plurality of lines of text; for each line of text in the plurality of lines of text, providing a machine learning model input based on the line of text, the machine learning model configured to generate output based on the line of text, the output classifying the line of text as either a line item or not a line item, wherein a line item is an individual expense item in a list of expense items and not a line item is not an individual expense item in the list of expense items; and extracting a plurality of data fields from each line of text classified as a line item. 

12. The computer system of claim 11 wherein classifying comprises: processing a term frequency-inverse document frequency (tf-idf) of a plurality of tokens from the plurality of lines of text using the machine learning algorithm to produce an output indicating whether each of the plurality of lines of text is a line item or not a line item. 



13. The computer system of claim 12 wherein the tf-idf of each token in each line of text comprises a plurality of parameters comprising: a total number of lines of text, n, from a corpus of lines of text used to train the classification model; a term frequency specifying a number of times the term, t, shows up in a document, d; and a document frequency specifying a number of documents, d, that contain the term t, wherein documents are individual lines of text from the OCR text and terms are tokens. 


14. The computer system of claim 11 the program further comprising sets of instructions for: determining if adjacent lines of text comprise hanging text, wherein hanging text is a plurality of adjacent lines of text that form a single line item, including: determining a center line of text; and sequentially processing each line of text, starting at a line of text adjacent to the center line of text, until a header line of text and a footer line of text is found, wherein said determining if the adjacent lines of text comprise hanging text comprises sequentially processing the lines of text between the header line of text and footer line of text; appending the adjacent lines of text; and classifying the appended adjacent lines of text as either a line item or not a line item using the machine learning algorithm. 




15. The computer system of claim 11 wherein the line item comprises one or more of a room charge, a valet parking charge, a room service charge, an entertainment charge, a tax fee, or a resort fee, and wherein the extracted plurality of data fields comprises one or more of a date, an amount, a description, or an expense type. 

16. A non-transitory machine-readable medium storing a program executable by at least one processing unit of a computer, the program comprising sets of instructions for: receiving, from an optical character recognition (OCR) system, OCR text in response to sending an image to the OCR system, wherein the OCR text comprises a plurality of lines of text; for each line of text in the plurality of lines of text, providing a machine learning model input based on the line of text, the machine learning model configured to generate output based on the line of text, the output classifying the line of text as either a line item or not a line item, wherein a line item is an individual expense item in a list of expense items and not a line item is not an individual expense item in the list of expense items; and extracting a plurality of data fields from each line of text classified as a line item. 

17. The non-transitory machine-readable medium of claim 16 wherein classifying comprises: processing a term frequency-inverse document frequency (tf-idf) of a plurality of tokens from the plurality of lines of text using the machine learning algorithm to produce an output indicating whether each of the plurality of lines of text is a line item or not a line item. 


18. The non-transitory machine-readable medium of claim 17 wherein the tf-idf of each token in each line of text comprises a plurality of parameters comprising: a total number of lines of text, n, from a corpus of lines of text used to train the classification model; a term frequency specifying a number of times the term, t, shows up in a document, d; and a document frequency specifying a number of documents, d, that contain the term t, wherein documents are individual lines of text from the OCR text and terms are tokens. 


19. The non-transitory machine-readable medium of claim 16, the program further comprising sets of instructions for: determining if adjacent lines of text comprise hanging text, wherein hanging text is a plurality of adjacent lines of text that form a single line item: including: determining a center line of text; and sequentially processing each line of text, starting at a line of text adjacent to the center line of text, until a header line of text and a footer line of text is found, wherein said determining if the adjacent lines of text comprise hanging text comprises sequentially processing the lines of text between the header line of text and footer line of text; appending the adjacent lines of text; and classifying the appended adjacent lines of text as either a line item or not a line item using the machine learning algorithm. 



20. The non-transitory machine-readable medium of claim 16 wherein the line item comprises one or more of a room charge, a valet parking charge, a room service charge, an entertainment charge, a tax fee, or a resort fee, and wherein the extracted plurality of data fields comprises one or more of a date, an amount, a description, or an expense type. information.
1. A method of extracting data from an image comprising: receiving, from an optical character recognition (OCR) system, OCR text in response to sending an image to the OCR system, wherein the OCR text comprises a plurality of lines of text; classifying each line of text as either a line item or not a line item using a machine learning algorithm, comprising: processing a term frequency-inverse document frequency (tf-idf) of a plurality of tokens from the plurality of lines of text using a trained classification model to produce an output indicating whether each of the plurality of lines of text is a line item or not a line item, wherein a line item is an individual expense item in a list of expense items; and extracting a plurality of data fields from each line of text classified as a line item. 

 See claim 1;  processing a term frequency-inverse document frequency (tf-idf) of a plurality of tokens from the plurality of lines of text using a trained classification model to produce an output indicating whether each of the plurality of lines of text is a line item or not a line item, wherein a line item is an individual expense item in a list of expense items; and extracting a plurality of data fields from each line of text classified as a line item

2. The method of claim 1 wherein classifying further comprises: normalizing numbers in each line of text to a same value; tokenizing each line of text to produce the plurality of tokens; determining the tf-idf of the plurality of tokens from each line of text, wherein the output indicating whether each of the plurality of lines of text is a line item or not a line item includes a first value corresponding to the line of text being a line item or a second value corresponding to the line of text being not a line item. 

    3. The method of claim 2 wherein a classification software component performs said classifying step, including said normalizing numbers step, said normalizing number step occurring as the lines of text as the lines of text are processed, and wherein numbers in the lines of text are not normalized when input to said extracting step. 

5. The method of claim 1 wherein the tf-idf of each token in each line of text comprises a plurality of parameters comprising: a total number of lines of text, n, from a corpus of lines of text used to train the classification model; a term frequency specifying a number of times the term, t, shows up in a document, d; and a document frequency specifying a number of documents, d, that contain the term t, wherein documents are individual lines of text from the OCR text and terms are tokens. 

    6. The method of claim 5 further comprising training the classification model using a corpus of lines of text, wherein each line of text in the corpus of lines of text is associated with an indicator specifying that a line of text is a line item or is not a line item, said training comprising: normalizing numbers in each line of text in the corpus to a same value; tokenizing each line of text in the corpus to produce a plurality of training tokens; determining a term frequency-inverse document frequency (tf-idf) of the plurality of tokens from each line of text in the corpus; and processing the tf-idf of the plurality of training tokens from each line of text in the corpus using a classification model to produce said trained classification model. 

(see claim 8 including claim 7)   
 7. The method of claim 1 further comprising: determining if adjacent lines of text comprise hanging text, wherein hanging text is a plurality of adjacent lines of text that form a single line item; appending the adjacent lines of text; and classifying the appended adjacent lines of text as either a line item or not a line item using the machine learning algorithm. 

    8. The method of claim 7 wherein determining if adjacent lines of text comprise hanging text comprises: determining a center line of text; and sequentially processing each line of text, starting at a line of text adjacent to the center line of text, until a header line of text and a footer line of text is found, wherein said determining if the adjacent lines of text comprise hanging text comprises sequentially processing the lines of text between the header line of text and footer line of text. 

    9. The method of claim 8 wherein sequentially processing each line of text, starting at a line of text adjacent to the center line of text, until a header line of text is found comprises: moving up one line of text; classifying the current line of text using a second classification model to determine if the current line of text is a header line of text or not a header line of text; repeating said moving up one line of text and said classifying the current line of text until said classifying determines that the current line of text is a header line of text. 

    10. The method of claim 8 wherein sequentially processing each line of text, starting at a line of text adjacent to the center line of text, until a header line of text and a footer line of text is found comprises: moving down one line of text; classifying the current line of text using a third classification model to determine if the current line of text is a footer line of text or not a footer line of text; repeating said moving down one line of text and said classifying the current line of text until said classifying determines that the current line of text is a footer line of text. 

 See claim 1; wherein a line item is an individual expense item in a list of expense items; and extracting a plurality of data fields from each line of text classified as a line item.




   11. A computer system comprising: one or more processors; and non-transitory machine-readable medium coupled to the one or more processors, the non-transitory machine-readable medium storing a program executable by at least one of the processors, the program comprising sets of instructions for: receiving, from an optical character recognition (OCR) system, OCR text in response to sending an image to the OCR system, wherein the OCR text comprises a plurality of lines of text; classifying each line of text as either a line item or not a line item using a machine learning algorithm, comprising: processing a term frequency-inverse document frequency (tf-idf) of a plurality of tokens from the plurality of lines of text using a trained classification model to produce an output indicating whether each of the plurality of lines of text is a line item or not a line item, wherein a line item is an individual expense item in a list of expense items; and extracting a plurality of data fields from each line of text classified as a line item. 

See claim 11; processing a term frequency-inverse document frequency (tf-idf) of a plurality of tokens from the plurality of lines of text using a trained classification model to produce an output indicating whether each of the plurality of lines of text is a line item or not a line item, wherein a line item is an individual expense item in a list of expense items; and extracting a plurality of data fields from each line of text classified as a line item.

    13. The computer system of claim 11 wherein the tf-idf of each token in each line of text comprises a plurality of parameters comprising: a total number of lines of text, n, from a corpus of lines of text used to train the classification model; a term frequency specifying a number of times the term, t, shows up in a document, d; and a document frequency specifying a number of documents, d, that contain the term t, wherein documents are individual lines of text from the OCR text and terms are tokens. 

  (See claim 15 including claim 14)
  14. The computer system of claim 11 the program further comprising sets of instructions for: determining if adjacent lines of text comprise hanging text, wherein hanging text is a plurality of adjacent lines of text that form a single line item; appending the adjacent lines of text; and classifying the appended adjacent lines of text as either a line item or not a line item using the machine learning algorithm. 
    15. The computer system of claim 14 wherein determining if adjacent lines of text comprise hanging text comprises: determining a center line of text; and sequentially processing each line of text, starting at a line of text adjacent to the center line of text, until a header line of text and a footer line of text is found, wherein said determining if the adjacent lines of text comprise hanging text comprises sequentially processing the lines of text between the header line of text and footer line of text. 

   See claim 11; wherein a line item is an individual expense item in a list of expense items; and extracting a plurality of data fields from each line of text classified as a line item.





 16. A non-transitory machine-readable medium storing a program executable by at least one processing unit of a computer, the program comprising sets of instructions for: receiving, from an optical character recognition (OCR) system, OCR text in response to sending an image to the OCR system, wherein the OCR text comprises a plurality of lines of text; classifying each line of text as either a line item or not a line item using a machine learning algorithm, comprising: processing a term frequency-inverse document frequency (tf-idf) of a plurality of tokens from the plurality of lines of text using a trained classification model to produce an output indicating whether each of the plurality of lines of text is a line item or not a line item, wherein a line item is an individual expense item in a list of expense items; and extracting a plurality of data fields from each line of text classified as a line item. 

    See claim 16; processing a term frequency-inverse document frequency (tf-idf) of a plurality of tokens from the plurality of lines of text using a trained classification model to produce an output indicating whether each of the plurality of lines of text is a line item or not a line item, wherein a line item is an individual expense item in a list of expense items; and extracting a plurality of data fields from each line of text classified as a line item.

  18. The non-transitory machine-readable medium of claim 16 wherein the tf-idf of each token in each line of text comprises a plurality of parameters comprising: a total number of lines of text, n, from a corpus of lines of text used to train the classification model; a term frequency specifying a number of times the term, t, shows up in a document, d; and a document frequency specifying a number of documents, d, that contain the term t, wherein documents are individual lines of text from the OCR text and terms are tokens. 

 (See claim 20 including claim 19)
  19. The non-transitory machine-readable medium of claim 16, the program further comprising sets of instructions for: determining if adjacent lines of text comprise hanging text, wherein hanging text is a plurality of adjacent lines of text that form a single line item; appending the adjacent lines of text; and classifying the appended adjacent lines of text as either a line item or not a line item using the machine learning algorithm. 

    20. The non-transitory machine-readable medium of claim 19 wherein determining if adjacent lines of text comprise hanging text comprises: determining a center line of text; and sequentially processing each line of text, starting at a line of text adjacent to the center line of text, until a header line of text and a footer line of text is found, wherein said determining if the adjacent lines of text comprise hanging text comprises sequentially processing the lines of text between the header line of text and footer line of text.
See claim 16; wherein a line item is an individual expense item in a list of expense items; and extracting a plurality of data fields from each line of text classified as a line item.


Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to PHUOC TRAN whose telephone number is (571)272-7399. The examiner can normally be reached 9am-5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Vu Le can be reached on 571-272-7332. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/PHUOC TRAN/Primary Examiner, Art Unit 2668