DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 3/16/2022 was filed after the mailing date of 3/16/2022.  The submission is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claim(s) 1-3 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Pandian et al (US20050289182).
Regarding claim 1, Pandian discloses a method of classifying electronic documents (para. [0014], The Image Collaborator may identify and index such a statement by, for example, identifying the name of the associated bank, the range of dates that the bank statement covers, the account number and other key indexing information. The remainder of the document may be processed through an optical character recognition module to create a digital package which is available for a line of business application) comprising: 
providing one or more electronic documents to be sorted (para. [0187], bank statements from, for example, Bank of America may be efficiently processed together by use of a sorting mechanism) and classified (para. [0081], The image identification module 34, provides the ability to, for example, distinguish a Bank of America bank statement from a Citibank bank statement or from a utility bill), the one or more electronic documents each including data to be extracted (para. [0012], [0014], The system captures data from electronic documents as diverse as facsimile images, scanned images and images from document management systems; The Image Collaborator may identify and index such a statement by, for example, identifying the name of the associated bank, the range of dates that the bank statement covers, the account number and other key indexing information); 
comparing an electronic document from the one or more electronic documents to a template (para. [0141], In accordance with an exemplary implementation, a template library 31 for structured forms processing is utilized by the enhanced image module 35 and OCR module 37 for enabling the modules 35 and 37 to identify structured forms which are input via input image module 33), wherein the template includes one or more objects to be compared to the electronic document and further wherein the template includes parameters that define data to be extracted from the document (para. [0141], In this fashion, a form template may be accessed and compared with an input image to identify that the input image has a structure which is known, for example, to be a Bank of America account statement. Such identification may occur by identifying, for example, a particular logo in an area of an input image by comparison with a template); 
determining a match between the electronic document and the template based on the presence of one or more template objects in the electronic document (para. [0141], Such identification may occur by identifying, for example, a particular logo in an area of an input image by comparison with a template); 
extracting data from the electronic document based on template parameters (56 in fig. 3, para. [0145]-[0146], FIG. 3 is an exemplary flow diagram showing the data extraction process in an exemplary Image Collaborator implementation;  In this fashion, the bank statement may be scanned for an account number); and 
associating the extracted data with the electronic document (fig. 31, para. [0605], The left-hand pane is known as the Data pane. It displays the data extracted from the document image as specified by your dictionaries. The document image from which the data was extracted is displayed in the Image pane on the right. The Image pane displays the document image that results from the OCR processing).


Regarding claim 2, Pandian discloses a method wherein the template object includes a graphic image (para. [0141], Such identification may occur by identifying, for example, a particular logo in an area of an input image by comparison with a template).


Regarding claim 3, Pandian discloses a method wherein the template includes a second object and wherein the second object comprises a text (para. [0083], The data extraction module 37, upon recognizing, for example, that the electronic document is a Bank of America bank statement, operates to extract such key information as the account number, statement date).


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 4-6 is/are rejected under 35 U.S.C. 103 as being unpatentable over Pandian et al (US20050289182) in view of Ghessassi (US20140281910).
Regarding claim 4, Pandian fails to teach a method wherein the template parameters include an anchor object and a predefined location of a data to be extracted from the electronic document based on the location of the anchor object on the electronic document.
However Ghessassi teaches wherein the template parameters include an anchor object (700 in fig. 7, para. [0023], [0045], The document processor 112 determines that at least a number of the objects (aaa-ddd) in the anchor area 300 are common to the objects aaa-iii and zzz in the reference document 102; The anchor area 300 in FIG. 7 includes the graphical object 700 along with various text objects and number objects) and a predefined location of a data to be extracted from an electronic document (312 in fig. 3) based on the location of the anchor object on the electronic document (para. [0046], If the graphical object 700 in the template document 101 anchor area 300 is substantially the same as the graphical object 700 in the reference document 102 (along with the common text objects and their distances) the process will then be able to retrieve the corresponding field zone 312 in the reference document 102).
Therefore taking the combined teachings of Pandian and Ghessassi as a whole, it would have been obvious to one of ordinary skill in the art at the time the invention was filed to incorporate the steps of Ghessassi into the method of Pandian. The motivation to combine Ghessassi and Pandian would be to improve overall accuracy (para. [0003], [0030] of Ghessassi).


Regarding claim 5, the modified invention of Pandian teaches a method further comprising determining the location of the anchor object on the electronic document (para. [0046] of Ghessassi, If the graphical object 700 in the template document 101 anchor area 300 is substantially the same as the graphical object 700 in the reference document 102 (along with the common text objects and their distances) the process will then be able to retrieve the corresponding field zone 312 in the reference document 102).


Regarding claim 6, the modified invention of Pandian teaches a method further comprising the step of locating the data to be extracted from the electronic document (312 in fig. 3, 302 in fig. 7 and para. [0020] of Ghessassi, For example, if the template document 101 and the reference document 102 are invoice documents from a company, the anchor area 300 may be an area associated with items included in a total amount on the invoice. The field zone 302 may be an area that contains the total amount for the invoice (e.g., see FIG. 7 as an illustrative example). The total amount of the invoice is identified and retrieved in order to process the invoice) based on the location of the anchor in the electronic document (para. [0025] of Ghessassi, In response to determining that the distances 301A-301C are substantially the same as the distances 311A-311C, the object retriever 113 identifies and retrieves a corresponding target object 312 from a corresponding target zone in the reference document 102 (zzz) based on the distance 301D between the anchor area 300 and the corresponding field zone 312 in the reference document 102).

Claims 7 and 8 is/are rejected under 35 U.S.C. 103 as being unpatentable over Guilak et al (US20030225763) in view of Pandian et al (US20050289182).
Regarding claim 7, Guilak teaches a method of classifying electronic documents (abstract, A self-improving system and method for classifying a plurality of digital documents) comprising: 
training a neural network to determine common features within a document classification (para. [0044],  For illustration purposes, the committee machine 109 is described herein as a neural network 302 based classifier. There are essentially two phases in an automatic classification process: a training phase, and a classification phase. During the training phase, training data 304 stored in the training data store 111 is used to develop a list of input features and parameter weights useful in classifying documents relative to specified topics or categories. Typically, the training data 304 consist of a large collection of documents, which have been previously classified), wherein training comprises the steps of: 
analyzing a set of electronic documents within a common classification (para. [0044], During the training phase, training data 304 stored in the training data store 111 is used to develop a list of input features and parameter weights useful in classifying documents relative to specified topics or categories. Typically, the training data 304 consist of a large collection of documents, which have been previously classified); 
determining common features between the set of electronic documents within the common classification (para. [0044], During the training phase, training data 304 stored in the training data store 111 is used to develop a list of input features and parameter weights useful in classifying documents relative to specified topics or categories); 
providing one or more electronic documents to be sorted and classified (para. [0046], During standard operation (i.e., the classification phase), the committee machine 109 evaluates extracted features from documents 102), the one or more electronic documents each including data to be extracted (fig. 2, para. [0025], A feature extraction tool 108 extracts text features and context features from each of the documents retrieved by the retrieval tool 107); 
comparing an electronic document from the one or more electronic documents to the common features within a given classification (para. [0046], During standard operation (i.e., the classification phase), the committee machine 109 evaluates extracted features from documents 102 with the function defined by the parameter weights stored in the parameter weight database 314); and
determining a match between the electronic document and the classification based on similarities between the electronic document and the common features (para. [0046], After the document 102 receives a rating, it can be classified into a category by comparing the document rating to a predetermined or user specified threshold value).
 
Guilak fails to teach extracting data from the electronic document based on parameters associated with the classification and associating the extracted data with the electronic document.
However Pandian teaches extracting data from an electronic document based on parameters associated with classification (para. [0135], [0145], The application extracts data corresponding to key words from a document image. It allows the user to find key words, verify their consistency, perform word analysis, group related documents and publish the results as index files which are easy to read and understand; FIG. 3 is an exemplary flow diagram showing the data extraction process in an exemplary Image Collaborator implementation) and associating the extracted data with the electronic document (fig. 31, para. [0605], The left-hand pane is known as the Data pane. It displays the data extracted from the document image as specified by your dictionaries. The document image from which the data was extracted is displayed in the Image pane on the right. The Image pane displays the document image that results from the OCR processing).
Therefore taking the combined teachings of Guilak and Pandian as a whole, it would have been obvious to one of ordinary skill in the art at the time the invention was filed to incorporate the steps of Pandian into the method of Guilak. The motivation to combine Guilak and Pandian would be to efficient store, retrieve, search and route electronic documents (para. [0008] of Pandian).


Regarding claim 8, the modified invention of Guilak teaches a method further comprising, in the event of no classification match determined by the neural network, comparing the electronic document from the one or more electronic documents to a template (para. [0141] of Pandian, In accordance with an exemplary implementation, a template library 31 for structured forms processing is utilized by the enhanced image module 35 and OCR module 37 for enabling the modules 35 and 37 to identify structured forms which are input via input image module 33), wherein the template includes one or more objects to be compared to the electronic document and further wherein the template includes parameters that define data to be extracted from the document (para. [0141] of Pandian, a form template may be accessed and compared with an input image to identify that the input image has a structure which is known, for example, to be a Bank of America account statement. Such identification may occur by identifying, for example, a particular logo in an area of an input image by comparison with a template).

Claims 9-11 is/are rejected under 35 U.S.C. 103 as being unpatentable over Guilak et al (US20030225763) and Pandian et al (US20050289182) in view of Ghessassi (US20140281910).
Regarding claim 9, the modified invention of Guilak fails to teach a method wherein the template parameters include an anchor object and a predefined location of a data to be extracted from the electronic document based on the location of the anchor object on the electronic document.
However Ghessassi teaches wherein the template parameters include an anchor object (700 in fig. 7, para. [0023], [0045], The document processor 112 determines that at least a number of the objects (aaa-ddd) in the anchor area 300 are common to the objects aaa-iii and zzz in the reference document 102; The anchor area 300 in FIG. 7 includes the graphical object 700 along with various text objects and number objects) and a predefined location of a data to be extracted from an electronic document (312 in fig. 3) based on the location of the anchor object on the electronic document (para. [0046], If the graphical object 700 in the template document 101 anchor area 300 is substantially the same as the graphical object 700 in the reference document 102 (along with the common text objects and their distances) the process will then be able to retrieve the corresponding field zone 312 in the reference document 102).
Therefore taking the combined teachings of Guilak and Pandian with Ghessassi as a whole, it would have been obvious to one of ordinary skill in the art at the time the invention was filed to incorporate the steps of Ghessassi into the method of Guilak and Pandian. The motivation to combine Ghessassi, Guilak and Pandian would be to improve overall accuracy (para. [0003], [0030] of Ghessassi).


Regarding claim 10, the modified invention of Guilak teaches a method further comprising determining the location of the anchor object on the electronic document (para. [0046] of Ghessassi, If the graphical object 700 in the template document 101 anchor area 300 is substantially the same as the graphical object 700 in the reference document 102 (along with the common text objects and their distances) the process will then be able to retrieve the corresponding field zone 312 in the reference document 102).


Regarding claim 11, the modified invention of Guilak teaches a method further comprising the step of locating the data to be extracted from the electronic document (312 in fig. 3, 302 in fig. 7 and para. [0020] of Ghessassi, For example, if the template document 101 and the reference document 102 are invoice documents from a company, the anchor area 300 may be an area associated with items included in a total amount on the invoice. The field zone 302 may be an area that contains the total amount for the invoice (e.g., see FIG. 7 as an illustrative example). The total amount of the invoice is identified and retrieved in order to process the invoice) based on the location of the anchor in the electronic document (para. [0025] of Ghessassi, In response to determining that the distances 301A-301C are substantially the same as the distances 311A-311C, the object retriever 113 identifies and retrieves a corresponding target object 312 from a corresponding target zone in the reference document 102 (zzz) based on the distance 301D between the anchor area 300 and the corresponding field zone 312 in the reference document 102).


Claims 12-14 is/are rejected under 35 U.S.C. 103 as being unpatentable over Prakash et al (US7519565) in view of Pandian et al (US20050289182).
Regarding claim 12, Prakash teaches a method of classifying electronic documents (abstract, Embodiments of the invention provide methods and apparatuses for classifying electronic documents) comprising: 
providing a set of electronic documents in a common classification (col. 4 lines 44-50, For one such embodiment, features are identified by evaluating a set of electronic documents (training set), each of which has been classified (e.g., classified as either spam e-mails or legitimate e-mails)), wherein the classification includes parameters for determining data within each document to be extracted (col. 2 lines 45-47, col. 3 lines 14-18,  each of a plurality of electronic communications is reduced to a multidimensional vector (MDV) based on an MDV space; The MDV space is defined by a plurality of features. Features may be of various types including words and or phrases contained within the body or header of the electronic documents); 
determining a unique vector for each document in the set of commonly classified electronic documents (col. 5 line 32-34, The MDV space containing a number of MDVs, each corresponding to a previously classified electronic communication), wherein the vector for each classified document is determined by assigning numeric values to attributes of the document (col. 3 lines 53-54, The resulting MDV 215 is {01, 12, 13, 24, 05, 16, 07, 08, ...0N}); 
determining a vector value for an unclassified document (col. 5 lines 17-20, A to-be-classified electronic document (subject electronic communication (SEC)) is reduced to an MDV based upon the defined MDV space and compared with the MDVs populating the MDV space), wherein the vector value is determined by assigning numeric values to attributes of the unclassified document (col. 5 lines 17-20, A to-be-classified electronic document (subject electronic communication (SEC)) is reduced to an MDV based upon the defined MDV space and compared with the MDVs populating the MDV space);
comparing the vector value of the unclassified document to vector values of the classified documents (col. 5 lines 17-20, A to-be-classified electronic document (subject electronic communication (SEC)) is reduced to an MDV based upon the defined MDV space and compared with the MDVs populating the MDV space); and
determining the presence of a match between the unclassified document and the classified documents based on a predetermined threshold level (col. 5 lines 21-23, col. 6 lines 15-18, Proximity of the MDV resulting from the SEC (subject MDV) to one or more MDVs of the MDV space leads to a determination that the SEC is of the same category; thresholding may be used to consider only neighbors that are within a threshold distance to the subject MDV).

Prakash fails to teach extracting data from the electronic document based on parameters associated with the classification and associating the extracted data with the electronic document.
However Pandian teaches extracting data from an electronic document based on parameters associated with classification (para. [0135], [0145], The application extracts data corresponding to key words from a document image. It allows the user to find key words, verify their consistency, perform word analysis, group related documents and publish the results as index files which are easy to read and understand; FIG. 3 is an exemplary flow diagram showing the data extraction process in an exemplary Image Collaborator implementation) and associating the extracted data with the electronic document (fig. 31, para. [0605], The left-hand pane is known as the Data pane. It displays the data extracted from the document image as specified by your dictionaries. The document image from which the data was extracted is displayed in the Image pane on the right. The Image pane displays the document image that results from the OCR processing).
Therefore taking the combined teachings of Prakash and Pandian as a whole, it would have been obvious to one of ordinary skill in the art at the time the invention was filed to incorporate the steps of Pandian into the method of Prakash. The motivation to combine Prakash and Pandian would be to efficient store, retrieve, search and route electronic documents (para. [0008] of Pandian).


Regarding claim 13, the modified invention of Prakash teaches a method wherein the comparison of the unclassified document vector to classified document vector values includes determining a cosine of the unclassified document vector and the classified document vectors (col. 6 lines 7-12 of Prakash, For one embodiment of the invention, a cosine similarity distance metric is used. A cosine similarity distance metric computes the similarity between two MDVs based upon the angle (through the origin) between the two MDVs. That is, the smaller the angle between two MDVs, the more similar the two MDVs are).


Regarding claim 14, the modified invention of Prakash teaches a method further comprising, in the event of no classification match determined by the vector comparison, comparing the electronic document from the one or more electronic documents to a template (para. [0141] of Pandian, In accordance with an exemplary implementation, a template library 31 for structured forms processing is utilized by the enhanced image module 35 and OCR module 37 for enabling the modules 35 and 37 to identify structured forms which are input via input image module 33), wherein the template includes one or more objects to be compared to the electronic document and further wherein the template includes parameters that define data to be extracted from the document (para. [0141] of Pandian, a form template may be accessed and compared with an input image to identify that the input image has a structure which is known, for example, to be a Bank of America account statement. Such identification may occur by identifying, for example, a particular logo in an area of an input image by comparison with a template).


Claims 15-17 is/are rejected under 35 U.S.C. 103 as being unpatentable over Prakash et al (US7519565) and Pandian et al (US20050289182) in view of Ghessassi (US20140281910).
Regarding claim 15, the modified invention of Prakash fails to teach a method wherein the template parameters include an anchor object and a predefined location of a data to be extracted from the electronic document based on the location of the anchor object on the electronic document.
However Ghessassi teaches wherein the template parameters include an anchor object (700 in fig. 7, para. [0023], [0045], The document processor 112 determines that at least a number of the objects (aaa-ddd) in the anchor area 300 are common to the objects aaa-iii and zzz in the reference document 102; The anchor area 300 in FIG. 7 includes the graphical object 700 along with various text objects and number objects) and a predefined location of a data to be extracted from an electronic document (312 in fig. 3) based on the location of the anchor object on the electronic document (para. [0046], If the graphical object 700 in the template document 101 anchor area 300 is substantially the same as the graphical object 700 in the reference document 102 (along with the common text objects and their distances) the process will then be able to retrieve the corresponding field zone 312 in the reference document 102).
Therefore taking the combined teachings of Prakash and Pandian with Ghessassi as a whole, it would have been obvious to one of ordinary skill in the art at the time the invention was filed to incorporate the steps of Ghessassi into the method of Prakash and Pandian. The motivation to combine Ghessassi, Prakash and Pandian would be to improve overall accuracy (para. [0003], [0030] of Ghessassi).


Regarding claim 16, the modified invention of Prakash teaches a method further comprising determining the location of the anchor object on the electronic document (para. [0046] of Ghessassi, If the graphical object 700 in the template document 101 anchor area 300 is substantially the same as the graphical object 700 in the reference document 102 (along with the common text objects and their distances) the process will then be able to retrieve the corresponding field zone 312 in the reference document 102).


Regarding claim 17, the modified invention of Prakash teaches a method further comprising the step of locating the data to be extracted from the electronic document (312 in fig. 3, 302 in fig. 7 and para. [0020] of Ghessassi, For example, if the template document 101 and the reference document 102 are invoice documents from a company, the anchor area 300 may be an area associated with items included in a total amount on the invoice. The field zone 302 may be an area that contains the total amount for the invoice (e.g., see FIG. 7 as an illustrative example). The total amount of the invoice is identified and retrieved in order to process the invoice) based on the location of the anchor in the electronic document (para. [0025] of Ghessassi, In response to determining that the distances 301A-301C are substantially the same as the distances 311A-311C, the object retriever 113 identifies and retrieves a corresponding target object 312 from a corresponding target zone in the reference document 102 (zzz) based on the distance 301D between the anchor area 300 and the corresponding field zone 312 in the reference document 102).


Related Art
Cali et al (US20190220660) teaches classifying a document and extracting data from the classified document (abstract) using a neural network (para. [0018], [0023]).


Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to LEON VIET Q NGUYEN whose telephone number is (571)270-1185. The examiner can normally be reached Mon-Fri 11AM-7PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Claire Wang can be reached on 571-270-1051. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/LEON VIET Q NGUYEN/Primary Examiner, Art Unit 2663