DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .


Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claim(s) 1-3, 5, 6, 9-12, 14 -17 and 20-23 is/are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Yanamandra U.S. PAP 2020/0394396 A1.

Regarding claim 1 Yanamandra teaches a computer-implemented method comprising: 
classifying portions of an electronic file using a trained machine learning based model, the classifications representing relative positions of the portions within sub-documents of the electronic file(the classification system receives a set of input pages and separates the input pages into a first set of classified documents and a set of unclassified pages., see par. [0008]); 
splitting the electronic file into the sub-documents based on the relative positions of the portions (The classification system applies the page-level recognition model to the unclassified pages to recognize the logical boundaries between documents and separates the unclassified pages into separated documents based on the logical boundaries, see par. [0008]); 
and outputting the sub-documents (identify a sequence of unstructured document pages from a first unstructured document page classified as the document start page to a second unstructured document page classified as the document end page. The sequence can be identified based on the page-level classifications for the unstructured document pages in the set of unstructured document pages, see par. [0010]).
Regarding claim 2 Yanamandra teaches the computer-implemented method of claim 1, wherein the classifications representing the relative positions of the portions within the sub-documents of the electronic file comprise a classification representing a first portion of a sub-document, a classification representing a last portion of a sub-document, and a classification representing a portion of a sub-document between the first portion and the last portion (recognition model trained to recognize a document  start page and a document end page and a document-level recognition model trained to recognize a document type, see par. [0010]).
Regarding claim 3 Yanamandra teaches the computer-implemented method of claim 1, wherein classifying portions of an electronic file using a trained machine learning based model comprises: mapping features of interest extracted from each of the portions of the electronic file to the classifications, the features of interest comprising one or more of a word cloud, a page number, or text related features (content classification system 200 indexes documents, user interface 220 displays a set of document metadata 240 for unstructured document 228, including the classification “Doc_type: Mortgage”, the confidence of the classification “Weightage:86.519”, indexing keywords extracted from unstructured document 228 and other metadata, see par. [0048]).
Regarding claim 5 Yanamandra teaches the computer-implemented method of claim 1, wherein splitting the electronic file into the sub-documents based on the relative positions of the portions comprises: splitting the electronic file immediately prior to each portion classified as being a first portion of a sub-document (the classification system includes any prior orphan pages that are in a continuous sequence with the unexpected end page in the new page group, see par. [0093]).
Regarding claim 6 Yanamandra teaches the computer-implemented method of claim 1, wherein the portions of the electronic file correspond to pages of the electronic file (system that separates unclassified pages into unclassified, separated documents, see par. [0008]).
Regarding claim 9 Yanamandra teaches the computer-implemented method of claim 1, further comprising: classifying the sub-documents using a classifier(page-level recognition model 416 is a machine learning classifier, see par. [0058]).
Regarding claim 10 Yanamandra teaches an apparatus comprising: a memory storing computer instructions ( a computer processor 1404 and associated memory, see par. [0108]); and at least one processor configured to execute the computer instructions (processor, se par. [0127]), the computer instructions configured to cause the at least one processor to perform operations of: 
classifying portions of an electronic file using a trained machine learning based model, the classifications representing relative positions of the portions within sub-documents of the electronic file(the classification system receives a set of input pages and separates the input pages into a first set of classified documents and a set of unclassified pages., see par. [0008]); 
splitting the electronic file into the sub-documents based on the relative positions of the portions (The classification system applies the page-level recognition model to the unclassified pages to recognize the logical boundaries between documents and separates the unclassified pages into separated documents based on the logical boundaries, see par. [0008]); 
and outputting the sub-documents (identify a sequence of unstructured document pages from a first unstructured document page classified as the document start page to a second unstructured document page classified as the document end page. The sequence can be identified based on the page-level classifications for the unstructured document pages in the set of unstructured document pages, see par. [0010]).
Regarding claim 11 Yanamandra teaches the apparatus of claim 10, wherein the classifications representing the relative positions of the portions within the sub-documents of the electronic file comprise a classification representing a first portion of a sub-document, a classification representing a last portion of a sub-document, and a classification representing a portion of a sub-document between the first portion and the last portion (recognition model trained to recognize a document  start page and a document end page and a document-level recognition model trained to recognize a document type, see par. [0010]).
Regarding claim 12 Yanamandra teaches the apparatus of claim 10, wherein classifying portions of an electronic file using a trained machine learning based model comprises: mapping features of interest extracted from each of the portions of the electronic file to the classifications, the features of interest comprising one or more of a word cloud, a page number, or text related features (content classification system 200 indexes documents, user interface 220 displays a set of document metadata 240 for unstructured document 228, including the classification “Doc_type: Mortgage”, the confidence of the classification “Weightage:86.519”, indexing keywords extracted from unstructured document 228 and other metadata, see par. [0048]).
Regarding claim 14 Yanamandra teaches the apparatus of claim 10, wherein splitting the electronic file into the sub-documents based on the relative positions of the portions comprises: splitting the electronic file immediately prior to each portion classified as being a first portion of a sub-document. (the classification system includes any prior orphan pages that are in a continuous sequence with the unexpected end page in the new page group, see par. [0093]).
Regarding claim 15 Yanamandra teaches a computer program embodied on a non-transitory computer-readable medium, the computer program configured to cause at least one processor to perform operations comprising (computer-readable program code may be stored, in whole or in part, temporarily or permanently, on a non-transitory computer readable medium, see par. [0126]): 
classifying portions of an electronic file using a trained machine learning based model, the classifications representing relative positions of the portions within sub-documents of the electronic file(the classification system receives a set of input pages and separates the input pages into a first set of classified documents and a set of unclassified pages., see par. [0008]); 
splitting the electronic file into the sub-documents based on the relative positions of the portions (The classification system applies the page-level recognition model to the unclassified pages to recognize the logical boundaries between documents and separates the unclassified pages into separated documents based on the logical boundaries, see par. [0008]); 
and outputting the sub-documents (identify a sequence of unstructured document pages from a first unstructured document page classified as the document start page to a second unstructured document page classified as the document end page. The sequence can be identified based on the page-level classifications for the unstructured document pages in the set of unstructured document pages, see par. [0010]).
Regarding claim 16 Yanamandra teaches the computer program of claim 15, wherein the classifications representing the relative positions of the portions within the sub-documents of the electronic file comprise a classification representing a first portion of a sub-document, a classification representing a last portion of a sub-document, and a classification representing a portion of a sub-document between the first portion and the last portion (recognition model trained to recognize a document  start page and a document end page and a document-level recognition model trained to recognize a document type, see par. [0010]).
Regarding claim 17 Yanamandra teaches the computer program of claim 15, wherein the portions of the electronic file correspond to pages of the electronic file (system that separates unclassified pages into unclassified, separated documents, see par. [0008]).
Regarding claim 20 Yanamandra teaches the computer program of claim 15, the operations further comprising: classifying the sub-documents using a classifier (page-level recognition model 416 is a machine learning classifier, see par. [0058]). 
Regarding claim 21 Yanamandra teaches the computer-implemented method of claim 1, wherein the classifying, the splitting, and the outputting are performed by one or more computing devices implemented in a cloud computing system (embodiments can be employed in distributed computing environments, where tasks or modules are performed by remote processing devices, which are linked through a communications network such as a LAN, WAN, and/or the Internet, see par. [0124]).
Regarding claim 22 Yanamandra teaches apparatus of claim 10, wherein the apparatus is implemented in a cloud computing system (embodiments can be employed in distributed computing environments, where tasks or modules are performed by remote processing devices, which are linked through a communications network such as a LAN, WAN, and/or the Internet, see par. [0124]).
Regarding claim 23 Yanamandra teaches computer program of claim 15, wherein the at least one processor is implemented in one or more computing devices and the one or more computing devices are implemented in a cloud computing system(embodiments can be employed in distributed computing environments, where tasks or modules are performed by remote processing devices, which are linked through a communications network such as a LAN, WAN, and/or the Internet, see par. [0124]).

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 7-8 and 18-19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Yanamandra U.S. PAP 2020/0394396 A1, in view of Ghatage U.S. PAP 2020/0234183 A1.

Regarding claim 7 Yanamandra does not teach the computer-implemented method of claim 1, wherein the trained machine learning based model comprises a trained deep learning model.
In the same field of endeavor Ghatage teaches that Automation and Artificial Intelligence (AI) are transforming the various aspects of human society by increasing productivity, see par. [0001]. An AI-based data transformation system. The data transformation system receives an input package containing a plurality of documents which are used to generate mappings that enable a RPA system to execute one or more automated processes. In addition to the plurality of documents, the input package can include metadata that enables the generation of the mappings, see par. [0016]. The ML-based relationship model 140 is trained on the training data 120 for generating the mappings 156. In an example, the ML relationship model 140 can be implemented via Long Short-term Memory (LSTM) methodology, see par. [0033].
It would have been obvious to one of ordinary skill in the art to combine the Yanamandra invention with the teachings of Ghatage for the benefit of implementing an AI system to increase user productivity, see par. [0001].

Regarding claim 8 Yanamandra does not teach the computer-implemented method of claim 1, wherein the trained machine learning based model is based on one of a LSTM (long short-term memory) architecture, a Bi-LSTM (bi-directional LSTM) architecture, or a seq2seq (sequence-to-sequence) architecture.
In the same field of endeavor Ghatage teaches that Automation and Artificial Intelligence (AI) are transforming the various aspects of human society by increasing productivity, see par. [0001]. An AI-based data transformation system. The data transformation system receives an input package containing a plurality of documents which are used to generate mappings that enable a RPA system to execute one or more automated processes. In addition to the plurality of documents, the input package can include metadata that enables the generation of the mappings, see par. [0016]. The ML-based relationship model 140 is trained on the training data 120 for generating the mappings 156. In an example, the ML relationship model 140 can be implemented via Long Short-term Memory (LSTM) methodology, see par. [0033].
It would have been obvious to one of ordinary skill in the art to combine the Yanamandra invention with the teachings of Ghatage for the benefit of implementing an AI system to increase user productivity, see par. [0001].


Regarding claim 18 Yanamandra does not teach the computer program of claim 15, wherein the trained machine learning based model comprises a trained deep learning model.
In the same field of endeavor Ghatage teaches that Automation and Artificial Intelligence (AI) are transforming the various aspects of human society by increasing productivity, see par. [0001]. An AI-based data transformation system. The data transformation system receives an input package containing a plurality of documents which are used to generate mappings that enable a RPA system to execute one or more automated processes. In addition to the plurality of documents, the input package can include metadata that enables the generation of the mappings, see par. [0016]. The ML-based relationship model 140 is trained on the training data 120 for generating the mappings 156. In an example, the ML relationship model 140 can be implemented via Long Short-term Memory (LSTM) methodology, see par. [0033].
It would have been obvious to one of ordinary skill in the art to combine the Yanamandra invention with the teachings of Ghatage for the benefit of implementing an AI system to increase user productivity, see par. [0001].


Regarding claim 19 Yanamandra does not teach the computer program of claim 15, wherein the trained machine learning based model is based on one of a LSTM (long short-term memory) architecture, a Bi-LSTM (bi-directional LSTM) architecture, or a seq2seq (sequence-to-sequence) architecture.
In the same field of endeavor Ghatage teaches that Automation and Artificial Intelligence (AI) are transforming the various aspects of human society by increasing productivity, see par. [0001]. An AI-based data transformation system. The data transformation system receives an input package containing a plurality of documents which are used to generate mappings that enable a RPA system to execute one or more automated processes. In addition to the plurality of documents, the input package can include metadata that enables the generation of the mappings, see par. [0016]. The ML-based relationship model 140 is trained on the training data 120 for generating the mappings 156. In an example, the ML relationship model 140 can be implemented via Long Short-term Memory (LSTM) methodology, see par. [0033].
It would have been obvious to one of ordinary skill in the art to combine the Yanamandra invention with the teachings of Ghatage for the benefit of implementing an AI system to increase user productivity, see par. [0001].
Claim(s) 4 and 13 is/are rejected under 35 U.S.C. 103 as being unpatentable over Yanamandra U.S. PAP 2020/0394396 A1, in view of Desmond U.S. PAP 2021/0174196 A1.

Regarding claim 4 Yanamandra does not teach the computer-implemented method of claim 1, wherein classifying portions of an electronic file using a trained machine learning based model further comprises: detecting misclassified portions from the classified portions using a statistical checker; and presenting the misclassified portions to a user for manual classification.
IN a similar field of endeavor Desmond teaches an invention which relates to computer processing systems, and more specifically, to computer processing systems configured to improve ground truth quality for generating accurate machine learning models, see par. [0001].  According to some embodiments, the ground truth analysis engine 112 can identify one or more data inputs that are outliers by identifying vector representations within a vector space that were not clustered. It can be desirable to examine outliers such as this to determine if they should be discarded for the purposes of training the model. In some embodiments, a notification provided to the user 121 by the processing system 100 can invite the user to examine the outlier(s) make a determination of whether to remove them from the training data, relabel the data input, create a new classification to account for the outlier or some other suitable action. According to some embodiments, the processing system 100 can generate recommended actions (e.g., remove an outlier from training data, modify the label of a mislabeled data input, etc.) and present the recommendations to the user 121 through a notification in a manner that can allow the user 121 to quickly review the relevant data inputs and accept, reject or modify the recommended change, see par. [0056].
It would have been obvious to one of ordinary skill in the art to combine the Yanamandra invention with the teachings of Desmond for the benefit of improving the quality of a machine learning model, see par. [0001].
Regarding claim 13 Yanamandra does not teach apparatus of claim 10, wherein classifying portions of an electronic file using a trained machine learning based model further comprises: detecting misclassified portions from the classified portions using a statistical checker; and presenting the misclassified portions to a user for manual classification.
IN a similar field of endeavor Desmond teaches an invention which relates to computer processing systems, and more specifically, to computer processing systems configured to improve ground truth quality for generating accurate machine learning models, see par. [0001].  According to some embodiments, the ground truth analysis engine 112 can identify one or more data inputs that are outliers by identifying vector representations within a vector space that were not clustered. It can be desirable to examine outliers such as this to determine if they should be discarded for the purposes of training the model. In some embodiments, a notification provided to the user 121 by the processing system 100 can invite the user to examine the outlier(s) make a determination of whether to remove them from the training data, relabel the data input, create a new classification to account for the outlier or some other suitable action. According to some embodiments, the processing system 100 can generate recommended actions (e.g., remove an outlier from training data, modify the label of a mislabeled data input, etc.) and present the recommendations to the user 121 through a notification in a manner that can allow the user 121 to quickly review the relevant data inputs and accept, reject or modify the recommended change, see par. [0056].
It would have been obvious to one of ordinary skill in the art to combine the Yanamandra invention with the teachings of Desmond for the benefit of improving the quality of a machine learning model, see par. [0001].
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Pertinent prior art available on form 892.
Bhowan ‘716 teaches a document classification system for text-based documents that classifies them into predetermined categories, see abstract.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to Michael Ortiz-Sanchez whose telephone number is (571)270-3711. The examiner can normally be reached Monday- Friday 9AM-6PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh Mehta can be reached on 571-272-7453. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/MICHAEL ORTIZ-SANCHEZ/Primary Examiner, Art Unit 2656