DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

	This non-final action is responsive to the application filed on 2/11/21.
	Claims 1-26 are pending. 
Claim objection
Claim 1 (and similar recited claims) recite “outputting the extracted documents.” However, only a single “extracted document” is previously recited within the claim. It appears that the “repeating” limitation may imply the presence of a second or more “extracted document” recited within the claim, but this is not clear. The applicant is requested to either reply in a response with a clarification on the record or to make the appropriate amendment as to a second or more “extracted document” to which the recited “extracted documents” refers other than the presently and only recited “extracted document” with respect to, in particular, the recited “an extracted document associated with the classification.” In other words, does the “repeating the classifying” limitation necessitate the rendering of, at least, a second “extracted document?” 

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


The claimed invention is directed to non-statutory subject matter.  The claim(s) does/do not fall within at least one of the four categories of patent eligible subject matter because claim 16 and corresponding dependent claims merely encompasses a “computer program.” Instead, the applicant may instead consider claiming the recited “non-transitory computer readable medium” on which the “computer program” is stored for execution. Further, claim 17 for instance is dpendent on the “computer program” rather than, e.g., the non-transitory readable medium. 

Allowable Subject Matter
Claim 3, 6, 12, 15, and 20  are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claim(s) 1, 2, 4, 7-11, 13, 16-18, and 21-23 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Albrecht et al. (US 20140101542, Herein “Albrecht”).
Regarding claim 1, Albrecht teaches A computer-implemented method (figs. 1 to 6) comprising:
classifying an initial portion of an electronic file and extracting the initial portion from the electronic file as an extracted document associated with the classification (in association with a given encyclopedia file containing content such as associated with the word “Columbia” performing extraction of the word “Columbia” [0056]);
iteratively determining whether each respective next portion of the electronic file should be added to the extracted document until it is determined that the respective next portion should not be added to the extracted document (remaining next portions of the given analyzed segment such as content indicative of “NASA” for the respective extracted Columbia associated content [0087]) and, in response to determining that the respective next portion should be added to the extracted document, extracting the respective next portion from the electronic file and adding the respective next portion to the extracted document (for performing disambiguation, adding the extracted content of the next portion such as in the aggregated named entity of the “Space Shuttle Columbia” [0087]);
in response to determining that the respective next portion should not be added to the extracted document, repeating the classifying and the iteratively determining using the respective next portion as the initial portion (as the initial portion, beginning analyzing a next portion (e.g., sentence), such as by identifying a sentence with “Columbia” appearing and further extracting additional content for recording suc h as in association with master’s degree or the Columbia University [0087]); and
outputting the extracted documents (output of extracted content (figs. 7 to 13)) such as based on the successive extraction processes associated with each version of the referenced, e.g., “Columbia” [0087]).

Regarding claim 2, Albrecht teaches the limitations of claim 1, as above.
Furthermore, Albrecht teaches The computer-implemented method of claim 1, wherein iteratively determining whether each respective next portion of the electronic file should be added to the extracted document until it is determined that the respective next portion should not be added to the extracted document comprises:
iteratively determining whether each respective next portion of the electronic file should be added to the extracted document based on 1) a confidence that the respective next portion is part of the extracted document (e.g., confidence that the next portion is within a same portion, such as within the immediate, same sentence) [0087]and 2) a confidence that the next respective portion is a start of a new extracted document (e.g., a next, different sentence containing the same keyword, such as also containing the same word “Columbia”) [0087].

Regarding claim 4, Albrecht teaches the limitations of claim 1, as above.
Furthermore, Albrecht teaches The computer-implemented method of claim 1, wherein classifying an initial portion of the electronic file comprises:
classifying the initial portion of the electronic file using a trained classifier, the trained classifier trained to learn sets of word vectors associated with document Classifications (record of words such as “Columbia” and various meanings, such as in association with a university or other types of named entities [0044]).

Regarding claim 7, Albrecht teaches the limitations of claim 1, as above.
Furthermore, Albrecht teaches The computer-implemented method of claim 1, wherein classifying an initial portion of the electronic file comprises:
determining the initial portion of the electronic file as a first portion of the electronic file that is able to be classified (e.g., sentences, such as for classification according to a word entity (e.g., “Columbia”) [0087]).

Regarding claim 8, Albrecht teaches the limitations of claims 1 and 7, as above.
Furthermore, Albrecht teaches The computer-implemented method of claim 7, wherein determining the initial portion of the electronic file as a first portion of the electronic file that is able to be classified comprises:
determining the initial portion of the electronic file as the first portion of the electronic file that is able to be classified based on a certainty of the classification (a certainty such as in comparison to different portions, such as knowledge of a sentence versus a paragraph level; such as by restriction to a sentence level as opposed to a paragraph level based on an immediate proximity [0087]).

Regarding claim 9, Albrecht teaches the limitations of claim 1, as above.
Furthermore, Albrecht teaches The computer-implemented method of claim 1, further comprising:
combining extracted documents that are associated with the same classification (combining various differently referenced versions of a same category, such as various text content related to different meanings of “Columbia” [0044]).

Regarding claim 10, Albrecht teaches An apparatus comprising:
a memory storing computer instructions; and
at least one processor configured to execute the computer instructions (processor and computer [0027]; fig. 1), the computer instructions configured to cause the at least one processor to perform operations of:
	The claim recites similar limitations as claim 1 – see above.

Regarding claim 11, the claim recites similar limitations as claim 2 – see above

Regarding claim 13, the claim recites similar limitations as claim 4 – see above.

Regarding claim 16, the claim recites similar limitations as claim 1 – see above.

Regarding claim 17, the claim recites similar limitations as claim 2 – see above.

Regarding claim 18, the claim recites similar limitations as claim 4 – see above.

Regarding claim 21, the claim recites similar limitations as claim 7 – see above.

Regarding claim 22, the claim recites similar limitations as claim 8 – see above.

Regarding claim 23, the claim recites similar limitations as claim 9 – see above.


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 5, 14, and 19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Albrecht and in view of Bedworth et al. (US 20060004721, Herein “Bedworth”).
Regarding claim 5, Albrecht teaches the limitations of claims 1 and 4, as above.
However, Albrecht fails to specifically teach The computer-implemented method of claim 4, wherein the trained classifier is trained by:
parsing text of training data into tokens;
assigning a score to each token; and
calculating an average of the scores for the tokens.
Yet, in a related art, Bedworth discloses calculating an average score for token with respect to training data ([0223] to [0226]).
It would have been obvious to one of ordinary skill in the art at the time of the invention’s effective filing date to combine the averaging based on score tokenization of Bedworth with the tokenization for understanding of Albrecht  to have parsing text of training data into tokens; assigning a score to each token; and calculating an average of the scores for the tokens.. The combination would allow for, according to the motivation of Bedworth, overcoming problems of data analysis such as with respect to records or databases by automatically perform data analysis for processing or searching (i.e., understanding) , thereby allowing for better textual understanding of individual terms ([0003] and [0004]) and especially using average analyses to taken into account relative textual meanings based on the tokenization, such as accounting for nearby token meanings [0223] to [0227]. 

Regarding claim 14, the claim recites similar limitations as claim 5 – see above.

Regarding claim 19, the claim recites similar limitations as claim 5 – see above.


Claim(s) 24-26 is/are rejected under 35 U.S.C. 103 as being unpatentable over Albrecht and in view of Holly et al. (US 20190384794, Herein “Holly”).
Regarding claim 24, Albrecht teaches the limitations of claim 1, as above.
However, Albrecht fails to specifically teach The computer-implemented method of claim 1, wherein the classifying, the iteratively determining, the repeating, and the outputting are performed by one or more computing devices implemented in a cloud computing system.
Yet, in a related, Holly discloses cloud system such as for accessing a cloud database ([0102] and [0103]).
It would have been obvious to one of ordinary skill in the art at the time of the invention’s effective filing date to combine the cloud computing system of Holly with the RPA of Albrecht to have performed by one or more computing devices implemented in a cloud computing system. The combination would allow for, according to the motivation of Holly, storing and providing access to textual data utilizing cloud-based databases which extend the efficiency and effectiveness based on platforms that are commercially available and can be implemented as an extended, enhanced processing environment [0103], further allowing for text processing of cloud-based text such as in the case of remote transactions (ecommerce) utilizintg cloud databases ([0003] to [0012]).

Regarding claim 25, the claim recites similar limitations as claim 24 – see above. 

Regarding claim 26, the claim recites similar limitations as claim 24 – see above. 

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JASON EDWARDS whose telephone number is (571) 272-5334. The examiner can normally be reached on Mon-Fri; 8am-5pm EST.
	If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Scott Baderman can be reached on 571-272-3644. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
	Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance form a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA or CANADA) or 571-272-1000.

	/JASON T EDWARDS/              Examiner, Art Unit 2144