DETAILED ACTION
Application No. 17/022,594 filed on 09/16/2020 has been examined. The response to Election Requirement filed on 05/31/2022 has been entered. Group II (claims 9-20) has been elected. Group I (claim 1-8) has been withdrawn. 

Notice of Pre-AIA  or AIA  Status
 	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC §101 
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
 Claims 9-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
At Step 1: The claims 9 and 15 recite determining, via one or more processors, whether a dataset comprises unstructured text; determining, via the one or more processors, that at least a portion of the unstructured text corresponds to a regex pattern of a regex list; replacing, via the one or more processors, the portion of the unstructured text with an encoding associated with the regex pattern to generate a modified dataset; and providing at least the modified dataset to at least one entity recognition system. Therefore, the claims are directed to a process which is a statutory category of invention.
At Step 2A, prong 1, the independent claims 9 and 15 recite limitations of “determining, via one or more processors, whether a dataset comprises unstructured text; determining, via the one or more processors, that at least a portion of the unstructured text corresponds to a regex pattern of a regex list; replacing, via the one or more processors, the portion of the unstructured text with an encoding associated with the regex pattern to generate a modified dataset; and providing at least the modified dataset to at least one entity recognition system”. This process determining…., replacing…., providing. This process can be a mental process, as a person can perform to determining…., replacing…., providing... Such step of performing a data processing task that is nothing more than a mental process. Accordingly, the "mental processes" abstract idea grouping is defined as concepts performed in the human mind, and examples of mental processes include observations, evaluations, judgments, and opinions, see 2106.04(a) (2). If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components, then it falls within the “Mental Processes” grouping of abstract ideas. Therefore, claims 9 and 15 recite an abstract idea.
At Step 2A, prong 2, the judicial exception is not integrated into a practical application. In particular, claims 9 and 15 recites additional elements “computer readable storage medium...;” are recited at a high-level of generality performing generic computer functions such that it amounts no more than mere instructions to apply the exception using a generic computer component. The limitations, “determining, via one or more processors, whether a dataset comprises unstructured text..., providing at least the modified dataset to at least one entity recognition system...;” are insignificant extra-solution activity, where the extra-solution activity includes both pre-solution and post-solution activity. An example of pre-solution activity is a step of determining...., replacing…., and providing...;” for the use in a claimed process are considered to be insignificant extra-solution activity. An example of pre-solution activity is a step of obtaining information about credit card transactions, which is recited as part of a claimed process of analyzing and manipulating the gathered information by a series of steps in order to detect whether the transactions were fraudulent. An example of post solution activity is an element that is not integrated into the claim as a whole, e.g., a printer that is used to output a report of fraudulent transactions, which is recited in a claim to a computer programmed to analyze and manipulate information about credit card transactions in order to detect whether the transactions were fraudulent. These limitations are no more than adding insignificantly extra-solution activity to the judicial exception. The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception. See MPEP 2106.05(g), discussing limitations that the Federal Circuit has considered to be insignificant extra solution activity, for instance "collecting information, analyzing it, and displaying certain results of the collection and analysis," where the data analysis steps are recited at a high level of generality such that they could practically be performed in the human mind, Electric Power Group v. Alstom, S.A., 830 F.3d 1350, 1353-54, 119 USPQ2d 1739, 1741-42 (Fed. Cir. 2016).
The combination of these additional elements also fails to integrate the recited judicial exceptions into a practical application of the exceptions. In particular, the claim recites additional element - one or more processors (in claim 9), a non-transitory readable storage medium (in claim 15), one or more processors and a machine readable non transitory storage medium in all steps is recited at a high-level of generality (i.e., as a generic processor performing a generic computer function of determining, replacing and providing) such that it amounts no more than mere instructions to apply the exception using a generic computer component. Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
At Step 2B, If a claim limitation, under its broadest reasonable interpretation, covers an abstract idea that includes a series of steps that recite mental steps, but for the recitation of generic computer components, then it falls within the "Mental Processes" and grouping of "Abstract Ideas". Accordingly, the claims recite abstract ideas. Claims 9 and 15 do not include any additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements only amount to mere instructions to apply the exception using generic computer components, extra-solution activities and a generic link of the use of the exception to a particular technological environment or field of use. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception. See MPEP 2106.05(f).
Thus, there are no additional elements that amount to significantly more than the above-judicial exception. Looking at the limitations as an order combinations and as a whole adds nothing that is not already present when looking at the elements taken individually. There is no indication that any combination of elements improves the functioning of a computer or improves any other technology. The claim is not patent eligible. 
Claims 10-14 are dependent on claim 9 and includes all the limitation of claim 9. Therefore, claims 10-14 recite the same abstract idea of “Mental Process” specifically performing a mental process in a computer environment. The claims recite limitations of “wherein the portion of the unstructured text is a first character…”, “wherein the encoding is a second character….”, “wherein the portion of the unstructured text is a string of two or more consecutive characters…”, “wherein the encoding is a word of at least one character…” and “determining, via the one or more processors, any false matches between the regex pattern of the regex list….”, which is abstract idea of a generic computer component of determining and manipulating data, and therefore, does not amount to significantly more than the abstract idea.
Dependent claims 16-20 are rejected under 35 U.S.C 101 not patent eligible subject matter for the same reason address in claims 10-14 above.
 



Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  


The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale or otherwise available to the public before the effective filing date of the claimed invention.


Claims 9-13 and 15-19 20 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Yellapragada et al (US 2018/0025222 A1).  
 	As per claim 9, Yellapragada teaches a method comprising: determining, via one or more processors, whether a dataset comprises unstructured text ([0026], [0031]-[0032], e.g., discloses wherein determines that OCR engine extracted text from each of the fields in a semi-structured document (i.e., unstructured document or raw document) with a confidence level for each of the fields);3 
4894-7844-4323, v. 1Appl. No. 17/022,594Docket No.: 1988.0278C Response Dated May 31, 2022Examiner: SANA, MOHAMMAD AZAM Reply to Restriction Requirement of March 31, 2022TC/A.U. 2166determining, via the one or more processors, that at least a portion of the unstructured text corresponds to a regex pattern of a regex list, replacing, via the one or more processors, the portion of the unstructured text with an encoding associated with the regex pattern to generate a modified dataset and providing at least the modified dataset to at least one entity recognition system ([0026], [0029], [0041]-[0042], e.g., a semi-structured document such as a tax form or invoice, OCR engine may extract textual content on a field-by-field basis…., data post-processor configured to receive raw OCR data from OCR engine and apply one or more post-processing rules… post-processing rules may be formatted as a regular expression indicating a pattern of characters that certain data fields are required to comply with, for example, with a social security number field in a W-2, data post-processor can replace characters in the raw OCR data with characters that would satisfy the regular expression).
 
 	As per claim 10, wherein the portion of the unstructured text is a first character, the first character being an alphanumeric character or special character ([0029], [0040]-[0042], e.g., character being an alphanumeric character).  

 	As per claim 11, wherein the encoding is a second character, the second character being an alphanumeric character or a special character ([0029], [0040]-[0042], e.g., characters being an alphanumeric characters).  

 	As per claim 12, wherein the portion of the unstructured text is a string of two or more consecutive characters, the two or more consecutive characters of the strings being alphanumeric characters or special characters ([0041]-[0042], e.g., identify commonly experienced OCR errors and generate one or more OCR rules to associate a given pattern of pixels representing a character with the correct character).

 	As per claim 13, wherein the encoding is a word of at least one character, the at least one character of the word being an alphanumeric character or a special character ([0029], [0040]-[0042], e.g., character being an alphanumeric character).  

 	Regarding claim 15, claim 15 is rejected for substantially the same reason as claim 9 above. 

 	Regarding claims 16-19, claims 16-19 are rejected for substantially the same reason as claims 10-13 above.

Claim Rejections - 35 USC § 103
	In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 14 and 20 are rejected under 35 U.S.C. 103(a) as being unpatentable over Yellapragada et al (US 2018/0025222 A1) in view of Pavlov et al (US 2008/0243905 A1).

 	As per claim 14, Yellapragada teaches determining, via the one or more processors, any false matches between the regex pattern of the regex list and the dataset ([0041],[0042], [0047], e.g., receive raw OCR data from OCR engine and apply one or more post-processing rules to correct errors in the raw OCR data); 
Yellapragada does not explicitly teach refining, based on any determined false matches, the regex list via a machine learning model or a classification model.  
However, Pavlov teaches refining, based on any determined false matches, the regex list via a machine learning model or a classification model ([0020], e.g., a machine learning mechanism to identify "false positives" produced when the regular expression technique is used to extract attribute values from input text).
Thus, it would have been obvious to one of the ordinary skills in the art before the effective filing date of the claimed invention to apply the teachings of Yellapragada with the teachings of Pavlov in order to efficiently enabling a system to use machine learning mechanism to predict the accuracy of attribute determinations represented by skeleton tokens (Yellapragada).

Regarding claim 20, claim 20 is rejected for substantially the same reason as claim 14 above. 

It is noted that any citation [[s]] to specific, pages, columns, lines, or figures in the prior art references and any interpretation of the references should not be considered to be limiting in any wav. A reference is relevant for all it contains and may be relied upon for all that it would have reasonably suggested to one having ordinary skill in the art. [[See, MPEP 2123]].


Pertinent Prior Art

 The prior art made of record and not relied upon is considered pertinent to
applicant's disclosure.
Inmon discloses US 2007/0100823 A1 Techniques for Manipulating Unstructured Data Using Synonyms and Alternate Spellings Prior To Recasting as Structured Data.
Matthews et al discloses US 9535892 B1 Method and System for Generating Unique Content Based on Business Entity Information Received from A User.
Gilbert discloses US 2015/0012464 A1 Systems and Methods for Creating and Implementing an Artificially Intelligent Agent or System.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Mohammad A Sana whose telephone number is (571)270-1753. The examiner can normally be reached Monday-Friday 9-5.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Mark D Featherstone can be reached on 5712703750. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/Mohammad A Sana/Primary Examiner, Art Unit 2166