DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Status of the Claims
	Claims 1 and 19 are rejected under 35 U.S.C. 102(a)(1).
Claims 2-18 are rejected under 35 U.S.C. 103.
	
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claims 1 and 19 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by YUAN (US 9,305,226 B1).

Regarding Claim 1, YUAN teaches an information processing device comprising: a processor configured to: (Figure 9 processor 902)
output an extracted character string entry rule for each item of a form (“Once the optimization engine has arrived at one or more decision trees, ordered sets of rules, or other such grouping, that set of rules can be provided to a component such as a refinement engine 504. The refinement engine 504, as well as the optimization engine 514, can each take the form of software and/or hardware, which can be located on and/or remote with respect to a computing device in accordance with various embodiments. The refinement engine 504 can receive recognized text data 502, as may be provided by an OCR engine after analyzing an obtain image containing one or more text regions. The refinement engine can include, for example, refinement logic 508 that enables the refinement engine to utilize the rule set(s) provided by the optimization engine and apply the rules to the text to attempt to improve the accuracy of the recognized text.” Col. 6:9-23. An optimization engine outputs an extracted character string entry rule, referred to as a “boosting rule” throughout the reference. The rule is applied to text regions (items) of an image on which OCR was performed, which is equivalent to a form.)
in a case where a regularity related to an entry of a character string of a confirmation result is extracted, (“there can be various types of semantic boosting rules that are considered and/or tested by the optimization engine, or another such component, service, or process. One example is a rule for pattern-based validation. There can be several customary rules that are applicable to actionable text, among other types of text. For example, the phone numbers in North America have the general form xxx-xxx-xxxx, although variations exist. One or multiple regular expressions can be utilized to represent these phone numbers. Similar rules can be applied to email addresses and web domains, among others. As discussed with respect to FIG. 3(b), a recognition that a text string corresponds to a URL or email address can indicate that the string more likely ends in “.com” than “.corn” or “.con”, for example.” Col. 8:10-23. The ”boosting rule(s)” include pattern recognition of expected format of text entry for different types of data, such as phone numbers or email addresses. This is equivalent to “a regularity”.)
the confirmation result being a result of confirming a result of character recognition performed on the form. (“as part of an offline training process, the ground-truth text data and at least an initial set of rules can be fed into an optimization engine 514 or other such component. The optimization engine 514 can take each of the rules in the rule set and test those rules against the ground-truth text data to determine the relative performance of the rule, to provide an overall performance score, or other such metric, as well as potentially the relative performance scores for certain types of data” Col.5:55-63. The rule or rules that are applied are based on analysis of ground truth data. Also see Col. 7:1-10, 15-30: “the ground-truth text data can include known strings of input text, as well as the results (including confidence values or other such information) produced by a text recognition process.” The ground-truth data is thefore text with known, or confirmed, strings after a text recognition process, such as OCR. The rule is tested against the confirmed text.)


Regarding Claim 19, YUAN teaches a non-transitory computer readable medium storing a program causing a computer to execute a process for processing information, the process comprising: (Figure 9 processor 902 memory 904.)
outputting an extracted character string entry rule for each item of a form (“Once the optimization engine has arrived at one or more decision trees, ordered sets of rules, or other such grouping, that set of rules can be provided to a component such as a refinement engine 504. The refinement engine 504, as well as the optimization engine 514, can each take the form of software and/or hardware, which can be located on and/or remote with respect to a computing device in accordance with various embodiments. The refinement engine 504 can receive recognized text data 502, as may be provided by an OCR engine after analyzing an obtain image containing one or more text regions. The refinement engine can include, for example, refinement logic 508 that enables the refinement engine to utilize the rule set(s) provided by the optimization engine and apply the rules to the text to attempt to improve the accuracy of the recognized text.” Col. 6:9-23. An optimization engine outputs an extracted character string entry rule, referred to as a “boosting rule” throughout the reference. The rule is applied to text regions (items) of an image on which OCR was performed, which is equivalent to a form.)
in a case where a regularity related to an entry of a character string of a confirmation result is extracted, (“there can be various types of semantic boosting rules that are considered and/or tested by the optimization engine, or another such component, service, or process. One example is a rule for pattern-based validation. There can be several customary rules that are applicable to actionable text, among other types of text. For example, the phone numbers in North America have the general form xxx-xxx-xxxx, although variations exist. One or multiple regular expressions can be utilized to represent these phone numbers. Similar rules can be applied to email addresses and web domains, among others. As discussed with respect to FIG. 3(b), a recognition that a text string corresponds to a URL or email address can indicate that the string more likely ends in “.com” than “.corn” or “.con”, for example.” Col. 8:10-23. The ”boosting rule(s)” include pattern recognition of expected format of text entry for different types of data, such as phone numbers or email addresses. This is equivalent to “a regularity”.)
the confirmation result being a result of confirming a result of character recognition performed on the form. (“as part of an offline training process, the ground-truth text data and at least an initial set of rules can be fed into an optimization engine 514 or other such component. The optimization engine 514 can take each of the rules in the rule set and test those rules against the ground-truth text data to determine the relative performance of the rule, to provide an overall performance score, or other such metric, as well as potentially the relative performance scores for certain types of data” Col.5:55-63. The rule or rules that are applied are based on analysis of ground truth data. Also see Col. 7:1-10, 15-30: “the ground-truth text data can include known strings of input text, as well as the results (including confidence values or other such information) produced by a text recognition process.” The ground-truth data is thefore text with known, or confirmed, strings after a text recognition process, such as OCR. The rule is tested against the confirmed text.)


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claims 2-18 are rejected under 35 U.S.C. 103 as being unpatentable over YUAN (US 9,305,226 B1) in view of GURUPRASAD (US 2020/0005089 A1).

Regarding Claim 2, YUAN teaches all the limitations of claim 1, on which claim 2 depends.
YUAN further teaches wherein the processor is configured to: output the character string entry rule together with a degree of change… in association with incorrect character recognition that changes depending on whether or not the character string entry rule is set. (“Any of this information can be used with one or more semantic boosting rules to refine the relative confidences, which might result in an updated bubble graph 350 as illustrated in the example of FIG. 3(b). In the updated bubble graph, the option “m” has had its confidence updated to 0.92 as a result of being processed with a semantic boosting rule that might have recognized the text as being a URL. Since the updated confidence score exceeds the minimum confidence threshold, that option is able to be selected with confidence to complete the text string. As illustrated, the option “rn” has a much lower confidence as a result of the refinement.” Col.4:60 – Col. 5:7. The confidence score is equivalent to a degree of change that changes depending on whether the rule is set. For example, if a rule that recognizes URLs in the recognized data is set, the incorrect character recognition of the character “m” as the characters “r” and “n” is reduced.)
While YUAN teaches a score for each rule, YUAN does not explicitly teach that the extracted rule is a degree of change in a number of corrected character strings that have been corrected. 
However, GURUPRASAD, which is directed to improving recognition of OCR-extracted data, teaches a degree of change in a number of corrected character strings that have been corrected. (“In the scenario where it is required to extract invoice number from 100 invoices (training data), the OCR confidence for each invoice for the field invoice number can be used. Also, upon looking at the actual document, it is known whether invoice number from the document is extracted correctly or not. That means, now there are two values associated with invoice number: (i) OCR confidence obtained from OCR engine, and (ii) match/mismatch information from ground truth. Using these values, a decision matrix is framed as below:” Paragraph 0035. “for a new document, when invoice number field is extracted by OCR and OCR gives confidence above 72, it is marked as green” Paragraph 0039. A threshold confidence for the recognition of a character string is determined from a number (such as 100) of ground truth samples.)
Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to modify the selection of a rule for improving the accuracy of recognized text taught by YUAN by selecting the rule based on a number of correctly and incorrectly recognized strings using the recognition rule as suggested by GURUPRASAD. Since both references are directed to improving the results of optical character recognition, the combination would have yielded predictable results. Furthermore, since YUAN (Col. 7:15-40) teaches that the scores used for selecting the rules include factors such as how accurately a condition can be detected and the cost of making incorrect changes, it would have been obvious to include the number of matches and mismatches of ground truth data as a factor in selecting the rule.

Regarding Claim 3, YUAN in view of GURUPRASAD further teaches wherein the processor is configured to: output a degree of change in the number of corrected character strings that falls if the output character string entry rule is set to the item of the form. (YUAN, “In the updated bubble graph, the option “m” has had its confidence updated to 0.92 as a result of being processed with a semantic boosting rule that might have recognized the text as being a URL.” Col.4:60 – Col. 5:7. A higher confidence due to a rule being set means a lower number of character strings that require correction.)

Regarding Claim 4, YUAN in view of GURUPRASAD further teaches wherein the degree of change is a degree of the number of corrected character strings that are corrected due to the output character string entry rule not being set to the item of the form as the degree of change. (GURUPRASAD, “in order to arrive an optimal threshold to determine the extraction correctness, the threshold indicates Green, which indicates developed system has trust on the extracted values so that the user (data entry person) need to check for its accuracy; and below the threshold means Red, that mean user has to look at the document and verify whether it is extracted correctly or not.” Paragraph 0035. When there is low confidence, the user is required to manually correct the text. In view of YUAN, a rule that is set with higher confidence would therefore result in less corrections being made then when the rule is not set.)

Regarding Claim 5, YUAN teaches all the limitations of claim 1, on which claim 5 depends.
While YUAN teaches that the rules depend on the type of data (Col. 5:60-67), YUAN does not explicitly teach wherein the processor is configured to: output the character string entry rule with respect to a classification attribute by which a regularity related to the entry of a character string is extracted.
However, GURUPRASAD, which is directed to improving recognition of OCR-extracted data, teaches wherein the processor is configured to: output the character string entry rule with respect to a classification attribute by which a regularity related to the entry of a character string is extracted. (“In this case, an example of extraction is key-value pairs, for instance, Invoice Number −1234, here key is invoice number and value is 1234. Key is what needs to be extracted, and value is the corresponding value in the document that represents the key.” Paragraph 0035. They key is a classification attribute.)
Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to modify the selection of a rule for improving the accuracy of recognized text taught by YUAN by outputting the rule with respect to a classification attribute, such as a specific type of data, as taught by GURUPRASAD. Since both references are directed to improving the results of optical character recognition, the combination would have yielded predictable results. As suggested by YUAN (Col. 5:60-67), such a combination would have been obvious since a pattern recognition rule might work well on certain types of data and may do poorly on other types of data.

Claims 6-8 depends from claims 2-4, respectively, but recite the same limitations as claim 5. Claims 6-8 are therefore rejected using the same reasoning described above.

Regarding Claim 9, YUAN in view of GURUPRASAD further teaches wherein the processor is configured to: output the character string entry rule for the classification attribute recognized as having a significant difference among a plurality of character string entry rules extracted from the character string of the confirmation result. (YUAN, “One embodiment of rule optimization involves sorting the selected rules in descending order by rule score, with the most important rules having the highest rules scores being applied first. In some embodiments the sorting or ordering can involve a decision tree, which can have one or more branches for potentially conflicting rules, where information such as the pre-conditions are used to determine whether to apply the rule at each node, and the ordering of the nodes is determined by the rule scores of the selected rules.” Col. 7:42-55. The rules with the highest confidence scores are output by the optimization engine first. The highest score rule would have a significant difference from a lower scored rule.)

Regarding Claim 10, YUAN teaches all the limitations of claim 1, on which claim 10 depends.
YUAN does not teach wherein the processor is configured to: specify whether or not a regularity related to the entry of a character string is extracted from the character string of the confirmation result, according to the number of character strings of the confirmation result collected for the item of the form.
However, GURUPRASAD, which is directed to improving recognition of OCR-extracted data, teaches wherein the processor is configured to: specify whether or not a regularity related to the entry of a character string is extracted from the character string of the confirmation result, according to the number of character strings of the confirmation result collected for the item of the form. (“Match is denoted as X and Mismatch is denoted as Y. Out of a sample number of 100 invoices, some are matched correctly and others are not. From the matched samples, the values of maximum and minimum confidences provided by the OCR engine are extracted. Similarly, for the Mismatch samples the values of maximum and minimum confidences provided by the OCR engine are extracted.” Paragraph 0036. The accuracy of an OCR recognition algorithm is based on a number of correctly matched results from ground truth character recognition.)
Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to modify the selection of a rule for improving the accuracy of recognized text taught by YUAN by determining whether there is a regularity based on a number of confirmed results, as taught by GURUPRASAD. Since both references are directed to improving the results of optical character recognition, the combination would have yielded predictable results. Such an implementation would improve the user experience by improving the accuracy of the recognition algorithm and reducing user frustration, as suggested by YUAN (background). As suggested by YUAN (Col. 5:60-67), such a combination would have been obvious since a pattern recognition rule might work well on certain types of data and may do poorly on other types of data.

Claims 11-13 depend from claims 2-4, respectively, but recite the same limitations as claim 10. Claims 11-13 are therefore rejected using the same reasoning described above.

Regarding Claim 14, YUAN in view of GURUPRASAD further teaches wherein the processor is configured to: in a case where the number of character strings of the confirmation result collected for the item of the form is equal to or greater than a number predetermined as a number from which the regularity is extracted, output the character string entry rule for the item whose number of character strings of the confirmation result is the predetermined number or greater. (GURUPRASAD, “in order to arrive an optimal threshold to determine the extraction correctness, the threshold indicates Green, which indicates developed system has trust on the extracted values so that the user (data entry person) need to check for its accuracy; and below the threshold means Red, that mean user has to look at the document and verify whether it is extracted correctly or not.” Paragraph 0035. Also see Paragraph 0039. The accuracy of a recognition algorithm, which in view of YUAN would include an extracted recognition rule, is based on a comparison to a threshold of confirmed results. If above the threshold, the algorithm is rated as Green and can be reliably used; otherwise, it is rated as Red and it is recommended to the user to verify the result.)

Regarding Claim 15, YUAN in view of GURUPRASAD further teaches wherein the processor is configured to: in a case where the number of character strings of the confirmation result collected for the item of the form is less than a number predetermined as a number from which the regularity is extracted, not output the character string entry rule for the item whose number of character strings of the confirmation result is less than the predetermined number. (GURUPRASAD, “in order to arrive an optimal threshold to determine the extraction correctness, the threshold indicates Green, which indicates developed system has trust on the extracted values so that the user (data entry person) need to check for its accuracy; and below the threshold means Red, that mean user has to look at the document and verify whether it is extracted correctly or not.” Paragraph 0035. Also see Paragraph 0039. The accuracy of a recognition algorithm, which in view of YUAN would include an extracted recognition rule, is based on a comparison to a threshold of confirmed results. If above the threshold, the algorithm is rated as Green and can be reliably used; otherwise, it is rated as Red and it is recommended to the user to verify the result.)

Regarding Claim 16, YUAN teaches all the limitations of claim 1, on which claim 16 depends.
YUAN does not teach wherein the processor is configured to: output a change notification encouraging a user to change the character string entry rule set to the item of the form according to a degree of correction with respect to the character string entered in the item of the form. 
However, GURUPRASAD teaches wherein the processor is configured to: output a change notification encouraging a user to change the character string entry rule set to the item of the form according to a degree of correction with respect to the character string entered in the item of the form. (“the threshold indicates Green, which indicates developed system has trust on the extracted values so that the user (data entry person) need to check for its accuracy; and below the threshold means Red, that mean user has to look at the document and verify whether it is extracted correctly or not” Paragraph 0035. The confidence value is equivalent to a degree of correction: lower confidence corresponds to a higher need for correction, while higher confidence corresponds to lower need for correction. The user is notified of the need for correction based on a color of the recognized value based on a threshold confidence level.)
Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to modify the selection of a rule for improving the accuracy of recognized text taught by YUAN by outputting a notification to a user that a result needs to be verified, as taught by GURUPRASAD. Since both references are directed to improving the results of optical character recognition, the combination would have yielded predictable results. Such an implementation would improve the user experience by ensuring that the correct data is recognized, reducing user frustration, as suggested by YUAN (background).

Regarding Claim 17, YUAN in view of GURUPRASAD further teaches wherein the processor is configured to: output the change notification in a case where the degree of correction in the item of the form has become equal to or greater than a degree predetermined from a standard degree. (GURUPRASAD, “In this scenario, for a new document, when invoice number field is extracted by OCR and OCR gives confidence above 72, it is marked as green, else it is marked red.” Paragraph 0039. The confidence in the recognition result being below a threshold is equivalent to a degree that an item of a form needs to be corrected is above a threshold.)

Regarding Claim 18, YUAN in view of GURUPRASAD further teaches wherein the processor is configured to: output the change notification in a case where the degree of correction in an item of the form after setting a character string entry rule is included within a range predetermined from the degree of correction for the same item of the form before setting the character string entry rule. (GURUPRASAD, “Further, if we want to have three states namely Green, Yellow and Red, we can include tolerance limit t (say 3%), and consider Th+τ to Th−τ as Yellow. The selection of the approaches is based on the size and nature of the training data for learning purpose” Paragraphs 0049-50. The change notification is output based on a range of the confidence value of the recognition result.)


Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Jakubik (US 20180143957 A1) teaches detecting a format of a string of characters based on user input into a form field. (Abstract, Figs. 3-4)
Lehoux (US 9,910,566 B2) teaches providing other options for OCR for selection by a user when filling out a form, including listing the options by confidence score. (Figs. 6-8)
Agrawal (US 2016/0292505 A1) teaches using field rules to automatically select recognized text when there is a high confidence score, but otherwise suggesting to the user to verify the recognized text. (¶ 28)
Baudin (US 7,174,507 B2) teaches extraction rules based on input text that has a high degree of contextual regularity. (Column 10)

Any inquiry concerning this communication or earlier communications from the examiner should be directed to RAMI RAFAT OKASHA whose telephone number is (571)272-0675. The examiner can normally be reached M-F 9-5 EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kieu Vu can be reached on (571) 272-4057. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/RAMI R OKASHA/Examiner, Art Unit 2173