DETAILED ACTION
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claim(s) 1 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Sarshogh et al (A Multi-Task Network for Localization and Recognition of Text in Images, see IDS).
Regarding claim 1, Sarshogh discloses an optical character recognition (OCR) processing method (Section I Optical Character Recognition), the method comprising: 
generating, by a processor, a plurality of intermediate feature layers of an image using convolutional neural network (CNN) processing (Fig. 2 & Section III Mask R-CNN; model has three heads: a localizer (bounding box regression), an text classifier (text or background), and a text recognition network (TRN) head; We replace ResNet [9] in the backbone, with a customized shallow Densely Connected Network (DenseNet); Image features are extracted through a shared convolutional backbone consisting of a shallow DenseNet and a Feature Pyramid Network); 
for each intermediate feature layer, generating, by the processor, a plurality of text proposals using a region proposal network (RPN), each text proposal comprising a portion of the intermediate feature layer that is predicted to contain text (Fig. 2 Region Proposal Layer; Section III B. the region proposal network will predict not just spatial coordinates for a box’s center, width, and height (cx; cy;w; h), but additionally angle of rotation); 
performing, by the processor, OCR processing on image data within a plurality of regions of the image, each region comprising at least one of the text proposals, to generate a text result for each (Section III B. & C. uses generated features from the convolutional backbone as TRN’s input. This provides a richer feature set for TRN and, as a result, it is possible to accomplish the recognition task with a shallower network. More importantly, the OCR’s accuracy will be propagated to the backbone and adjust the features accordingly); and 
assembling, by the processor, the text results into a text string comprising the text results ordered according to a spatial order in which the plurality of regions appear within the image (Section III C. “This representation for each RoI is then input into the recognition module, the output of which is a predicted sequence over the model’s alphabet. The model’s alphabet contains A distinct tokens comprising letters, numbers, and a variety of punctuation and special characters”; see further Fig. 3 wherein the predicted text shown in red is ordered according to the special order of the regions).

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim 7 & 14-16 is/are rejected under 35 U.S.C. 103 as being unpatentable over Sarshogh et al (A Multi-Task Network for Localization and Recognition of Text in Images, see IDS) in view of Dabet (US 20080267505).
Regarding claim 7, Sarshogh discloses a method comprising:
 generating, by a processor, a plurality of intermediate feature layers of an image using convolutional neural network (CNN) processing (Fig. 2 & Section III Mask R-CNN; model has three heads: a localizer (bounding box regression), an text classifier (text or background), and a text recognition network (TRN) head; We replace ResNet [9] in the backbone, with a customized shallow Densely Connected Network (DenseNet); Image features are extracted through a shared convolutional backbone consisting of a shallow DenseNet and a Feature Pyramid Network); 
for each intermediate feature layer, generating, by the processor, a plurality of text proposals using a region proposal network (RPN), each text proposal comprising a portion of the intermediate feature layer that is predicted to contain text (Fig. 2 Region Proposal Layer; Section III B. the region proposal network will predict not just spatial coordinates for a box’s center, width, and height (cx; cy;w; h), but additionally angle of rotation); 
performing, by the processor, OCR processing on image data within a plurality of regions of the image, each region comprising at least one of the text proposals, to generate a text result for each region (Section III B. & C. uses generated features from the convolutional backbone as TRN’s input. This provides a richer feature set for TRN and, as a result, it is possible to accomplish the recognition task with a shallower network. More importantly, the OCR’s accuracy will be propagated to the backbone and adjust the features accordingly);  
assembling, by the processor, the text results into a text string comprising the text results ordered according to a spatial order in which the plurality of regions appear within the image (Section III C. “This representation for each RoI is then input into the recognition module, the output of which is a predicted sequence over the model’s alphabet. The model’s alphabet contains A distinct tokens comprising letters, numbers, and a variety of punctuation and special characters”; see further Fig. 3 wherein the predicted text shown in red is ordered according to the special order of the regions).
Sarshogh fails to specifically teach an automatic form filling method, the method comprising: determining, by the processor, a meaning of at least a portion of the text string; and inserting, by the processor, the at least the portion of the text string into a portion of a form corresponding to the meaning.
(abstract), the method comprising: 
determining, by the processor, a meaning of at least a portion of the text string (¶75 candidate segments of text 82 are tagged--e.g., labeled with a reference to the corresponding field of the form 14); and 
inserting, by the processor, the at least the portion of the text string into a portion of a form corresponding to the meaning (¶76 At S122, the form 14 may be populated with the OCR extracted information according to the field designations).
Therefore, it would have been obvious to one with ordinary skill in the art before the effective filing date of the invention to have implemented the teaching of an automatic form filling method, the method comprising: determining, by the processor, a meaning of at least a portion of the text string; and inserting, by the processor, the at least the portion of the text string into a portion of a form corresponding to the meaning from Dabet into the method as disclosed by Sarshogh. The motivation for doing this is to improve methods for populating a form with textual information extracted from a physical document.

Regarding claim 14, the combination of Sarshogh and Dabet teach the method of claim 7, wherein determining the meaning comprises tagging the at least the portion of the text string with a tag indicating the meaning (Dabet ¶74-75 candidate segments of text 82 are tagged--e.g., labeled with a reference to the corresponding field of the form 14). 
Therefore, it would have been obvious to one with ordinary skill in the art before the effective filing date of the invention to have implemented the teaching of wherein determining the meaning comprises tagging the at least the portion of the text string with a tag indicating the meaning from Dabet into the method as disclosed by Sarshogh. The motivation for doing this is to improve methods for populating a form with textual information extracted from a physical document.

Regarding claim 15, the combination of Sarshogh and Dabet teach the method of claim 14, wherein the inserting comprises inserting the at least the portion of the text string into an entry field in the form associated with the tag (Dabet ¶75-76 At S122, the form 14 may be populated with the OCR extracted information according to the field designations). 
Therefore, it would have been obvious to one with ordinary skill in the art before the effective filing date of the invention to have implemented the teaching of wherein the inserting comprises inserting the at least the portion of the text string into an entry field in the form associated with the tag from Dabet into the method as disclosed by Sarshogh. The motivation for doing this is to improve methods for populating a form with textual information extracted from a physical document.

Regarding claim(s) 16 (drawn to a system):               
The rejection/proposed combination of the combination of Sarshogh and Dabet, explained in the rejection of method claim(s) 7, anticipates/renders obvious the steps of the system of claim(s) 16 because these steps occur in the operation of the proposed combination as discussed above. Thus, the arguments similar to that presented above for claim(s) 7 is/are equally applicable to claim(s) 16.
	

Claim 2-4 is/are rejected under 35 U.S.C. 103 as being unpatentable over Sarshogh as applied to claim 1 above, and further in view Al-Omari et al (US 20120219220).
Claim 8-10 & 17 is/are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Sarshogh and Dabet as applied to claim 1 above, and further in view Al-Omari et al (US 20120219220).
Regarding claim 2, Sarshogh discloses the method of claim 1, but fails to teach forming, by the processor, at least one of the plurality of regions by merging at least two text proposals with one another. 
(¶29 plurality of components associated with each column are then merged based on the first set of characteristic parameters to form one or more of one or more sub-words and one or more words).  
Therefore, it would have been obvious to one with ordinary skill in the art before the effective filing date of the invention to have implemented the teaching of forming, by the processor, at least one of the plurality of regions by merging at least two text proposals with one another from Al-Omari into the method as disclosed by Sarshogh. The motivation for doing this is to improve optical character recognition processing.

Regarding claim 3, the combination of Sarshogh and Al-Omari teach method of claim 2, wherein the merging comprises: identifying at least two horizontally-aligned text proposals or merged regions and merging them into a first region (Al-Omari ¶81 the rectangles defining each column overlap in the horizontal and/or vertical direction); identifying at least two vertically-aligned text proposals or merged regions  and merging them into a second region(Al-Omari ¶81 the rectangles defining each column overlap in the horizontal and/or vertical direction); and merging the first region and the second region (Al-Omari Fig. 14-15 & ¶82 FIG. 15 shows an exemplary technique of analyzing one or more components, such as regions "text3a"-"text3e," which are then merged to be processed as region "text3," as shown in FIG. 14.). 
Therefore, it would have been obvious to one with ordinary skill in the art before the effective filing date of the invention to have implemented the teaching of wherein the merging comprises: identifying at least two horizontally-aligned text proposals or merged regions and merging them into a first region; identifying at least two vertically-aligned text proposals or merged regions  and merging them into a second region; and merging the first region and the second region from Al-Omari into the 

Regarding claim 4, Sarshogh discloses the method of claim 1, but fails to teach determining, by the processor, the spatial order by determining coordinates for at least a portion of each region. 
Al-Omari teaches determining, by the processor, the spatial order by determining coordinates for at least a portion of each region (¶82 The columns or regions are sorted based on their location on the page (e.g., their coordinates) to help stream the recognized resulting text in the right order). 
Therefore, it would have been obvious to one with ordinary skill in the art before the effective filing date of the invention to have implemented the teaching of determining, by the processor, the spatial order by determining coordinates for at least a portion of each region from Al-Omari into the method as disclosed by Sarshogh. The motivation for doing this is to improve optical character recognition processing.

Regarding claim(s) 8-10 (drawn to a method):               
The rejection/proposed combination of Sarshogh and Al-Omari, explained in the rejection of method claim(s) 2-4, anticipates/renders obvious the steps of the method of claim(s) 8-10 because these steps occur in the operation of the proposed combination as discussed above. Thus, the arguments similar to that presented above for claim(s) 2-4 is/are equally applicable to claim(s) 8-10.

Regarding claim(s) 17 (drawn to a system):               
The rejection/proposed combination of Sarshogh and Al-Omari, explained in the rejection of method claim(s) 2, anticipates/renders obvious the steps of the method of claim(s) 17 because these .
	
Claim 5-6 is/are rejected under 35 U.S.C. 103 as being unpatentable over Sarshogh as applied to claim 1 above, and further in view of Metallinou et al (US Patent 10515625 B1).
Claim 11-13 & 18-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Sarshogh as applied to claim 1 above, and further in view of Metallinou et al (US Patent 10515625 B1).
Regarding claim 5, Sarshogh discloses the method of claim 1 but fails to teach performing, by the processor, named entity recognition (NER) processing on the text string to determine a meaning of at least a portion of the text string. 
Metallinou teaches performing, by the processor, named entity recognition (NER) processing on the text string to determine a meaning of at least a portion of the text string (col 19 lines 1-16 NLU system 204 may include a named entity recognition (" NER") system 272, which may be used to identify portions of text that correspond to a named entity recognizable by NLU system 204). 
Therefore, it would have been obvious to one with ordinary skill in the art before the effective filing date of the invention to have implemented the teaching of performing, by the processor, named entity recognition (NER) processing on the text string to determine a meaning of at least a portion of the text string from Metallinou into the method as disclosed by Sarshogh. The motivation for doing this is to improve optical character recognition processing.

Regarding claim 6, the combination of Sarshogh and Metallinou disclose the method of claim 5, wherein the NER processing includes utilizing a bidirectional long short-term memory (Bi-LSTM) network to process the text string (Metallinou col 40 lines 8-15 In some embodiments, the model 250b may be a particular form of LSTM neural network, known as a Bidirectional LSTM (" Bi-LSTM")). 


Regarding claim(s) 11-12 (drawn to a method):               
The rejection/proposed combination of Sarshogh and Metallinou, explained in the rejection of method claim(s) 5-6, anticipates/renders obvious the steps of the method of claim(s) 11-12 because these steps occur in the operation of the proposed combination as discussed above. Thus, the arguments similar to that presented above for claim(s) 5-6 is/are equally applicable to claim(s) 11-12.

Regarding claim 13, the combination of Sarshogh, Dabet and Metallinou method of claim 12, wherein the Bi-LSTM network processes the text string at a character level, a word level, or both (Metallinou col 38 lines 6-20 The NLU system 204 can process the text data and other model input data using a multi-modal model, such as one of the models 250a or 250b; col 40 lines 8-15 In some embodiments, the model 250b may be a particular form of LSTM neural network, known as a Bidirectional LSTM (" Bi-LSTM")).
Therefore, it would have been obvious to one with ordinary skill in the art before the effective filing date of the invention to have implemented the teaching of wherein the Bi-LSTM network processes the text string at a character level, a word level, or both from Metallinou into the method as disclosed by the combination of Sarshogh and Dabet. The motivation for doing this is to improve optical character recognition processing.


The rejection/proposed combination of the combination of Sarshogh, Dabet and Metallinou, explained in the rejection of method claim(s) 11-13, anticipates/renders obvious the steps of the system of claim(s) 18-20 because these steps occur in the operation of the proposed combination as discussed above. Thus, the arguments similar to that presented above for claim(s) 11-13 is/are equally applicable to claim(s) 18-20.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to KEVIN KY whose telephone number is (571)272-7648. The examiner can normally be reached Monday-Friday 9-5PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Chan Park can be reached on 571-272-7409. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like 





/KEVIN KY/Primary Examiner, Art Unit 2669