Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 03/31/2020 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Claim Objections
Claims 1,3,5,7,10-11 and 15 are objected to because of the following informalities: 
In claim 1, line 3, “CNN” should read “ convolutional neural network (CNN)”.
In claim 1, line 10, “ YOLO CNN” should read “you only look once (YOLO) convolutional neural network (CNN)”.
In claim 1, line 11, “ YOLO CNN” should read “you only look once (YOLO) convolutional neural network (CNN)”.
In claim 1, line 13, “FOIs” should read “fields of interest”.
In claim 1, line 15, “FOIs” should read “ fields of interest”.
In claim 1, line 17, “FOIs” should read “ fields of interest”.
In claim 1, line 18,“ FOIs” should read “ fields of interest”.
Similar issue with claims 3,5,7,11 and 15.
Appropriate correction is required.

In claim 10, line 3, “thereo” should read “ thereof ”
Appropriate correction is required.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1-17 is rejected under 35 U.S.C. 103 as being unpatentable over Baheti et al (U.S. 20140168478 A1; Baheti), in view of Huang et al (U.S. 20170147905 A1; Huang).


Regarding claims 1 and 11, Baheti discloses the claim invention, an automated documentation intake (Paragraph 6 : “an electronic device and method use multiple images of identical text that have one or more sizes, to improve text recognition.”) and processing system comprising: 
one or more non-transitory media (Fig.2, Memory 501) storing at least one of processor (Fig.2, processor(s) 404) -executable instructions or data or one or more CNN models (Fig.2 and Paragraph 40: “processor(s) 404 of such embodiments may be programmed to execute first instructions included in software 610 (see FIG. 9) to verify presence of text in a region of an image of a scene of real world captured by a camera (which implements means for determining).”); and
 at least one processor (Fig.2, processor(s) 404)  communicatively coupled to the one or more non-transitory media (Fig.2, Memory 501);( Fig.2 shows the processor 404 connect to memory 501) and configured to: 
receive one or more source images comprising one or more documents related to one or more individuals; (Fig.2, element 201: Capture multiple images at a plurality of zoom levels, of a scene containing text of one or more sizes)  
apply one or more pre-processing methods to render the source image(s) to input image(s) (Fig.2, act 211 and Paragraph 39: “Such regions are stored in memory on completion of act 211. Depending on the embodiment, act 211 may include skew correction of a plurality of regions (including one or more text regions).”); 
apply model to render the input image(s) to one or more target image(s),(Fig.2 , operation 210; Fig.3A, operation 210 and Paragraph 39: “In performing the operation 210, in an act 211 the processor(s) 404 apply a predetermined method (e.g. MSER) to identify regions of pixels in the image that are connected to one another and differ from surrounding pixels in one or more properties, such as intensity and/or color. Regions of the type described above may be similar or identical to regions known in the prior art as connected components, and/or maximally stable extremal regions or MSERs.”) wherein the model: 
applies one or more filters to the input image(s), (Paragraph 64: “A value of an indicator of skew that is computed in operation 420 for each region is stored either individually (for each region) or in aggregate (across multiple regions), at a specific location in memory 501 … Certain embodiments of operation 420 may use statistical methods to compute mean or median of the multiple values, followed by filtering outliers among the multiple values, followed by re-computation of mean or median of the filtered values and comparison to a threshold (e.g. greater than .+-.5 degrees) to determine whether or not skew in the image as a whole is acceptable.’)
generates a probability map suggesting one or more FOIs present in the input image(s), (Fig.3A, act 215 and Paragraph 52: “in act 215 (FIG. 3A), two or more regions may be clustered by processor(s) 404, based on a test of geometry (e.g. when projections of two regions on an axis (such as the x-axis) overlap and the two regions are adjacent to one another with no other region intervening there-between). The just-described clustering enables various modifiers (e.g. upper maatras and/or lower maatras) to be included with the specific characters that are modified by the modifiers (e.g. by being appending thereto)” ; Paragraph 88: “an electronic device and method of the type described herein may provide the region as input to the OCR module 330, followed by storing in a memory 501 a result of the optical character recognition (e.g. one or more words recognized as present in the region, optionally with a probability indicative of confidence in the recognition”, one or more words is interpreted as target data.) and 
classifies the one or more FOIs as comprising one or more of non-target data and target data; (Fig.3A, act 216 and Paragraph 52: “hereafter, processor(s) 404 perform an act 216 (FIG. 3A) to classify the regions (whether or not merged) as text or non-text, e.g. by use of a neural network and/or based on stroke width, which may be implemented in a text verification block 250 of the type illustrated in FIG. 4A.”).
However, Baheti does not discloses YOLO CNN model and 
upsample the one or more FOIs; and 
digitize the upsampled FOIs and extract any target data therein.
 	Huang  discloses the claim invention, an automated documentation intake and processing system (Paragraph 71: “Systems and methods of the present disclosure frame object detection as a regression problem and provide an end-to-end detection framework.”) comprising: 
YOLO CNN model (Paragraph 29: “YOLO also predicts bounding boxes and class probabilities directly from full images in one evaluation. All these methods use shared computation of convolutions, which has been attracting increased attention due to its relatively efficient and accurate visual recognition.”)
upsample the one or more FOIs; (Fig.1, step 106 and Paragraph 32: “ After several layers of convolution and pooling, feature map 106 is upsampled and convolution layers are applied to obtain final output 108.”) and 
 (Fig.1, step 108 and Paragraph 32: “output feature 108 map is converted to bounding boxes 120, and non-maximum suppression is applied to bounding boxes 120 that exceed a threshold.”)
Therefore, it would have been obvious to one of ordinary skill in the art before the
effective filing date to incorporate systems and methods for end-to-end object detection of Huang into an electronic device and method use multiple images of identical text that have one or more sizes, to improve text recognition of Baheti in order to improving an efficient and accurate visual recognition in image. 

Regarding claims 2 and 12, Baheti, as modified by Huang discloses the claim invention. Baheti further discloses the one or more pre-processing methods are selected from the group consisting of: segmentation, denoising, grayscaling, deskewing, scaling, sharpen, threshold, smoothen, rotation, and affine transformation. (Paragraph 36: “implemented by one or more processors 404 included in a mobile device 401 (FIG. 9) that is capable of identifying rectangular portions (or blocks) of an image of a real world scene, followed by segmentation of each rectangular portion (or block) to form a sequence of sub-blocks and identify a character in each sub-block” ; Paragraph 54: “Then, in act 335 mobile device 401 checks if the variance is less than a predetermined threshold, and if so, the region's classification as text has been verified.” and Paragraph 58: “The amount of zoom may be automatically calculated by such an artifact classifier 320 based on scaling up a current height of the text region to reach a predetermined number of pixels.”)

Regarding claims 3, Baheti, as modified by Huang discloses the claim invention. Huang further discloses the YOLO CNN model is trained utilizing a set of predetermined training inputs having one or more known field descriptors. (Fig.1 and Paragraph 32: “In embodiments, pipeline 100 receives input image 112 or image pyramid that is fed to network 104. After several layers of convolution and pooling, feature map 106 is upsampled and convolution layers are applied to obtain final output 108. In embodiments, output feature 108 map is converted to bounding boxes 120, and non-maximum suppression is applied to bounding boxes 120 that exceed a threshold.”)

Regarding claims 4 and 13, Baheti, as modified by Huang discloses the claim invention. Baheti further discloses the target data is associated with confidence score derived from corresponding probability map data. (Paragraph 88: “an electronic device and method of the type described herein may provide the region as input to the OCR module 330, followed by storing in a memory 501 a result of the optical character recognition (e.g. one or more words recognized as present in the region, optionally with a probability indicative of confidence in the recognition”, one or more words is interpreted as target data.)

Regarding claims 5, Baheti, as modified by Huang discloses the claim invention. Baheti further discloses each FOI and extracted target data is associated a unique key. (Paragraph 96: “a mobile device 401 may detect characters of text in images, in implementations that operate the OCR module 330 to identify, e.g. characters of Devanagari alphabet in an image. Any one or more character decoders, word dictionary and feedback module may be implemented in software (executed by one or more processors or processor cores) or in hardware or in firmware, or in any combination thereof.”)

Regarding claims 6 and 14, Baheti, as modified by Huang discloses the claim invention. Huang further discloses the processor is additionally configured to receive corrected target data and adjust a corresponding confidence score. (FIG.1 and Paragraph 35 and Paragraph 36 disclose as picture below)
    PNG
    media_image1.png
    434
    1171
    media_image1.png
    Greyscale


    PNG
    media_image2.png
    358
    1187
    media_image2.png
    Greyscale


Regarding claims 7 and 15, Baheti, as modified by Huang discloses the claim invention. Huang further discloses the processor is additionally configured to update the YOLO CNN model based on received corrected target data according to one or more learning optimization (Paragraph 30: “Object detection often involves multi-task learning, such as landmark localization, pose estimation, and semantic segmentation. Some propose a tree structure model for joint face detection, pose estimation, and landmark localization. Some deep net based object detection designs integrate multi-task learning,” and Paragraph 57: “Hard Negative Mining. Analogous to the hard-negative mining procedure in SVM learning methods, in embodiments, learning is made more efficient by searching badly predicted samples rather than searching random samples. After negative mining, the badly predicted samples are relatively more likely to be selected, such that gradient descent learning on those samples reduces noise and, thus, leads to more robust prediction..”)

Regarding claims 8 and 16, Baheti, as modified by Huang discloses the claim invention. Baheti further discloses the one or more filters correspond to one or more known form fields. (Paragraph 64: “A value of an indicator of skew that is computed in operation 420 for each region is stored either individually (for each region) or in aggregate (across multiple regions), at a specific location in memory 501a skew count for the entire image each time a region is marked as skew-present … Certain embodiments of operation 420 may use statistical methods to compute mean or median of the multiple values, followed by filtering outliers among the multiple values, followed by re-computation of mean or median of the filtered values and comparison to a threshold (e.g. greater than .+-.5 degrees) to determine whether or not skew in the image as a whole is acceptable.’, “a value” is interpreted as known form fields.)

Regarding claims 9 and 17, Baheti, as modified by Huang discloses the claim invention. Baheti further discloses the one or more filters correspond to one or more form field types. (Paragraph 64: “A value of an indicator of skew that is computed in operation 420 for each region is stored either individually (for each region) or in aggregate (across multiple regions), at a specific location in memory 501 a skew count for the entire image each time a region is marked as skew-present … Certain embodiments of operation 420 may use statistical methods to compute mean or median of the multiple values, followed by filtering outliers among the multiple values, followed by re-computation of mean or median of the filtered values and comparison to a threshold (e.g. greater than .+-.5 degrees) to determine whether or not skew in the image as a whole is acceptable.”, “a value” is interpreted as known form fields types.)

Regarding claims 10, Baheti, as modified by Huang discloses the claim invention. Baheti further discloses the one or more form field types are industry-specific and comprises known fields related thereo. (Paragraph 64: “A value of an indicator of skew that is computed in operation 420 for each region is stored either individually (for each region) or in aggregate (across multiple regions), at a specific location in memory 501 a skew count for the entire image each time a region is marked as skew-present … Certain embodiments of operation 420 may use statistical methods to compute mean or median of the multiple values, followed by filtering outliers among the multiple values, followed by re-computation of mean or median of the filtered values and comparison to a threshold (e.g. greater than .+-.5 degrees) to determine whether or not skew in the image as a whole is acceptable.”, “the filtered values and comparison to a threshold to determine whether or not skew in the image as a whole is acceptable” is interpreted as form field types are industry-specific and comprises known fields )


Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Tang et al (U.S. 10,157,331 B1), “ Systems And Methods For Image Preprocessing To Improve Accuracy Of Object Recognition”, teaches about systems and methods for preprocessing an image for object recognition and the systems and methods result in improved accuracy if images having a plurality of objects are fed into an object recognition algorithm
RAGNET et al (U.S. 2011/0078191 A1). “HANDWRITTEN DOCUMENT CATEGORIZER AND METHOD OF TRAINING”, teaches about a method for training a categorizer for assigning handwritten documents to one of a set of predetermined classes based on identification of certain keywords in the documents. Also, the exemplary categorizer training method relies on a collection of both handwritten and typed training documents which are each manually labeled with a label corresponding to a respective one of a finite set of categories.
Jacobs et al (U.S. 2005/0259866 A1), “Low Resolution OCR For Camera Acquired Documents”, teaches about A global optimization framework for optical character recognition (OCR) of low-resolution photographed documents that combines a binarization-type process, segmentation, and recognition into a single process
BACH et al (U.S. 2018/0018553 A1), “RELEVANCE SCORE ASSIGNMENT FOR ARTIFICIAL NEURAL NETWORKS”, teaches about The task of relevance score assignment to 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to Duy A Tran whose telephone number is (571)272-4887. The examiner can normally be reached Monday-Friday 8:00 am - 5:00 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Edward F Urban can be reached on (571)-272-7899. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/DUY TRAN/            Examiner, Art Unit 2665                                                                                                                                                                                            
/BOBBAK SAFAIPOUR/            Primary Examiner, Art Unit 2665