DETAILED ACTIONS
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 5 and 10 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claim 5 recites the limitation ""the name validator"" in line 5.  There is insufficient antecedent basis for this limitation in the claim. Claim 1 which claim 5 depends upon does not mention a name validator, but claim 4 does in line 1-2. For the purpose of furthering prosecution, examiner has interpreted claim 5 to be dependent upon claim 4.
Claim 10 recites the limitation ""the name validator"" in line 4.  There is insufficient antecedent basis for this limitation in the claim. Claim 1 which claim 5 depends upon does not mention a name validator, but claim 9 does in line 2. For the purpose of furthering prosecution, examiner has interpreted claim 5 to be dependent upon claim 9.


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claims 1-11 are rejected under 35 U.S.C. 103 as being unpatentable over Flament et al. (US 20190019020 A1), hereinafter referred to as Flament, in view of Guo et al. (US 20040042659 A1), hereinafter referred to as Guo in further view of Thrasher et al. (US 20160028921 A1), hereinafter referred to as Thrasher.

Regarding claim 1, Flament teaches a method for document classification and text information extraction (Title), the method comprising: 
5receiving (para. 0009, “receive an input image from an input/output device to which it is communicatively connected by a network”), via one or more hardware processors (Fig. 2, CPU 260), a scanned image of a document (para. 0025, “I/O device 110 provides images (e.g., photographs, scanned documents, faxes, etc.) to a server 120”); 
detecting (para. 0033, “uses a convolutional neural network 322 to identify regions within input images where the desired types of information (e.g., text, images, signatures, etc.) are found”), via one or more hardware processors (Fig. 2, CPU 260), a Region of Interest (ROI) in the scanned image (para. 0025, “I/O device 110 provides images (e.g., photographs, scanned documents, faxes, etc.) to a server 120”) by marking a ROI bounding box on the scanned image using a text detection engine based-ROI technique 10that locates corner coordinates of a ROI bounding box defining the ROI (para. 0009, “a set of convolutional operations on the input image to produce one or more heat maps or bounding boxes”); 
classifying, via the one or more hardware processors (Fig. 2, CPU 260), the ROI into a document type among a plurality of document types using a trainable Deep Learning based multi-layered Neural Network (NN) classification model (para. 0010, “fully convolutional neural network is then trained to recognize the feature types using the provided pieces of information as expected outputs of processing the associated simulated document images”, para. 0040, “if the image is classified as a driver's license, a set of numbers may be interpreted as a license number associated with a particular driver, rather than as an invoice number or some other type of number.”, the image is being classified as to what type of document it is either a driver’s license or an invoice); 
15applying, via the one or more hardware processors (Fig. 2, CPU 260), multistage pre-processing (para. 0009, “processor may be configured to perform preprocessing (e.g., conversion of colors to grayscale) on each input image prior to providing the preprocessed image to the fully convolutional neural network, Fig. 4, step 404) on the classified ROI to remove background noise (para. 0042, “it may be helpful to identify a security feature or other type of background in order to be able to remove, minimize or otherwise account for this background so that text or other features in the document are more easily recognizable”), 
applying, via the one or more hardware processors (Fig. 2, CPU 260), a text detection technique on the second level pre-processed image (Fig. 4, step 404, the input image has been preprocessed) to mark a 20plurality of bounding boxes around text information in the second level pre-processed image (para. 0038, Fig. 5, “A second one of the heat maps (530) shows the areas or “bounding boxes” (532) of the license in which text is found.”), wherein each of the plurality of bounding boxes 46are identified by spatial positions defined by corner coordinates and corresponding height and width (Fig. 530, the size of the bounding boxes corresponds to the size of the text), and wherein one or more bounding boxes are clubbed based on a spatial proximity criteria (para. 0039, “the heat maps may indicate areas that are bounded by distinct lines (bounding boxes), where the portion of the image within a bounding box has an above-threshold likelihood of having the particular feature type and the portion of the image outside the bounding box has a below-threshold likelihood of having the particular feature type”, threshold is the criteria, para. 0039, “the image regions as clear lines (e.g., bounding boxes 532) dividing the relevant areas (photos and text) from other areas”); 
extracting (para. 0025, “trained convolutional neural network can then identify areas within the input images received from I/O device 110 which contain text, images, signatures and/or other information, and extract the corresponding information (e.g., using optical character recognition to recognize text within the text areas)”), via the one or more hardware processors (Fig. 2, CPU 260), text 5information from each of the plurality of bounding boxes by applying OCR (Fig. 4, step 408, recognize content in identified areas by performing optical character recognition, para. 0009, “e processor then extracts information of the associated feature type from an area of the input image indicated by the corresponding one of the heat maps or bounding boxes”); and 
determining, via the one or more hardware processors (Fig. 2, CPU 260), contextual relationship among the extracted text information (para. 0040, “recognized characters in the text areas, or text bounding boxes, can then be processed to identify, derive, interpret, extract, and/or infer meaningful information within the text (410). For example, if the characters include “DOB” followed by a set of numbers, the numbers may be interpreted as a birthdate, or if the characters include “HAIR: BRN”, the characters may be interpreted to indicate that the associated person's hair is brown”) and refining the extracted text information based on configuration rules for the 10document type (para. 0040, “if the image is classified as a driver's license, a set of numbers may be interpreted as a license number associated with a particular driver, rather than as an invoice number or some other type of number”, if the image is classified as a driver’s license then the set of numbers is configured to be interpreted as a license number rather than other type of number)).

Flament does not explicitly disclose the multistage pre-processing comprising of
reading the ROI into a Red Green Blue (RGB) color space and flattening the ROI; 
44performing unsupervised clustering by applying K-means clustering on a plurality of pixels of the ROI in the RGB color space to generate a plurality of color clusters; 
obtaining a plurality of centroids, of each of the 5plurality of color clusters, wherein each centroid represents a unique color associated with each of the plurality of clusters; 
converting the centroids from RGB color space to Hue Saturation Value (HSV) space; 
10generating a plurality of color masks corresponding to the plurality of clusters, wherein each color mask is generated based on a) the unique color associated with a centroid among the plurality of centroids and b) range of HSV color space defined around the centroid; 
15applying each of the plurality of color masks to the ROI to obtain a plurality of binary ROI images, wherein each of the plurality of binary ROI image comprises one or more contours indicating spatial locations of one or more pixels among the plurality of pixels in the ROI that belong 20to the unique color of the centroid; 
45identifying in each of the plurality of binary ROI images, one or more contours of interest from among the plurality of contours, wherein the one or more contours of interest are a) closed contours and b) have size above a 5predefined contour size; 
performing a subtraction, of the one or more of contours of interest identified for each of the plurality of binary ROI images, from the ROI in accordance to spatial positions of pixels of the one or more contours of interest 10identified for each of the plurality of binary ROI images, wherein the subtraction eliminates the background noise while retaining information of interest to generate a first level pre-processed image; and 
applying thresholding on the first level pre- 15processed image to obtain a second level pre-processed image using a threshold value derived dynamically from a histogram of the first level pre-processed image.
	However, Guo teaches the multistage pre-processing (Fig. 4) comprising of:
reading the ROI into a Red Green Blue (RGB) color space and flattening the ROI (para. 0026, “The RGB (red, green, and blue) color space is first preferably transformed into an HSV (hue, saturation, and intensity value) color space to best distinguish color features (step 100)”, the image is originally in RGB color space); 
44performing unsupervised clustering by applying K-means clustering on a plurality of pixels of the ROI in the RGB color space to generate a plurality of color clusters (para. 0041, “fuzzy K-mean clustering is performed to the transformed image to group pixels having feature vectors close to one another in the feature space into blocks”, para. 0026, “texture features are obtained for each color channel (step 110)”)); 
obtaining a plurality of centroids, of each of the 5plurality of color clusters, wherein each centroid represents a unique color associated with each of the plurality of clusters (para. 0057, “Each cluster can be represented by the mean feature vector of all its fuzzy members as a cluster center. To obtain good segmentation, the members of each cluster should be as close to the cluster center as possible, and the cluster centers should be well separated.”, para. 0026, “texture features are obtained for each color channel (step 110)”); 
converting the centroids from RGB color space to Hue Saturation Value (HSV) space (para. 0026, “The RGB (red, green, and blue) color space is first preferably transformed into an HSV (hue, saturation, and intensity value) color space to best distinguish color features (step 100)”); 
10generating a plurality of color masks corresponding to the plurality of clusters, wherein each color mask is generated based on a) the unique color associated with a centroid among the plurality of centroids and b) range of HSV color space defined around the centroid (para. 0057, “the fuzzy k-mean clustering technique is used to group pixels with similar feature vectors that are close to one another in the feature space. As a result of this processing, k clusters of feature data points, which depicts perceptually different regions in the image, are generated. Each cluster can be represented by the mean feature vector of all its fuzzy members as a cluster center. To obtain good segmentation, the members of each cluster should be as close to the cluster center as possible, and the cluster centers should be well separated.”, para. 0062, “In FIG. 6A, a sample document is shown prior to the clustering step, and in FIG. 6B, the same sample document is shown after the clustering step was applied”, in Fig. 6B”, the center of the clusters are used to do the masking in Fig. 6B); 
15applying each of the plurality of color masks to the ROI to obtain a plurality of binary ROI images, wherein each of the plurality of binary ROI image comprises one or more contours indicating spatial locations of one or more pixels among the plurality of pixels in the ROI that belong 20to the unique color of the centroid (para. 0057, “the fuzzy k-mean clustering technique is used to group pixels with similar feature vectors that are close to one another in the feature space. As a result of this processing, k clusters of feature data points, which depicts perceptually different regions in the image, are generated. Each cluster can be represented by the mean feature vector of all its fuzzy members as a cluster center. To obtain good segmentation, the members of each cluster should be as close to the cluster center as possible, and the cluster centers should be well separated.”, para. 0062, “In FIG. 6A, a sample document is shown prior to the clustering step, and in FIG. 6B, the same sample document is shown after the clustering step was applied”, Fig. 6B is a binary image result from the clustering as explained in para. 0057, the clusters are separated which indicates spatial locations); 
45identifying in each of the plurality of binary ROI images, one or more contours of interest from among the plurality of contours, wherein the one or more contours of interest are a) closed contours and b) have size above a 5predefined contour size (para. 0064, “the steps performed in identifying regions as background, text or halftone are shown. First, at step 900, the mean and variance of each region calculated. If the variance is found to be less than a predetermined threshold T at step 910, it is classified as background at step 990 as discussed above. Otherwise, the region is considered text or halftone and processing continues at step 920, where the data preprocessing step is performed”, the regions are compared to a threshold to identify the ROI being background or text or halftone); 
performing a subtraction, of the one or more of contours of interest identified for each of the plurality of binary ROI images, from the ROI in accordance to spatial positions of pixels of the one or more contours of interest 10identified for each of the plurality of binary ROI images, wherein the subtraction eliminates the background noise while retaining information of interest to generate a first level pre-processed image (para. 0064, “the region is considered text or halftone and processing continues at step 920, where the data preprocessing step is performed. In certain instances, a homogenous background can cause problems in identifying the periodic patterns in the histograms, as shown in FIG. 10A. As a result, the data preprocessing step makes the process more robust by inverting the pixel values in each region when there is a light (and thus homogenous) background, which results in the histogram shown in FIG. 10B.”, so by inverting the pixel value of the background the text pattern can be easily identified so the information of interest which is the text is retained). 
Flament and Guo are both considered to be analogous to the claimed invention because they are in the same field of text extraction. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the method as taught by Flament to incorporate the teachings of Guo of the multi-stage preprocessing. Such a modification is the result of combining prior art elements according to known methods to yield predictable results. The motivation for the proposed modification would have been because it makes the process more robust (Guo, para. 0064).

 The combination of Flament in view of Guo does not explicitly disclose applying thresholding on the first level pre-15processed image to obtain a second level pre-processed image using a threshold value derived dynamically from a histogram of the first level pre-processed image.
	However, Thrasher teaches applying thresholding on the first level pre-15processed image (Guo teaches the first level pre-processed image where the background has been subtracted from the image to retain the text information, Thrasher also teaches background smoothing using clustering in para. 0196) to obtain a second level pre-processed image using a threshold value derived dynamically from a histogram of the first level pre-processed image (para. 0241, iterative recognition-guided thresholding, para. 0248, “the thresholding process may be performed in a manner that renders a legible result on a per-character basis, and upon achieving a legible result, extraction is performed on the legible result, and the process proceeds to obtain a legible result for other characters in the string. Upon accurately extracting all individual characters, the string may be reconstructed from the aggregate extraction results, including the extracted portion(s) of the image, as well as the result of extracting the region of interest (e.g. OCR result). As described herein, this basic procedure is referred to as recognition-guided thresholding”, para. 0254, “it should be understood that the iterative thresholding and extraction process described above is equally applicable to extraction of non-textual information, such as lines or other document structures, graphical elements, etc., as long as there is a quality criterion (as akin to OCR confidence for characters, e.g. a classification-based or other feature-matching confidence measure) evaluating the result. For example, consider a graphical element depicting a gradient of color, which progresses from contrasting with the background to substantially representing the background color the graphical element overlays. In such circumstances, it is similarly possible to progress along the gradient (or other pattern or progression) using an iterative thresholding process to extract a legible or clear version of the graphic”, according to Wikipedia the thresholding image processing uses “Histogram shape-based methods, where, for example, the peaks, valleys and curvatures of the smoothed histogram are analyzed.[2] Note that these methods, more than others, make certain assumptions about the image intensity probability distribution (i.e., the shape of the histogram)”.
Flament and Thrasher are both considered to be analogous to the claimed invention because they are in the same field of text extraction. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the method as taught by Flament to incorporate the teachings of Thrasher of applying thresholding on the first level pre-15processed image to obtain a second level pre-processed image using a threshold value derived dynamically from a histogram of the first level pre-processed image. Such a modification is the result of combining prior art elements according to known methods to yield predictable results. The motivation for the proposed modification would have been “to accomplish accurate and reliable extraction of both significantly similar and significantly contrasting foreground/background elements within a single image or region of interest of an image” (Thrasher, para. 0248).

Regarding claim 2, the combination of Flament in view of Guo in further view of Thrasher teaches the method of claim 1 (Flament, Title), the method comprises deriving the threshold value dynamically (Thrasher, para. 0241, iterative recognition-guided thresholding) by: 
calculating the histogram of the first level pre-processed image (Thrasher, para. 0092, “creating a grayscale intensity histogram”, according to Wikipedia, thresholding image processing uses histogram method); 
determining average pixel intensity of the first level pre- 15processed image and a left most and a right most peak values in the histogram (Thrasher, para. 0092, “creating a grayscale intensity histogram”, according to Wikipedia, when Histogram shaped methods is used for thresholding , the peaks are analyzed); and 
calculating the threshold value by averaging the left most and the right most peak values (Thrasher, para. 0092, “creating a grayscale intensity histogram”, according to Wikipedia, when Histogram shaped methods is used for thresholding , the peaks are analyzed).  

Regarding claim 3, the combination of Flament in view of Guo in further view of Thrasher teaches the method of claim 1 (Flament, Title), wherein the configuration rules comprise 20precompiled knowledge base that is referred to identify undesired text from the extracted text information, wherein the undesired text is 47discarded from the extracted text information, and wherein the precompiled knowledge base is composition of KEYWORDS and VALUECATEGORY (Flament, para. 0042, “identified or otherwise verified by the system. In other embodiments, it may be helpful to identify a security feature or other type of background in order to be able to remove, minimize or otherwise account for this background so that text or other features in the document are more easily recognizable. In the example of the “VOID” security background of a check, the background image commonly overlaps with other features on the check, such as text indicating a payor, a payee, an amount, a routing number, etc”, the keyword void is removed because it is not important and it overlaps other features on the check).  

Regarding claim 4, the combination of Flament in view of Guo in further view of Thrasher teaches the method of claim 1 (Flament, Title), wherein the method (Flament, Title) comprises correcting (Thrasher, para. 0291, “it is possible to mitigate the need for a user to review and/or to correct extraction results by performing automatic validation of extraction results”), via a 5name validator (Thrasher, para. 0291, “if name and address are extracted, in some instances it is possible to validate that the individual in question in fact resides at the given address”, the name and address are validated)  implemented by the one or more hardware processors  (Flament, Fig. 2, CPU 260, Thrasher, para. 0294, “a system within the scope of the present descriptions may include a processor and logic in and/or executable by the processor to cause the processor to perform steps of a method”), Name values from the extracted data using an encoder-decoder model that utilizes LSTM based RNN architecture (Thrasher, para. 0291, “it is possible to mitigate the need for a user to review and/or to correct extraction results by performing automatic validation of extraction results”).  
Flament and Thrasher are both considered to be analogous to the claimed invention because they are in the same field of text extraction. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the method as taught by Flament to incorporate the teachings of Thrasher of correcting , via a 5name validator implemented by the one or more hardware processors , Name values from the extracted data using an encoder-decoder model that utilizes LSTM based RNN architecture. Such a modification is the result of combining prior art elements according to known methods to yield predictable results. The motivation for the proposed modification would have been so the validation may be performed without requiring user input (Thrasher, para. 0291).

Regarding claim 5, the combination of Flament in view of Guo in further view of Thrasher teaches the method of claim 1 (Flament, Title), wherein the method (Flament, Title) further comprises: generating confidence scores for each of the classified document 10type, the detected text boxes, the extracted text information, and output of the name validator (Thrasher, para. 0241, “for example by matching an expected region of interest identity with an expected region of interest location, it is possible to acquire confidence in the extraction result, Flament teaches classified document type, para. 0251, “any other equivalent means of determining confidence as to whether a particular image feature matches an expected image feature may be employed without departing from the scope of the present disclosures”); assigning predefined weights to each of the confidence scores (Thrasher, para. 0262, “where the confidence measure is OCR confidence and the primary but nonexclusive objective is to threshold textual information, each particular region is matched to a corresponding region of interest known from the training set”); and aggregating the weighted confidence score to compute a 15cumulative confidence score for the extracted text information (Thrasher, para. 0241, “for example by matching an expected region of interest identity with an expected region of interest location, it is possible to acquire confidence in the extraction result. For instance, and as will be described in further detail below, by matching a region of interest location with an expected region of interest identity, the result of extraction from various image “frames” subjected to different threshold levels may be evaluated to determine whether the extraction at one particular threshold is “correct.”).  
Flament and Thrasher are both considered to be analogous to the claimed invention because they are in the same field of text extraction. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the method as taught by Flament to incorporate the teachings of Thrasher of generating confidence scores for each of the classified document 10type, the detected text boxes, the extracted text information, and output of the name validator; assigning predefined weights to each of the confidence scores; and aggregating the weighted confidence score to compute a 15cumulative confidence score for the extracted text information. Such a modification is the result of combining prior art elements according to known methods to yield predictable results. The motivation for the proposed modification would have been to maximize the likelihood of achieving a candidate result with sufficient confidence for extraction (Thrasher, para. 0266).

Regarding claim 6, Flament teaches a system for document classification and text information extraction (Title), the system comprising: 
a memory storing instructions (para. 0025, “receives program instructions from a memory 126); 
one or more Input/Output (1/O) interfaces (para. 0025, “I/O device 110 may, for example, be a mobile phone, a camera, a document scanner, a digital document storage, or any other device suitable for inputting images to the system. I/O device 110 provides images (e.g., photographs, scanned documents, faxes, etc.) to a server 120.”); and 
48one or more hardware processors (Fig. 2, CPU 260) coupled to the memory (para. 0025, “receives program instructions from a memory 126) via the one or more 1/O interfaces (para. 0025, “I/O device 110 may, for example, be a mobile phone, a camera, a document scanner, a digital document storage, or any other device suitable for inputting images to the system. I/O device 110 provides images (e.g., photographs, scanned documents, faxes, etc.) to a server 120.”, wherein the one or more hardware processors (Fig. 2, CPU 260) are configured by the instructions to: 
5receive (para. 0009, “receive an input image from an input/output device to which it is communicatively connected by a network”)  a scanned image of a document (para. 0025, “I/O device 110 provides images (e.g., photographs, scanned documents, faxes, etc.) to a server 120”); 
detect (para. 0033, “uses a convolutional neural network 322 to identify regions within input images where the desired types of information (e.g., text, images, signatures, etc.) are found”) a Region of Interest (ROI) in the scanned image (para. 0025, “I/O device 110 provides images (e.g., photographs, scanned documents, faxes, etc.) to a server 120”) by marking a ROI bounding box on the scanned image using a text detection engine based-ROI technique 10that locates corner coordinates of a ROI bounding box defining the ROI (para. 0009, “a set of convolutional operations on the input image to produce one or more heat maps or bounding boxes”); 
classify the ROI into a document type among a plurality of document types using a trainable Deep Learning based multi-layered Neural Network (NN) classification model (para. 0010, “fully convolutional neural network is then trained to recognize the feature types using the provided pieces of information as expected outputs of processing the associated simulated document images”, para. 0040, “if the image is classified as a driver's license, a set of numbers may be interpreted as a license number associated with a particular driver, rather than as an invoice number or some other type of number.”, the image is being classified as to what type of document it is either a driver’s license or an invoice); 
15apply multistage pre-processing (para. 0009, “processor may be configured to perform preprocessing (e.g., conversion of colors to grayscale) on each input image prior to providing the preprocessed image to the fully convolutional neural network, Fig. 4, step 404) on the classified ROI to remove background noise (para. 0042, “it may be helpful to identify a security feature or other type of background in order to be able to remove, minimize or otherwise account for this background so that text or other features in the document are more easily recognizable”), 
apply a text detection technique on the second level pre-processed image (Fig. 4, step 404, the input image has been preprocessed) to mark a 20plurality of bounding boxes around text information in the second level pre-processed image (para. 0038, Fig. 5, “A second one of the heat maps (530) shows the areas or “bounding boxes” (532) of the license in which text is found.”), wherein each of the plurality of bounding boxes 46are identified by spatial positions defined by corner coordinates and corresponding height and width (Fig. 530, the size of the bounding boxes corresponds to the size of the text), and wherein one or more bounding boxes are clubbed based on a spatial proximity criteria (para. 0039, “the heat maps may indicate areas that are bounded by distinct lines (bounding boxes), where the portion of the image within a bounding box has an above-threshold likelihood of having the particular feature type and the portion of the image outside the bounding box has a below-threshold likelihood of having the particular feature type”, threshold is the criteria, para. 0039, “the image regions as clear lines (e.g., bounding boxes 532) dividing the relevant areas (photos and text) from other areas”); 
extract text 5information from each of the plurality of bounding boxes by applying OCR (Fig. 4, step 408, recognize content in identified areas by performing optical character recognition, para. 0009, “e processor then extracts information of the associated feature type from an area of the input image indicated by the corresponding one of the heat maps or bounding boxes”); and 
determine contextual relationship among the extracted text information (para. 0040, “recognized characters in the text areas, or text bounding boxes, can then be processed to identify, derive, interpret, extract, and/or infer meaningful information within the text (410). For example, if the characters include “DOB” followed by a set of numbers, the numbers may be interpreted as a birthdate, or if the characters include “HAIR: BRN”, the characters may be interpreted to indicate that the associated person's hair is brown”) and refining the extracted text information based on configuration rules for the 10document type (para. 0040, “if the image is classified as a driver's license, a set of numbers may be interpreted as a license number associated with a particular driver, rather than as an invoice number or some other type of number”, if the image is classified as a driver’s license then the set of numbers is configured to be interpreted as a license number rather than other type of number)).
Flament does not explicitly disclose the multistage pre-processing comprising of
reading the ROI into a Red Green Blue (RGB) color space and flattening the ROI; 
44performing unsupervised clustering by applying K-means clustering on a plurality of pixels of the ROI in the RGB color space to generate a plurality of color clusters; 
obtaining a plurality of centroids, of each of the 5plurality of color clusters, wherein each centroid represents a unique color associated with each of the plurality of clusters; 
converting the centroids from RGB color space to Hue Saturation Value (HSV) space; 
10generating a plurality of color masks corresponding to the plurality of clusters, wherein each color mask is generated based on a) the unique color associated with a centroid among the plurality of centroids and b) range of HSV color space defined around the centroid; 
15applying each of the plurality of color masks to the ROI to obtain a plurality of binary ROI images, wherein each of the plurality of binary ROI image comprises one or more contours indicating spatial locations of one or more pixels among the plurality of pixels in the ROI that belong 20to the unique color of the centroid; 
45identifying in each of the plurality of binary ROI images, one or more contours of interest from among the plurality of contours, wherein the one or more contours of interest are a) closed contours and b) have size above a 5predefined contour size; 
performing a subtraction, of the one or more of contours of interest identified for each of the plurality of binary ROI images, from the ROI in accordance to spatial positions of pixels of the one or more contours of interest 10identified for each of the plurality of binary ROI images, wherein the subtraction eliminates the background noise while retaining information of interest to generate a first level pre-processed image; and 
applying thresholding on the first level pre- 15processed image to obtain a second level pre-processed image using a threshold value derived dynamically from a histogram of the first level pre-processed image.
	However, Guo teaches the multistage pre-processing (Fig. 4) comprising of:
reading the ROI into a Red Green Blue (RGB) color space and flattening the ROI (para. 0026, “The RGB (red, green, and blue) color space is first preferably transformed into an HSV (hue, saturation, and intensity value) color space to best distinguish color features (step 100)”, the image is originally in RGB color space); 
44performing unsupervised clustering by applying K-means clustering on a plurality of pixels of the ROI in the RGB color space to generate a plurality of color clusters (para. 0041, “fuzzy K-mean clustering is performed to the transformed image to group pixels having feature vectors close to one another in the feature space into blocks”, para. 0026, “texture features are obtained for each color channel (step 110)”)); 
obtaining a plurality of centroids, of each of the 5plurality of color clusters, wherein each centroid represents a unique color associated with each of the plurality of clusters (para. 0057, “Each cluster can be represented by the mean feature vector of all its fuzzy members as a cluster center. To obtain good segmentation, the members of each cluster should be as close to the cluster center as possible, and the cluster centers should be well separated.”, para. 0026, “texture features are obtained for each color channel (step 110)”); 
converting the centroids from RGB color space to Hue Saturation Value (HSV) space (para. 0026, “The RGB (red, green, and blue) color space is first preferably transformed into an HSV (hue, saturation, and intensity value) color space to best distinguish color features (step 100)”); 
10generating a plurality of color masks corresponding to the plurality of clusters, wherein each color mask is generated based on a) the unique color associated with a centroid among the plurality of centroids and b) range of HSV color space defined around the centroid (para. 0057, “the fuzzy k-mean clustering technique is used to group pixels with similar feature vectors that are close to one another in the feature space. As a result of this processing, k clusters of feature data points, which depicts perceptually different regions in the image, are generated. Each cluster can be represented by the mean feature vector of all its fuzzy members as a cluster center. To obtain good segmentation, the members of each cluster should be as close to the cluster center as possible, and the cluster centers should be well separated.”, para. 0062, “In FIG. 6A, a sample document is shown prior to the clustering step, and in FIG. 6B, the same sample document is shown after the clustering step was applied”, in Fig. 6B”, the center of the clusters are used to do the masking in Fig. 6B); 
15applying each of the plurality of color masks to the ROI to obtain a plurality of binary ROI images, wherein each of the plurality of binary ROI image comprises one or more contours indicating spatial locations of one or more pixels among the plurality of pixels in the ROI that belong 20to the unique color of the centroid (para. 0057, “the fuzzy k-mean clustering technique is used to group pixels with similar feature vectors that are close to one another in the feature space. As a result of this processing, k clusters of feature data points, which depicts perceptually different regions in the image, are generated. Each cluster can be represented by the mean feature vector of all its fuzzy members as a cluster center. To obtain good segmentation, the members of each cluster should be as close to the cluster center as possible, and the cluster centers should be well separated.”, para. 0062, “In FIG. 6A, a sample document is shown prior to the clustering step, and in FIG. 6B, the same sample document is shown after the clustering step was applied”, Fig. 6B is a binary image result from the clustering as explained in para. 0057, the clusters are separated which indicates spatial locations); 
45identifying in each of the plurality of binary ROI images, one or more contours of interest from among the plurality of contours, wherein the one or more contours of interest are a) closed contours and b) have size above a 5predefined contour size (para. 0064, “the steps performed in identifying regions as background, text or halftone are shown. First, at step 900, the mean and variance of each region calculated. If the variance is found to be less than a predetermined threshold T at step 910, it is classified as background at step 990 as discussed above. Otherwise, the region is considered text or halftone and processing continues at step 920, where the data preprocessing step is performed”, the regions are compared to a threshold to identify the ROI being background or text or halftone); 
performing a subtraction, of the one or more of contours of interest identified for each of the plurality of binary ROI images, from the ROI in accordance to spatial positions of pixels of the one or more contours of interest 10identified for each of the plurality of binary ROI images, wherein the subtraction eliminates the background noise while retaining information of interest to generate a first level pre-processed image (para. 0064, “the region is considered text or halftone and processing continues at step 920, where the data preprocessing step is performed. In certain instances, a homogenous background can cause problems in identifying the periodic patterns in the histograms, as shown in FIG. 10A. As a result, the data preprocessing step makes the process more robust by inverting the pixel values in each region when there is a light (and thus homogenous) background, which results in the histogram shown in FIG. 10B.”, so by inverting the pixel value of the background the text pattern can be easily identified so the information of interest which is the text is retained). 
Flament and Guo are both considered to be analogous to the claimed invention because they are in the same field of text extraction. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the system as taught by Flament to incorporate the teachings of Guo of the multi-stage preprocessing. Such a modification is the result of combining prior art elements according to known methods to yield predictable results. The motivation for the proposed modification would have been because it makes the process more robust (Guo, para. 0064).

 The combination of Flament in view of Guo does not explicitly disclose applying thresholding on the first level pre-15processed image to obtain a second level pre-processed image using a threshold value derived dynamically from a histogram of the first level pre-processed image.
	However, Thrasher teaches applying thresholding on the first level pre-15processed image (Guo teaches the first level pre-processed image where the background has been subtracted from the image to retain the text information, Thrasher also teaches background smoothing using clustering in para. 0196) to obtain a second level pre-processed image using a threshold value derived dynamically from a histogram of the first level pre-processed image (para. 0241, iterative recognition-guided thresholding, para. 0248, “the thresholding process may be performed in a manner that renders a legible result on a per-character basis, and upon achieving a legible result, extraction is performed on the legible result, and the process proceeds to obtain a legible result for other characters in the string. Upon accurately extracting all individual characters, the string may be reconstructed from the aggregate extraction results, including the extracted portion(s) of the image, as well as the result of extracting the region of interest (e.g. OCR result). As described herein, this basic procedure is referred to as recognition-guided thresholding”, para. 0254, “it should be understood that the iterative thresholding and extraction process described above is equally applicable to extraction of non-textual information, such as lines or other document structures, graphical elements, etc., as long as there is a quality criterion (as akin to OCR confidence for characters, e.g. a classification-based or other feature-matching confidence measure) evaluating the result. For example, consider a graphical element depicting a gradient of color, which progresses from contrasting with the background to substantially representing the background color the graphical element overlays. In such circumstances, it is similarly possible to progress along the gradient (or other pattern or progression) using an iterative thresholding process to extract a legible or clear version of the graphic”, according to Wikipedia the thresholding image processing uses “Histogram shape-based methods, where, for example, the peaks, valleys and curvatures of the smoothed histogram are analyzed.[2] Note that these methods, more than others, make certain assumptions about the image intensity probability distribution (i.e., the shape of the histogram)”.
Flament and Thrasher are both considered to be analogous to the claimed invention because they are in the same field of text extraction. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the method as taught by Flament to incorporate the teachings of Thrasher of applying thresholding on the first level pre-15processed image to obtain a second level pre-processed image using a threshold value derived dynamically from a histogram of the first level pre-processed image. Such a modification is the result of combining prior art elements according to known methods to yield predictable results. The motivation for the proposed modification would have been “to accomplish accurate and reliable extraction of both significantly similar and significantly contrasting foreground/background elements within a single image or region of interest of an image” (Thrasher, para. 0248).

Regarding claim 7, the combination of Flament in view of Guo in further view of Thrasher teaches the system of claim 6 (Flament, Title), wherein the one or more hardware processors (Flament, Fig. 2, CPU 260) 5are configured to derive the threshold value dynamically (Thrasher, para. 0241, iterative recognition-guided thresholding) by: 
calculating the histogram of the first level pre-processed image (Thrasher, para. 0092, “creating a grayscale intensity histogram”, according to Wikipedia, thresholding image processing uses histogram method); 
determining average pixel intensity of the first level pre- 15processed image and a left most and a right most peak values in the histogram (Thrasher, para. 0092, “creating a grayscale intensity histogram”, according to Wikipedia, when Histogram shaped methods is used for thresholding , the peaks are analyzed); and 
calculating the threshold value by averaging the left most and the right most peak values (Thrasher, para. 0092, “creating a grayscale intensity histogram”, according to Wikipedia, when Histogram shaped methods is used for thresholding , the peaks are analyzed).  

Regarding claim 8, the combination of Flament in view of Guo in further view of Thrasher teaches the system of claim 6 (Flament, Title), wherein the configuration rules comprise 20precompiled knowledge base that is referred to identify undesired text from the extracted text information, wherein the undesired text is 47discarded from the extracted text information, and wherein the precompiled knowledge base is composition of KEYWORDS and VALUECATEGORY (Flament, para. 0042, “identified or otherwise verified by the system. In other embodiments, it may be helpful to identify a security feature or other type of background in order to be able to remove, minimize or otherwise account for this background so that text or other features in the document are more easily recognizable. In the example of the “VOID” security background of a check, the background image commonly overlaps with other features on the check, such as text indicating a payor, a payee, an amount, a routing number, etc”, the keyword void is removed because it is not important and it overlaps other features on the check).  

Regarding claim 9, the combination of Flament in view of Guo in further view of Thrasher teaches the system of claim 6 (Flament, Title), wherein the one or more hardware processors (Flament, Fig. 2, CPU 260)are configured to correct (Thrasher, para. 0291, “it is possible to mitigate the need for a user to review and/or to correct extraction results by performing automatic validation of extraction results”), via a 5name validator (Thrasher, para. 0291, “if name and address are extracted, in some instances it is possible to validate that the individual in question in fact resides at the given address”, the name and address are validated)  implemented by the one or more hardware processors  (Flament, Fig. 2, CPU 260, Thrasher, para. 0294, “a system within the scope of the present descriptions may include a processor and logic in and/or executable by the processor to cause the processor to perform steps of a method”), Name values from the extracted data using an encoder-decoder model that utilizes LSTM based RNN architecture (Thrasher, para. 0291, “it is possible to mitigate the need for a user to review and/or to correct extraction results by performing automatic validation of extraction results”).  
Flament and Thrasher are both considered to be analogous to the claimed invention because they are in the same field of text extraction. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the system as taught by Flament to incorporate the teachings of Thrasher of correcting , via a 5name validator implemented by the one or more hardware processors , Name values from the extracted data using an encoder-decoder model that utilizes LSTM based RNN architecture. Such a modification is the result of combining prior art elements according to known methods to yield predictable results. The motivation for the proposed modification would have been so the validation may be performed without requiring user input (Thrasher, para. 0291).

Regarding claim 10, the combination of Flament in view of Guo in further view of Thrasher teaches the system of claim 6 (Flament, Title), wherein the one or more hardware processors (Flament, Fig. 2, CPU 260) are configured to: 5generate confidence scores for each of the classified document type, the detected text boxes, the extracted text information, and output of the name validator (Thrasher, para. 0241, “for example by matching an expected region of interest identity with an expected region of interest location, it is possible to acquire confidence in the extraction result, Flament teaches classified document type, para. 0251, “any other equivalent means of determining confidence as to whether a particular image feature matches an expected image feature may be employed without departing from the scope of the present disclosures”); assign predefined weights to each of the confidence scores (Thrasher, para. 0262, “where the confidence measure is OCR confidence and the primary but nonexclusive objective is to threshold textual information, each particular region is matched to a corresponding region of interest known from the training set”); and aggregate the weighted confidence score to compute a 15cumulative confidence score for the extracted text information (Thrasher, para. 0241, “for example by matching an expected region of interest identity with an expected region of interest location, it is possible to acquire confidence in the extraction result. For instance, and as will be described in further detail below, by matching a region of interest location with an expected region of interest identity, the result of extraction from various image “frames” subjected to different threshold levels may be evaluated to determine whether the extraction at one particular threshold is “correct.”).  
Flament and Thrasher are both considered to be analogous to the claimed invention because they are in the same field of text extraction. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the system as taught by Flament to incorporate the teachings of Thrasher of generating confidence scores for each of the classified document 10type, the detected text boxes, the extracted text information, and output of the name validator; assigning predefined weights to each of the confidence scores; and aggregating the weighted confidence score to compute a 15cumulative confidence score for the extracted text information. Such a modification is the result of combining prior art elements according to known methods to yield predictable results. The motivation for the proposed modification would have been to maximize the likelihood of achieving a candidate result with sufficient confidence for extraction (Thrasher, para. 0266).

Regarding claim 11, Flament teaches one or more non-transitory machine-readable information storage mediums comprising one or more instructions (para. 0007, “a system comprising a processor and a non-transitory computer-readable storage medium that stores computer instructions translatable by the processor to perform a method), which when executed by one or more hardware processors (Fig. 2, CPU 260) causes: 
155receiving (para. 0009, “receive an input image from an input/output device to which it is communicatively connected by a network”), via one or more hardware processors (Fig. 2, CPU 260) , a scanned image of a document (para. 0025, “I/O device 110 provides images (e.g., photographs, scanned documents, faxes, etc.) to a server 120”); 
detecting (para. 0033, “uses a convolutional neural network 322 to identify regions within input images where the desired types of information (e.g., text, images, signatures, etc.) are found”), via one or more hardware processors (Fig. 2, CPU 260), a Region of Interest (ROI) in the scanned image (para. 0025, “I/O device 110 provides images (e.g., photographs, scanned documents, faxes, etc.) to a server 120”) by marking a ROI bounding box on the scanned image using a text detection engine based-ROI technique 10that locates corner coordinates of a ROI bounding box defining the ROI (para. 0009, “a set of convolutional operations on the input image to produce one or more heat maps or bounding boxes”); 
classifying, via the one or more hardware processors (Fig. 2, CPU 260), the ROI into a document type among a plurality of document types using a trainable Deep Learning based multi-layered Neural Network (NN) classification model (para. 0010, “fully convolutional neural network is then trained to recognize the feature types using the provided pieces of information as expected outputs of processing the associated simulated document images”, para. 0040, “if the image is classified as a driver's license, a set of numbers may be interpreted as a license number associated with a particular driver, rather than as an invoice number or some other type of number.”, the image is being classified as to what type of document it is either a driver’s license or an invoice); 
15applying, via the one or more hardware processors (Fig. 2, CPU 260), multistage pre-processing (para. 0009, “processor may be configured to perform preprocessing (e.g., conversion of colors to grayscale) on each input image prior to providing the preprocessed image to the fully convolutional neural network, Fig. 4, step 404) on the classified ROI to remove background noise (para. 0042, “it may be helpful to identify a security feature or other type of background in order to be able to remove, minimize or otherwise account for this background so that text or other features in the document are more easily recognizable”), 
applying, via the one or more hardware processors (Fig. 2, CPU 260), a text detection technique on the second level pre-processed image (Fig. 4, step 404, the input image has been preprocessed) to mark a 20plurality of bounding boxes around text information in the second level pre-processed image (para. 0038, Fig. 5, “A second one of the heat maps (530) shows the areas or “bounding boxes” (532) of the license in which text is found.”), wherein each of the plurality of bounding boxes 46are identified by spatial positions defined by corner coordinates and corresponding height and width (Fig. 530, the size of the bounding boxes corresponds to the size of the text), and wherein one or more bounding boxes are clubbed based on a spatial proximity criteria (para. 0039, “the heat maps may indicate areas that are bounded by distinct lines (bounding boxes), where the portion of the image within a bounding box has an above-threshold likelihood of having the particular feature type and the portion of the image outside the bounding box has a below-threshold likelihood of having the particular feature type”, threshold is the criteria, para. 0039, “the image regions as clear lines (e.g., bounding boxes 532) dividing the relevant areas (photos and text) from other areas”); 
extracting (para. 0025, “trained convolutional neural network can then identify areas within the input images received from I/O device 110 which contain text, images, signatures and/or other information, and extract the corresponding information (e.g., using optical character recognition to recognize text within the text areas)”), via the one or more hardware processors (Fig. 2, CPU 260), text 5information from each of the plurality of bounding boxes by applying OCR (Fig. 4, step 408, recognize content in identified areas by performing optical character recognition, para. 0009, “e processor then extracts information of the associated feature type from an area of the input image indicated by the corresponding one of the heat maps or bounding boxes”); and 
determining, via the one or more hardware processors (Fig. 2, CPU 260), contextual relationship among the extracted text information (para. 0040, “recognized characters in the text areas, or text bounding boxes, can then be processed to identify, derive, interpret, extract, and/or infer meaningful information within the text (410). For example, if the characters include “DOB” followed by a set of numbers, the numbers may be interpreted as a birthdate, or if the characters include “HAIR: BRN”, the characters may be interpreted to indicate that the associated person's hair is brown”) and refining the extracted text information based on configuration rules for the 10document type (para. 0040, “if the image is classified as a driver's license, a set of numbers may be interpreted as a license number associated with a particular driver, rather than as an invoice number or some other type of number”, if the image is classified as a driver’s license then the set of numbers is configured to be interpreted as a license number rather than other type of number)).

Flament does not explicitly disclose the multistage pre-processing comprising of
reading the ROI into a Red Green Blue (RGB) color space and flattening the ROI; 
44performing unsupervised clustering by applying K-means clustering on a plurality of pixels of the ROI in the RGB color space to generate a plurality of color clusters; 
obtaining a plurality of centroids, of each of the 5plurality of color clusters, wherein each centroid represents a unique color associated with each of the plurality of clusters; 
converting the centroids from RGB color space to Hue Saturation Value (HSV) space; 
10generating a plurality of color masks corresponding to the plurality of clusters, wherein each color mask is generated based on a) the unique color associated with a centroid among the plurality of centroids and b) range of HSV color space defined around the centroid; 
15applying each of the plurality of color masks to the ROI to obtain a plurality of binary ROI images, wherein each of the plurality of binary ROI image comprises one or more contours indicating spatial locations of one or more pixels among the plurality of pixels in the ROI that belong 20to the unique color of the centroid; 
45identifying in each of the plurality of binary ROI images, one or more contours of interest from among the plurality of contours, wherein the one or more contours of interest are a) closed contours and b) have size above a 5predefined contour size; 
performing a subtraction, of the one or more of contours of interest identified for each of the plurality of binary ROI images, from the ROI in accordance to spatial positions of pixels of the one or more contours of interest 10identified for each of the plurality of binary ROI images, wherein the subtraction eliminates the background noise while retaining information of interest to generate a first level pre-processed image; and 
applying thresholding on the first level pre- 15processed image to obtain a second level pre-processed image using a threshold value derived dynamically from a histogram of the first level pre-processed image.
	However, Guo teaches the multistage pre-processing (Fig. 4) comprising of:
reading the ROI into a Red Green Blue (RGB) color space and flattening the ROI (para. 0026, “The RGB (red, green, and blue) color space is first preferably transformed into an HSV (hue, saturation, and intensity value) color space to best distinguish color features (step 100)”, the image is originally in RGB color space); 
44performing unsupervised clustering by applying K-means clustering on a plurality of pixels of the ROI in the RGB color space to generate a plurality of color clusters (para. 0041, “fuzzy K-mean clustering is performed to the transformed image to group pixels having feature vectors close to one another in the feature space into blocks”, para. 0026, “texture features are obtained for each color channel (step 110)”)); 
obtaining a plurality of centroids, of each of the 5plurality of color clusters, wherein each centroid represents a unique color associated with each of the plurality of clusters (para. 0057, “Each cluster can be represented by the mean feature vector of all its fuzzy members as a cluster center. To obtain good segmentation, the members of each cluster should be as close to the cluster center as possible, and the cluster centers should be well separated.”, para. 0026, “texture features are obtained for each color channel (step 110)”); 
converting the centroids from RGB color space to Hue Saturation Value (HSV) space (para. 0026, “The RGB (red, green, and blue) color space is first preferably transformed into an HSV (hue, saturation, and intensity value) color space to best distinguish color features (step 100)”); 
10generating a plurality of color masks corresponding to the plurality of clusters, wherein each color mask is generated based on a) the unique color associated with a centroid among the plurality of centroids and b) range of HSV color space defined around the centroid (para. 0057, “the fuzzy k-mean clustering technique is used to group pixels with similar feature vectors that are close to one another in the feature space. As a result of this processing, k clusters of feature data points, which depicts perceptually different regions in the image, are generated. Each cluster can be represented by the mean feature vector of all its fuzzy members as a cluster center. To obtain good segmentation, the members of each cluster should be as close to the cluster center as possible, and the cluster centers should be well separated.”, para. 0062, “In FIG. 6A, a sample document is shown prior to the clustering step, and in FIG. 6B, the same sample document is shown after the clustering step was applied”, in Fig. 6B”, the center of the clusters are used to do the masking in Fig. 6B); 
15applying each of the plurality of color masks to the ROI to obtain a plurality of binary ROI images, wherein each of the plurality of binary ROI image comprises one or more contours indicating spatial locations of one or more pixels among the plurality of pixels in the ROI that belong 20to the unique color of the centroid (para. 0057, “the fuzzy k-mean clustering technique is used to group pixels with similar feature vectors that are close to one another in the feature space. As a result of this processing, k clusters of feature data points, which depicts perceptually different regions in the image, are generated. Each cluster can be represented by the mean feature vector of all its fuzzy members as a cluster center. To obtain good segmentation, the members of each cluster should be as close to the cluster center as possible, and the cluster centers should be well separated.”, para. 0062, “In FIG. 6A, a sample document is shown prior to the clustering step, and in FIG. 6B, the same sample document is shown after the clustering step was applied”, Fig. 6B is a binary image result from the clustering as explained in para. 0057, the clusters are separated which indicates spatial locations); 
45identifying in each of the plurality of binary ROI images, one or more contours of interest from among the plurality of contours, wherein the one or more contours of interest are a) closed contours and b) have size above a 5predefined contour size (para. 0064, “the steps performed in identifying regions as background, text or halftone are shown. First, at step 900, the mean and variance of each region calculated. If the variance is found to be less than a predetermined threshold T at step 910, it is classified as background at step 990 as discussed above. Otherwise, the region is considered text or halftone and processing continues at step 920, where the data preprocessing step is performed”, the regions are compared to a threshold to identify the ROI being background or text or halftone); 
performing a subtraction, of the one or more of contours of interest identified for each of the plurality of binary ROI images, from the ROI in accordance to spatial positions of pixels of the one or more contours of interest 10identified for each of the plurality of binary ROI images, wherein the subtraction eliminates the background noise while retaining information of interest to generate a first level pre-processed image (para. 0064, “the region is considered text or halftone and processing continues at step 920, where the data preprocessing step is performed. In certain instances, a homogenous background can cause problems in identifying the periodic patterns in the histograms, as shown in FIG. 10A. As a result, the data preprocessing step makes the process more robust by inverting the pixel values in each region when there is a light (and thus homogenous) background, which results in the histogram shown in FIG. 10B.”, so by inverting the pixel value of the background the text pattern can be easily identified so the information of interest which is the text is retained). 
Flament and Guo are both considered to be analogous to the claimed invention because they are in the same field of text extraction. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the non-transitory machine-readable information storage mediums as taught by Flament to incorporate the teachings of Guo of the multi-stage preprocessing. Such a modification is the result of combining prior art elements according to known methods to yield predictable results. The motivation for the proposed modification would have been because it makes the process more robust (Guo, para. 0064).

 The combination of Flament in view of Guo does not explicitly disclose applying thresholding on the first level pre-15processed image to obtain a second level pre-processed image using a threshold value derived dynamically from a histogram of the first level pre-processed image.
	However, Thrasher teaches applying thresholding on the first level pre-15processed image (Guo teaches the first level pre-processed image where the background has been subtracted from the image to retain the text information, Thrasher also teaches background smoothing using clustering in para. 0196) to obtain a second level pre-processed image using a threshold value derived dynamically from a histogram of the first level pre-processed image (para. 0241, iterative recognition-guided thresholding, para. 0248, “the thresholding process may be performed in a manner that renders a legible result on a per-character basis, and upon achieving a legible result, extraction is performed on the legible result, and the process proceeds to obtain a legible result for other characters in the string. Upon accurately extracting all individual characters, the string may be reconstructed from the aggregate extraction results, including the extracted portion(s) of the image, as well as the result of extracting the region of interest (e.g. OCR result). As described herein, this basic procedure is referred to as recognition-guided thresholding”, para. 0254, “it should be understood that the iterative thresholding and extraction process described above is equally applicable to extraction of non-textual information, such as lines or other document structures, graphical elements, etc., as long as there is a quality criterion (as akin to OCR confidence for characters, e.g. a classification-based or other feature-matching confidence measure) evaluating the result. For example, consider a graphical element depicting a gradient of color, which progresses from contrasting with the background to substantially representing the background color the graphical element overlays. In such circumstances, it is similarly possible to progress along the gradient (or other pattern or progression) using an iterative thresholding process to extract a legible or clear version of the graphic”, according to Wikipedia the thresholding image processing uses “Histogram shape-based methods, where, for example, the peaks, valleys and curvatures of the smoothed histogram are analyzed.[2] Note that these methods, more than others, make certain assumptions about the image intensity probability distribution (i.e., the shape of the histogram)”.
Flament and Thrasher are both considered to be analogous to the claimed invention because they are in the same field of text extraction. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the non-transitory machine-readable information storage mediums as taught by Flament to incorporate the teachings of Thrasher of applying thresholding on the first level pre-15processed image to obtain a second level pre-processed image using a threshold value derived dynamically from a histogram of the first level pre-processed image. Such a modification is the result of combining prior art elements according to known methods to yield predictable results. The motivation for the proposed modification would have been “to accomplish accurate and reliable extraction of both significantly similar and significantly contrasting foreground/background elements within a single image or region of interest of an image” (Thrasher, para. 0248).

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DENISE G ALFONSO whose telephone number is (571)272-1360. The examiner can normally be reached Monday - Friday 7:30 - 5:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Claire Wang can be reached on 571-270-1051. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/DENISE G ALFONSO/Examiner, Art Unit 2663                              

/CLAIRE X WANG/Supervisory Patent Examiner, Art Unit 2663