DETAILED ACTION

Notice of Pre-AIA  or AIA  Status

The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

A first-action interview with proposed amendment was requested on May 3, 2021 and conducted on May 10, 2021.

Regarding the proposed amended limitation “scanning at least one page of the at least one text document by shifting a text window to a plurality of locations on the at least one page of the at least one text document to capture a plurality of lines of text,“ the examiner indicated, during the interview, that while the applied Parameswaran reference discloses, in Fig. 5 as an example, a block of image capturing one line of text, it does not expressly disclose capturing more than one.  However, the other applied reference, Prasad, teaches identifying a region with one or more text lines.  See, for example, paragraph 45.  Therefore, while the amendment appears to have overcome the applied Parameswaran reference, the new feature appears to read on paragraph 45 of the Prasad reference.  (Note that US 2016/0092754 by Kuznetsov also discloses capturing regions with multiple text lines in Fig. 200 and paragraph 19.)

Regarding the other proposed amended limitation “providing the respective text snippet and the respective image snippet to a classifier as a part of training the classifier to distinguishes [sic] between text documents and image documents,” note that the Osindero reference discloses training a neural network (a classifier) using training data, while the Kim reference teaches classifying text and non-text (e.g., pictures or photos) in Figs. 13-15 with detailed discussion provided in paragraphs 90-95 and 101.  As a classifier is typically trained with data with characteristics it intends to differentiate, it would have been obvious to train a classifier for differentiating text data and image data with training data and training image data.  Therefore, the proposed amendment does not appear to have overcome the applied references.

The rejections made in the 3/4/2021-mailed pre-interview communication are re-produced below. 

Claim Rejections - 35 USC § 112

The following is a quotation of 35 U.S.C. 112(b):

(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.

Claim 11 recites the limitation "Modified National Institute of Standards and Technology (MNIST) style Neural Network" in lines 1-2.  MNIST is a well-known database of handwriting sample.  However, the term “MNIST-style neural network” is not known in the art and is not defined in the specification, other than that it is “provided 

Claim Rejections - 35 USC § 103

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claims 1-4, 11-15 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Osindero (US 2017/0004374), Parameswaran et al. (US 2013/0129222), Prasad et al. (US 2010/0246961) and Kim et al. (US 2010/0220927).

Regarding claim 1 (and similarly claims 12 and 20), Osindero discloses a method of training a neural network to distinguish between text documents and image documents, comprising:
obtaining a corpus of text documents;
[Fig. 1 and paragraphs 10 (“…trains the neural network on the training data…comprises images with text”), 41 (“…A large number of images can be collected…for generating the training data 130”)]

for at least one text document of the corpus of text documents:
determining whether text (in the window) at a respective location of the plurality  of locations meets text (line) criteria;
in accordance with a determination that the text (in the window) at the respective location of the plurality of locations meets text (line) criteria, storing the text (in the window) as a respective text snippet;
[Figs. 1, 2 and paragraphs 41 (“…A large number of images can be collected and filtered for those containing text-bearing regions for generating the training data 130”), 51 (“…text area identification module 204…analyzes…and identifies the size and placement of the text-bearing portion”).  Note that text-bearing regions are considered text snippets and, being training data, are stored in 130.  The use 

	Osindero does not expressly disclosed the following, which is taught by Parameswaran and Prasad:
scanning at least one page of the at least one text document by shifting a text window to a plurality of locations on the at least one page of the at least one text document;
[Parameswaran: Fig. 1 and paragraph 76 (“…applying a sliding window over the image and identifying whether the image region within the window contains a hypothesized text fragment or not”)]
(that the text criteria is a) text line criteria
[Prasad: Fig. 2 and paragraph 45 (“applies a feature extractor 204 to…training images…this feature extraction identifies the location of…one of more lines of text present in the image…divides each line of text into a series uniform horizontal windows”)]

	Prior to the effective filing date of the claimed invention it would have been obvious to one of ordinary skill in the art to modify Osindero with the teachings of Parameswaran and Prasad as set forth above.  The reasons for doing so at least would have been to identify an image region hypothesized to contain a text fragment, as Parameswaran indicated in paragraph 76, and to be able to compute features representing text, as Prasad indicated in paragraph 45.

	The combined invention of Osindero, Parameswaran and Prasad further discloses obtaining training data and the following:
obtaining a corpus of image documents;
in accordance with a determination that the content of the respective image window meets the image criteria, storing the content of the respective image window as a respective image snippet; and
providing the respective text snippet and the respective image snippet to a classifier
[Osindero: Fig. 1 and paragraphs 10 (“…trains the neural network on the training data”), 41 (“…A large number of images can be collected…for generating the training data 130”).  Note that the training data being image snippets is taught by Kim.  See the analysis below]

 	The combined invention of Osindero, Parameswaran and Prasad does not expressly disclose the following, which are taught by Kim:
for at least one image document of the corpus of image documents:
superimposing a plurality of image windows over at least one page of the at least one image document;
[Figs. 13A-13C and paragraphs 93 (“…FIGS. 13A through 13C, an original image, gray image blocks, and edge map blocks are illustrated”).  Note that each block corresponds to a window]
determining whether the content of a respective image window meets image criteria;
[Figs. 9, 16 and paragraphs 90 (“…unit 240 analyzes the characteristics of the histogram for each block...to determine whether the blocks are…pictures or photographs”), 101 (“…1612…classified as being a text image or a non text image according to…the number of the peaks, the peak distances, and…the peak widths”)]

	Prior to the effective filing date of the claimed invention it would have been obvious to one of ordinary skill in the art to modify the combined invention with the teachings of Kim as set forth above.  The reasons for doing so at least would have been its ability to determine whether a block is picture/photograph or not (as indicated in paragraph 90) without complex calculation (as indicated in paragraph 108).

Regarding claim 2 (and similarly claim 13), the combined invention of Osindero, Parameswaran, Prasad and Kim further discloses:
wherein the text line criteria include first criteria that are met in accordance with a determination that a number of lines of text in the text in the window at the respective location of the plurality of locations is greater than a first predetermined number of lines of text
[Prasad: Fig. 2 and paragraph 45 (“applies a feature extractor 204 to…training images…this feature extraction identifies the location of…one [or] more lines of text present in the image…divides each line of text into a series uniform 

Regarding claim 3 (and similarly claim 14), the combined invention of Osindero, Parameswaran, Prasad and Kim further discloses:
wherein the text line criteria include second criteria that are met in accordance with a determination that a number of lines of text in the text in the window at the respective location of the plurality of locations is fewer than a second predetermined number of lines of text
[Prasad: Fig. 2 and paragraph 45 (“applies a feature extractor 204 to…training images…this feature extraction identifies the location of…one [or] more lines of text present in the image…divides each line of text into a series uniform horizontal windows”).  Note that since the number of line in a window is 1, the second predetermined number is 2]

Regarding claim 4 (and similarly claim 15), the combined invention of Osindero, Parameswaran, Prasad and Kim discloses substantially the claimed invention as set forth in the discussion above for claims 2 and 3 (respectively claims 13 and 14), including a first predetermined number of 0 and a second predetermined number of 2.

The combined invention does not expressly disclose the recited limitation of the first predetermined number being 2 and the second predetermined number being 4.



Therefore, it would have been obvious to one of ordinary skill in this art to modify the combined invention by using the recited first and second predetermined numbers (2 and 4, respective) to obtain the invention as specified in claims 4 and 15.

Regarding claim 11 (as interpreted), the combined invention further discloses:
wherein the classifier is a Modified National Institute of Standards and Technology (MNIST) style Neural Network
[Osindero: Fig. 1 and paragraphs 10 (“…trains the neural network on the training data…comprises images with text”), 41 (“…one or more of the image area module 102, a character extraction module 104 and the language module 106 can comprise neural networks that are trained with vast amounts of training data 130”)]

>>><<<
Claims 5 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Osindero (US 2017/0004374), Parameswaran et al. (US 2013/0129222), Prasad et al. (US 2010/0246961) and Kim et al. (US 2010/0220927) as applied to claims 1-4, 11-15 and 20 above, and further in view of Matsunawa et al. (US 4,741,046).

Regarding claim 5 (and similarly claim 16), the combined invention of Osindero, Parameswaran, Prasad and Kim discloses all limitations of its parent claim 1 (respectively, claim 12) but not expressly the following, which are taught by Matsunawa:
wherein the image criteria include criteria that are met in accordance with a determination that a portion of the content of the respective image window occupied by non-background content exceeds a predetermined threshold amount of non-background content
[Col. 8, lines 4-7 (“…determining from a ratio of background pixels to foreground pixels whether or not said discrimination unit area is of a line picture”)]

	Prior to the effective filing date of the claimed invention it would have been obvious to modify the combined invention with the teaching of Matsunawa by using the ratio as set forth above.  The reasons for doing so at least would have been for its ability to identify a specific type of image that is line picture, as Matsunawa indicated in column 8, lines 4-7.

>>><<<
Claims 6 and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Osindero (US 2017/0004374), Parameswaran et al. (US 2013/0129222), Prasad et al. (US 2010/0246961) and Kim et al. (US 2010/0220927) as applied to claims 1-4, 11-15 above, and further in view of Kiapour et al. (US 2016/0210533).

Regarding claim 6 (and similarly claim 17), the combined invention of Osindero, Parameswaran, Prasad and Kim discloses all limitations of its parent claim 1 (respectively, claim 12) but not expressly the following, which are taught by Kiapour:
wherein a first image window of the plurality of image windows has a first size that is different from a second size of a second image window of the plurality of image windows
[Paragraph 48 (“…A training set for the fine-grained identification module can be generated from an available training set by…using…windows of various sizes…and tested to see if the windows contain the identified foreground object (e.g., using a CNN). The smallest window having the strongest positive recognition for the object is determined to identify the location of the foreground object”)]

	Prior to the effective filing date of the claimed invention it would have been obvious to one of ordinary skill in the art to modify the combined invention with the teachings of Kiapour as set forth above.  The reasons for doing so at least would have been to facilitate training of fine-grained identification module, as Kiapour indicated in paragraph 48.

>>><<<
Claims 7-10, 18 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Osindero (US 2017/0004374), Parameswaran et al. (US 2013/0129222), Prasad et al. (US 2010/0246961) and Kim et al. (US 2010/0220927) as applied to claims 1-4, 11-15 above, and further in view of Williams, Jr. et al. (US 2015/0254555).

Regarding claim 7 (and similarly claim 18) and 9, the combined invention of Osindero, Parameswaran, Prasad and Kim discloses all limitations of its parent claim 1 (respectively, claims 12 and 7) but not expressly the following, which is taught by Williams:
(Claim 7) normalizing the size of a plurality of text snippets that include the respective text snippet
(Claim 9) normalizing the size of a plurality of image snippets that include the respective image snippet
[Fig. 5 and paragraphs 84 (“…Data may be…hardcopy documents that have been scanned, photographs”), 242 (“Training Corpus 508…populated with image data extracted from exemplar documents and image files…Images and diagrams may also be normalized for size”)]

	Prior to the effective filing date of the claimed invention it would have been obvious to one of ordinary skill in the art to modify the combined invention with the teachings of Williams as set forth above.  The reasons for doing so at least would have 

Regarding claim 8 (and similarly claim 19), the combined invention of Osindero, Parameswaran, Prasad, Kim and Williams discloses substantially the claimed invention as set forth in the discussion above for claim 7 (and similarly claims 18), including normalizing the size of the text snippets.

The combined invention does not expressly disclose the recited limitation of normalizing the size to 32 x 32.

However, prior to the effective date of the claimed invention, it would have been obvious to a person of ordinary skill in the art to use the recited limitation regarding size of 32 x 32.  Applicant has not disclosed that the recited limitation provides an advantage, is used for a particular purpose, or solves a stated problem.  One of ordinary skill in the art, furthermore, would have expected Applicant’s invention to perform equally well with the unspecified size taught by the combined invention (specifically, by Williams) because it also normalizes the size of the training data for the training purpose.

Therefore, it would have been obvious to one of ordinary skill in this art to modify the combined invention by using the recited size of 32 x32 to obtain the invention as specified in claims 8 and 19.

Regarding claim 10, the combined invention further discloses:
the plurality of text snippets and the plurality of image snippets are added to a collection of training material; and
the classifier is trained based on the collection of training material
[Per the analyses of claims 1 and 9 above.  Especially Osindero: paragraphs 5 (“…providing, by the processor to a neural network, training data and training, by the processor, the neural network on the training data”) and  41 (“…A large number of images can be collected and filtered for those containing text-bearing regions for generating the training data 130”)]

Conclusion and Contact Information

Any inquiry concerning this communication or earlier communications from the examiner should be directed to YUBIN HUNG whose telephone number is (571)272-7451.  The examiner can normally be reached on M-F 7:30-16:00.

Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.

If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Sumati Lefkowitz can be reached on 571-272-3638.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.

Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/YUBIN HUNG/Primary Examiner, Art Unit 2666                                                                                                                                                                                                        May 11, 2021