DETAILED ACTION

Notice of Pre-AIA  or AIA  Status

The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 112

The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.

Claim 10 is rejected under 35 U.S.C. 112(a) as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor at the time the application was filed, had possession of the claimed invention.

Claim 10 recites “wherein partitioning the second set of cells to form the partitioned set of cells includes partitioning respective cells of the second set of cells into six cells.”  The instant specification merely discloses dividing the image into six cells and is silent on the number of cells to further divide each cell from the second set into.

Claim Rejections - 35 USC § 103

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 2, 8, 13 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Kim et al. (US 2010/0220927) and Kansal et al. (US 2008/0273796).

Regarding claim 2 (and similarly claims 13 and 19) Kim discloses:
partitioning the document into a plurality of cells;
scaling each of the cells to a standardized number of pixels to provide a corresponding snippet for each of the cells;
[Fig. 17 (ref. 1702) and paragraphs 87 (“…divides a region of the gray image into a predetermined number of blocks…having a regular size”), 103 (“…a region is divided into a predetermined number of blocks, that is, n blocks”).  Note that the scaling factor is 1 and all blocks have the same size.  Note further that processing document image was well known in the at prior to the effective filing date of the claimed invention.  See for example, the Wong reference and the Kimura reference applied in the rejections of claims 9 and 10, respectively, below]
classifying the snippets, (using a neural network), to determine (i) a first set of cells classified as text and (ii) a second set of cells classified as non-text;
[Figs. 17 and paragraph 104 (“…when the block is determined to be a text block in operation 1712…”).  Note that block 1712 determines whether a block is text or non-text.  Note further that the use of neural network is taught by Kansal, see below]
determining a volume of text for the document based on a total amount of text in the document corresponding to a sum of an amount of text in each cell of the first set of cells;
[Fig. 17 (ref. 1714) and paragraph 104 (“…when the block is determined to be a text block in operation 1712, the first block is counted as a text block in operation 1714”).  Note that the number of text blocks is counted at 1714.  This number is used to calculate a measure of text volume in the image that is the ratio of text blocks to all blocks (see paragraph 105), with 1/n, n being the number of all blocks, considered as the text amount of a text block]
in response to a determination that (i) the total amount of text in the document exceeds a predetermined threshold (and (ii) the first set of cells has a satisfactory geometry), determining that the document is a text page
[Fig. 17 (ref. 1720) and paragraph 105 (“…determine whether the input image is a text image based on…a ratio of the text blocks to all blocks, and dispersion of the text blocks”).  Note that the ratio of text blocks in an input image is a measure of the total amount of text in the image.   Note further that the text page determination being also in response to (ii) the first set of cells has a satisfactory geometry is taught by Kansal, see the analysis below]

	Kim does not expressly disclose the following, which are taught by Kansal:
(that the classification is by) using a neural network;
(that the text page determination is also in response to) (ii) the first set of cells has a satisfactory geometry
[Fig. 1 (ref. 108) and paragraph 22 (“…a neural network trained classifier may recognize an image text region based on the included text shapes”)]

Prior to the effective filing date of the claimed invention it would have been obvious to one of ordinary skill in the art to modify Kim with the teaching of Kansal as set forth above. The reasons for doing so at least would have been that it is among a set of suitable methodologies for detecting a text region, as Kansal indicates in paragraph 22, and one of ordinary skill would have been motivated to try to obtain the predictable result of text region detection. 

Regarding claim 8, the combined invention further discloses:
wherein one or more cells of the first set of cells are aligned to form at least one text line and wherein the at least one text line is one of: horizontal or vertical
[Kim: Fig. 10B, 10C]

>>><<<
Claims 3, 6, 7, 14, 17, 18 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Kim et al. (US 2010/0220927) and Kansal et al. (US 2008/0273796) as applied to claims 2, 8, 13 and 19 above, and further in view of Moriya et al. (US 5,754,709).

Regarding claim 3 (and similarly claims 14 and 20), the combined invention of Kim and Kansal discloses all limitation of its parent claim 2 and additionally the following:
scaling each of the partitioned cells of the partitioned set of cells to a standardized number of pixels to provide a respective snippet for each of the partitioned cells of the partitioned set of cells;
classifying the respective snippets, using a neural network, to determine (i) a first set of partitioned cells classified as text and (ii) a second set of partitioned cells classified as non-text;
[Per current claim 2]
determining an updated volume of text for the document based on an updated total amount of text in the document corresponding to a sum of an amount of text in each cell of the first set of cells and each cell of the first set of partitioned cells;
[Per current claim 2, but updating the volume]
in response to a determination that the updated total amount of text in the document exceeds (i) the predetermined threshold, determining that the document is a text page.
[Per condition (i) of current claim 2]

	The combined invention does not expressly disclose the following, which is taught by Moriya:
in response to a determination that (i) the total amount of text in the document does not exceed the predetermined threshold and (ii) that partitioning criteria are met for the second set of cells, partitioning the second set of cells to form a partitioned set of cells;
[Figs. 12(a)-(d), 18, 20(a); Col. 13, line 66-Col. 14, line 2 (“…The image dividing means 2 is formed by an image division information extracting part 21 for extracting one type or a plurality types of division information”), Col. 14, line 57-Col. 15, line 3 (“…(a) a monotonous image, (b) an image which is generally divided into two luminance groups of the same luminance or (c) a complex image…if the target image is in the condition (c), it is necessary to repeat image division until the condition (a) or (b) is realized”), Col. 15, lines 25-28 (“…extraction, judgement and division of the division information are repeated as shown in FIG. 12(d) to divide the image smaller only in a region where such is necessary”).  Note that the applied teaching is to recursively divide and examine blocks until the block type is of the specified ones (e.g., of the types of Figs. 12(a) and (b)).  Note further that Kim discloses one of the conditions for a block to be determined as a non-text block when the text amount does not exceed a predetermined threshold, per the analysis of claim 2 above ]

	Prior to the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art to modify Kim by examining cells the type of which is not the specified ones, as taught by Moriya.  The reasons for doing so at least would have been to be able to segment out image regions of a desired type at a finer level, as Moriya indicates in Col. 14, line 57-Col. 15, line 3.

Regarding claims 6 and 7 (and similarly claims 17 and 18), the combined invention of Kim, Kansal and Moriya further discloses:
(Claims 6 and 17) wherein the respective snippets are examined in a random order.
(claims 7 and 18) wherein the respective snippets are examined in an order that prioritizes subdivisions adjacent to snippets previously classified as text.
[Per the analysis of claims 2-3 above.  Note that for a block being subdivided into k subblocks, there are a finite, namely k!, number of orders the subdivisions can be examined and one of ordinary skill in the art may be motivated to try each of them and choose one particular order, since any of the examination order will achieve the expected result of having all subdivisions examined]  

>>><<<
Claims 4, 5, 15, 16 and 21 are rejected under 35 U.S.C. 103 as being unpatentable over Kim et al. (US 2010/0220927), Kansal et al. (US 2008/0273796) and Moriya et al. (US 5,754,709) and as applied to claims 3, 6, 7, 14, 17, 18 and 20 above, and further in view of Chakraborty et al. (US 5,995,659).

Regarding claim 4 (and similarly claims 15 and 21), the combined invention of Kim, Kansal and Moriya discloses all limitations of its parent claim 3.  In addition, the combined invention also suggests the following:
in response to a determination that the updated total amount of text in the document does not exceed the predetermined threshold and (ii) that partitioning criteria are not met for the second set of partitioned cells, determining whether the first set of cells and the first set of partitioned cells (have a satisfactory geometry);
[Moriya: Figs. 12(a)-(d), 18, 20(a); Col. 13, line 66-Col. 14, line 2 (“…The image dividing means 2 is…for extracting one type or a plurality types of division information”), Col. 14, line 57-Col. 15, line 3 (“…(a) a monotonous image, (b) an image which is generally divided into two luminance groups of the same luminance or (c) a complex image…if the target image is in the condition (c), it is necessary to repeat image division until the condition (a) or (b) is realized”).  See also the analysis of claim 3 above which recites conditions for further partitioning.  Since Kim discloses making a text/non-text determination in Fig. 17, when the partitioning conditions are not met (as is the case of claim 4) for a block, one of ordinary skill in the art would be motivated to determine whether the block is a text or not so that a text/non-text determination can be made for the input image; see, again, Fig. 17, especially ref. 1720.  Determining whether a satisfactory geometry exists ]
in response to a determination that the first set of cells and the first set of partitioned cells have a satisfactory geometry, determining that the document is a text page
[Kansal: Fig. 1 (ref. 108) and paragraph 22 (“…a neural network trained classifier may recognize an image text region based on the included text shapes”)]

The combined invention also d but not expressly the following, which are taught by Chakraborty:
(that the determination is whether the first set of cells and the first set of partitioned cells) have a satisfactory geometry
[Figs. 1, 2 and col. 3, lines 44-48 (“…Based on the appearance and underlying geometry and structure, first step 12 identifies these text areas”)]

	Prior to the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art to modify the combined invention with the teaching of Chakraborty as set forth above.  The reasons for doing so at least would have been that underlining geometry of text regions can be used to identify and locate them, as Chakraborty indicates in col. 2, lines 8-13

Regarding claim 5 (and similarly claim 16) the combined invention further discloses:
in response to a determination that the first set of cells and the first set of partitioned cells do not have a satisfactory geometry, determining that the document is not a text page
[Per the analysis of claim 4 above.  Note that Kim discloses determining whether a block is text or non-text and Chakraborty discloses using geometry to define a determination criteria]


>>><<<
Claim 9 is rejected under 35 U.S.C. 103 as being unpatentable over Kim et al. (US 2010/0220927) and Kansal et al. (US 2008/0273796) as applied to claims 2, 8, 13 and 19 above, and further in view of Wong et al. (“Document Analysis System,” IBM Journal of Research and Development, Vol. 26, No. 6, November 1982, pp. 647-656) and Nishida (US 2007/0165950).

Regarding claim 9, the combined invention of Kim and Kansal discloses all limitation of its parent claim 2 but not expressly the following, which are taught by Wong and Nishida:
wherein the one or more cells of the second set of cells are classified as an image or unknown
[Wong: Abstract, lines 3-4 (“…The first is the segmentation and classification of digitized documents into regions of text and images…”).
Nishida: Figs. 2-5 and paragraphs 56-60, 67 (‘…the block is classified into any one of "picture", "text", and "other"’).  Note that “picture” and “other” are considered as “image” and “unknown,” respectively]

	Prior to the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art to modify the combined invention by classifying the snippets in the manner taught by Wong and Nishida as set forth above.  The reasons for doing so at least would have been to further process the regions according to their classification, as Wong indicates in the third paragraph of the Introduction section; as well as to apply appropriate algorithms to the blocks according to their types, as Nishida indicates in lines 3-5 of paragraph 26.

>>><<<
Claim 10 is rejected under 35 U.S.C. 103 as being unpatentable over Kim et al. (US 2010/0220927), Kansal et al. (US 2008/0273796) and Moriya et al. (US 5,754,709) as applied to claims 3, 6, 7, 14, 17, 18 and 20 above, and further in view of Kimura (US 2015/0172508).

Regarding claim 10, the combined invention of Kim, Kansal and Moriya discloses all limitation of its parent claim 3 but not expressly the following, which is taught by Kimura:
wherein partitioning the second set of cells to form the partitioned set of cells includes partitioning respective cells of the second set of cells into six cells
[Fig. 2 and paragraph 17 (“…FIG. 2 illustrates the case where the document image is partitioned into six rectangular regions”)]

	Prior to the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art to modify the combined invention by partitioning a document into six cells as taught by Kimura.  The choice of the number is clear a design choice, as none of the references of the combined invention, the Kimura reference and the instant specification disclose the advantage of any specific number of partitioned cells, and any number of partitioned cells would have the expected result of allows desired processing (such as classification) to be applied to each cell.

>>><<<
Claim 11 rejected under 35 U.S.C. 103 as being unpatentable over Kim et al. (US 2010/0220927) and Kansal et al. (US 2008/0273796) as applied to claims 2, 8, 13 and 19 above, and further in view of Stokes (US 2013/0177246).

Regarding claim 11, the combined invention of Kim and Kansal discloses all limitation of its parent claim 2 but not expressly the following, which is taught by Stokes:
wherein the document is captured using a smartphone
[Fig. 1 (ref. 150) and paragraph 17 (“…the image acquisition device 150 may be…a smartphone”)]

	Prior to the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art to modify the combined invention by capturing images using a smartphone as taught by Stokes.  The reasons for doing so at least would have been the ubiquity of smartphones prior to the effective filing date and one of ordinary skill in the art would have been motivated to try it to obtain the expected result of capturing images.



>>><<<
Claim 12 is rejected under 35 U.S.C. 103 as being unpatentable over Kim et al. (US 2010/0220927) and Kansal et al. (US 2008/0273796) as applied to claims 2, 8, 13 and 19 above, and further in view of  Williams, Jr. et al. (US 2015/0254555).

Regarding claim 12, the combined invention of Kim and Kansal does not expressly disclose the following, which is taught by Williams:
wherein the neural network is trained using a plurality of image documents and a plurality of text pages having various formats, layouts, text sizes, ranges of word, line and paragraph spacing
[Fig. 5 (ref. 508), paragraphs 104 (“…the classifier model may be generated based on a deep learning neural network ”), 242 (“Training Corpus 508…populated with image data extracted from exemplar documents and image files”), 243 (“…Model(s) 518 are configured using convolutional network layers to extract visual features from the Training Corpus 508 image files”).  Note that while not expressly disclosed, Official notice is taken that training data are typically selected to be reflect the distribution of the target population so as to more effectively train the classifier, and that documents and text pages so selected will have different formats, layouts, font, spacing, vocabulary and other characteristics related to document.  For example, see Figs. 3, 4 and paragraph 3 of Isaev et al. (US 2010/0215272), cited merely to show the general knowledge prior to the effective filing date of the claimed invention]

	Prior to the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art to modify the combined invention by using the training data as taught by Williams.  The reasons for doing so at least would have been to train a deep learning network that can achieve higher classification accuracy, as Williams indicates in paragraphs 22 and 25.

Conclusion and Contact Information

The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
Kim et al. (US 2016/0104052)—[Figs. 2, 4, 9 and paragraphs 46 (“… the text categories may also be determined based on a shape, a layout, an arrangement, a pattern, a size, a width, a height, an aspect ratio, a color, an object, a context or the like of the text regions”), 57 and 80]
Shiiyama (US 2006/0120627)—[Paragraph 99 (“The partial regions of black pixels obtained in this way are categorized into regions having different attributes based on their sizes and shapes. For example…neighboring characters regularly line up and can be grouped is determined as a text region…a partial region with an arbitrary shape other than those described above is determined as a picture region”)]

Any inquiry concerning this communication or earlier communications from the examiner should be directed to YUBIN HUNG whose telephone number is (571)272-7451.  The examiner can normally be reached on M-F 7:30-16:00.

Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.

If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Sumati Lefkowitz can be reached on 571-272-3638.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.

Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/YUBIN HUNG/Primary Examiner, Art Unit 2666                                                                                                                                                                                                        September 4, 2022