Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
The information disclosure statement (IDS) submitted on January 29, 2019 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.
Claim Objections
Claim 13 is objected to over the following minor informalities:  “a system for clustering document objects based on information content, the method comprising” should read “a system for clustering document objects based on information content, the system comprising.”  Appropriate correction is required.
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claims 1, 2, 13, and 20 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Levy (PG Pub. No. 2015/0324338 A1).
Regarding Claim 1, Levy
identifying, by a document clustering device, a plurality of object chunks from at least one document based on semantic context of each of the plurality of object chunks, wherein each of the plurality of object chunks comprise at least one object selected from the at least one document (see Levy, paragraph [0129], where some embodiments also identify when two or more primitive graphic elements or graphic objects (e.g., shapes, images, photographs, bitmaps, etc.) in the document should be grouped as one structural graphic element; for instance, two objects that mostly overlap may be one element that is defined as two shapes or images in the unstructured document);
determining, by the document clustering device, at least one document portion from the at least one document as a base document, based on a plurality of parameters applied to the plurality of object chunks (see Levy, paragraph [0396], where the process receives (at 5705) layout information for a primary zone);
determining, by the document clustering device, a plurality of hierarchies within the base document (see Levy, paragraph [0154], where a DOM, in some embodiments, is a hierarchical representation of a document that includes all the structural elements of the documents); and
categorizing, by the document clustering device, the plurality of object chunks based on the plurality of hierarchies and information in each of the plurality of object chunks (see Levy, paragraph [0163], where star object 1025, on the other hand is not an upright rectilinear shape and as such its edges would not qualify as a zone border; as such, these objects would simply be classified as content (specifically, graphic objects) that are within one zone or another).
Regarding Claim 2, Levy discloses the method of Claim 1, wherein each of the at least one object comprises at least one of text, an image, a figure, a table, or a graph (see Levy, paragraph [0012], where some embodiments provide a method for analyzing an unstructured document that includes numerous primitive graphic elements, each of which is defined as a single object).
Regarding Claim 13, Levy discloses a system for clustering document objects based on information content, the method comprising:
a document clustering device comprising at least one processor and a memory (see Levy, paragraph [0659], where instructions are executed by one or more computational elements (such as processors)) storing instructions that, when executed by the at least one processor, cause the at least one processor to perform operations comprising:
identifying, by a document clustering device, a plurality of object chunks from at least one document based on semantic context of each of the plurality of object chunks, wherein each of the plurality of object chunks comprise at least one object selected from the at least one document (see Levy, paragraph [0129], where some embodiments also identify when two or more primitive graphic elements or graphic objects (e.g., shapes, images, photographs, bitmaps, etc.) in the document should be grouped as one structural graphic element; for instance, two objects that mostly overlap may be one element that is defined as two shapes or images in the unstructured document);
determining, by the document clustering device, at least one document portion from the at least one document as a base document, based on a plurality of parameters applied to the plurality of object chunks (see Levy, paragraph [0396], where the process receives (at 5705) layout information for a primary zone);
determining, by the document clustering device, a plurality of hierarchies within the base document (see Levy, paragraph [0154], where a DOM, in some embodiments, is a hierarchical representation of a document that includes all the structural elements of the documents); and
categorizing, by the document clustering device, the plurality of object chunks based on the plurality of hierarchies and information in each of the plurality of object chunks (see Levy, paragraph [0163], where star object 1025, on the other hand is not an upright rectilinear shape and as such its edges would not qualify as a zone border; as such, these objects would simply be classified as content (specifically, graphic objects) that are within one zone or another).
Regarding Claim 20, Levy discloses a non-transitory computer-readable medium storing computer-executable instructions for:
identifying, by a document clustering device, a plurality of object chunks from at least one document based on semantic context of each of the plurality of object chunks, wherein each of the plurality of object chunks comprise at least one object selected from the at least one document (see Levy, paragraph [0129], where some embodiments also identify when two or more primitive graphic elements or graphic objects (e.g., shapes, images, photographs, bitmaps, etc.) in the document should be grouped as one structural graphic element; for instance, two objects that mostly overlap may be one element that is defined as two shapes or images in the unstructured document);
determining, by the document clustering device, at least one document portion from the at least one document as a base document, based on a plurality of parameters applied to the plurality of object chunks (see Levy, paragraph [0396], where the process receives (at 5705) layout information for a primary zone);
determining, by the document clustering device, a plurality of hierarchies within the base document (see Levy, paragraph [0154], where a DOM, in some embodiments, is a hierarchical representation of a document that includes all the structural elements of the documents); and
categorizing, by the document clustering device, the plurality of object chunks based on the plurality of hierarchies and information in each of the plurality of object chunks (see Levy, paragraph [0163], where star object 1025, on the other hand is not an upright rectilinear shape and as such its edges would not qualify as a zone border; as such, these objects would simply be classified as content (specifically, graphic objects) that are within one zone or another).
Claim Rejections - 35 USC § 103

A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 7-9, 17, and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Levy as applied to Claims 1, 2, 13, and 20 above, and further in view of Kojima (US Patent No. 5,604,910 A) and further in view of Liu (PG Pub. No. 2014/0149401 A1).
Regarding Claim 7, Levy
Levy does not disclose creating an index for the object chunk based on iterative summarization of the object chunk and extracting information context from the object chunk based on frequency of occurrence of each term in the object chunk and total number of terms in the object chunk.  Kojima discloses creating an index for the object chunk based on iterative summarization of the object chunk (see Kojima, column 31, lines 16-19, where a text is reduced into keywords and character strings analogous thereto, followed by a second state in which each character string thus reduced is checked for any keyword).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to combine Levy with Kohima for the benefit of searching a text for designated keywords (see Kojima, Abstract).
Levy in view of Kojima does not disclose extracting information context from the object chunk based on frequency of occurrence of each term in the object chunk and total number of terms in the object chunk.  Liu discloses extracting information context from the object chunk based on frequency of occurrence of each term in the object chunk and total number of terms in the object chunk (see Liu, paragraph [0015], where a search query is received which comprises keyword terms and surrounding non-keyword terms having a contextual meaning).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to combine Levy and Kohima with Liu for the benefit of performing a semantic search using per-document indexes (see Liu, Abstract).
Regarding Claim 8, Levy in view of Kojima and Liu discloses the method of Claim 7, wherein:
Levy does not disclose iterative summarization is performed to reduce a summary of the object chunk to a predefined number of words.  Kojima discloses iterative summarization is performed to reduce a summary of the object chunk to a predefined number of words (see Kojima, column 31, lines 16-19, where a text is reduced into keywords and character strings analogous thereto, followed by a second state in which each character string thus reduced is checked for any keyword).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to combine Levy with Kohima for the benefit of searching a text for designated keywords (see Kojima, Abstract).
Regarding Claim 9, Levy in view of Kojima and Liu discloses the method of Claim 7, wherein:
Levy does not disclose the object chunk is categorized in a hierarchy from the plurality of hierarchies based on similarity of the index and the information context with the hierarchy.  Liu discloses the object chunk is categorized in a hierarchy from the plurality of hierarchies based on similarity of the index and the information context with the hierarchy (see Liu, paragraph [0029], where the data store 212 is configured to store information used by the per-document index service 210; for instance the data store 212 may store section-specific … dictionaries for use in translating terms in a document section to term identifiers; the data store 212 may store attributes and/or known contexts that are identified for each document, such as for example, category classifications).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to combine Levy with Liu for the benefit of performing a semantic search using per-document indexes (see Liu, Abstract).
Regarding Claim 17, Levy discloses the system of Claim 13, wherein categorizing an object chunk from the plurality of object chunks comprises:
Levy does not disclose creating an index for the object chunk based on iterative summarization of the object chunk and extracting information context from the object chunk based on frequency of occurrence of each term in the object chunk and total number of terms in the object chunk.  Kojima discloses creating an index for the object chunk based on iterative summarization of the object chunk (see Kojima, column 31, lines 16-19, where a text is reduced into keywords and character strings analogous thereto, followed by a second state in which each character string thus reduced is checked for any keyword).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to combine Levy with Kohima for the benefit of searching a text for designated keywords (see Kojima, Abstract).
Levy in view of Kojima does not disclose extracting information context from the object chunk based on frequency of occurrence of each term in the object chunk and total number of terms in the object chunk.  Liu discloses extracting information context from the object chunk based on frequency of occurrence of each term in the object chunk and total number of terms in the object chunk (see Liu, paragraph [0015], where a search query is received which comprises keyword terms and surrounding non-keyword terms having a contextual meaning).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to combine Levy and Kohima with Liu for the benefit of performing a semantic search using per-document indexes (see Liu, Abstract).
Regarding Claim 18, Levy in view of Kojima and Liu discloses the system of Claim 17, wherein:
Levy does not disclose iterative summarization is performed to reduce a summary of the object chunk to a predefined number of words.  Kojima discloses iterative summarization is performed to reduce a summary of the object chunk to a predefined number of words (see Kojima, column 31, lines 16-19, where a text is reduced into keywords and character strings analogous thereto, followed by a second state in which each character string thus reduced is checked for any keyword).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to combine Levy with Kohima for the benefit of searching a text for designated keywords (see Kojima, Abstract).
Levy in view of Kojima does not disclose the object chunk is categorized in a hierarchy from the plurality of hierarchies based on similarity of the index and the information context with the hierarchy.  Liu discloses the object chunk is categorized in a hierarchy from the plurality of hierarchies based on similarity of the index and the information context with the hierarchy (see Liu, paragraph [0029], where the data store 212 is configured to store information used by the per-document index service 210; for instance the data store 212 may store section-specific … dictionaries for use in translating terms in a document section to term identifiers; the data store 212 may store attributes and/or known contexts that are identified for each document, such as for example, category classifications).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to combine Levy and Kojima with Liu for the benefit of performing a semantic search using per-document indexes (see Liu, Abstract).
Claims 10, 11, and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Levy as applied to Claims 1, 2, 13, and 20 above, and further in view of Liu.
Regarding Claim 10, Levy discloses the method of Claim 1, further comprising:
Levy does not disclose receiving a user query, wherein the user query comprises at least one of textual query and a vocal query.  Liu discloses receiving a user query, wherein the user query comprises at least one of textual query and a vocal query (see Liu, paragraph [0015], where a search query is received which comprises keyword terms and surrounding non-keyword terms having a contextual meaning).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to combine Levy with Liu for the benefit of performing a semantic search using per-document indexes (see Liu, Abstract).
Regarding Claim 11, Levy in view of Liu discloses the method of Claim 10, further comprising:
Levy does not disclose:
extracting keywords from the user query to determine a context of the user query;
comparing the extracted keywords with each hierarchy in the plurality of hierarchies to identify a hierarchy matching the extracted keywords
retrieving at least one object chunk from a set of chunks categorized within the matching hierarchy; and
presenting the at least one object chunk to a user generating the user query.
Liu discloses:
extracting keywords from the user query to determine a context of the user query (see Liu, paragraph [0015], where a search query is received which comprises keyword terms and surrounding non-keyword terms having a contextual meaning);
comparing the extracted keywords with each hierarchy in the plurality of hierarchies to identify a hierarchy matching the extracted keywords (see Liu, paragraph [0015], where a search query is received which comprises keyword terms and surrounding non-keyword terms having a contextual meaning);
retrieving at least one object chunk from a set of chunks categorized within the matching hierarchy (see Liu, paragraph [0015], where a search query is received which comprises keyword terms and surrounding non-keyword terms having a contextual meaning); and
presenting the at least one object chunk to a user generating the user query (see Liu, paragraph [0015], where a search query is received which comprises keyword terms and surrounding non-keyword terms having a contextual meaning).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to combine Levy with Liu for the benefit of performing a semantic search using per-document indexes (see Liu, Abstract).
Regarding Claim 19, Levy in view of Liu discloses the system of Claim 13, wherein the operations further comprise:
Levy does not disclose:
extracting keywords from the user query to determine a context of the user query;
comparing the extracted keywords with each hierarchy in the plurality of hierarchies to identify a hierarchy matching the extracted keywords
retrieving at least one object chunk from a set of chunks categorized within the matching hierarchy; and
presenting the at least one object chunk to a user generating the user query.
Liu discloses:
extracting keywords from the user query to determine a context of the user query (see Liu, paragraph [0015], where a search query is received which comprises keyword terms and surrounding non-keyword terms having a contextual meaning);
comparing the extracted keywords with each hierarchy in the plurality of hierarchies to identify a hierarchy matching the extracted keywords (see Liu, paragraph [0015], where a search query is received which comprises keyword terms and surrounding non-keyword terms having a contextual meaning);
retrieving at least one object chunk from a set of chunks categorized within the matching hierarchy (see Liu, paragraph [0015], where a search query is received which comprises keyword terms and surrounding non-keyword terms having a contextual meaning); and
presenting the at least one object chunk to a user generating the user query (see Liu, paragraph [0015], where a search query is received which comprises keyword terms and surrounding non-keyword terms having a contextual meaning).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to combine Levy with Liu for the benefit of performing a semantic search using per-document indexes (see Liu, Abstract).
Claim 12 is rejected under 35 U.S.C. 103 as being unpatentable over Levy and Liu as applied to Claims 11 above, and further in view of Donneau-Golencer (PG Pub. No. 2015/0046435 A1).
Regarding Claim 12, Levy discloses the method of Claim 11, wherein:
Levy does not disclose the at least one object chunk is retrieved based on history associated with the user.  Donneau-Golencer discloses the at least one object chunk is retrieved based on history associated with the user (see Donneau-Golencer, paragraph [0009], where the method may include conducting at least one of … query history analysis of the at least one search term using the user-specific profile).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to combine Levy with Donneau--Golencer for the benefit of utilizing a personalized user model to develop search requests (see Donneau-Golencer, Abstract).
Allowable Subject Matter
Claims 3-6 and 14-16 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
The following is a statement of reasons for the indication of allowable subject matter:
Regarding Claims 3, 4, 14, and 15, the prior art made of record does not teach, disclose, or fairly suggest:
summarizing a paragraph within a document from the at least one document;
iteratively adding at least one sentence to the paragraph;
iteratively computing a summary quotient based on length of sentences within the paragraph and length of the at least one first sentence added in a current iteration;
iteratively comparing the summary quotient with a predefined threshold; and
demarcating the object chunk in a current iteration, when the summary quotient in the current iteration exceeds the predefined threshold, wherein the demarcated object chunk excludes the at least one sentence added in the current iteration.
The prior art regarding determination of block boundaries does not contemplate determining block boundaries for document sections based on iterative summarization.
Regarding Claims 5, 6, and 16, the prior art made of record does not teach, disclose, or fairly suggest:
determining the plurality of parameters for each document portion in a plurality of document portions within the at least one document, wherein the plurality of document portions comprise the at least one document portion;
computing, for each document portion, a weighted sum of the plurality of parameters in response to determining the plurality of parameters for each document portion; and
selecting the at least one document portion as the base document in response to computing the weighted sum for each document portion, wherein the at least one document portion comprises the highest weighted sum;
wherein the plurality of parameters comprises at least one of: number of object chunks in each document portion, number of object chunks in each document portion that are common with remaining document portions in the plurality of document portions, number of object chunks in each document portion that overlap with one or more of the remaining document portions, or number of documents from the at least one document that each document portion overlaps.
Boguraev (PG Pub. No. 2009/0276378 A1) is directed toward identifying document structure and associated meta-information.   However, Boguraev does not disclose designating a base document in response to computing the weighted sum for each document portion, wherein the at least one document portion comprises the highest weighted sum, wherein the weighted sum is a plurality of parameters comprising at least one of: number of object chunks in each document portion, number of object chunks in each document portion that are common with remaining document portions in the plurality of document portions, number of object chunks in each document portion that overlap with one or more of the remaining document portions, or number of documents from the at least one document that each document portion overlaps.
Koutrika (PG Pub. No. 2016/0299891 A1) is directed to, inter alia, determining a set of base document segments.  However, Koutrika does not disclose disclose designating a base document in response to computing the weighted sum for each document portion, wherein the at least one document portion comprises the highest weighted sum, wherein the weighted sum is a .
Conclusion
The prior art made of record but not relied upon is considered pertinent to the Applicant’s disclosure:
Boguraev (PG Pub. No. 2009/0276378 A1), which concerns identifying a document structure and associated meta-information.
Koutrika (PG Pub. No. 2016/0299891 A1), which concerns matching of an input document to documents in a document collection.
Boone (PG Pub. No. 2013/0191737 A1), which concerns creating a new electronic document from existing electronic documents.
 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to FARHAD AGHARAHIMI whose telephone number is (571)272-9864.  The examiner can normally be reached on M-F 9am - 5pm ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Apu Mofiz can be reached on 571-272-4080.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from 






/FARHAD AGHARAHIMI/Examiner, Art Unit 2161         





























/APU M MOFIZ/Supervisory Patent Examiner, Art Unit 2161