DETAILED ACTION
This communication is in response to the amendment filed on 6/11/21 in which claims 1, 3, 7, 11, 19, and 20 were amended. Claims 1-20 are pending.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
With respect to claim 1, applicant argues:
First, Leal does not "automatically identify chunks and their semantic roles." Semantic roles are also recited in dependent claim 11, which was also rejected under §102 as anticipated by Leal. In the rejections of claims 1 and 11, the Office Action points to Leal paragraphs 72-84 and 121. However, in these paragraphs, Leal's "chunks" are small pieces of text: individual words or phrases. Leal ranks these words based on the statistical frequency of the occurrence of the word within a document (e.g., co-occurrence). This is just a counting exercise. It is statistical in nature without any notion of identifying the semantic role of the word, as recited in claim 1. In fact, the same word or phrase may play different semantic roles in different documents or even within the same document, but that would not affect Leal's statistical frequency of the occurrence of the word. 
In addition, although Leal may use "semantic" in the phrases "semantic word models", "semantic topic models" and "semantic document models", these are just the statistical models described above, at the word level (occurrence of words), at the topic level (occurrence of topics) and at the document level (occurrence of sentences). There still is no notion of identifying the "semantic roles" of text. Leal paragraph 125. Thus, Leal does not meet this limitation of claim 1.
Applicant’s arguments have been considered. However, Leal is now remapped in combination with new references to teach the argued features. Accordingly, the arguments are moot in view of the additional references used in the current rejection.

Second, Leal also does not annotate documents with the "locations of identified chunks and the semantic roles of the identified chunks." Leal does his statistical analysis of words in a document in order to come up with a set of "tags" for characterizing the document. He may annotate the document with this set of tags, but he does not annotate the document with the locations of the words he has counted (because he only cares about the count) or their semantic roles. The occurrence of different words in the document are counted and this is used to generate tags, but the locations of the words are not needed or recorded as part of the tag. Leal also does not record the semantic roles of the words, since he does not determine the semantic roles to begin with. 
Third, Leal also does not identify chunks and their semantic roles within individual documents in the document set "(b) based on patterns of content, layout and contexts across the documents in the document set". Leal counts words by analyzing one document at a time, and not based on patterns across multiple documents. 
Applicant’s arguments have been considered. However, Leal is now remapped in combination with new references to teach the argued features. Accordingly, the arguments are moot in view of the additional references used in the current rejection.
Applicant further argues:
Finally, claim 3 also recites semantic roles and that claim was rejected in further view of Musgrove. However, Musgrove also does not identify the semantic roles of chunks in documents. In Musgrove, there are two taxonomies and documents are classified under the two taxonomies. Musgrove tries to link the two taxonomies to each other, for example figuring out whether node X in taxonomy 1 corresponds to node Y in taxonomy 2. This is not related to identifying semantic roles of chunks in a document. Musgrove interlinks the two taxonomies by comparing the names of the nodes and by correlating the text in the documents classified under the nodes. If the comparison reveals high similarity, then the nodes have higher chance of being counterparts. But there is no notion of chunks and their semantic roles, or annotating documents with that information in Musgrove.
During the interview, Applicant agreed to re-review Musgrove for any disclosure of identifying semantic roles, for example as might be used to determine the equivalence of the two taxonomies. Applicant has done so and did not find any such disclosures. The most relevant components in Musgrove appear to be the "semantic resemblance module 40" and the "word usage pattern module 50", as 
The semantic resemblance module 40 analyzes the names of the nodes, and the words of the electronic documents classified under these nodes, to provide information as to the strength, or weakness, of the correlation between the nodes and/or documents so that nodes having strong correlation can be identified and interlinked together. The semantic analysis information as determined by the semantic resemblance module 40 is preferably quantified, for example, as a semantic resemblance score. 
This appears to be the same type of word-counting statistical analysis that Leal performs, followed by correlating the occurrences of words in different documents. Musgrove's "semantic resemblance score" may be similar to Leal's "semantic word model." More importantly, neither of these is a semantic role, as recited in claim 1. 
Musgrove paragraph 36 goes on to describe that his semantic resemblance module 40 and his word usage pattern module 50 may extract certain semantic features. These features are more sophisticated methods for distinguishing different occurrences of the same word, so that a more sophisticated counting may be performed. For example, the base model in Leal and Musgrove may count the number of occurrences of the word "white". Leal may generate tags and Musgrove may correlate different documents based on the number or frequency of occurrence of the word "white". Musgrove may make a more sophisticated analysis to determine that "white" "red" "yellow" are a class of related words (colors) and may count occurrences of that class, or Musgrove may determine that "red, white and blue" and "US flag" are related concepts and count occurrences of that concept, or Musgrove may distinguish "white sand beach" from "Mrs. White" because the latter is a proper name and make separate counts for those two versions of "white", or that "white paint" is an adjective while "egg white" is a noun and makes separate counts for those versions. However, none of these determines that "Mrs. White" plays the semantic role of tenant in a real estate lease (or any other semantic role), or that "egg white" plays the semantic role of ingredient in a recipe. 
Thus, after reviewing Musgrove in more detail, Applicant believes that Musgrove does not "automatically identify chunks and their semantic roles," as recited in amended claim 1. Musgrove also does not remedy any of the other deficiencies in Leal. 
Therefore, Applicant respectfully submits that amended claim 1 is patentable over the cited references.
or other semantic role.” Accordingly, Musgrove discloses identifying counterparty chunks in different documents, wherein the chunks play a same semantic role, as claimed.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.

3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or non-obviousness.
Claims 1, 6, 7, 10-12 and 15-17 are rejected under 35 U.S.C. 103 as being unpatentable over Leal (US 2018/0300315 A1; published Oct. 18, 2018), Kambhatla (US 2008/0319735 A1; published Dec. 25, 2008), and Cardie (US 2013/0024183 A1; published Jan. 24, 2013).

Regarding claim 1, Leal discloses [a] method implemented on a computer system executing instructions for analyzing and annotating documents, the method comprising: 
accessing a document set that contains a plurality of documents; (Leal, paragraph 74)
automatically identifying chunks [and their semantic roles] within individual documents in the document set (a) based on the content, layout and contexts in the individual document; (Leal, paragraphs 72-84, teaches performing tokenization, chunking, and contextual model generation on the documents, analyzes lexical patterns in the documents that are statistically relevant to the document text (lexical pattern comprises a linguistic expression including tokens as well as formatting and morphological variations)) and (b) based on patterns of content, layout and contexts across the documents in the document set; and (Leal, paragraph 72-84, teaches utilizing a latent semantic indexing to identify a statistically relevant pattern; receives a set of natural language texts and generates one or more relationship patterns between word forms within the set of texts) 
annotating documents in the document set based on analysis of the identified chunks from documents within the document set, [wherein the annotations include locations of identified chunks and the semantic roles of the identified chunks] (Leal, paragraph 121, teaches preprocessing a corpus of documents into a structure having various fields representing aspects of the document; the fields include one or more tags associated with the document). 
Although Leal teaches identifying lexical patterns in a document, Leal does not specifically disclose automatically identifying their semantic roles based on patterns of content, layout and contexts, and annotating chunks with their locations and identified semantic roles. However, Kambhatla, paragraphs 55, 56, 62, 68, 70, 76, teaches training a semantic role labeling model using lexical, syntactic, and semantic analysis of words, phrases, sentences, and texts on a corpus of training data, and using the SRL model to classify and label constituents of an input sentence. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Leal to incorporate the teachings of Kambhatla to identify and annotate the semantic roles of words, phrases, or sentences. Doing so would enable understanding the grammatical relations between the arguments of identified natural language predicates and resolve syntactic ambiguity (Kambhatla, paragraph 6).
Although Leal, paragraph 121, teaches tagging the received document, Leal does not specifically disclose annotating chunks with their locations and identified semantic roles.  However, Cardie, paragraph 23, teaches identifying an opinion frame in a document, the opinion frame storing data indicating the position of that particular component of the opinion expression within the document or text being analyzed. It would have been obvious to one of 
Claims 19 and 20 are apparatus and CRM claims corresponding to claim 1 and are similarly rejected.

Regarding claim 6, Leal, in view of Kambhatla and Cardie, discloses the invention of claim 1 as discussed above. Leal further discloses annotating some of the identified chunks with metadata describing the chunk, wherein identifying counterpart chunks in different documents is based on similarity of the metadata (Leal, paragraph 19, teaches comparing tags between documents; Leal, paragraph 22, teaches finding correlated tags; Leal, paragraph 116, teaches utilizing tags to match expressions between documents).

Regarding claim 7, Leal, in view of Kambhatla and Cardie, discloses the invention of claim 1 as discussed above. Leal further discloses wherein identifying chunks based on patterns across the documents in the document set comprises: 
identifying, in an individual document, a chunk that is commonly occurring in the documents of the document set but does not occur in the individual document (Leal, paragraph 77, teaches performing latent semantic indexing to identify statistical relevant lexical patterns across the document set (latent semantic indexing involves analyzing relationships 

Regarding claim 10, Leal, in view of Kambhatla and Cardie, discloses the invention of claim 1 as discussed above. Leal further discloses wherein some of the identified chunks contain content that is descriptive of semantic roles played by other chunks (Leal, paragraph 77, teaches that the identified lexical patterns in the documents comprise linguistic expressions, including combinations of verbs, adjectives, nouns and morphological variations (interpreted as the claimed semantic roles)).

Regarding claim 11, Leal, in view of Kambhatla and Cardie, discloses the invention of claim 1 as discussed above. Although Leal, paragraph 121, teaches tagging the received document, Leal does not specifically disclose wherein the annotations further comprise datatypes of the identified chunks. However, Cardie, paragraph 23, teaches identifying an opinion frame in a document, the opinion frame storing data indicating the position of that particular component of the opinion expression within the document or text being analyzed. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Leal to incorporate the teachings of Cardie to include the semantic role and position information in the tags. Doing so would enable pinpointing the relative position within a document where a particular component of an opinion expression is located (Cardie, paragraph 23). 

Leal, in view of Kambhatla and Cardie, discloses the invention of claim 1 as discussed above. Leal further discloses wherein identifying chunks based on layout comprises: 
grouping line-oriented text into structural chunks, wherein the grouping is based on word shapes, first and last tokens, formatting characteristics, and/or punctuation (Leal, paragraph 77, teaches that the identified lexical patterns are based on identifying linguistic expressions including tokens such as verbs, adjectives, nouns, adverbs, and combinations of these, as well as formatting (bold, caps, etc.) and morphological (verb tenses, plural and singulars, etc.) variations).

Regarding claim 15, Leal, in view of Kambhatla and Cardie, discloses the invention of claim 1 as discussed above. Leal further discloses wherein identifying chunks based on layout comprises: 
identifying structural chunks based on layout of non-text structural features, wherein the non-text structural features comprise at least one of a figure, a table, a sidebar, a footnote, and a page header or footer (Leal, paragraphs 37, 38, teaches removing tables or images appearing within a document).

Regarding claim 16, Leal, in view of Kambhatla and Cardie, discloses the invention of claim 1 as discussed above. Leal further discloses wherein identifying chunks based on content comprises: identifying chunks using Al techniques for topic estimation (Leal, paragraph 17, teaches a topic modeling approach to detect lexico-statistic patterns of abstract topics in the Leal, paragraph 47, teaches further a semantic topic model created using machine learning algorithms).

Regarding claim 17, Leal, in view of Kambhatla and Cardie, discloses the invention of claim 1 as discussed above. Leal further discloses wherein identifying chunks based on content comprises: using few-shot Named Entity recognition techniques to identify chunks within the set of documents (Leal, FIG. 4, paragraphs 81, 87, 90, teaches a dictionary approach to identify new tags for a document).

Claims 2-5 are rejected under 35 U.S.C. 103 as being unpatentable over Leal, in view of Kambhatla and Cardie, as applied to claim 1 above, and further in view of Musgrove (U.S. Pub. No. 2006/0235870) (“Musgrove”).

Regarding claim 2, Leal, in view of Kambhatla and Cardie, discloses the invention of claim 1 as discussed above. Leal, paragraph 74, teaches receiving a set of documents. Yet, Leal does not disclose assembling the document set by clustering documents into the document set based on similarity of content and/or layout. However, Musgrove, paragraph 39, teaches a taxonomy interlinking system that includes a clustering module used to group, i.e., classify, a plurality of documents into clusters based on how they relate to one another, for example, using semantic resemblance. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Leal to incorporate the teachings of Musgrove to cluster the received set of documents. Doing so would enable the Musgrove, paragraph 39).

Regarding claim 3, Leal, in view of Kambhatla and Cardie, discloses the invention of claim 1 as discussed above. Leal, paragraphs 72-84, teaches performing tokenization, chunking, and contextual model generation on the documents. Yet, Leal does not disclose wherein automatically identifying chunks within individual documents in the document set is further (c) based on identifying counterpart chunks in different documents in the document set, wherein counterpart chunks play a same semantic role in different documents. However, Musgrove, paragraph 36, teaches a taxonomy interlinking system that allows the node names and the texts of the electronic documents to be analyzed based on how the words are used in context by extracting and comparing a vector of semantic features, including relations of nouns to verbs as variously an actor, object, instrument, or other semantic role, and differentiate a particular word’s pattern of occurrences in the electronic documents. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Leal to incorporate the teachings of Musgrove to analyze the semantic role of the lexical strings in the received set of documents. Doing so would enable analyzing the documents based on the context, rather than merely analyzing the text based on definitions of the words (Musgrove, paragraph 36).

Regarding claim 4, Leal, in view of Kambhatla, Cardie, and Musgrove, discloses the invention of claim 3 as discussed above. Leal, paragraphs 72-84, teaches performing Leal does not particularly disclose wherein identifying counterpart chunks in different documents comprises: identifying content that is different in different documents but occurs within substantially similar contexts within the different documents. However, Musgrove, paragraph 36, teaches a taxonomy interlinking system that allows the node names and the texts of the electronic documents to be analyzed based on how the words are used in context by extracting and comparing a vector of semantic features, including relations of nouns to verbs as variously an actor, object, instrument, or other semantic role, and differentiate a particular word’s pattern of occurrences in the electronic documents. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Leal to incorporate the teachings of Musgrove to analyze the semantic role of the lexical strings in the received set of documents. Doing so would enable analyzing the documents based on the context, rather than merely analyzing the text based on definitions of the words (Musgrove, paragraph 36).

Regarding claim 5, Leal, in view of Kambhatla, Cardie, and Musgrove, discloses the invention of claim 3 as discussed above. Leal further discloses wherein identifying counterpart chunks in different documents comprises: identifying content that is substantially the same in different documents (Leal, paragraphs 72-84, teaches performing tokenization, chunking, and contextual model generation on the documents, analyzes lexical patterns in the documents that .

Claims 8 and 9 are rejected under 35 U.S.C. 103 as being unpatentable over Leal, Kambhatla, and Cardie, as applied to claim 1 above, and further in view of Carrier et al. (U.S. Pub. No. 2016/0070693) (“Carrier”).

Regarding claim 8, Leal, in view of Kambhatla and Cardie, discloses the invention of claim 1 as discussed above. Leal further discloses wherein the identified chunks comprise: structural chunks that contain content comprising structures within the layout of the documents (Leal, paragraphs 37, 38, teaches identifying tables or images by parsing the formatting content of the document). 
Leal does not disclose wherein the identified chunks comprise: field chunks that contain content within the documents suitable for use as fields in document templates. However, Carrier, paragraphs 8, 9, teaches applying natural language processing to unstructured data within a target form to identify elements of a form structure. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Leal to incorporate the teachings of Carrier to apply natural language processing to the documents to identify form elements. Doing so would enable detecting form criteria without relying upon headers (Carrier, paragraph 7).

Leal, in view of Kambhatla, Cardie, and Carrier, discloses the invention of claim 8 as discussed above. Leal does not disclose wherein some of the field chunks are hierarchical and contain other chunks as sub-chunks. However, Carrier, paragraphs 8, 9, teaches applying natural language processing to unstructured data within a target form to identify elements of a hierarchical form structure. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Leal to incorporate the teachings of Carrier to apply natural language processing to the documents to identify form elements. Doing so would enable detecting form criteria without relying upon headers (Carrier, paragraph 7).

Claims 13 and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Leal, Kambhatla, and Cardie, as applied to claim 1 above, and further in view of Dejean (U.S. Pub. No. 2011/0276874) (“Dejean”).

Regarding claim 13, Leal, in view of Kambhatla and Cardie, discloses the invention of claim 1 as discussed above. Leal, paragraphs 37, 38, teaches removing tables or images appearing within a document. Yet, Leal does not disclose wherein identifying chunks based on layout comprises: identifying spatial boundaries of structural chunks using machine learning inference trained on tiles of page images. However, Dejean, paragraph 3, teaches using geometric page analysis to recognize the different elements of a page as they are laid out on a document image (i.e., layout objects) based on exploiting the geometric or layout features. It would have been obvious to one of ordinary skill in the art before the effective filing date of the Leal to incorporate the teachings of Dejean to infer the boundaries of the tables of images in the document. Doing so would enable reducing the amount of noise data present within a received document (Leal, paragraph 37).

Regarding claim 14, Leal, in view of Kambhatla and Cardie, discloses the invention of claim 1 as discussed above. Leal, paragraphs 37, 38, teaches removing tables or images appearing within a document. Yet, Leal does not disclose wherein identifying chunks based on layout comprises: identifying spatial boundaries of structural chunks using artificial intelligence-based visual recognition of geometric patterns of the layout. However, Dejean, paragraph 3, teaches using geometric page analysis to recognize the different elements of a page as they are laid out on a document image (i.e., layout objects) based on exploiting the geometric or layout features. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Leal to incorporate the teachings of Dejean to infer the boundaries of the tables of images in the document. Doing so would enable reducing the amount of noise data present within a received document (Leal, paragraph 37).

Claim 18 is rejected under 35 U.S.C. 103 as being unpatentable over Leal, Kambhatla, and Cardie, as applied to claim 1 above, and further in view of Brugger, R. et al., A DTD Brugger”).

Regarding claim 18, Leal, in view of Kambhatla and Cardie, discloses the invention of claim 1 as discussed above. Leal, paragraphs 72-84, teaches performing tokenization, chunking, and contextual model generation on the documents. Yet, Leal does not disclose receiving user corrections for incorrectly identified chunks; and improving the step of automatically identifying chunks in response to the user corrections. However, Brugger, section 3 (page 4), teaches a machine learning approach to building a document model where the model is generated interactively by the user such that if the recognition on a new document fails, corrections are manually performed and the corrected tree is passed to the learning algorithm. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Leal to incorporate the teachings of Brugger to update the model generation based on user corrections. Doing so would enable the user to update the model interactively (Brugger). 

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SHAHID K KHAN whose telephone number is (571)270-0419.  The examiner can normally be reached on M-F, 9-5 est.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Stephen Hong can be reached on (571)272-4124.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/SHAHID K KHAN/Examiner, Art Unit 2178