DETAILED ACTION

This action is responsive to communications filed on August 5, 2020. This action is made Non-Final.
Claims 1-20 are pending in the case. 
Claims 1, 19, and 20 are independent claims.
Claims 1-20 are rejected.

	
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statement (IDS(s)) submitted on 12/01/2020 and 01/21/2021 is/are in compliance with the provisions of 37 C.F.R. 1.97. Accordingly, the IDS(s) is/are being considered by the examiner.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-4, 10, 13, 14, 17, 19, and 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Rajan et al., US Patent Application Publication no. 2012/0005686 (“Rajan”), and further in view of Straus, US Patent no. US 8,209,278 (“Straus”).
Claim 1:
	Rajan teaches or suggests a method implemented on a computer system executing instructions for analyzing and improving documents, the method comprising:
	accessing a document set that contains a plurality of documents, wherein the document also identifies chunks within the individual documents of the document set (see Fig. 1, 3; para. 0007 – classifying portions of documents; para. 0008 - machine-learning model uses machine-learning principles to learn the characteristics (features) of a set of documents that are correlated with a set of annotations in the training data; para. 0015 - pages can be decomposed into a set of functional segments. Once a web page is segmented into functional segments; para. 0025 – which parts of a set of web pages.);
	automatically assigning semantic role labels to a plurality of the chunks, wherein the semantic role labels are descriptive of the semantic roles played by the chunks (see Fig. 1, 3; para. 0008 - automatically assigning a category based on web-based features is a machine-learning model; para. 0015 - automatically takes in each such page segment as input, and assigns a functional label; para. 0028 - functional category as used herein is applied to individual segments of a web page, with different segments within the same web page having potentially different functional categories. Functional labels indicate the role or purpose that various parts of the web page serve;); and 
	automatically assigning semtantic role labels to the chunks 
(a) comprises using machine learning and/or natural language processing methods to determine semantic roles for chunks (see para. 0008 - automatically assigning a category based on web-based features is a machine-learning model. A machine-learning model uses machine-learning principles to learn the characteristics (features) of a set of documents that are correlated with a set of annotations in the training data; para. 0015 – a machine-learned solution may be employed to assign functional classification labels to each segment. Such a solution automatically takes in each such page segment as input, and assigns a functional label; para. 0026 - a machine-learning mechanism may be used to create classifier modules that recognize the features associated with a particular category label; para. 0032 - machine-learning module for each segment type is generated by analyzing all of the training data segments that have been classified with the same segment type. features that are relevant to the presentation of the segments on the screen are extracted into a feature vector and correlated with the functional category. As a result of running the machine-learning mechanism, a model is built in Step 140 for each functional category type. A classifier takes a segment and a set of segment features as input and outputs a probability that the segment with these features should be classified as the particular type.); and
(b) is also based on chunks in different documents that are identified as playing the same semantic role within their respective documents (see para. 0030 - steps for training a model that will functionally classify segments when machine-learning approaches are used. label web page segments to create training data.; para. 0032 - machine-learning module for each segment type is generated by analyzing all of the 
	using the chunks and their semantic role labels in further processing documents (see Fig. 1, 3; para. 0030 - steps for training a model that will functionally classify segments when machine-learning approaches are used; para. 0032 - a machine-learning module for each segment type is generated by analyzing all of the training data segments that have been classified with the same segment type; para. 0033 - Once the analysis framework has been created, the system is ready to analyze web pages that are requested by an application; para. 0035 - using the trained machine-learning model to determine the best functional label to assign to a particular segment based on the probabilities generated by each of the classifiers when analyzing the particular segment.).
	Rajan appears to fail to explicitly disclose in further processing of documents in the document set.
	Straus teaches or suggests in further processing of documents in the document set (see Fig. 4; col. 2, lines 4-6 - standard practice is for a law firm to 5 choose the most similar versions of the same kinds of documents col. 3; lines 7-25 - to survey sets of legal documents and determine common patterns in such documents, particularly common textual patterns. System includes recognition functionality so that provisions in the unsurveyed documents can be matched to the most similar common textual patterns determined by the survey process. Additionally, the attorney user can input further 
Accordingly, it would have been obvious to one having ordinary skill before the effective filing date of the claimed invention to modify the system and method, taught in Rajan, to include in further processing of documents in the document set for the purpose of efficiently expanding knowledge of documents of a kind for processing further documents of the kind, as taught by Straus (col. 3, lines 5-37). 
Claim(s) 19 and 20:
Claim(s) 19 and 20 correspond to Claim 1, and thus, Rajan and Straus teach or suggest the limitations of claim(s) 19 and 20 as well.

Claim 2:
	Rajan does not appear to explicitly disclose wherein the plurality of documents in the document set are all a same document type.
	Straus further teaches or suggests wherein the plurality of documents in the document set are all a same document type (see col. 2, lines 4-6 - standard practice is for a law firm to 5 choose the most similar versions of the same kinds of documents col. 3; lines 7-25 - to survey sets of legal documents and determine common patterns in such documents, particularly common textual patterns; col. 6, lines 33-36 - computer database for each kind of project containing information on the documents typically contained in such projects; col. 9, lines 6-7 - to further enhance the "knowledge" of its copy of the System in this area.).
Accordingly, it would have been obvious to one having ordinary skill before the effective filing date of the claimed invention to modify the system and method, taught in Rajan, to include wherein the plurality of documents in the document set are all a same document type for the purpose of efficiently expanding knowledge of documents of a kind for processing further documents of the kind, as taught by Straus (col. 3, lines 5-37).

Claim 3:
	Rajan further teaches or suggests wherein the chunks in the document set comprise: field chunks that contain content within documents suitable for use as fields in document templates, wherein some of the field chunks are hierarchical and contain other sub-chunks; and structural chunks that contain content comprising structures within the layout of the documents (see Fig. 4; para. 0029 - elements and attributes define both content type and presentation layout. initially partitioned into segments, the presentation features in the HTML elements and attributes associated with each segment are extracted and stored as metadata associated with each segment; Table 1 - Location of segment as displayed on the screen, Height and length of text box, User input forms,  Boiler Plate and Advertisement Phrases, element type(s) (site navigation); para. 0033 - segmentation constructs a DOM tree from the web page and works to group each of the nodes in the DOM tree into a segment. Location-based segmentation defines regions of a web page such as top, middle, bottom, left, and right and uses presentation information in HTML tag attributes to group together portions of the web page.).

Claim 4:
	Rajan fails to explicitly disclose wherein the document set contains legal documents; and the semantic roles comprise (a) roles played by parties to the legal documents, and (b) roles played by dates, time periods or other expressions of time. 
	Straus teaches or suggests wherein the document set contains legal documents; and the semantic roles comprise (a) roles played by parties to the legal documents, and (b) roles played by dates, time periods or other expressions of time (see Fig. 3 and 4; col. 3, lines 7-9 - survey sets of legal documents and determine common patterns in such documents, particularly common textual patterns; col. 5, lines 32-38 - work product created or analyzed by the law firm for this matter, as stored on the law firm's computer systems, would constitute a single "Project" for its client PartsCo. If later in the year PartsCo came back to the law firm for help with leasing a manufacturing plant, that lease of the plant 
Purchase from Supplier"; col. 35, lines 58-60 - store their word processing documents and include information as to the date a document is created, the client and project corresponding to the document, search functionality, etc.).
Accordingly, it would have been obvious to one having ordinary skill before the effective filing date of the claimed invention to modify the system and method, taught in Rajan, to include wherein the plurality of documents in the document set are all a same document type for the purpose of efficiently expanding knowledge of documents of a kind for processing further documents of the kind, as taught by Straus (col. 3, lines 5-37).

Claim 10:
Rajan teaches or suggests assigning candidate semantic role labels to chunks; standardizing the candidate semantic role labels among chunks in the clusters; and assigning the standardized semantic role labels to chunks (see para. 0017 - annotating the function of each segment of a web page, each individual application can select the segments having the labels that correspond to content that is meaningful; para. 0026 - a machine-learning mechanism may be used to create classifier modules that recognize the features associated with a particular category label. Users annotate web pages with corresponding functional category labels, and these annotations are used as training data to the application; para. 0028 - a functional category as used herein is applied to individual segments of a web page, with different segments within the same web page having potentially different functional  classification function is learned by correlating the feature vectors with their functional category label; para. 0032 - a machine-learning module for each segment type is generated by analyzing all of the training data segments that have been classified with the same segment type.).
	Rajan appears to fail to teach or suggest grouping chunks into clusters based on similarity of roles.
Straus teaches or suggests grouping chunks into clusters based on similarity of roles (see col. 11, lines 27-30 – the purpose of these provision comparisons is to find clusters or groups of similar provisions; This clustering together into sufficiently similar forms then allows the identification of the "Core Provisions."; col. 13, lines 28-31 - refined similarity calculations on a smaller subset identified in the first pass) it is possible to identify groups or clusters of provisions that are identical or substantially identical.).
Accordingly, it would have been obvious to one having ordinary skill before the effective filing date of the claimed invention to modify the system and method, taught in Rajan, to include grouping chunks into clusters based on similarity of roles for the purpose of efficiently expanding knowledge of documents of a kind for processing further documents of the kind, as taught by Straus (col. 3, lines 5-37).


	Rajan further teaches or suggests wherein the semantic role labels are chosen from a predetermined set of semantic role labels (see Fig. 2; para. 0008 - machine-learning mechanism analyzes the training data to learn which features in the web page best correlate with the human's choice of annotation; para. 0026 - machine-learning mechanism may be used to create classifier modules that recognize the features associated with a particular category label. Users annotate web pages with corresponding functional category labels, and these annotations are used as training data; para. 0030 - category label such "main content", "user-generated content", "advertisements", "boiler-plate", "content pointers", etc.).

Claim 14:
	Rajan further teaches or suggests wherein the semantic role labels comprise labels recognized by a software application used for further processing of documents (see para. 0017 - By annotating the function of each segment of a web page, each individual application can select the segments having the labels that correspond to content that is meaningful to the application; para. 0035 - an application determines to process a web page segment if and only if the functional category label assigned to the segment indicates content that is of interest to the application; Claim 1 - application processing only content contained in a first set of segments of said web page, wherein the first application selects the first set of segments based on the functional category assigned to said each segment of the first set of segments; a second application processing only content contained in a second set of segments of said web page, wherein the second application selects the second 
	Straus further teaches or suggests in the document set (see Fig. 4; col. 2, lines 4-6 - standard practice is for a law firm to 5 choose the most similar versions of the same kinds of documents col. 3; lines 7-25 - to survey sets of legal documents and determine common patterns in such documents, particularly common textual patterns. System includes recognition functionality so that provisions in the unsurveyed documents can be matched to the most similar common textual patterns determined by the survey process. Additionally, the attorney user can input further information into the survey databases which the system, or the attorney user, "learns" during an analysis of an existing document; col. 6, lines 33-36 - computer database for each kind of project containing information on the documents typically contained in such projects; col. 8, lines 60-63 – analyzing the provisions contained in these twelve documents, the System would essentially take a "survey" to populate the provision database corresponding to this kind of document; col. 9, lines 6-7 - to further enhance the "knowledge" of its copy of the System in this area; col. 39, lines 6-34 - If survey functionality is initiated, the user needs to identify the sets of documents to be surveyed (likely imported directly from a law firm's document management system, even if initially acquired from another source) and then the user can provide input as applicable as the survey is running. On completion the information "learned" would be added to the applicable System databases or dictionary files analyze the document to identify Core Provisions most closely matching the document's existing provisions. Add specified information "learned" during the analysis or editing 30 of this document to the applicable survey databases or files.).
in the document set for the purpose of efficiently expanding knowledge of documents of a kind for processing further documents of the kind, as taught by Straus (col. 3, lines 5-37).

Claim 17:
	Rajan further teaches or suggests wherein some of the chunks are multi-paragraph structures in the documents, and such chunks are labeled with semantic role labels for the semantic roles played by those chunks in the documents (see Fig. 2; para. 0015 - a solution automatically takes in each such page segment as input, and assigns a functional label; para. 0019 - main content usually contains the most text that is relevant to the topic of the web page. When users perform a web search, the keywords are intended to match the text in the main content; para. 0028 - a functional category as used herein is applied to individual segments of a web page, with different segments within the same web page having potentially different functional categories. Functional labels indicate the role or purpose that various parts of the web page serve. A web page may have a main content area, but also include a header, navigation links, search textbox, advertisements, user input forms, etc. Each of these components may be identified as to their function; para. 0035 - for using the trained machine-learning model to determine the best functional label to assign to a particular segment based on the probabilities generated by each of the classifiers when analyzing the particular segment).
	
Claims 5 and 6 is/are rejected under 35 U.S.C. 103 as being unpatentable over Rajan, in view of Straus, and further in view of Sweeney et al., US Patent Application Publication no. US 2007/0136221 (“Sweeney”).
Claim 5:
	As indicated above, Rajan teaches or suggests automatically assigning the semantic role labels to chunks.
	Rajan appears to fail to explicitly disclose automatically extracting some of the semantic role labels from chunks; and assigning the extracted semantic role labels to chunks.
	Sweeney teaches or suggests automatically extracting some of the semantic role labels from chunks; and assigning the extracted semantic role labels to chunks (see para. 0048 - directed to the classification of content residing within Web pages. Alternate embodiments of domain 200 may include document repositories; para. 0067 - labels are abstracted from the respective entities they describe in the knowledge representation model; para. 0068 - label is derived from the unique vocabulary of the source domain. In other words, the labels assigned to each data element are drawn from the language and terms presented in the domain; para. 0292 - label is derived from the unique vocabulary of the source domain. In other words, the labels assigned to each data element are drawn from the language and terms presented in the domain; para. 0446 - labels may be segmented by entity type.).
Accordingly, it would have been obvious to one having ordinary skill before the effective filing date of the claimed invention to modify the system and method, taught in Rajan, to include automatically extracting some of the semantic role labels from chunks; and assigning the extracted semantic role labels to chunks for the purpose of efficiently 

Claim 6:
	Rajan further teaches or suggests wherein automatically assigning semantic role labels to chunks comprises: 
using machine learning to automatically use semantic role labels from chunks (a) based on the content, layout and contexts of chunks in individual documents (see para. 0017 - annotating the function of each segment of a web page, each individual application can select the segments having the labels that correspond to content that is meaningful; para. 0026 - a machine-learning mechanism may be used to create classifier modules that recognize the features associated with a particular category label. Users annotate web pages with corresponding functional category labels, and these annotations are used as training data to the application; para. 0028 - a functional category as used herein is applied to individual segments of a web page, with different segments within the same web page having potentially different functional categories. Functional labels indicate the roleorpurposethatvarious parts of the web page serve; para. 0030 - steps for training a model that will functionally classify segments when machine-learning approaches are used. In Step 110, human editors explicitly label web page segments to create training data; para. 0032 - a machine-learning module for each segment type is generated by analyzing all of the training data segments that have been classified with the same segment type.);
(b) based on patterns of content, layout and contexts of chunks across documents in the document set (see Fig. 2, Table 1; para. 0029 - examples of features that can be ; and 
	(c) based on datatypes of chunks (see Fig. 2, Table 1; para. 0015 - a solution automatically takes in each such page segment as input, and assigns a functional label of the type para. 0029 - examples of features that can be extracted and later correlated with functional categories. Also included in the table are some heuristics for establishing a correlation between feature metadata and a functional category; para. 0031 - main content 230 is in the middle of the page and describes what Riley's Place is. The Links to Donor's Web Sites, classified as content pointers 240, are in a smaller font on the right margin of the page, and a Boiler-plate 250 on the bottom left; para. 0033 - Location-based segmentation defines regions. Vision-based segmentation breaks the web page down into segments based on organizing HTML tags.); and 
assigning the extracted semantic role labels to chunks (see Fig. 1, 3; para. 0008 - automatically assigning a category based on web-based features is a machine-learning model; para. 0015 - automatically takes in each such page segment as input, and assigns a functional label; para. 0028 - functional category as used herein is applied to individual segments of a web page, with different segments within the same web page having 
Rajan appears to fail to explicitly disclose automatically extract semantic role labels from chunks.
	Sweeney teaches or suggests automatically extract semantic role labels from chunks (see para. 0048 - directed to the classification of content residing within Web pages. Alternate embodiments of domain 200 may include document repositories; para. 0067 - labels are abstracted from the respective entities they describe in the knowledge representation model; para. 0068 - label is derived from the unique vocabulary of the source domain. In other words, the labels assigned to each data element are drawn from the language and terms presented in the domain; para. 0292 - label is derived from the unique vocabulary of the source domain. In other words, the labels assigned to each data element are drawn from the language and terms presented in the domain; para. 0446 - labels may be segmented by entity type.).
Accordingly, it would have been obvious to one having ordinary skill before the effective filing date of the claimed invention to modify the system and method, taught in Rajan, to include automatically extract semantic role labels from chunks for the purpose of efficiently identifying data elements in a knowledge representation model, as taught by Sweeney (para. 0068).

Claim 8:
Rajan further teaches or suggests automatically using candidate semantic role labels from the chunks; using machine learning to refine the candidate semantic role labels; and assigning the used semantic role labels to chunks (see para. 0017 - annotating the function of each segment of a web page, each individual application can select the segments having the labels that correspond to content that is meaningful; para. 0026 - a machine-learning mechanism may be used to create classifier modules that recognize the features associated with a particular category label. Users annotate web pages with corresponding functional category labels, and these annotations are used as training data to the application; para. 0028 - a functional category as used herein is applied to individual segments of a web page, with different segments within the same web page having potentially different functional categories. Functional labels indicate the roleorpurposethatvarious parts of the web page serve; para. 0030 - steps for training a model that will functionally classify segments when machine-learning approaches are used. In Step 110, human editors explicitly label web page segments to create training data. classification function is learned by correlating the feature vectors with their functional category label; para. 0032 - a machine-learning module for each segment type is generated by analyzing all of the training data segments that have been classified with the same segment type.).
Sweeney teaches or suggests automatically extract candidate semantic role labels from the chunks; extracted labels (see para. 0048 - directed to the classification of content residing within Web pages. Alternate embodiments of domain 200 may include document repositories; para. 0067 - labels are abstracted from the respective entities they describe in the knowledge representation model; para. 0068 - label is derived from the unique vocabulary of the source domain. In other words, the labels assigned to each data element are drawn from the language and terms presented in the domain; para. 0292 - label is derived from the unique vocabulary of the source domain. In other words, the labels 
Accordingly, it would have been obvious to one having ordinary skill before the effective filing date of the claimed invention to modify the system and method, taught in Rajan, to include automatically extract candidate semantic role labels from the chunks; extracted labels for the purpose of efficiently identifying data elements in a knowledge representation model, as taught by Sweeney (para. 0068).

Claim 9:
Rajan further teaches or suggests automatically using some of the semantic role labels from chunks based on similarity of content, layout, and/or context of chunks from different documents in the document set; and assigning the used semantic role labels to chunks (see Fig. 1-3, Table 1; para. 0008 - automatically assigning a category based on web-based features is a machine-learning model; para. 0015 - a solution automatically takes in each such page segment as input, and assigns a functional label of the type; para. 0017 - annotating the function of each segment of a web page, each individual application can select the segments having the labels that correspond to content that is meaningful; para. 0026 - a machine-learning mechanism may be used to create classifier modules that recognize the features associated with a particular category label. Users annotate web pages with corresponding functional category labels, and these annotations are used as training data to the application; para. 0028 - a functional category as used herein is applied to individual segments of a web page, with different segments within the same web page having potentially different functional categories. Functional labels indicate the role or 
Sweeney teaches or suggests automatically extract some of the semantic role labels from chunks; extracted labels (see para. 0048 - directed to the classification of content residing within Web pages. Alternate embodiments of domain 200 may include document repositories; para. 0067 - labels are abstracted from the respective entities they describe in the knowledge representation model; para. 0068 - label is derived from the unique vocabulary of the source domain. In other words, the labels assigned to each data element are drawn from the language and terms presented in the domain; para. 0292 - label is derived from the unique vocabulary of the source domain. In other words, the labels assigned to each data element are drawn from the language and terms presented in the domain; para. 0446 - labels may be segmented by entity type.).
automatically extract some of the semantic role labels from chunks; extracted labels for the purpose of efficiently identifying data elements in a knowledge representation model, as taught by Sweeney (para. 0068).

Claim 7 is/are rejected under 35 U.S.C. 103 as being unpatentable over Rajan, in view of Straus, and further in view of Mao et al., US Patent Application Publication no. US 2017/0147910 (“Mao”).
Claim 7:
	As indicated above, Rajan teaches or suggests assigning the semantic role labels to chunks and Sweeney teaches or suggests extracting labels and extracted labels.
	Rajan fails to explicitly disclose using auto-encoder machine learning techniques to automatically learn some of the semantic role labels.
	Mao teaches or suggests using auto-encoder machine learning techniques to automatically learn some of the semantic role labels (see para. 0040 - adopted auto-encoders with attribute representations to learn new class labels; para. 0047 - It encodes both the syntactic and semantic meaning of the words; para. 0066 - motivated by the tied weights strategy in auto-encoders for unsupervised learning tasks; Claim 1 - word embedding component that encodes meaning of a word from a caption.).
Accordingly, it would have been obvious to one having ordinary skill before the effective filing date of the claimed invention to modify the system and method, taught in Rajan, to include using auto-encoder machine learning techniques to automatically learn some of the semantic role labels for the purpose of efficiently learning useful encodings of data, as taught by Mao (0040).

Claim 11, 12, and 18 is/are rejected under 35 U.S.C. 103 as being unpatentable over Rajan, in view of Straus, and further in view of Syed et al., US Patent Application Publication no. US 2015/0139610 (“Syed”).
Claim 11:
Rajan teaches or suggests assigning semantic role labels to chunks; grouping based on similarity of size; standardizing the semantic role labels assigning the semantic role labels to chunks (see Fig. 1-3, Table 1; para. 0008 - automatically assigning a category based on web-based features is a machine-learning model; para. 0015 - a solution automatically takes in each such page segment as input, and assigns a functional label of the type; para. 0017 - annotating the function of each segment of a web page, each individual application can select the segments having the labels that correspond to content that is meaningful; para. 0026 - a machine-learning mechanism may be used to create classifier modules that recognize the features associated with a particular category label. Users annotate web pages with corresponding functional category labels, and these annotations are used as training data to the application; para. 0028 - a functional category as used herein is applied to individual segments of a web page, with different segments within the same web page having potentially different functional categories. Functional labels indicate the role or purpose that various parts of the web page serve; para. 0029 - examples of features that can be extracted and later correlated with functional categories. Also included in the table are some heuristics for establishing a correlation between feature metadata and a 
Straus further teaches or suggests grouping chunks into chunk clusters based on text embedding of the chunks; based on the chunk clusters (see col. 11, lines 27-30 – the purpose of these provision comparisons is to find clusters or groups of similar provisions; This clustering together into sufficiently similar forms then allows the identification of the "Core Provisions."; col. 13, lines 28-31 - refined similarity calculations on a smaller subset identified in the first pass) it is possible to identify groups or clusters of provisions that are identical or substantially identical.).
Accordingly, it would have been obvious to one having ordinary skill before the effective filing date of the claimed invention to modify the system and method, taught in Rajan, to include grouping chunks into chunk clusters based on text embedding of the chunks; based on the chunk clusters for the purpose of efficiently expanding knowledge of documents of a kind for processing further documents of the kind, as taught by Straus (col. 3, lines 5-37).
grouping candidate semantic role labels into label clusters based on similarity of text embedding of the candidate semantic role labels; based on the label clusters; standardizing semantic role labels (see Fig. 6-9; para. 0065 - there is a high concentration of labels. may also cluster these labels into one parent label. Clustering may be based on various factors … and similarity of content of the labels at issue and properties; para. 0087 - selected for display to the user based on one or more of filtering, clustering and/or prioritization; para. 0093 - automated label integration/filtering processing may involve, among other things, a weighted integration of labels via clustering and spam filtering; para. 0108 - similar labels are merged into a single label using clustering; para. 0116 - labels may be merged into one or more parent labels via clustering. Clustering may be based on the time offset of the labels, similarity of content, and properties of the underlying scenes; para. 0129 - clustering of labels is performed. In one embodiment, the goal of clustering is to group similar labels; para. 0133 - labels in a cluster are merged to obtain a single representative label from each of the selected clusters.).
Accordingly, it would have been obvious to one having ordinary skill before the effective filing date of the claimed invention to modify the system and method, taught in Rajan, to include grouping candidate semantic role labels into label clusters based on similarity of text embedding of the candidate semantic role labels; based on the label clusters; standardizing semantic role labels for the purpose of efficiently merging labels to obtain a single representative label, as taught by Syed (0133).


Rajan teaches or suggests assigning candidate semantic role labels to chunks that comprise sections of documents, wherein the semantic role labels are based on headings of the sections; standardizing the candidate semantic role labels; assigning the standardized semantic role labels to the chunks (see Fig. 1-3, Table 1; para. 0008 - automatically assigning a category based on web-based features is a machine-learning model; para. 0015 - a solution automatically takes in each such page segment as input, and assigns a functional label of the type; para. 0017 - annotating the function of each segment of a web page, each individual application can select the segments having the labels that correspond to content that is meaningful; para. 0026 - a machine-learning mechanism may be used to create classifier modules that recognize the features associated with a particular category label. Users annotate web pages with corresponding functional category labels, and these annotations are used as training data to the application; para. 0028 - a functional category as used herein is applied to individual segments of a web page, with different segments within the same web page having potentially different functional categories. Functional labels indicate the role or purpose that various parts of the web page serve. may have a main content area, but also include a header; para. 0029 - examples of features that can be extracted and later correlated with functional categories. Also included in the table are some heuristics for establishing a correlation between feature metadata and a functional category; para. 0030 - steps for training a model that will functionally classify segments when machine-learning approaches are used. In Step 110, human editors explicitly label web page segments to create training data; para. 0031 - main content 230 is in the middle of the page and describes what Riley's Place is. The Links to Donor's Web Sites, classified as 
Straus further teaches or suggests grouping chunks into clusters based on similarity of the content in the sections (see col. 11, lines 27-30 – the purpose of these provision comparisons is to find clusters or groups of similar provisions; This clustering together into sufficiently similar forms then allows the identification of the "Core Provisions."; col. 13, lines 28-31 - refined similarity calculations on a smaller subset identified in the first pass) it is possible to identify groups or clusters of provisions that are identical or substantially identical.).
Accordingly, it would have been obvious to one having ordinary skill before the effective filing date of the claimed invention to modify the system and method, taught in Rajan, to include grouping chunks into clusters based on similarity of the content in the sections for the purpose of efficiently expanding knowledge of documents of a kind for processing further documents of the kind, as taught by Straus (col. 3, lines 5-37).
	Syed teaches or suggests standardizing the candidate semantic role labels by selecting the most common candidate semantic role label as the semantic role label for all chunks in a cluster; assigning the standardized semantic role labels (see Fig. 6-9; para. 0065 - there is a high concentration of labels. may also cluster these labels into one parent label. Clustering may be based on various factors … and similarity of content of the labels at issue 
Accordingly, it would have been obvious to one having ordinary skill before the effective filing date of the claimed invention to modify the system and method, taught in Rajan, to include grouping candidate semantic role labels into label clusters based on similarity of text embedding of the candidate semantic role labels; based on the label clusters; standardizing semantic role labels for the purpose of efficiently merging labels to obtain a single representative label, as taught by Syed (0133).

Claim 18:
	As indicated above, Rajan teaches or suggests for the automatically assigned semantic role labels.
Syed further teaches estimating a confidence level for the automatically assigned semantic role labels; based on the estimated confidence level, presenting some assignments to a user for confirmation; receiving user feedback for the automatically assigned semantic role labels; and improving the machine learning and/or natural language processing methods in response to the user feedback (see para. 0074 - allows the user to add labels to the video or provide an approval/disapproval vote with respect to an existing label; para. 0080 - One or more users are given privileges as editors or super-users and are enabled to allow or disallow labels to become part of the ToC; para. 0081 - label integration uses cues, including, but not limited to, semantic distance between labels, time alignment between labels, number of approval votes for labels, reputations of users providing labels, similarity of label with underlying video content, to cluster together similar labels; para. 0082 - manual and automatic label integration: This integration process uses automatic label integration to generate the ToC but also allows users with privileged status to add, remove or modify the labels in the ToC; para. 0117 – weights are calculated based on the current vote counts ( e.g., the net of up votes, down votes, likes, dislikes). In one embodiment, other criteria might be used for label weight calculation including, weighting of labels based on user's reputation, similarity of the label and the underlying video content, sentiment of the label.).
Accordingly, it would have been obvious to one having ordinary skill before the effective filing date of the claimed invention to modify the system and method, taught in Rajan, to include estimating a confidence level for the automatically assigned semantic role labels; based on the estimated confidence level, presenting some assignments to a user for confirmation; receiving user feedback for the automatically assigned semantic role labels; and improving the machine learning and/or natural language processing methods in response to the user feedback for the purpose of efficiently merging labels to obtain a single representative label, as taught by Syed (0133).

Claim 15 and 16 is/are rejected under 35 U.S.C. 103 as being unpatentable over Rajan, in view of Straus, and further in view of Joshi et al., US Patent no. US 10,489,441 (“Joshi”).
Claim 15:
Rajan teaches or suggests (a) using machine learning to determine semantic roles for chunks based on other chunks (see Fig. 1-3, Table 1; para. 0008 - automatically assigning a category based on web-based features is a machine-learning model; para. 0015 - a solution automatically takes in each such page segment as input, and assigns a functional label of the type; para. 0017 - annotating the function of each segment of a web page, each individual application can select the segments having the labels that correspond to content that is meaningful; para. 0026 - a machine-learning mechanism may be used to create classifier modules that recognize the features associated with a particular category label. Users annotate web pages with corresponding functional category labels, and these annotations are used as training data to the application; para. 0028 - a functional category as used herein is applied to individual segments of a web page, with different segments within the same web page having potentially different functional categories. Functional labels indicate the role or purpose that various parts of the web page serve; para. 0029 - examples of features that can be extracted and later correlated with functional categories. Also included in the table are some heuristics for establishing a correlation between feature metadata and a functional category; para. 0030 - steps for training a model that will functionally classify segments when machine-learning approaches are used. In Step 110, human editors explicitly label web page segments to create training data; para. 0031 - main 
	Joshi further teaches or suggests that are nearby or based on containing chunks that contain said chunks, or (b) using natural language processing methods based on grammatical structures of nearby chunks to determine semantic roles for chunks (see col. 1, lines 40-47 - determine whether a content segment (e.g., a document or portion of a document) is relevant to the particular category. The content relevance model of some embodiments is defined in terms of (i) a set of key word sets more likely to appear in content segments relevant to the particular category and (ii) other word sets within the context of the key word sets; col. 2, lines 10-24 -  for the content segment to be evaluated for relevance to the category define context by proximity, such that all word sets within a particular number of words of a first word set are within the context of the first word set. Some embodiments define all 20 word sets within the sentence or paragraph of a first word set as within the context of that first word set. In addition, some embodiments allow different definitions of context for different words or different types of content segments.).
Accordingly, it would have been obvious to one having ordinary skill before the effective filing date of the claimed invention to modify the system and method, taught in Rajan, to include that are nearby or based on containing chunks that contain said chunks, or (b) using natural language processing methods based on grammatical structures of nearby chunks to determine semantic roles for chunks for the purpose of efficiently classifying content segments, as taught by Joshi (col. 1).

Claim 16:
Rajan further teaches or suggests such chunks are labeled with semantic role labels for the semantic roles played by those chunks in the documents, and such chunks are also labeled with a datatype of the chunk (see Fig. 1-3, Table 1; para. 0008 - automatically assigning a category based on web-based features is a machine-learning model; para. 0015 - a solution automatically takes in each such page segment as input, and assigns a functional label of the type; para. 0017 - annotating the function of each segment of a web page, each individual application can select the segments having the labels that correspond to content that is meaningful; para. 0026 - a machine-learning mechanism may be used to create classifier modules that recognize the features associated with a particular category label. Users annotate web pages with corresponding functional category labels, and these annotations are used as training data to the application; para. 0028 - a functional category as used herein is applied to individual segments of a web page, with different segments within the same web page having potentially different functional categories. Functional labels indicate the role or purpose that various parts of the web page serve; para. 0029 - examples of features that can be extracted and later correlated with functional categories. Also included in the table are some heuristics for establishing a correlation between feature metadata and a functional category; para. 0030 - steps for training a model that will functionally classify segments when machine-learning approaches are used. In Step 110, 
Joshi further teaches or suggests wherein some of the chunks are Named Entity References (see col. 1, lines 39-47 - defining a content relevance model for a particular category (e.g., a company, product, person, topic, etc.) that is used to determine whether a content segment (e.g., a document or portion of a document) is relevant to the particular category. The content relevance model of some embodiments is defined in terms of (i) a set of key word sets more likely to appear in content segments relevant to the particular category and (ii) other word sets within the context of the key word sets; col. 2, lines 10-24 -  for the content segment to be evaluated for relevance to the category define context by proximity, such that all word sets within a particular number of words of a first word set are within the context of the first word set. Some embodiments define all 20 word sets within the sentence or paragraph of a first word set as within the context of that first word set. In addition, some embodiments allow different definitions of context for different words or different types of content segments; col. 4, lines 35-45 - categories may be companies (e.g., Microsoft, Intel, General Motors, etc.), products (e.g., Bing, Xbox, Windows 7, etc.), people (e.g., Bill Gates, Steve Ballmer, etc.); col. 12, lines 40-45 - user could define 
Accordingly, it would have been obvious to one having ordinary skill before the effective filing date of the claimed invention to modify the system and method, taught in Rajan, to include wherein some of the chunks are Named Entity References for chunks for the purpose of efficiently classifying content segments, as taught by Joshi (col. 1).

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Andrew T McIntosh whose telephone number is (571)270-7790.  The examiner can normally be reached on M-Th 8:00am-5:30pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kavita Stanley can be reached on 571-272-8352.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information 






/ANDREW T MCINTOSH/Primary Examiner, Art Unit 2176