DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 1-20 rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
The independent claims 1, 11 and 18 recites “a number of markup-language network-accessible documents in the set of sample documents being less that a total number of markup-language network-accessible documents in the repository that match the filter pattern” which is ambiguous because it is not clear that the sample documents number is less than the number of actual stored documents.  For the purpose of the examination the limitation is interpreted as the number of markup-language network-accessible documents in the set of sample documents being less than a total number of markup-language network-accessible documents in the repository that match the filter pattern.


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 1, 2, 3, 6, 7, 9, 11, 12, 16, 18, 19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Sheng et al. (US 2018/0144042) and in view of Watts (US 2004/0186833). 

With respect to claim 1, Sheng teaches one or more computing devices for processing network-accessible documents obtained from a wide-area network (fig. 7; examiner’s note: the hardware system includes network in [0023]), comprising: 
hardware logic circuitry, the hardware logic circuitry including (fig. 7; examiner’s note: the hardware system): 
(a) one or more hardware processors that perform operations by executing machine-readable instructions stored in a memory (fig. 7, [0023]; examiner’s note: the hardware system), and/or (b) one or more other hardware logic units that perform the operations using a task-specific collection of logic gates (fig. 7, [0023]; examiner’s note: the hardware system), the operations including: 
providing a set of sample documents from a repository of network-accessible markup-language documents that match a filter pattern ([0024, one or more preliminary filtering mechanisms to discard documents that are not suitable for template generation. For example, if a corpus of structured documents 100 under analysis includes personal emails and B2C emails, personal emails may be discarded]; [0024, a cluster engine 122 may be configured to group the corpus of structured documents 100 into a plurality of clusters 132.sub.1-m based on one or more patterns (e.g., fixed content)], [0026]; [0030, A corpus of labeled structured documents 200 may be provided as training data]; examiner’s note: the documents are matched with a pattern to filter out the patterns that does not match, for example as described in para. [0026] the email pattern is matched with other emails that has the same amount of XPATH, and the filtered set of documents are provided as training data which are the sample data), the set of sample documents being associated with a class of network-accessible markup-language documents ([0030, FIG. 2 depicts an example of how one or more machine learning models may be trained to classify structured documents into various categories (or "verticals")], fig. 2; examiner’s note: the structured training documents are classified into different categories),
 storing the set of sample documents in a sample-document data store (fig. 2,[0030]; examiner’s note: the training data sets are stored to label the data); 
using a machine-trained labeling model to apply labels to the set of sample documents ([0030, each labeled structured document 200 may be labeled (e.g., annotated) with various classifications]; [0032]; [0034, if a structured document is labeled or classified as an "event," machine learning application engine 252 may apply only those extraction machine learning models 256 that are applicable to events…. machine learning application engine 252 may select and apply an "event title"]; examiner’s note: the machine learning model is applying labels such as “event title” to the documents), to provide a set of labeled documents ([0034]; examiner’s note: the machine learning model apply labels to documents to provide labeled documents), a label added to a given sample document identifying a type of data item that is present in the given sample document ([0034, extraction machine learning model 256--which may be configured to identify a location (e.g., XPath) within the structured document as containing transient data corresponding to an event title--to the particular structured document]; examiner’s note: the event title specifies what type of data does the document contains) and a location of the data item in the given sample document ([0033, each transient field location (e.g., XPath) in a structured document that contains transient data of interest may be labeled, annotated, or otherwise indicated], [0034, extraction machine learning model 256--which may be configured to identify a location (e.g., XPath) within the structured document as containing transient data corresponding to an event title--to the particular structured document]; examiner’s note: the Xpath in the document identifies the location of the event title in the document); 
storing the set of labeled documents in a labeled-document data store ([0034, 0054, 0055]; examiner’s note: the labeled documents are stored in a data store to process new matching documents); 
generating a data-extraction model based on the set of labeled documents ([0034, if a structured document is labeled or classified as an "event," machine learning application engine 252 may apply only those extraction machine learning models 256 that are applicable to events]; examiner’s note: the data extraction module extract data from the labeled documents that matches the event), the data-extraction model including data-extracting logic for extracting at least one specified data item from new documents that match the class of documents (fig. 6, [0055]; [0056, At block 606, the system may extract one or more data points from the subsequent structured communication based on one or more associations between the matched data extraction template and categories/transient field locations, e.g., which may be annotations of the template]; [0058, using new training data that includes labeled structured documents (or in some cases, simply data extraction templates generated from clusters of structured documents)]; examiner’s note: each subsequent document goes through the process of data extraction model); and  
43407867-US-NPstoring the data-extraction model in a model data store ([0054, the data extraction template may be stored in memory….or more semantic classifications may be stored in association with the one or more transient field locations]; examiner’s note: the data extracted templates are stored in a memory).
Sheng does not explicitly teach a number of markup-language network-accessible documents in the set of sample documents being less that a total number of markup-language network-accessible documents in the repository that match the filter pattern.
However, Watts teaches a number of markup-language network-accessible documents in the set of sample documents being less that a total number of markup-language network-accessible documents in the repository that match the filter pattern ([0004, A second search of the same data base using the subject area model predictive control combined with the logical operator "and" combined with the sub-discipline constraint handling would be expected to return a much smaller set of records say on the order of less than 50 records]; [0015, A portion of the data set is selected using a refined, detailed search on terms that are directly on point. This is designed to generate a small subject specific sample, say less than 5% of the total documents that have index terms defining the very kernel of the data set to be analyzed]; examiner’s note: the matching results are less than the actual stored documents and markup language documents are taught by Sheng in para. [0023, structured documents may be structured using various markup languages such as the eXtensible Markup Language ("XML") or the Hypertext Markup Language ("HTML")]).
It would have been obvious to one of ordinary skill in the art before the effective filling date of the invention to modify Sheng's invention which teaches clustering and label documents to include Watts which teaches matching documents are less than the total documents. Sheng and Watts are in the same field of invention because all of them teach clustering documents. One would have been motivated to make this modification because it provides predictable results such as to use less documents as sample documents to save time and also to have efficient system.

With respect to claim 2, Sheng teaches the one or more computing devices of claim 1, Sheng further teaches wherein the hardware logic circuitry performs said providing, using ([0062, the storage subsystem 724 may include the logic to perform selected aspects of methods 500 and/or 600 and/or to implement one or more of cluster engine 122]; fig. 7; examiner’s note: the hardware logic), and generating for at least one other class of markup-language network-accessible documents ([0023, structured documents may be structured using various markup languages such as the eXtensible Markup Language ("XML") or the Hypertext Markup Language ("HTML")]; examiner’s note: the documents are in markup language), to overall provide plural data- extraction models associated with respective classes of markup-language network- accessible documents ([0024, one or more patterns (e.g., fixed content) shared among one or more structured documents 100 within the corpus]; [0006, Based on output of the one or more category machine learning models]; [0007, one or more extraction machine learning models may then be selected from a plurality of extraction machine learning models based on the determined document category]; examiner’s note: plurality of extraction models are used to classify documents into plurality of categories).  

With respect to claim 3, Sheng, Watt in combination teach the one or more computing devices of claim 1, Sheng further teaches wherein the hardware logic circuitry further performs operations of: generating plural filter patterns associated with different respective classes of markup-language network-accessible documents ([0024, a cluster engine 122 may be configured to group the corpus of structured documents 100 into a plurality of clusters 132.sub.1-m based on one or more patterns (e.g., fixed content) shared among one or more structured documents 100 within the corpus]; examiner’s note: the more or more patterns are associated with one or more filters to classify documents); 
and storing the plural filter patterns in a filter data store, said providing using the plural filter patterns to produce plural sets of sample documents associated with the respective classes of markup-language network- accessible documents ([0024-0028]; examiner’s note: the plurality of patterns are associated with plurality of filters such as different types of emails are clustered based on number of XPATHs and each type of email has its own pattern; the patterns are matched with documents to cluster them, therefore, the patterns are stored in a filter database).

With respect to claim 6, Sheng teaches the one or more computing devices of claim 1, wherein the given sample document is an HTML document ([0025, formatting information (e.g., HTML nodes, XPaths, etc.),]; examiner’s note: the document is a HTML document).

With respect to claim 7, Sheng teaches the one or more computing devices of claim 1, wherein said generating identifies at least one pattern in the set of labeled documents that satisfies a prescribed statistical condition ([0037, a threshold number (e.g., 90%, or some other threshold) of structured documents 300 of cluster 132.sub.x are classified into a particular category]; examiner’s note: the threshold is a statistical condition and the documents needs to match a certain threshold to be labeled and stored).

With respect to claim 9, Sheng and Watt in combination teach the one or more computing devices of claim 1, Sheng further teaches wherein the hardware logic circuitry is further configured to perform a data-extracting operation, the data- extracting operation including (fig. 2, [0058]; [0062, template generation engine 140, and/or feature extraction engine]; examiner’s note: the data extraction module): 
receiving a new document from the repository of markup-language network- accessible documents ([0058, new structured documents generated from a heretofore unknown template may be distributed, e.g., as B2C emails to consumers]; examiner’s note: the new documents are classified according to their pattern), the new document not being a member of the set of sample documents ([0058]; examiner’s note: the new documents are not in the sample documents); 
determining that the data-extraction model applies to the new document ([0058]; examiner’s note: the new documents features are extracted to classify them into categories, the matching features determines that data extraction models can be applied to the new documents); and using the data-extracting logic of the data-extraction model to extract one or more data items from the new document ([0058]; examiner’s note: the new documents features are extracted to classify them into categories, the matching features determines that data extraction models can be applied to the new documents).

With respect to claim 11, Sheng teaches a computer-implemented method for processing network-accessible documents obtained from a wide-area network (fig. 4; [0023, documents may include other types of documents, such as letters (e.g., in portable document format ("PDF") and/or word processing format), invoices, bills, receipts, invitations (e.g., invites received via social network applications)]; [0042]; examiner’s note: the documents are received via network), comprising: 
receiving a new document from a repository of markup-language network- accessible documents ([0058, new structured documents generated from a heretofore unknown template may be distributed, e.g., as B2C emails to consumers]; examiner’s note: the new documents are classified according to their pattern); 
identifying a data-extraction model that applies to the new document ([0049, At block 502, the system may identify, e.g., from data extraction template database 142, a data extraction template generated from a cluster of structured documents that share at least some fixed content (e.g., boilerplate)]; [0058]; examiner’s note: the new documents features are extracted to classify them into categories, the matching features determines that data extraction models can be applied to the new documents); and 
using the data-extraction model to extract one or more data items from the new document, the data-extraction model being produced, in advance of said receiving, in a model-generating process that includes ([0058, new structured documents generated from a heretofore unknown template may be distributed, e.g., as B2C emails to consumers]; examiner’s note: the new documents are classified according to their pattern);  
providing a set of sample documents from the repository of markup-language network-accessible documents that match a filter pattern ([0024, one or more preliminary filtering mechanisms to discard documents that are not suitable for template generation. For example, if a corpus of structured documents 100 under analysis includes personal emails and B2C emails, personal emails may be discarded]; [0024, a cluster engine 122 may be configured to group the corpus of structured documents 100 into a plurality of clusters 132.sub.1-m based on one or more patterns (e.g., fixed content)], [0026]; [0030, A corpus of labeled structured documents 200 may be provided as training data]; examiner’s note: the documents are matched with a pattern to filter out the patterns that does not match, for example as described in para. [0026] the email pattern is matched with other emails that has the same amount of XPATH, and the filtered set of documents are provided as training data which are the sample data), the set of sample documents being associated with a class of markup-language network-accessible documents ([0030, FIG. 2 depicts an example of how one or more machine learning models may be trained to classify structured documents into various categories (or "verticals")], fig. 2; examiner’s note: the structured training documents are classified into different categories),
storing the set of sample documents in a sample-document data store (fig. 2,[0030]; examiner’s note: the training data sets are stored to label the data); 
using a machine-trained labeling model to apply labels to the set of sample documents, to provide a set of labeled documents ([0030, each labeled structured document 200 may be labeled (e.g., annotated) with various classifications]; [0032]; [0034, if a structured document is labeled or classified as an "event," machine learning application engine 252 may apply only those extraction machine learning models 256 that are applicable to events…. machine learning application engine 252 may select and apply an "event title"]; examiner’s note: the machine learning model is applying labels such as “event title” to the documents), a label added to a given sample document identifying a type of data item that is present in the given sample document ([0034, extraction machine learning model 256--which may be configured to identify a location (e.g., XPath) within the structured document as containing transient data corresponding to an event title--to the particular structured document]; examiner’s note: the event title specifies what type of data does the document contains) and a location of the data item in the given sample document ([0033, each transient field location (e.g., XPath) in a structured document that contains transient data of interest may be labeled, annotated, or otherwise indicated], [0034, extraction machine learning model 256--which may be configured to identify a location (e.g., XPath) within the structured document as containing transient data corresponding to an event title--to the particular structured document]; examiner’s note: the Xpath in the document identifies the location of the event title in the document);  
storing the set of labeled documents in a labeled-document data store ([0034, 0054, 0055]; examiner’s note: the labeled documents are stored in a data store to process new matching documents);  
46407867-US-NPgenerating the data-extraction model based on the set of labeled documents ([0034, if a structured document is labeled or classified as an "event," machine learning application engine 252 may apply only those extraction machine learning models 256 that are applicable to events]; examiner’s note: the data extraction module extract data from the labeled documents that matches the event), the data-extraction model including data-extracting logic for extracting at least one specified data item from new documents that match the class of documents (fig. 6, [0055]; [0056, At block 606, the system may extract one or more data points from the subsequent structured communication based on one or more associations between the matched data extraction template and categories/transient field locations, e.g., which may be annotations of the template]; [0058, using new training data that includes labeled structured documents (or in some cases, simply data extraction templates generated from clusters of structured documents)]; examiner’s note: each subsequent document goes through the process of data extraction model); and 
storing the data-extraction model in a model data store (([0054, the data extraction template may be stored in memory….or more semantic classifications may be stored in association with the one or more transient field locations]; examiner’s note: the data extracted templates are stored in a memory).
Sheng does not explicitly teach a number of markup-language network-accessible documents in the set of sample documents being less that a total number of markup-language network-accessible documents in the repository that match the filter pattern.
However, Watts teaches a number of markup-language network-accessible documents in the set of sample documents being less that a total number of markup-language network-accessible documents in the repository that match the filter pattern ([0004, A second search of the same data base using the subject area model predictive control combined with the logical operator "and" combined with the sub-discipline constraint handling would be expected to return a much smaller set of records say on the order of less than 50 records]; [0015, A portion of the data set is selected using a refined, detailed search on terms that are directly on point. This is designed to generate a small subject specific sample, say less than 5% of the total documents that have index terms defining the very kernel of the data set to be analyzed]; examiner’s note: the matching results are less than the actual stored documents and markup language documents are taught by Sheng in para. [0023, structured documents may be structured using various markup languages such as the eXtensible Markup Language ("XML") or the Hypertext Markup Language ("HTML")]).
It would have been obvious to one of ordinary skill in the art before the effective filling date of the invention to modify Sheng's invention which teaches clustering and label documents to include Watts which teaches matching documents are less than the total documents. Sheng and Watts are in the same field of invention because all of them teach clustering documents. One would have been motivated to make this modification because it provides predictable results such as to use less documents as sample documents to save time and also to have efficient system.

	Claim 12 is rejected on the basis of rejection of claim 2.
	Claim 16 is rejected on the basis of rejection of claim 7.

	With respect to claim 18, Sheng teaches a computer-readable storage medium for storing computer-readable instructions, the computer-readable instructions, when executed by one or more hardware processors, performing a method that comprises ([0003]; examiner’s note: the computer readable storage media): in a model-generating process (fig. 2. [0034]; examiner’s note: the training model is generated): 
providing a set of sample documents from a repository of markup-language network-accessible documents that match a filter pattern ([0024, one or more preliminary filtering mechanisms to discard documents that are not suitable for template generation. For example, if a corpus of structured documents 100 under analysis includes personal emails and B2C emails, personal emails may be discarded]; [0024, a cluster engine 122 may be configured to group the corpus of structured documents 100 into a plurality of clusters 132.sub.1-m based on one or more patterns (e.g., fixed content)], [0026]; [0030, A corpus of labeled structured documents 200 may be provided as training data]; examiner’s note: the documents are matched with a pattern to filter out the patterns that does not match, for example as described in para. [0026] the email pattern is matched with other emails that has the same amount of XPATH, and the filtered set of documents are provided as training data which are the sample data), the set of sample documents being associated with a class of markup-language network-accessible documents ([0030, FIG. 2 depicts an example of how one or more machine learning models may be trained to classify structured documents into various categories (or "verticals")], fig. 2; examiner’s note: the structured training documents are classified into different categories), 
storing the set of sample documents in a sample-document data store (fig. 2,[0030]; examiner’s note: the training data sets are stored to label the data);  
using a machine-trained labeling model to apply labels to the set of sample documents ([0030, each labeled structured document 200 may be labeled (e.g., annotated) with various classifications]; [0032]; [0034, if a structured document is labeled or classified as an "event," machine learning application engine 252 may apply only those extraction machine learning models 256 that are applicable to events…. machine learning application engine 252 may select and apply an "event title"]; examiner’s note: the machine learning model is applying labels such as “event title” to the documents), to provide a set of labeled documents, a label added to a given sample document identifying a type of data item that is present in the given sample document  ([0034, extraction machine learning model 256--which may be configured to identify a location (e.g., XPath) within the structured document as containing transient data corresponding to an event title--to the particular structured document]; examiner’s note: the event title specifies what type of data does the document contains) and a location of the data item in the given sample document ([0033, each transient field location (e.g., XPath) in a structured document that contains transient data of interest may be labeled, annotated, or otherwise indicated], [0034, extraction machine learning model 256--which may be configured to identify a location (e.g., XPath) within the structured document as containing transient data corresponding to an event title--to the particular structured document]; examiner’s note: the Xpath in the document identifies the location of the event title in the document);  
storing the set of labeled documents in a labeled-document data store ([0034, 0054, 0055]; examiner’s note: the labeled documents are stored in a data store to process new matching documents);   
48407867-US-NPgenerating a data-extraction model based on the set of labeled documents by identifying least one pattern in the set of labeled documents that satisfies a prescribed statistical condition ([0037, a threshold number (e.g., 90%, or some other threshold) of structured documents 300 of cluster 132.sub.x are classified into a particular category]; examiner’s note: the threshold is a statistical condition and the documents needs to match a certain threshold to be labeled and stored), the data- extraction model including data-extracting logic for extracting at least one specified data item from new documents that match the class of documents ([0058, new structured documents generated from a heretofore unknown template may be distributed, e.g., as B2C emails to consumers]; examiner’s note: the new documents are classified according to their pattern); and 
storing the data-extraction model in a model data store (); and 
in a data-extracting process: 
receiving a new document from the repository of markup- language network-accessible documents, the new document not being a member of the set of sample documents ([0058, new structured documents generated from a heretofore unknown template may be distributed, e.g., as B2C emails to consumers]; examiner’s note: the new documents are classified according to their pattern; the new documents are not in the sample documents);
determining that the data-extraction model applies to the new document ([0058]; examiner’s note: the new documents features are extracted to classify them into categories, the matching features determines that data extraction models can be applied to the new documents); 
and using the data-extraction model to extract one or more data items from the new document ([0058]; examiner’s note: the new documents features are extracted to classify them into categories, the matching features determines that data extraction models can be applied to the new documents).
Sheng does not explicitly teach a number of markup-language network-accessible documents in the set of sample documents being less that a total number of markup-language network-accessible documents in the repository that match the filter pattern.
However, Watts teaches a number of markup-language network-accessible documents in the set of sample documents being less that a total number of markup-language network-accessible documents in the repository that match the filter pattern ([0004, A second search of the same data base using the subject area model predictive control combined with the logical operator "and" combined with the sub-discipline constraint handling would be expected to return a much smaller set of records say on the order of less than 50 records]; [0015, A portion of the data set is selected using a refined, detailed search on terms that are directly on point. This is designed to generate a small subject specific sample, say less than 5% of the total documents that have index terms defining the very kernel of the data set to be analyzed]; examiner’s note: the matching results are less than the actual stored documents and markup language documents are taught by Sheng in para. [0023, structured documents may be structured using various markup languages such as the eXtensible Markup Language ("XML") or the Hypertext Markup Language ("HTML")]).
It would have been obvious to one of ordinary skill in the art before the effective filling date of the invention to modify Sheng's invention which teaches clustering and label documents to include Watts which teaches matching documents are less than the total documents. Sheng and Watts are in the same field of invention because all of them teach clustering documents. One would have been motivated to make this modification because it provides predictable results such as to use less documents as sample documents to save time and also to have efficient system.

Claim 19 is rejected on the same basis of rejection of claim 2.

Claim 4, 5, 10, 13, 14, 15, 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Sheng et al. (US 2018/0144042) and in view of Watts (US 2004/0186833) and in view of Baumgartner et al. (US 2005/022115).

With respect to claim 7, Sheng and Watt in combination teach the one or more computing devices of claim 1, but do not explicitly teach wherein said providing extracts markup-language network-accessible documents having URLs that match the filter pattern.
However, Baumgartner teaches wherein said providing extracts markup-language network-accessible documents having URLs that match the filter pattern ([0555, Document filters have as parent instance a string which can be identified as URL]; [0677, the parent string pattern has as instances strings which are URLs]; examiner’s note: the documents are matched according to the matching URLs).
It would have been obvious to one of ordinary skill in the art before the effective filling date of the invention to modify Sheng's invention which teaches clustering and label documents to include Watts which teaches matching documents are less than the total documents to include Baumgartner which teaches matching URLs to documents. Sheng, Watts and Baumgartner are in the same field of invention because all of them teach clustering documents. One would have been motivated to make this modification because it provides predictable results such as match URLs to documents to find matching documents which contains URLs to find multiple types of documents and classify them to find them faster.

With respect to claim 5, Sheng and Watt in combination teach the one or more computing devices of claim 1, but do not explicitly teach wherein the given sample document expresses content as a collection of nodes arranged in a tree data structure.
However, Baumgartner teaches wherein the given sample document expresses content as a collection of nodes arranged in a tree data structure ([0555, to define a document pattern first a tree pattern is needed to identify the element containing the link, then a string pattern as child of it with an attribute filter extracting the link itself, and then a document pattern as child using the URL and extracting the linked Web page]; [0551]; examiner’s note: the document is express as a tree data structure with nodes).
It would have been obvious to one of ordinary skill in the art before the effective filling date of the invention to modify Sheng's invention which teaches clustering and label documents to include Watts which teaches matching documents are less than the total documents to include Baumgartner which teaches nodes are arranged in a tree structure. Sheng, Watts and Baumgartner are in the same field of invention because all of them teach clustering documents. One would have been motivated to make this modification because it provides predictable results such as tree structure to visualize and access the document more efficiently.
With respect to claim 10, Sheng and Watt in combination teach the one or more computing devices of claim 9, but do not explicitly teach wherein said determining tests whether a URL associated with the new document matches the filter pattern.
However, Baumgartner teaches wherein said determining tests whether a URL associated with the new document matches the filter pattern ([0555, define a document pattern first a tree pattern is needed to identify the element containing the link, then a string pattern as child of it with an attribute filter extracting the link itself, and then a document pattern as child using the URL and extracting the linked Web page]; [0556]; [0559, only one document pattern (containing one automatically constructed filter based on the entered URL) is present]; examiner’s note: the URL is identified in the document that matches the pattern).
It would have been obvious to one of ordinary skill in the art before the effective filling date of the invention to modify Sheng's invention which teaches clustering and label documents to include Watts which teaches matching documents are less than the total documents to include Baumgartner which teaches matching URLs to documents. Sheng, Watts and Baumgartner are in the same field of invention because all of them teach clustering documents. One would have been motivated to make this modification because it provides predictable results such as match URLs to documents to find matching documents which contains URLs to find multiple types of documents and classify them to find them faster.
	
	Claim 13 is rejected on the basis of rejection of claim 10.
	Claim 14 is rejected on the basis of rejection of claim 4.
	Claim 15 is rejected on the basis of rejection of claim 5.
	Claim 20 is rejected on the basis of rejection of claim 4.

Claims 8, 17 is/are rejected under 35 U.S.C. 103 as being unpatentable over Sheng et al. (US 2018/0144042) and in view of Watts (US 2004/0186833) and in view of Biesterfeld et al. (US 2019/0361980).

With respect to claim 8, Sheng and Watt in combination teach the one or more computing devices of claim 1, Shen teaches wherein the data-extraction model incorporates knowledge imparted by the machine-trained labeling model via the set of labeled documents ([0034]; examiner’s note: data extraction models uses the knowledge that is stored by the machine trained labeling model), but do not explicitly teach but the data-extraction model consumes fewer computing resources than the machine-trained labeling model.
However, Biesterfeld teaches but the data-extraction model consumes fewer computing resources than the machine-trained labeling model ([0041, processing of data via Workflow 623A generally consumes fewer resources than Workflow 623B, which consumes fewer computing resources than Workflow 623D, which similarly consumes fewer resources than Workflow 623E]; examiner’s note: in the two systems, one of them users fewer resources).
It would have been obvious to one of ordinary skill in the art before the effective filling date of the invention to modify Sheng's invention which teaches clustering and label documents to include Watts which teaches matching documents are less than the total documents to include Biesterfeld which teaches that between multiple systems, one of the systems uses fewer resources. Sheng, Watts and Biesterfeld are in the same field of invention because all of them teach data extraction. One would have been motivated to make this modification because it provides predictable results to have a system which uses fewer resources to have more efficient system.

	Claim 17 is rejected on the basis of rejection of claim 8.

Conclusion

Any inquiry concerning this communication or earlier communications from the examiner should be directed to FATIMA P MINA whose telephone number is (571)270-3556. The examiner can normally be reached Monday - Friday 9:00 am - 5:00 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Mariela Reyes can be reached on 571-270-1006. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/FATIMA P MINA/           Examiner, Art Unit 2159                                                                                                                                                                                             /Mariela Reyes/Supervisory Patent Examiner, Art Unit 2159