DETAILED ACTION
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 02/22/2021 has been entered.
	Claims 1-20 are pending.
	Claims 6, 10, and 18 have been cancelled.
	Claims 21-23 have been added.
	Claims 1, 11 and 16 are independent claims

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.




Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 11-15 are rejected under 35 U.S.C. 103 as being unpatentable over Monsarrat (U.S. Patent Application Publication No. 2008/0071829 A1, filed 09/14/2006, published 03/20/2008) in view of Chang et al. (hereinafter Chang, U.S. Patent Application Publication No. 2007/0266342 A1, filed 05/10/2007, published 11/15/2007).
In regard to independent claim 11, Monsarrat teaches:
A computer storage media having stored thereon computer-executable instructions that when executed by a processor causes the processor to perform a method, the method comprising:
executing an annotation template application for a web browser (at least p. 2, [0045]; p. 3, [0050], [0055]; p. 4, [0064], [0072]; Figures 1, 3b, 4, 5a [Wingdings font/0xE0] Monsarrat teaches a Set Up System 101 operable via a web browser by one or more Set Up Experts 100 on a copy of a web page from which data is desired to be extracted);
receiving a web document (at least p. 2, [0045]; p. 3, [0050], [0055]; p. 4, [0064], [0072]; Figures 1, 3b, 4, 5a [Wingdings font/0xE0] Monsarrat teaches a Set Up System 101 operable via a web browser by one or more Set Up Experts 100 on a Web Page With Data 300 from which data is desired to be extracted)
annotating an element in the web document with the annotation template application to create an annotated web document (at least pp. 3-4, [0052]-[0064]; Figures 3a-b [Wingdings font/0xE0] Monsarrat teaches a Set Up System 101 that inserts special HTML tags into the Copy of Web Page 300 to annotate selected data and define a data type for that data. In addition, a Set Up Expert 100 highlights other data, such as “January 6 – January 8” and “Championship Auto Shows”, until Copy of Web Page 300 is All Marked Up 307. The Set Up System 101 then stores the Copy of Web Page 300 With All Data Marked Up 307 as a “template” for future use).
extracting metadata from the web document based on the annotated element in the web document (at least pp. 3-4, [0048]-[0064]; Figures 3A-C and 4 [Wingdings font/0xE0] Monsarrat teaches, in reference to Figure 3, that a user creates a template (to be used for later web scraping) where content of interest is annotated by a user (“annotation” by selecting/highlighting desired content and/or “annotation” by the addition of content (see pp. 3-4, [0051]-[0063]; Figure 3). The content that is added is largely metadata about the selected/highlighted desired content. The template generated comprises (1) the original Web page’s HTML in full; (2) Annotations showing: (A) the location of the element on the web page that contains the desired information; (B) the data type of the information; and (C) the relation between the information and other data on the Web page or elsewhere (see p. 3, [0055]-[0060]). An example of the Set Up Expert 100 are then made into a template for that page and stored in a database. In the process that converts the annotated copy of the web page to a template, the metadata is extracted. The created template(s) are then “processed” (e.g. by a web extracting service) to gather or collect or extract the previously annotated (annotation by highlighting content and/or by the addition of content (see pp. 3-4, [0051]-[0063]; Figure 3)) data. Once gathered or collected or extracted, the data is post-processed to connect data together, resolve conflicts, and report possible errors. The Set Up Expert 100 corrects any remaining errors and resolves any remaining conflicts).
sending the annotated web document and the extracted metadata from the web document to an extraction service (at least pp. 3-4, [0052]-[0064]; Figures 3a-b [Wingdings font/0xE0] Monsarrat teaches a Set Up System 101 that inserts special HTML tags into the Copy of Web Page 300 to annotate selected data and define a data type for that data. In addition, a Set Up Expert 100 highlights other data, such as “January 6 – January 8” and “Championship Auto Shows”, until Copy of Web Page 300 is All Marked Up 307. The Set Up System 101 then stores the Copy of Web Page 300 With All Data Marked Up 307 as a “template” for future use).
Monsarrat fails to explicitly teach:
sending … extracted metadata from the web document to an extraction service.
However, Chang
sending … extracted metadata from the web document to an extraction service (at least pp. 7-9, [0077]-[0099]; Figure 3 [Wingdings font/0xE0] Chang teaches web notebook tools that allow a user to extract annotated content and associated metadata from web documents for placement into a web notebook (see at least pp. 7-8, [0086]-[0088])) and
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Chang with those of Monsarrat as both inventions relate to the annotation and subsequent extraction or capture of information from web pages. Adding the teaching of Chang to Monsarrat provides Monsarrat with additional functionality enabling it to capture, in addition to annotated content, underlying and preexisting metadata.

In regard to dependent claim 12, Monsarrat teaches:
receiving the annotation template application from the extraction service (at least [0052]; Figure 3 [Wingdings font/0xE0] Monsarrat teaches display of The Copy of Web Page With Data 300 in a web browser on which is running a Java applet (presumably as part of the annotation process)).

In regard to dependent claim 13, Monsarrat teaches:
the annotation of the element is a visual indicia placed in the web document (at least pp. 3-4, [0050]-[0063]; Figures 3a-c, 4 [Wingdings font/0xE0] Monsarrat teaches that The Copy of Web Page With Data 300 is displayed in a Web browser on which is running a Java applet. As shown in Fig. 3a, Set Up Expert 100 uses the mouse to highlight items on Activity 314, is highlighted with a dashed rectangle (i.e. visual indicia)).

In regard to dependent claim 14, Monsarrat teaches:
the annotation of the element indicates a location of the element within the web document (at least p. 2, [0045]; pp. 3-4, 3[0057]-[0063]; pp. 4-5, [0073]-[0078]; Figures 4-9 [Wingdings font/0xE0] Monsarrat teaches that the annotations indicate the location of the elements in the web document).

In regard to dependent claim 15, Monsarrat teaches:
the annotation of the element indicates a type of content associated with the element within the web document (at least p. 2, [0045]; pp. 3-4, [0057]-[0063]; pp. 4-5, [0073]-[0078]; Figures 4-9 [Wingdings font/0xE0] Monsarrat teaches that the annotations indicate the different types of content in the web document; assignable via dynamic pull-down menus invoked by a right-click of a mouse).

Claims 1-5, 7, and 16-19 are rejected under 35 U.S.C. 103 as being unpatentable over Monsarrat in view of Orelind et al. (hereinafter Orelind, U.S. Patent Application Publication No. 20100107055 A1, filed 01/06/2010, published 04/29/2010).
In regard to independent claim 1, Monsarrat teaches:
A method comprising:
receiving, at a template service, a first annotated web document associated with a web document from a first client (at least pp. 2-4, [0044]-[0064]; Figures 1, 2, 3a-b [Wingdings font/0xE0] Monsarrat teaches a Set Up Expert 100 that uses a Set Up System 101 that inserts special HTML tags into the Copy of Web Page 300 to annotate selected data and define a data type for that data. In addition, the Set Up Expert 100 highlights other data, such as “January 6 – January 8” and “Championship Auto Shows”, until the Copy of Web Page 300 is All Marked Up 307. The Set Up System 101 then stores the Copy of Web Page 300 With All Data Marked Up 307 as a template” for future use);
receiving, at the template service, a second annotated web document associated with the web document from a second client (at least pp. 2-4, [0044]-[0064]; Figures 1, 2, 3a-b [Wingdings font/0xE0] Monsarrat teaches a Set Up Expert 100 that uses a Set Up System 101 that inserts special HTML tags into the Copy of Web Page 300 to annotate selected data and define a data type for that data. In addition, the Set Up Expert 100 highlights other data, such as “January 6 – January 8” and “Championship Auto Shows”, until the Copy of Web Page 300 is All Marked Up 307. The Set Up System 101 then stores the Copy of Web Page 300 With All Data Marked Up 307 as a “template” in a Database 104 for future use. The system, as described by Monsarrat, may be used by multiple individuals, each submitting annotated copies of selected web pages to be made into “templates” and stored for future use in the Database 104),
wherein the first annotated web document and the second annotated web document are associated with similar content from a same domain (at least pp. 2-4, [0044]-[0064]; Figures 1, 2, 3a-b [Wingdings font/0xE0] Monsarrat teaches that first, the Set Up Expert 100 characterizes the data domain of the data to be gathered from the Web, using a Data Schema 113. For example, if the data domain is Data Schema 113 would specify that cars have a make, model, and year of manufacture. The Set Up Expert 100 then uses the Set Up System 101 to browse to a web page and mark the location of information, creating a template. This may be repeated across thousands of Web sites, but one template will usually suffice for a single page, and an entire group of Web pages that have a similar look and feel (see p. 2, [0045]; Fig. 1)).
Monsarrat fails to explicitly teach:
generating a combined template indicating a structure of the web document based on the first annotated web document and the second annotated web document;
However, Orelind teaches:
Note: the Specification, at pages 9-10, [0052] states that “The conflation component 222 can combine the templates or reduce the templates into a single or into a smaller set of template information. Conflation can include determining what types of information may be within the templates, which templates provide the best information based on template ranking or other information, or other types of analysis. The output of the conflation component 222 can be a single or a reduced set of information about a domain and the web content within that domain.

generating a combined template indicating a structure of the web document based on the first annotated web document and the second annotated web document (at least Abstract; pp. 3-4, [0030], [0032], [0037]-[0038]; pp. 4-5, [0047]-[0051]; Figure 5 [Wingdings font/0xE0] Orelind teaches a method of merging (combining) pairs of sufficiently similar extraction rules using clustering techniques and replacing the pairs of extraction rules with a merged (combined) extraction rule).
Orelind with those of Monsarrat as both invention are related to the generation of data or information extraction templates/rules for use with electronic documents. Adding the teaching of Orelind provides Monsarrat with a way to improve the efficiency and effectiveness of data extraction of templates/rules created or edited by multiple contributors by merging or combining them based on how well they work at extracting the desired data.
Monsarrat further teaches:
storing the combined template in a template data store (at least pp. 2-4, [0044]-[0064]; Figures 1, 2, 3a-b [Wingdings font/0xE0] Monsarrat teaches a Set Up System 101 that inserts special HTML tags into the Copy of Web Page 300 to annotate selected data and define a data type for that data. In addition, a Set Up Expert 100 highlights other data, such as “January 6 – January 8” and “Championship Auto Shows”, until Copy of Web Page 300 is All Marked Up 307. The Set Up System 101 then stores the Copy of Web Page 300 With All Data Marked Up 307 as a “template” for future use).

In regard to dependent claim 2, Monsarrat teaches:
the first annotated web document annotates a structure for the web document (at least pp. 2-4, [0044]-[0064]; Figures 1, 2, 3a-b [Wingdings font/0xE0] Monsarrat teaches that The Copy of Web Page With Data 300 is displayed in a Web browser on which is running a Java applet. As shown in Fig. 3a, Set Up Expert 100 uses the mouse to highlight Activity 314, is highlighted with a dashed rectangle (i.e. visual indicia)).

In regard to dependent claim 3, Monsarrat fails to explicitly teach:
the similar content is a first type of web document associated with the same domain.
However, Orelind teaches:
the similar content is a first type of web document associated with the same domain (at least Abstract; pp. 3-4, [0030], [0032], [0037]-[0038]; pp. 4-5, [0047]-[0051]; Figure 5 [Wingdings font/0xE0] Orelind teaches a method of merging (combining) pairs of sufficiently similar extraction rules using clustering techniques and replacing the pairs of extraction rules with a merged (combined) extraction rule).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Orelind with those of Monsarrat as both invention are related to the generation of data or information extraction templates/rules for use with electronic documents. Adding the teaching of Orelind provides Monsarrat with a way to improve the efficiency and effectiveness of data extraction of templates/rules created or edited by multiple contributors by merging or combining them based on how well they work at extracting the desired data.

In regard to dependent claim 4, Monsarrat teaches:
based on the template, generate structural information associated with a structural element of a first type of web content (at least p. 3, [0045]-[0064]; Figures 1, 2, 3a-c, 4 [Wingdings font/0xE0] Monsarrat teaches the generation of one or more templates intended to gather information from a Web Site. Target Web Sites are first identified that are relevant to Data Schemas 113; see Figs. 1-2, p. 2, [0045]-[0047]), are then stored with all their “structural information”; extracted data is stored in a Database 104 (see Figs. 1-2, p. 2, [0046]; p. 4, [0067]; p. 5, [0094]; p. 6, [0104])).

In regard to dependent claim 5, Monsarrat teaches:
Note: it is unclear as to what constitutes a knowledge graph in this instance? Is it associated with the components of the template, or is it describing the data extracted by the template. Please clarify.

building a knowledge graph for the same domain using the structural information (at least p. 3, [0045]-[0064]; Figures 1, 2, 3a-c, 4 [Wingdings font/0xE0] Monsarrat teaches the generation of one or more templates intended to gather information from a Web Site. Target Web Sites are first identified that are relevant to the desired data domain. Web pages are annotated to identify the locations and data types of the desired data or information to be extracted. The annotated Web pages (Data Schemas 113; see Figs. 1-2, p. 2, [0045]-[0047]), are then stored with all their “structural information”; extracted data is stored in a Database 104 (see Figs. 1-2, p. 2, [0046]; p. 4, [0067]; p. 5, [0094]; p. 6, [0104])).

In regard to dependent claim 7, Monsarrat fails to teach:
Note: as claimed, first/second templates are interpreted to be from the same or different annotated web document. Further, the annotators may be the same or different annotators.

generating the combined template comprises generating a first template from the first annotated web document and a second template from the second annotated web document.
Orelind teaches:
generating the combined template comprises generating a first template from the first annotated web document and a second template from the second annotated web document (at least Abstract; pp. 3-4, [0030], [0032], [0037]-[0038]; pp. 4-5, [0047]-[0051]; Figure 5 [Wingdings font/0xE0] Orelind teaches a method of merging (combining) pairs of sufficiently similar extraction rules using clustering techniques and replacing the pairs of extraction rules with a merged (combined) extraction rule).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Orelind with those of Monsarrat as both invention are related to the generation of data or information extraction templates/rules for use with electronic documents. Adding the teaching of Orelind provides Monsarrat with a way to improve the efficiency and effectiveness of data extraction of templates/rules created or edited by multiple contributors by merging or combining them based on how well they work at extracting the desired data.

In regard to independent claim 16, Monsarrat teaches:
An extraction service server comprising:
a memory having stored thereon computer-executable instructions; and a processor, in communication the memory (see Monsarrat at p. 3, [0054] mentions a computer; the computer is assumed to comprise at least a processor and memory), to execute the computer-executable instructions to perform a method comprising:
receiving, at a template service executed with the processor, a first annotated web document from a first client (at least pp. 2-4, [0044]-Monsarrat teaches a Set Up Expert 100 that uses a Set Up System 101 that inserts special HTML tags into the Copy of Web Page 300 to annotate selected data and define a data type for that data. In addition, the Set Up Expert 100 highlights other data, such as “January 6 – January 8” and “Championship Auto Shows”, until the Copy of Web Page 300 is All Marked Up 307. The Set Up System 101 then stores the Copy of Web Page 300 With All Data Marked Up 307 as a template” for future use);
receiving, at the template service, a second annotated web document from a second client, wherein the first annotated web document and the second annotated web document are associated with similar content from a same domain (at least pp. 2-4, [0044]-[0064]; Figures 1, 2, 3a-b [Wingdings font/0xE0] Monsarrat teaches a Set Up Expert 100 that uses a Set Up System 101 that inserts special HTML tags into the Copy of Web Page 300 to annotate selected data and define a data type for that data. In addition, the Set Up Expert 100 highlights other data, such as “January 6 – January 8” and “Championship Auto Shows”, until the Copy of Web Page 300 is All Marked Up 307. The Set Up System 101 then stores the Copy of Web Page 300 With All Data Marked Up 307 as a “template” in a Database 104 for future use. The system, as described by Monsarrat, may be used by multiple individuals, each submitting annotated copies of selected web pages to be made into “templates” and stored for future use in the Database 104);
Monsarrat fails to explicitly teach:
based on the first annotated web document and the second annotated web document, generating a combined template indicating a structure for content of the same domain based on the first annotated web document and the second annotated web document;
However Orelind teaches:
Note: the Specification, at pages 9-10, [0052] states that “The conflation component 222 can combine the templates or reduce the templates into a single or into a smaller set of template information. Conflation can include determining what types of information may be within the templates, which templates provide the best information based on template ranking or other information, or other types of analysis. The output of the conflation component 222 can be a single or a reduced set of information about a domain and the web content within that domain.

based on the first annotated web document and the second annotated web document, generating a combined template indicating a structure for content of the same domain based on the first annotated web document and the second annotated web document (at least Abstract; pp. 3-4, [0030], [0032], [0037]-[0038]; pp. 4-5, [0047]-[0051]; Figure 5 [Wingdings font/0xE0] Orelind teaches a method of merging (combining) pairs of sufficiently similar extraction rules using clustering techniques and replacing the pairs of extraction rules with a merged (combined) extraction rule).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Orelind with those of Monsarrat as both invention are related to the generation of data or information extraction templates/rules for use Orelind provides Monsarrat with a way to improve the efficiency and effectiveness of data extraction of templates/rules created or edited by multiple contributors by merging or combining them based on how well they work at extracting the desired data.
Monsarrat further teaches:
storing the template in a structural data repository (at least pp. 2-4, [0044]-[0064]; Figures 1, 2, 3a-b [Wingdings font/0xE0] Monsarrat teaches a Set Up System 101 that inserts special HTML tags into the Copy of Web Page 300 to annotate selected data and define a data type for that data. In addition, a Set Up Expert 100 highlights other data, such as “January 6 – January 8” and “Championship Auto Shows”, until Copy of Web Page 300 is All Marked Up 307. The Set Up System 101 then stores the Copy of Web Page 300 With All Data Marked Up 307 as a “template” for future use).
based on the template, generating structural information associated with a structural element of the similar content (at least pp. 2-4, [0044]-[0064]; Figures 1, 2, 3a-b [Wingdings font/0xE0] Monsarrat teaches the generation of one or more templates intended to gather information from a Web Site. Target Web Sites are first identified that are relevant to the desired data domain. Web pages are annotated to identify the locations and data types of the desired data or information to be extracted. The annotated Web pages (Data Schemas 113; see Figs. 1-2, p. 2, [0045]-structural information”; extracted data is stored in a Database 104 (see Figs. 1-2, p. 2, [0046]; p. 4, [0067]; p. 5, [0094]; p. 6, [0104])); and
Note: it is unclear as to what constitutes a knowledge graph in this instance? Is it associated with the components of the template, or is it describing the data extracted by the template. Please clarify.
		
building a knowledge graph for the same domain using the structural information (at least pp. 2-4, [0044]-[0064]; Figures 1, 2, 3a-b [Wingdings font/0xE0] Monsarrat teaches the generation of one or more templates intended to gather information from a Web Site. Target Web Sites are first identified that are relevant to the desired data domain. Web pages are annotated to identify the locations and data types of the desired data or information to be extracted. The annotated Web pages (Data Schemas 113; see Figs. 1-2, p. 2, [0045]-[0047]), are then stored with all their “structural information”; extracted data is stored in a Database 104 (see Figs. 1-2, p. 2, [0046]; p. 4, [0067]; p. 5, [0094]; p. 6, [0104])).

In regard to dependent claim 17, Monsarrat teaches:
the first annotated web document annotates a structure for a web document (at least pp. 2-4, [0044]-[0064]; Figures 1, 2, 3a-b [Wingdings font/0xE0] Monsarrat teaches that The Copy of Web Page With Data 300 is displayed in a Web browser on which is running a Java applet. As shown in Fig. 3a, Set Up Expert 100 uses the mouse to highlight items on the page. As shown in Fig. 3a, for example, the entire Activity 314, is highlighted with a dashed rectangle (i.e. visual indicia)).

In regard to dependent claim 19, Monsarrat fails to explicitly teach:
Note: as claimed, first/second templates are interpreted to be from the same or different annotated web document. Further, the annotators may be the same or different annotators.
Note: the Specification, at pages 9-10, [0052] states that “The conflation component 222 can combine the templates or reduce the templates into a single or into a smaller set of template information. Conflation can include determining what types of information may be within the templates, which templates provide the best information based on template ranking or other information, or other types of analysis. The output of the conflation component 222 can be a single or a reduced set of information about a domain and the web content within that domain.

generating the combined template comprises generating a first template from the first annotated web document and a second template from the second annotated web document.
However, Orelind teaches:
generating the combined template comprises generating a first template from the first annotated web document and a second template from the second annotated web document (at least Abstract; pp. 3-4, [0030], [0032], [0037]-[0038]; pp. 4-5, [0047]-[0051]; Figure 5 [Wingdings font/0xE0] Orelind teaches a method of merging (combining) pairs of sufficiently similar extraction rules using clustering techniques and replacing the pairs of extraction rules with a merged (combined) extraction rule).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Orelind with those of Monsarrat as both invention are related to the generation of data or information extraction templates/rules for use with electronic documents. Adding the teaching of Orelind provides Monsarrat with a way to improve the efficiency and effectiveness of data extraction of templates/rules created or edited by multiple contributors by merging or combining them based on how well they work at extracting the desired data.
s 8-9, 20 and 21-23 are rejected under 35 U.S.C. 103 as being unpatentable over Monsarrat in view of Orelind, and in further view of Madhani et al. (hereinafter Madhani, U.S. Patent Application Publication No. 2015/0127659 A1, filed 11/01/2013, published 05/07/2015).
In regard to dependent claim 8, Monsarrat and Orelind fail to explicitly teach
generating the template further comprises ranking the first template over the second template.
However, Madhani teaches:
generating the template further comprises ranking the first template over the second template (at least Abstract; p. 1, [0008]-[0012]; p. 4, [0042]-[0049], [0051]; p. 5, [0054], [0068]; pp. 6-7, [0082]-[0089]; Figures 1-3 [Wingdings font/0xE0] Madhani teaches ranking of data extraction templates by assigning a ranking score and processes by which a higher ranked template is selected over one of lower ranking).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Madhani with those of Monsarrat and Orelind as all three invention are related to the generation of data or information extraction templates/rules for use with electronic documents. Adding the teaching of Madhani provides Monsarrat and Orelind with a way to improve the efficiency and effectiveness of data extraction by templates/rules created or edited by multiple contributors by selection of the best or highest ranked template(s).






In regard to dependent claim 9, Monsarrat fails to explicitly teach:
Note: the Specification, at pages 9-10, [0052] states that “The conflation component 222 can combine the templates or reduce the templates into a single or into a smaller set of template information. Conflation can include determining what types of information may be within the templates, which templates provide the best information based on template ranking or other information, or other types of analysis. The output of the conflation component 222 can be a single or a reduced set of information about a domain and the web content within that domain.

generating the combined template further comprises conflating the first template with the second template into a conflated template.
However, Orelind teaches:
generating the combined template further comprises conflating the first template with the second template into a conflated template (at least Abstract; pp. 3-4, [0030], [0032], [0037]-[0038]; pp. 4-5, [0047]-[0051]; Figure 5 [Wingdings font/0xE0] Orelind teaches a method of merging (combining) pairs of sufficiently similar extraction rules using clustering techniques and replacing the pairs of extraction rules with a merged (combined) extraction rule).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Orelind with those of Monsarrat as both invention are related to the generation of data or information extraction templates/rules for use with electronic documents. Adding the teaching of Orelind provides Monsarrat with a way to improve the efficiency and effectiveness of data extraction of templates/rules created or edited by multiple contributors by merging or combining them based on how well they work at extracting the desired data.




In regard to dependent claim 20, Monsarrat and Orelind fail to explicitly teach:
Note: the Specification, at pages 9-10, [0052] states that “The conflation component 222 can combine the templates or reduce the templates into a single or into a smaller set of template information. Conflation can include determining what types of information may be within the templates, which templates provide the best information based on template ranking or other information, or other types of analysis. The output of the conflation component 222 can be a single or a reduced set of information about a domain and the web content within that domain.

generating the combined template further comprises: ranking the first template over the second template; and conflating the first template with the second template.
However, Madhani teaches:
generating the combined template further comprises: ranking the first template over the second template (at least Abstract; p. 1, [0008]-[0012]; p. 4, [0042]-[0049], [0051]; p. 5, [0054], [0068]; pp. 6-7, [0082]-[0089]; Figures 1-3 [Wingdings font/0xE0] Madhani teaches ranking of data extraction templates by assigning a ranking score and processes by which a higher ranked template is selected over one of lower ranking); and
conflating the first template with the second template (at least Abstract; p. 1, [0008]-[0012]; pp. 13-15, [0150]-[0171]; Figures 1-3 [Wingdings font/0xE0] Madhani teaches generating multiple data extraction templates submitted my multiple contributing individuals (see p.3 , [0031]) and assigning a score to each based on their ability to properly extract data from a source document. Specifically, Madhani speaks of aggregating (combining) ranked data extraction template data associated with two or more data extraction templates associated with a specific source document type);
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Madhani with those of Monsarrat and Orelind as all three inventions are related to the generation of data or Madhani provides Monsarrat and Orelind with a way to improve the efficiency and effectiveness of data extraction by templates/rules created or edited by multiple contributors by selection of the best or highest ranked template(s).

In regard to dependent claim 21, Monsarrat and Orelind fail to explicitly teach:
generating the combined template comprises: generating a first set of rules for the first annotated web document; generating a second set of rules associated with the second annotated web document; and processing the first set of rules and the second set of rules to generate a combined set of rules for the combined template.
However, Madhani teaches:
generating the combined template comprises: generating a first set of rules for the first annotated web document; generating a second set of rules associated with the second annotated web document; and processing the first set of rules and the second set of rules to generate a combined set of rules for the combined template (at least Abstract; p. 1, [0008]-[0012]; pp. 13-15, [0150]-[0171]; Figures 1-3 [Wingdings font/0xE0] Madhani teaches generating multiple data extraction templates submitted my multiple contributing individuals (see p.3 , [0031]) and assigning a score to each based on their ability to properly extract data from a source document. Specifically, Madhani speaks of aggregating (combining) ranked data extraction template data associated with two or more data extraction templates associated with a specific source document type);
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Madhani with those of Monsarrat and Orelind as both invention are related to the generation of data or information extraction templates/rules for use with electronic documents. Adding the teaching of Madhani provides Monsarrat and Orelind with a way to improve the efficiency and effectiveness of data extraction by templates/rules created or edited by multiple contributors by selection of the best or highest ranked template(s).

In regard to dependent claim 22, Monsarrat fails to explicitly teach:
Note: it is unclear where these limitations are described in the Specification? The phrase “common rule” is not found in the Specification nor is the term “common”. Are these limitations related to the validator component 208 (see Specification at p. 8, [0047])? It would appear that the result is to keeping rules that are found in each rule of a set of rules?

processing the first set of rules and the second set of rules comprises: determining a common rule is in the first set of rules and the second set of rules; and including the common rule in the combined set of rules.
However, Orelind teaches:
processing the first set of rules and the second set of rules comprises: determining a common rule is in the first set of rules and the second set of rules; and including the common rule in the combined set of rules (at least Abstract; pp. 3-4, [0030], [0032], [0037]-[0038]; pp. 4-5, [0047]-[0051]; Figure 5 [Wingdings font/0xE0] Orelind teaches a method of merging (combining) pairs of sufficiently similar extraction rules using clustering techniques and replacing the pairs of extraction rules with a merged (combined) extraction rule).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Orelind with those of Monsarrat as both invention are related to the generation of data or information Orelind provides Monsarrat with a way to improve the efficiency and effectiveness of data extraction of templates/rules created or edited by multiple contributors by merging or combining them based on how well they work at extracting the desired data.

In regard to dependent claim 23, Monsarrat fails to explicitly teach:
Note: it is unclear where these limitations are described in the Specification? The phrase “common rule” is not found in the Specification nor is the term “common”. Are these limitations related to the validator component 208 (see Specification at p. 8, [0047])?

processing the first set of rules and the second set of rules comprises: identifying a conflict between a conflicting rule of the first set of rules that conflicts with a conflicting rule of the second set of rules; and determining between the conflicting rule of the first set of rules and the conflicting rule of the second set of rules to resolve the conflict; and including the determined rule in the combined set of rule.
However, Orelind teaches:
processing the first set of rules and the second set of rules comprises: identifying a conflict between a conflicting rule of the first set of rules that conflicts with a conflicting rule of the second set of rules; and determining between the conflicting rule of the first set of rules and the conflicting rule of the second set of rules to resolve the conflict; and including the determined rule in the combined set of rules (at least Abstract; pp. 3-4, [0030], [0032], [0037]-[0038]; pp. 4-5, [0047]-[0051]; Figure 5 [Wingdings font/0xE0] Orelind teaches a method of merging (combining) pairs of sufficiently similar extraction rules using clustering techniques and replacing the pairs of extraction rules with a merged (combined) extraction rule).
Orelind with those of Monsarrat as both invention are related to the generation of data or information extraction templates/rules for use with electronic documents. Adding the teaching of Orelind provides Monsarrat with a way to improve the efficiency and effectiveness of data extraction of templates/rules created or edited by multiple contributors by merging or combining them based on how well they work at extracting the desired data.

Response to Arguments
Regarding the previous rejection of independent claim 1, Applicant has amended claim 1 as indicated below:

1.	A method comprising:
receiving, at a template service, a first annotated web document associated with a web document from a first client;
receiving, at the template service, a second annotated web document associated with the web document from a second client,
wherein the first annotated web document and the second annotated web document are associated with similar content from a same domain;
generating a combined template indicating a structure of the web document based on the first annotated web document and the second annotated web document; and
storing the combined template in a template data store.

Applicant states that: Non-limiting examples of the claims of this application relate to the crowdsourcing of the extraction of structural data from web pages. In some cases, multiple users can annotate web pages with a web application (e.g., a plug-in) to both identify the different types of, and indicate the location of, [structural] data on the web page.

The Examiner notes that the crowdsourcing aspect of this invention appears to be limited to the annotation of web pages with a web application (e.g., a plug-in) to both identify the different types of, and indicate the location of, data on the web page.
Applicant further state that: Templates and/or rules can be generated for the web page based on these annotations, where an extraction service can automatically extract structural data, and in some cases, the underlying metadata, from the webpage locations corresponding with these annotations. In the case where multiple users annotate similar web pages, differences may exist between the users’ annotations (e.g., different templates are generated for multiple, similar, web pages).

Applicant further states that: In this case, the multiple templates can be conflated into a “best” template (e.g., a comprehensive template) based upon a ranking of the generated templates.

The Examiner notes that the decision as to what templates are conflated appears to be based on how the system “ranks” them. Thus, for example, does a conflated template comprise only the top ranked individual templates?

Further, the Specification does not appear to describe how, or by what method, templates and/or rules are conflated or combined or merged or aggregated, etc.?


Applicant further states: This comprehensive template is shared with the users and is used as a base template for future annotations of webpages. Additionally, a knowledge graph can be built for the web page based on the many iterations of annotations and ranking of the many different templates, and such information is used to develop an intelligent model that can automatically extract structural data from new webpages.









Regarding the previous rejection of independent claim 11, Applicant argues that both the prior art of Monsarrat and the prior art of Chang fail to teach or suggest at least:

extracting metadata from the web document based on the annotated element in the web document; and
sending the annotated web document and the extracted metadata from the web document to an extraction service,

	The Examiner respectfully disagrees that Monsarrat fails to teach these limitations.
In reference to Figure 3 of Monsarrat, Monsarrat teaches that a user creates a template (to be used for later web scraping) where content of interest is annotated by a user (“annotation” by selecting/highlighting desired content and/or “annotation” by the addition of content (see pp. 3-4, [0051]-[0063]; Figure 3). The content that is added is largely metadata about the selected/highlighted desired content.
The template generated comprises (1) the original Web page’s HTML in full; (2) Annotations showing: (A) the location of the element on the web page that contains the desired information; (B) the data type of the information; and (C) the relation between the information and other data on the Web page or elsewhere (see p. 3, [0055]-[0060]). An example of the template generated is illustrated in Fig. 4. The “annotations” made by the Set Up Expert 100 are then made into a template for metadata is extracted.
	The created template(s) are then “processed” (e.g. by a web extracting service) to gather or collect or extract the previously annotated (annotation by highlighting content and/or by the addition of content (see pp. 3-4, [0051]-[0063]; Figure 3)) data. Once gathered or collected or extracted, the data is post-processed to connect data together, resolve conflicts, and report possible errors. The Set Up Expert 100 corrects any remaining errors and resolves any remaining conflicts.

Regarding the previous rejection of independent claim 1 (and similarly independent claim 16), it was previously argued that the combination of Monsarrat and Madhani fails to disclose, inter alia,

based on the first annotated web document and the second annotated web document, generating a template indicating a structure of the web document.

The Examiner notes that this limitation has been amended:

combined template indicating a structure of the web document based on the first annotated web document and the second annotated web document; and

combined or merged or aggregated, etc.?
For example, does a second template get added to a first template to create a new template containing both the first and second templates? Does a second template get added to a first template to refine or narrow the first template? The Examiner has found in the prior art that the creation of extraction templates/rules/wrappers often follows a top-down or bottom-up approach. The top-down approach starts with a very general extraction template/rule/wrapper and ends with a very specific template/rule/wrapper, whereas, a bottom-up approach starts with a very specific template/rule/wrapper and ends with a very general extraction template/rule/wrapper (general in the sense of being comprehensive such that it handles variations).

Applicant states that: However, Madhani relates to the ranking of data extraction templates so that the highest ranked template is used in the extraction of data from a webpage. See Madhani, Abstract.

At the portions of Madhani cited in the Office Action, Madhani describes the aggregation of the template data where “ranked data extraction template data associated with two or more data extraction templates…are then aggregated and stored for use with new source documents of the specific source document type.” Madhani, para. [0034].
Madhani describes that the aggregation of the data is meant to serve as the basis for selecting the “highest ranking” data template for future use, and not for the generation of a new, combined, template. See Madhani, para. [0162].

Said another way, Madhani describes a situation where template data from two or more templates are ranked, aggregated and stored, then the aggregated template data is used to select which of the two templates is to be use on the new document.

In fact, the aggregation and storage of template data from two (or more) templates is not the same as the generation of a single comprehensive template based on two different annotated web documents.

First, the Examiner would agree that Monsarrat does not appear to teach any sort of “combination” of templates, but rather uses separate templates for each web page from which content is to be extracted.
Second, the Specification at [0029] states “The templates/rules conflation module can conflate template information or even combine template information into more comprehensive templates; suggesting that there is a difference between conflating and combining? The Specification at [0052] states “The conflation component 222 can combine the templates or reduce the templates into a single or into a smaller set of template information. Conflation can include determining what types of information may be within the templates, which template ranking or other information, or other types of analysis. The output of a conflation component 222 can be a single or reduced set of information about a domain and the web content within that domain….”
The Examiner has conducted a further search and has identified the Orelind et al. (US 2010/0107055 A1) reference.
Orelind teaches an extraction-rule generation and training system that uses information obtained from multiple markup language documents of similar structure to generate an extraction rule for extracting datapoints (see Orelind at p. 3, [0030] for definition) from markup language documents (Abstract). In some embodiments, (see pp. 4-5, [0047]-[0051]; Fig. 5), pairs of rules (for extracting the same datapoint) that are sufficiently similar are merged together to form a merged rule replacing that pair of rules in the set of rules.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to James H Blackwell whose telephone number is (571)272-4089.  The examiner can normally be reached on M-F 04:30AM - 12:30PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an 
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Cesar Paula can be reached on 571-272-4128.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/James H. Blackwell/
03/25/2021

/CESAR B PAULA/Supervisory Patent Examiner, Art Unit 2177