DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application is being examined under the pre-AIA  first to invent provisions. 

The text of those sections of Title 35, U.S. Code not included in this action can be found in a prior Office action.

Response to Amendment
This communication is responsive to the applicant’s amendment dated 06/17/2021.  The applicant(s) amended claims 1, 9, and 17.

Response to Arguments
Applicant's arguments with respect to claims 1, 9, and 17 have been considered but are moot in view of the new ground(s) of rejection because the arguments pertain to the newly amended limitations.

Claim Rejections - 35 USC § 103
Claim(s) 1-2, 9-10, 17-18 are rejected under pre-AIA  35 U.S.C. 103(a) as being unpatentable over Bailey et al. (US 20110167054 A1) in view of Rahurkar et al. (“A Conference Classification and Meta Extraction System”, 2006).


Regarding claims 1, 9, and 17, Bailey teaches:
“a processor; and a computer-readable storage medium having instructions stored which, when executed by the processor, cause the processor to perform instructions comprising” and “A non-transitory computer-readable storage device having instructions stored which, when executed by a computing device, cause the computing device to perform operations comprising” (par. 0011; ‘Accordingly, one embodiment of the present invention is directed to one or more computer storage media storing computer-useable instructions that, when used by one or more computing devices, cause the one or more computing devices to perform a method.’);
“establishing a web-page analysis module (URL watch list generation component) configured to identify a likelihood of web pages having information relevant for a language model” (par. 0030; ‘The URL watch list generation component 306 identifies URLs that are relevant to the given subject area. In some embodiments, URLs relevant to the given subject area are determined based on search queries.’; par. 0031; ‘As noted above, the search engine session data may include search engine log files, query-click graphs, query histograms, search engine toolbar data, and web browser data. Similar techniques for determining relatedness as discussed above for identifying related search queries by the query expansion component 304 may be employed by the URL watch list generation component 306 to identify relevant URLs.’; par. 0037; ‘In some embodiments, the classifier for the classifier component 310 is created by crawling URLs in the URL watch list to obtain content and using existing technologies to perplexity]] value associated with a configuration of the respective web page relative to a vocabulary (subject area) on the respective web page to yield a crawling schedule” (par. 0030; ‘The URL watch list generation component 306 identifies URLs that are relevant to the given subject area. In some embodiments, URLs relevant to the given subject area are determined based on search queries.’; par. 0033; ‘In some cases, however, individual web pages may be determined by the URL watch list component 306 to be relevant, and the URLs corresponding to those individual web pages are added to the URL watch list.’)
“crawling, via a processor, the web pages based on the crawling schedule, to yield new vocabulary words (new content)” (par. 0035; ‘The crawler 308 is operable to crawl the web pages and websites identified by the URL watch list generation component 304 to identify new content.’); and
“generating a new language model based at least in part on the new vocabulary words” (par. 0037; ‘In some embodiments, the classifier for the classifier component 310 is created by crawling URLs in the URL watch list to obtain content and using existing technologies to create a language model for the particular subject area of interest based on the content.’).
Bailey teaches identifying relevant URLs (par. 0030). However, Bailey does not expressly teach a perplexity value, as in “establishing a web-page analysis module configured to identify a likelihood of web pages having information relevant for a perplexity value associated with a configuration of the respective web page relative to a vocabulary on the respective web page to yield a crawling schedule.”
Perplexity values are well-known in the art as a measure of predictability. 
Rahurkar teaches using perplexity to compare web-page content for relevancy (pg. 4, 4.2; ‘Perplexity was the distance measure used to compare the web-page content to determine if a given page is conference related page or not. Perplexity is information theoretic measure which roughly speaking gives the distance between two distributions.’).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date to modify Bailey’s method of identifying relevant URLs by incorporating the well-known perplexity values as taught by Rahurkar in order to determine whether or not to crawl a respective web page based at least in part on a perplexity value associated with a configuration of the respective web page relative to a vocabulary on the respective web page to yield a crawling schedule. The combination would provide a method of identifying conference related pages and extract entities. (Rahurkar: pg. 6)

 Regarding claims 2 (dep. on claim 1), 10 (dep. on claim 9), and 18 (dep. on claim 17), the combination of Bailey in view of Rahurkar further teaches:
“wherein the new language model is not based on a previous language model” (Bailey: par. 0037; ‘In some embodiments, the classifier for the classifier component .  

Claim Rejections - 35 USC § 103
Claims 3-6, 11-14, and 19-20 are rejected under pre-AIA  35 U.S.C. 103(a) as being unpatentable over Bailey in view of Rahurkar, further in view of Wang et al. (US 20110172988 A1).

Regarding claims 3 (dep. on claim 1), 11 (dep. on claim 9), and 19 (dep. on claim 17), Bailey teaches a new language model (par. 0037).
However, Bailey and Rahurkar do not expressly teach:
“wherein the new language model comprises an update of a previous language model.”
Wang teaches:
“wherein the new language model comprises an update of a previous language model” (Wang: par. 0019; ‘Each time the web crawler returns new information about the web, that information may be blended with the existing SLM to produce a new SLM--i.e., if the current SLM is the i.sup.th model, then the new data is blended with the i.sup.th model to produce the (i+1).sup.th model’).
Therefore, it would have been obvious to one of ordinary skill in the art at the time of the claimed invention to modify Bailey’s (in view of Rahurkar) classifier component and language model generation by incorporating Wang’s adaptive 

Regarding claims 4 (dep. on claim 1), 12 (dep. on claim 9), and 20 (dep. on claim 17), the combination of Bailey in view of Rahurkar and Wang further teaches:
“wherein the information relevant for the language model relates to a vocabulary gap in the language model” (Wang: par. 0019; ‘Each time the web crawler returns new information about the web, that information may be blended with the existing SLM to produce a new SLM--i.e., if the current SLM is the i.sup.th model, then the new data is blended with the i.sup.th model to produce the (i+1).sup.th model.’ The new information may be new vocabulary. In addition, this feature is well-known in the art, as evident by Bargeron et al. (US 20090083257 A1), par. 0060; ‘The date-stamped or date/time-stamped collections of normalized text documents are then stored in the categorized information storage component (442 in FIG. 4) for use by the language-model builder and the ontology builder. A vocabulary may be computed for, an associated with, each normalized-text-document package produced by the IAC component. Alternatively, vocabularies can be separately prepared and stored for each category.’).

Regarding claims 5 (dep. on claim 1) and 13 (dep. on claim 9), the combination of Bailey in view of Rahurkar and Wang further teaches:


Regarding claims 6 (dep. on claim 1) and 14 (dep. on claim 9), the combination of Bailey in view of Rahurkar and Wang further teaches:
“wherein the new language model is generated by modifying the language model” (Wang: par. 0019; ‘Each time the web crawler returns new information about the web, that information may be blended with the existing SLM to produce a new SLM--i.e., if the current SLM is the i.sup.th model, then the new data is blended with the i.sup.th model to produce the (i+1).sup.th model’).

Claims 7 and 15 are rejected under pre-AIA  35 U.S.C. 103(a) as being unpatentable over Bailey in view of Rahurkar, further in view of Fox et al. (US 8533226 B1)

Regarding claims 7 (dep. on claim 1) and 15 (dep. on claim 9), Bailey and Rahurkar do not expressly teach:
“updating a website visitation policy for the crawling once a specified number of pages is crawled.”
Fox teaches:
“updating a website visitation policy for the crawling once a specified number of pages is crawled” (Fox: col. 20, lines 40-47; ‘According to certain embodiments, the 
Therefore, it would have been obvious to one of ordinary skill in the art at the time of the claimed invention to modify Bailey’s (in view of Rahurkar) URL watch list generation and crawling methods by incorporating Fox’s crawl rate control such that a site owner for a specified website may control the rate at which crawlers or crawl robots crawl the specified website. (Fox: col. 19, lines 63-65)

Claims 8 and 16 are rejected under pre-AIA  35 U.S.C. 103(a) as being unpatentable over Bailey in view of Rahurkar, further in view of Wang et al. (US 20110231394 A1) (“Wang2”).

Regarding claims 8 (dep. on claim 1) and 16 (dep. on claim 9), Bailey and Rahurkar do not expressly teach:
“wherein the new language model is generated from a merging of a set of language models.”
Wang2 teaches:
“wherein the new language model is generated from a merging of a set of language models” (par. 0065; ‘At 504, mixture weights are estimated for combining the individual language models into a mixed language model.’).


Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MARK VILLENA whose telephone number is (571)270-3191.  The examiner can normally be reached on 10 am - 6pm EST Monday through Friday.

If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Richemond Dorvil can be reached on (571) 272-7602.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


MARK . VILLENA
Examiner
Art Unit 2658



/MARK VILLENA/           Examiner, Art Unit 2658