DETAILED ACTION

This Office Action is in response to the Preliminary Amendments filed July 2, 2020. Claim(s) 4 and 7 have been amended. Therefore, Claim(s) 1-7 is/are pending and have been considered as follows.

Notice of Pre-AIA  or AIA  Status

The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  


Information Disclosure Statement

The information disclosure statement (IDS) submitted on 6/26/2020.  The submission is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Claim Objections

Claim 1 is/are objected to because of the following informalities:  
Claim 1 recites “Uniform Resource Locater”; the Examiner suggest “Uniform Resource Locator”;
Appropriate correction is required.

Claim Rejections - 35 USC § 102

The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claim(s) 1-2, 4, and 6-7 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Invernizzi et al. (non-patent literature, “EvilSeed: A Guided Approach to Finding Malicious Web Pages”; hereinafter Invernizzi).

As to Claim 1, Invernizzi discloses a collection apparatus that collects a Uniform Resource Locater of a Web page, the collection apparatus comprising: 
a memory (Invernizzi; Fig. 1); and 
a processor coupled to the memory and programmed to execute a process comprising (Invernizzi; Fig. 1):
first generating a search query for a search engine by combining a digital content name that is a name of a digital content and an associated keyword of the digital content ((Invernizzi; Fig. 1; [Pg. 3, Sect. II, B]), where Invernizzi discloses the use of gadgets to generate queries to search engines. The gadgets enables the system to find candidate pages (URLs) that are likely malicious based on the pages contained in the seed. Each gadget extracts certain information from the pages in the seed, such as content that is shared among these pages and links that are related to them. The extracted information are distilled into queries that are sent to search engine which could utilize words or terms or links, etc.); 
first predicting a degree to which a Web page that leads to user operation is output as a search result when a search is performed by using the generated search query, on the basis of feature information on the search query generated by the first generating ((Invernizzi; [Pg. 4, Sect. II, B, “Gadgets”]), where Invernizzi discloses the gadgets finding candidate pages (URLs) that are likely malicious based on the pages contained in the seed based on the featured information. [Pg. 2, 1st para.], a prefilter to quickly discard pages that are very likely to be legitimate. Such prefilters examine static properties of URLs, the HTML code, and JavaScript functions to compute a score that indicated the likelihood that a page is malicious. Based on these scores, pages can be ranked (sorted).);
first determining, searching for a Web page by using a search query in a search order that is based on the degree predicted by the first predicting, and determining analysis priority that is priority for analyzing whether a URL of a retrieved Web page is the Web page that leads to user operation on the basis of the degree of the search query and search result information ((Invernizzi; [Pg. 1, right column, 3rd para.]), where Invernizzi discloses searching for malicious web pages is a three-step process, in which URLs are first collected, then quickly inspected with fast filters, and finally examiner in depth using specialized analyzers. [Pg. 2, 1st para.], a prefilter to quickly discard pages that are very likely to be legitimate. Such prefilters examine static properties of URLs, the HTML code, and JavaScript functions to compute a score that indicated the likelihood that a page is malicious. Based on these scores, pages can be ranked (sorted).); and 
outputting the URL of the Web page retrieved by the first determining and the analysis priority of the URL in an associated manner to an analysis apparatus ((Invernizzi; [Pg. 1, right column, 4th para.]), where Invernizzi discloses given the set of web pages discovered by the crawler, the purpose of the second step is to prioritize these URLS for subsequent, detailed analysis.).

As to Claim 2, Invernizzi discloses the collection apparatus according to claim 1, wherein the first generating includes 
receiving input of a category of digital data; first collecting, as a first keyword, an arbitrary digital content name that belongs to the category; second collecting, as a second keyword, an associated keyword that is associated with the first keyword when the first keyword is included in a search query for a search engine; and second generating the search query by combining the first keyword and the second keyword ((Invernizzi; Fig. 1; [Pg. 3, Sect. II, B]), where Invernizzi discloses the use of gadgets to generate queries to search engines. The gadgets enables the system to find candidate pages (URLs) that are likely malicious based on the pages contained in the seed. [Pg. 4, Sect. II, B, “Gadgets”], each gadget extracts certain information from the pages in the seed, such as content that is shared among these pages and links that are related to them. The extracted information are distilled into queries that are sent to search engine which could utilize words or terms or links, etc. [Sect. 3; Gadgets]).

As to Claim 4, Invernizzi discloses the collection apparatus according to claim 1, wherein the first predicting includes 
constructing, cause a prediction model to learn the feature information and the degree of a known search query by which the Web page that leads to user operation is included in a search result and a known search query by which the Web page that leads to user operation is not included in a search result; and second predicting the degree of the search query by using the prediction model, on the basis of the feature information on the search query generated by the first generating ((Invernizzi; [Pg. 4, Sect. II, B, “Gadgets”]), where Invernizzi discloses the gadgets finding candidate pages (URLs) that are likely malicious based on the pages contained in the seed based on the featured information. [Pg. 2, 1st para.], a prefilter to quickly discard pages that are very likely to be legitimate. Such prefilters examine static properties of URLs, the HTML code, and JavaScript functions to compute a score that indicated the likelihood that a page is malicious. Based on these scores, pages can be ranked (sorted).).

As to Claim 6, Invernizzi discloses a collection method implemented by a collection apparatus that collects a Uniform Resource Locator (URL) of a Web page, the collection method comprising: 
generating a search query for a search engine by combining a digital content name that is a name of a digital content and an associated keyword of the digital content ((Invernizzi; Fig. 1; [Pg. 3, Sect. II, B]), where Invernizzi discloses the use of gadgets to generate queries to search engines. The gadgets enables the system to find candidate pages (URLs) that are likely malicious based on the pages contained in the seed. Each gadget extracts certain information from the pages in the seed, such as content that is shared among these pages and links that are related to them. The extracted information are distilled into queries that are sent to search engine which could utilize words or terms or links, etc.);
predicting a degree to which a Web page that leads to user operation is output as a search result when a search is performed by using the generated search query, on the basis of feature information on the generated search query ((Invernizzi; [Pg. 4, Sect. II, B, “Gadgets”]), where Invernizzi discloses the gadgets finding candidate pages (URLs) that are likely malicious based on the pages contained in the seed based on the featured information. [Pg. 2, 1st para.], a prefilter to quickly discard pages that are very likely to be legitimate. Such prefilters examine static properties of URLs, the HTML code, and JavaScript functions to compute a score that indicated the likelihood that a page is malicious. Based on these scores, pages can be ranked (sorted).); 
searching for a Web page by using a search query in a search order that is based on the predicted degree, and determining analysis priority that is priority for analyzing whether a URL of a retrieved Web page is the Web page that leads to user operation on the basis of the degree of the search query and search result information ((Invernizzi; [Pg. 1, right column, 3rd para.]), where Invernizzi discloses searching for malicious web pages is a three-step process, in which URLs are first collected, then quickly inspected with fast filters, and finally examiner in depth using specialized analyzers. [Pg. 2, 1st para.], a prefilter to quickly discard pages that are very likely to be legitimate. Such prefilters examine static properties of URLs, the HTML code, and JavaScript functions to compute a score that indicated the likelihood that a page is malicious. Based on these scores, pages can be ranked (sorted).); and 
outputting the URL of the Web page retrieved at the determining the analysis priority of the URL, in an associated manner to an analysis apparatus ((Invernizzi; [Pg. 1, right column, 4th para.]), where Invernizzi discloses given the set of web pages discovered by the crawler, the purpose of the second step is to prioritize these URLS for subsequent, detailed analysis.).

As to Claim 7, Invernizzi discloses a non-transitory computer-readable recording medium having stored therein a collection program for causing a computer to execute a process comprising: 
generating a search query for a search engine by combining a digital content name that is a name of a digital content and an associated keyword of the digital content ((Invernizzi; Fig. 1; [Pg. 3, Sect. II, B]), where Invernizzi discloses the use of gadgets to generate queries to search engines. The gadgets enables the system to find candidate pages (URLs) that are likely malicious based on the pages contained in the seed. Each gadget extracts certain information from the pages in the seed, such as content that is shared among these pages and links that are related to them. The extracted information are distilled into queries that are sent to search engine which could utilize words or terms or links, etc.); 
predicting a degree to which a Web page that leads to user operation is output as a search result when a search is performed by using the generated search query, on the basis of feature information on the generated search query ((Invernizzi; [Pg. 4, Sect. II, B, “Gadgets”]), where Invernizzi discloses the gadgets finding candidate pages (URLs) that are likely malicious based on the pages contained in the seed based on the featured information. [Pg. 2, 1st para.], a prefilter to quickly discard pages that are very likely to be legitimate. Such prefilters examine static properties of URLs, the HTML code, and JavaScript functions to compute a score that indicated the likelihood that a page is malicious. Based on these scores, pages can be ranked (sorted).); 
searching for a Web page by using a search query in a search order that is based on the predicted degree, and determining analysis priority that is priority for analyzing whether a Uniform Resource Locator (URL) of a retrieved Web page is the Web page that leads to user operation on the basis of the degree of the search query and search result information ((Invernizzi; [Pg. 1, right column, 3rd para.]), where Invernizzi discloses searching for malicious web pages is a three-step process, in which URLs are first collected, then quickly inspected with fast filters, and finally examiner in depth using specialized analyzers. [Pg. 2, 1st para.], a prefilter to quickly discard pages that are very likely to be legitimate. Such prefilters examine static properties of URLs, the HTML code, and JavaScript functions to compute a score that indicated the likelihood that a page is malicious. Based on these scores, pages can be ranked (sorted).); and 
outputting the URL of the Web page retrieved at the determining the analysis priority of the URL, in an associated manner to an analysis apparatus ((Invernizzi; [Pg. 1, right column, 4th para.]), where Invernizzi discloses given the set of web pages discovered by the crawler, the purpose of the second step is to prioritize these URLS for subsequent, detailed analysis.).


Allowable Subject Matter

Claim(s) 3 and 5 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

Conclusion

The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. See PTO-892.
Konig et al. (US 2012/0158705 A1) discloses improve local search ranking.
Gibbs et al. (US 2009/0132529 A1) discloses URL autocompletion using ranked results. 
Musuvathi et al. (US 2018/0365580 A1) discloses determining likelihood of a user interaction with a content element.

The examiner also requests, in response to this Office action, support be shown for language added to any original claims on amendment and any new claims. That is, indicate support for newly added claim language by specifically pointing to page(s) and line no(s) in the specification and/or drawing figure(s). This will assist the examiner in prosecuting the application.

When responding to this office action, Applicant is advised to clearly point out the patentable novelty which he or she thinks the claims present, in view of the state of the art disclosed by the references cited or the objections made. He or she must also show how the amendments avoid such references or objections See 37 CFR 1.111(c). 


Any inquiry concerning this communication or earlier communications from the examiner should be directed to BENJAMIN M THIEU whose telephone number is (571) 270-7475 and fax number is (571) 270-8475. The examiner can normally be reached Monday - Friday: 8:00 AM - 5:00 PM EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Wing Chan can be reached on 571-272-7493. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/BENJAMIN M THIEU/Primary Examiner, Art Unit 2441                                                                                                                                                                                                        5.20.2022