DETAILED ACTION

This is the initial Office action based on the application filed on May 22, 2019. Claims 1-20 are currently pending and have been considered below.


Claim Objections
Claims listed after Claim 12 are objected to because of the following informalities:  
Claims listed after Claim 12 are misnumbered. There are multiple instances of claims 8-12. Other Claims are also not in any particular order.
  Appropriate correction is required.


Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter.  The claims do not fall within at least one of the four categories of patent eligible subject matter because the claimed invention is directed to an abstract idea without significantly more.

As to Claim 1, the claim recites gathering various data (such as rates) and then using the gathered data to calculate a particular outcome, followed by outputting such data. However, calculating the data is considered a Mathematical Concept, which is an abstract idea. 
This judicial exception is not integrated into a practical application. In particular, the claim further recites limitations such as receiving and outputting limitations. Those limitations are considered insignificant extra-solution activity that do not integrate in a practical application.
As such, the claim is directed to an abstract idea that does not include additional elements that are sufficient to amount to significantly more than the judicial exception. 
Dependent Claims 2-7 do not provide additional elements that are sufficient to amount to significantly more than the judicial exception.

As to Claim 8, the claim recited gathering data and then using that data to calculate a probability of an outcome, followed by outputting such an outcome. However, calculating the data is considered a Mathematical Concept, which is an abstract idea. 
This judicial exception is not integrated into a practical application. In particular, the claim further recites limitations such as an outputting limitations. That limitation is considered insignificant extra-solution activity that does not integrate in a practical application.
As such, the claim is directed to an abstract idea that does not include additional elements that are sufficient to amount to significantly more than the judicial exception. 
Dependent Claims 9-12 (and misnumbered claims 8-10 after Claim 12) do not provide additional elements that are sufficient to amount to significantly more than the judicial exception.

As to Claim 11, the claim recites gathering various data, and then using the gathered data to calculate a particular outcome (such as a bandwidth), followed by outputting such data. However, calculating the data is considered a Mathematical Concept, which is an abstract idea. 
This judicial exception is not integrated into a practical application. In particular, the claim further recites limitations such as receiving and outputting limitations. Those limitations are considered insignificant extra-solution activity that do not integrate in a practical application.
As such, the claim is directed to an abstract idea that does not include additional elements that are sufficient to amount to significantly more than the judicial exception. 
Dependent Claims 12, 18 and 13-14 (some of the Claims are misnumbered and are duplicate numbers as described in the above Claim Objection) do not provide additional elements that are sufficient to amount to significantly more than the judicial exception.





Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-2, 6-11 (and duplicate numbers of other claims listed below) are rejected under 35 U.S.C. 103 as being unpatentable over Brawer et al (US Patent 7,769,742).

Claim 1: discloses a method for preparing a schedule for a crawler to follow when seeking updated copies of content items in a content set, the method comprising: 
receiving a bandwidth constraint that limits a total amount of crawls the crawler is allowed to make across the content set per unit time (Col 13 ln 54-58; Col 14 ln 39-67). [See at least “limits on the number of documents that the crawler is allowed to download during the time period of the crawl”.]
receiving a set of importance scores that comprises an individual importance score for each content item in the content set (Col 11 ln 7-12). [See at least “importance score” of URLs.]
receiving a set of change rates that comprises an individual change rate for each content item in the content set (Col 6 ln 1-24). [See at least update rate or change frequency of a URL.]
calculating, using a cost function, a crawl rate schedule for the content set that comprises an individual crawl rate for each individual content item in the content set, wherein the set of change rates and the set of importance scores are input to the cost function, wherein the crawl rate schedule is a solution to the cost function that minimizes a sum of costs across all content items and is constrained by the bandwidth constraint, such that a sum of crawl rates for all content items in the content set is equal to the bandwidth constraint (Col 14 ln 39-67, Col 15 ln 1-4). [Scheduling of crawling is calculated using at least importance scores, update rates and an interval for crawling. That means that the result of the formula is calculated using analogous inputs. However, even though the formula is not exactly identical to the one being described in the instant limitation, it would have been obvious for one of ordinary skill in the art before the effective filing date to come up with such a formula. One would have been motivated to do so in order to come up with a crawling scheduled best optimized for particular data. Furthermore, the exact description of the formula may be thought of as designer choice because the cited art comes up with an analogous result using a different description of the formula.]
outputting the crawl rate schedule for use by the crawler (Col 14 ln 39-67, Col 15 ln 1-4).
Claim 2: Brawer discloses the method of Claim 1 above, and Brawer further discloses wherein the bandwidth constraint comprises a crawler constraint and a host constraint, wherein the crawler constraint is defined as a first sum of crawl events the crawler is allowed to make against all content sources per unit of time, and wherein the host constraint is a second sum of crawl events allowed against an individual host content source per unit of time (Col 14 ln 39-67, Col 15 ln 1-4). [Also see at least the above reasoning for why the description of the formula in the instant claim is analogous to Brawer.]
Claim 6: Brawer discloses the method of Claim 1 above, and Brawer further discloses wherein change information for content items in the content set is incomplete (Col 14 ln 39-67, Col 15 ln 1-4). [See at least identifying likelihood of a candidate to crawl.]
Claim 7: Brawer discloses the method of Claim 1 above, and Brawer further discloses wherein the individual crawl rate for each individual content item in the content set is greater than zero (Col 14 ln 39-67, Col 15 ln 1-4).
Claim 8: Brawer discloses a method for preparing a crawl probability vector for a crawler to follow when seeking updated copies of content items in a content set, the method comprising: 
determining a crawl probability vector that comprises an individual crawl probability for each individual content item in the content, wherein the individual crawl probability is used to determine whether to crawl an associated individual content item upon receiving a change notification for the associated individual content item (Col 6 ln 25-38, Col 14 ln 39-67, Col 15 ln 1-4). [A crawler receives notification that that at least a URL is updated. Then, a determination is made of whether a URL “is likely to have been updated”. In response to the determination, a crawl schedule is identified. Identifying a likelihood of an updated URL is analogous to a probability, which is then used to determine a crawl schedule. However, even though the formula is not exactly identical to the one being described in the instant limitation, it would have been obvious for one of ordinary skill in the art before the effective filing date to come up with such a formula. One would have been motivated to do so in order to come up with a crawling scheduled best optimized for particular data. Furthermore, the exact description of the formula may be thought of as designer choice because the cited art comes up with an analogous result using a different description of the formula.]
outputting a crawl probability vector comprising the individual crawl probability for each individual content item (Col 14 ln 39-67, Col 15 ln 1-4).
Claim 9: Brawer discloses the method of Claim 8 above, and Brawer further discloses:
receiving a bandwidth constraint that limits a total amount of crawls the crawler is allowed to make across the content set per unit of time (Col 13 ln 54-58; Col 14 ln 39-67). [See at least “limits on the number of documents that the crawler is allowed to download during the time period of the crawl”.]
using the bandwidth constraint when said determining the crawl probability vector (Col 14 ln 39-67, Col 15 ln 1-4). [A determination is made of whether a URL “is likely to have been updated”. In response to the determination, a crawl schedule is identified.]
Claim 10: Brawer discloses the method of Claim 9 above, and Brawer further discloses:
receiving a set of importance scores that comprises an individual importance score for each content item in the content set (Col 11 ln 7-12). [See at least “importance score” of URLs.]
receiving a set of change rates that comprises an individual change rate for each content item in the content set (Col 6 ln 1-24). [See at least update rate or change frequency of a URL.]
using the set of importance scores and the set of change rates when said determining the crawl probability vector (Col 14 ln 39-67, Col 15 ln 1-4).
Claim 11: Brawer discloses the method of Claim 9 above, and Brawer further discloses wherein the crawl probability vector is a solution to a cost function that minimizes a sum of costs across all content items and is constrained by the bandwidth constraint, such that a sum of crawl rates for all content items in the content set is equal to the bandwidth constraint (Col 14 ln 39-67, Col 15 ln 1-4). [Scheduling of crawling is calculated using at least importance scores, likelihood of updated content and an interval for crawling. That means that the result of the formula is calculated using analogous inputs. However, even though the formula is not exactly identical to the one being described in the instant limitation, it would have been obvious for one of ordinary skill in the art before the effective filing date to come up with such a formula. One would have been motivated to do so in order to come up with a crawling scheduled best optimized for particular data. Furthermore, the exact description of the formula may be thought of as designer choice because the cited art comes up with an analogous result using a different description of the formula.]
Claim 8’ (second instance of Claim 8): Brawer discloses the method of Claim 9 above, and Brawer further discloses:
wherein the bandwidth constraint comprises a crawler constraint and a host constraint, wherein the crawler constraint is defined as a first sum of crawl events the crawler is allowed to make against all content sources per unit of time, wherein the host constraint is a second sum of crawl events allowed against an individual host content source set per unit of time (Col 14 ln 39-67, Col 15 ln 1-4). [Also see at least the above reasoning for why the description of the formula in the instant claim is analogous to Brawer.]
Claim 10’ (second instance of Claim 10): Brawer discloses the method of Claim 8 above, and Brawer further discloses wherein change information for content items in the content set is complete (Col 14 ln 39-67, Col 15 ln 1-4).
Claim 11’ (second instance of Claim 11): Brawer discloses a computer storage media that, when executed by a computing device, causes the computing device to perform a method of preparing a schedule for a crawler to follow when seeking updated copies of content items in a content set, the method comprising: 
receiving a bandwidth constraint that limits a total amount of crawls the crawler is allowed to make across the content set per unit of time, wherein the content set comprises a first subset of content items with incomplete change information and a second subset of content items with complete change information (Col 13 ln 54-58; Col 14 ln 39-67). [See at least “limits on the number of documents that the crawler is allowed to download during the time period of the crawl”.] 
receiving a set of importance scores that comprises an individual importance score for each content item in the content set (Col 11 ln 7-12). [See at least “importance score” of URLs.]
receiving a set of change rates that comprises an individual change rate for each content item in the content set (Col 6 ln 1-24). [See at least update rate or change frequency of a URL.]
determining an optimized split of bandwidth between the first subset and the second subset to produce a first bandwidth allocation to the first subset and a second bandwidth allocation to the second subset by minimizing a combined cost function that comprises a first cost function for the first subset and a second cost function for the second subset (Col 14 ln 39-67, Col 15 ln 1-4). [Scheduling of crawling is calculated using at least importance scores, update rates and an interval for crawling of identified subsets. That means that the result of the formula is calculated using analogous inputs. However, even though the formula is not exactly identical to the one being described in the instant limitation, it would have been obvious for one of ordinary skill in the art before the effective filing date to come up with such a formula. One would have been motivated to do so in order to come up with a crawling scheduled best optimized for particular data. Furthermore, the exact description of the formula may be thought of as designer choice because the cited art comes up with an analogous result using a different description of the formula.]
outputting the first bandwidth allocation for use calculating a crawl rate schedule for the first subset and the second bandwidth allocation for use calculating a crawl probability vector for the second subset (Col 14 ln 39-67, Col 15 ln 1-4).
Claim 12’ (second instance of Claim 12): Brawer discloses the media of Claim 11 above, and Brawer further discloses calculating, using the first cost function for the first subset, a crawl rate schedule for the first subset that comprises an individual crawl rate for each individual content item in the first subset of content items, wherein the set of change rates and the set of importance scores are input to the first cost function, wherein the crawl rate schedule is a solution to the first cost function that minimizes a sum of costs across all content items and is constrained by the bandwidth constraint, such that a sum of crawl rates for all content items in the first subset is equal to the first bandwidth allocation; and outputting the crawl rate schedule (Col 6 ln 25-38, Col 14 ln 39-67, Col 15 ln 1-4). [A crawler receives notification that that at least a URL is updated. Then, a determination is made of whether a URL “is likely to have been updated”. In response to the determination, a crawl schedule is identified. Identifying a likelihood of an updated URL is analogous to a probability, which is then used to determine a crawl schedule. However, even though the formula is not exactly identical to the one being described in the instant limitation, it would have been obvious for one of ordinary skill in the art before the effective filing date to come up with such a formula. One would have been motivated to do so in order to come up with a crawling scheduled best optimized for particular data. Furthermore, the exact description of the formula may be thought of as designer choice because the cited art comes up with an analogous result using a different description of the formula.]
Claim 18: Brawer discloses the media of Claim 11 above, and Brawer further discloses calculating, using the first cost function for the first subset, a crawl rate schedule for the first subset that comprises an individual crawl rate for each individual content item in the first subset of content items, wherein the set of change rates and the set of importance scores are input to the first cost function, wherein the crawl rate schedule is a solution to the first cost function that minimizes a sum of costs across all content items and is constrained by the bandwidth constraint, such that a sum of crawl rates for all content items in the first subset is equal to the first bandwidth allocation; and outputting the crawl rate schedule (Col 14 ln 39-67, Col 15 ln 1-4). [Scheduling of crawling is calculated using at least importance scores, update rates and an interval for crawling. That means that the result of the formula is calculated using analogous inputs. However, even though the formula is not exactly identical to the one being described in the instant limitation, it would have been obvious for one of ordinary skill in the art before the effective filing date to come up with such a formula. One would have been motivated to do so in order to come up with a crawling scheduled best optimized for particular data. Furthermore, the exact description of the formula may be thought of as designer choice because the cited art comes up with an analogous result using a different description of the formula.]
Claim 14’ (second instance of Claim 14): Brawer discloses the media of Claim 11 above, and Brawer further discloses wherein a first change information for the first subset of content items in the content set is incomplete and a second change information for the second subset of content items in the content set is complete (Col 14 ln 39-67, Col 15 ln 1-4).


Claims 3, 9’ and 13’ are rejected under 35 U.S.C. 103 as being unpatentable over Brawer et al (US Patent 7,769,742) in view of Hajaj et al (US Patent Application Publication 2013/0212100).

Claim 3: Brawer discloses the method of Claim 1 above, but Brawer alone does not explicitly disclose wherein the individual change rate is estimated based on analysis of previous crawl events against the individual content item and a determination whether the individual content item changed between crawl events.
However, Hajaj [0021] discloses estimating a change rate for documents at least with some crawls history in order to schedule crawls of documents.
Therefore, it would have been obvious for one of ordinary skill in the art before the effective filing date to modify Brawer with Hajaj. One would have been motivated to do so in order to identify a schedule to crawl documents so that a crawl index would be up to date.
Claim 9’ (second instance of Claim 9): Brawer discloses the method of Claim 9 above, but Brawer alone does not explicitly disclose wherein the individual change rate is estimated based on analysis of previous crawl events against the individual content item and a determination whether the individual content item changed between crawl events.
However, Hajaj [0021] discloses estimating a change rate for documents at least with some crawls history in order to schedule crawls of documents.
Therefore, it would have been obvious for one of ordinary skill in the art before the effective filing date to modify Brawer with Hajaj. One would have been motivated to do so in order to identify a schedule to crawl documents so that a crawl index would be up to date.
Claim 13’ (second instance of Claim 13): Brawer discloses the method of Claim 9 above, but Brawer alone does not explicitly disclose wherein the individual change rate is estimated based on analysis of previous crawl events against the individual content item and a determination whether the individual content item changed between crawl events.
However, Hajaj [0021] discloses estimating a change rate for documents at least with some crawls history in order to schedule crawls of documents.
Therefore, it would have been obvious for one of ordinary skill in the art before the effective filing date to modify Brawer with Hajaj. One would have been motivated to do so in order to identify a schedule to crawl documents so that a crawl index would be up to date.



Claims 4-5 and 12 are rejected under 35 U.S.C. 103 as being unpatentable over Brawer et al (US Patent 7,769,742) in view of Hajaj et al (US Patent Application Publication 2013/0212100) and further in view of Smith et al (US Patent Application Publication 2017/0116653).

Claims 4 and 12: Brawer discloses the method of Claims 3 and 11 above, but Brawer does not explicitly disclose wherein the cost function is minimized using a Lagrange multiplier method.
However, Smith [0081] discloses using a Lagrange multiplier to optimize a particular function.
Therefore, it would have been obvious for one of ordinary skill in the art before the effective filing date to modify Brawer with Smith. One would have been motivated to do so in order to optimize particular functions.
Claim 5: Brawer discloses the method of Claim 3 above, but Brawer does not explicitly disclose wherein a Lagrange multiplier used in the Lagrange multiplier method is determined using a bisection search method.
However, Smith [0081] discloses using a Lagrange multiplier and “performing a bisectional search” to optimize a particular function.
Therefore, it would have been obvious for one of ordinary skill in the art before the effective filing date to modify Brawer with Smith. One would have been motivated to do so in order to optimize particular functions.

Conclusion

The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:

Erera et al (US Patent Application Publication 2011/0320428) – discloses crawling bandwidth and crawling schedules;



Any inquiry concerning this communication or earlier communications from the examiner should be directed to ALEX GOFMAN whose telephone number is (571)270-1072.  The examiner can normally be reached on Monday-Friday 8-5.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Tony Mahmoudi can be reached on 571-272-4078.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/ALEX GOFMAN/Primary Examiner, Art Unit 2163