DETAILED ACTION
This action is written in response to the application flied 5/7/20. The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.
Claims 21-40 are rejected on the ground of nonstatutory double patenting as being unpatentable over the corresponding claims of U.S. Patent No. 10,699,210 B2. Although the claims at issue are not identical, they are not patentably distinct from each other for the reasons described in the tables below.
This application – 16/869382
US 10,699,210 B2
31. A method, comprising:
1. A computer-implemented method, comprising:

maintaining a machine-learned model at a computing system, the model receiving a plurality of input features and generating an output prediction based on the received features;


maintaining a training set comprising a plurality of examples, each example of the plurality of examples associated with a plurality of features and a label describing the example;


identifying an additional feature for the model for a specified time, wherein the additional feature is dynamic, a value for the additional feature based on values of one or more characteristics of content maintained by the computing system at one or more times prior to the specified time;

receiving, by a processor, data comprising one or more characteristics of content;

obtaining data stored by the computing system and including characteristics of content maintained by the computing system;

generating a plurality of partitions of the data, wherein each partition of the plurality of partitions of the data: 

comprises values for the one or more characteristics of content, and
corresponds to a respective table that identifies the values of the one or more characteristics of the content;
generating a plurality of different partitions of the data stored by the computing system based on the content maintained by the computing system,

each partition of data including values of one or more characteristics of the content on which the additional feature is based, wherein each partition corresponds to a respective table that identifies the values of the one or more characteristics of the content;
updating each partition of data to comprise one or more values of at least one additional feature,
modifying each partition of data to include one or more values for the additional feature, a value of the additional feature included in a partition of data determined from values of one or more characteristics of the content on which the additional feature is based that are included in the partition of data and are associated with one or more times prior to the current time;
wherein the one or more values for at least one additional feature is determined, for each example in a training data set, from values of one or more characteristics of the content on which the additional feature is based and that are included in the partition of data; and
for each example in the training data set, computing the value of the additional feature based on the partition of data comprising values of characteristics of content associated with a time prior to the specified time;
updating the training data set to include the one or more values of the additional feature for each example in the training data set to train a machine learning model.
updating the training set to include values of the additional feature for each example; andtraining a modified model including the additional feature using the training set to generate an updated model.
As illustrated above, each limitation in claim 31 of this application has a corresponding equivalent or broader limitation in claim 1 of the ‘210 patent. Thus, claim 1 of the ‘210 patent anticipates claim 31 of this application.
Independent claims 21 and 38 are substantially identical to independent claim 31, varying only in statutory category (namely a system for claim 21 and a computer-readable storage medium for claim 38).

The correspondence in the dependent claims is outlined below.
This application – 16/869382
US 10,699,210 B2
32. The method of claim 31, wherein the at least one additional feature is identified for the machine learning model for a specified time.
[From claim 1] identifying an additional feature for the model for a specified time, wherein the additional feature is dynamic, a value for the additional feature based on values of one or more characteristics of content maintained by the computing system at one or more times prior to the specified time;
33. The method of claim 32, wherein the at least one additional feature is dynamic and the value for the at least one additional feature is based on values of the one or more characteristics of content maintained by a computing system at a time prior to the specified time, wherein content maintained by the computing system is selected from a group comprising at least one of a user or a content item.
[From claim 1] identifying an additional feature for the model for a specified time, wherein the additional feature is dynamic, a value for the additional feature based on values of one or more characteristics of content maintained by the computing system at one or more times prior to the specified time;
34. The method of claim 32, further comprising:
computing the one or more values of the at least one additional feature based on the partition of data comprising values of one or more characteristics of content associated with a time prior to the specified time;
modifying the partition of data to include a plurality of values for the at least one additional feature, each of the plurality of values for the at least one additional feature associated with a different time; and
generating an updated machine learning model based on the updated training data set.
2. The method of claim 1, wherein modifying each partition of data to include one or more values for the additional feature is based on characteristics of content associated with the partition of data, and wherein modifying each partition of data further comprises:
modifying a plurality of partitions of data in parallel to include one or more values for the additional feature.[From claim 1] training a modified model including the additional feature using the training set to generate an updated model.
35. The method of claim 31, wherein the one or more values for at least one additional feature is further determined, for each example in a training data set, from values of one or more characteristics of the content on which the at least one additional feature is based and that are associated with a time prior to a current time.
[From claim 1] for each example in the training data set, computing the value of the additional feature based on the partition of data comprising values of characteristics of content associated with a time prior to the specified time;
36. The method of claim 31, wherein updating each partition of data further comprises:
generating a value for the at least one additional feature associated with a time by applying one or more additional machine learning models to one or more values of one or more characteristics of the content, wherein the one or more additional learning models account for at least one of a decay rate or a propagation delay.
7. The method of claim 6, wherein the one or more additional models account for one or more selected from a group consisting of: a decay rate, a propagation delay, and any combination thereof.
37. The method of claim 31, further comprising:
generating, using the updated training data set, one or more alternative results; and
replacing the trained machine learning model with a replacement machine learning model, the replacement machine learning model based on the one or more alternative results.
8. The method of claim 1, further comprising:
applying the model to the training set to generate one or more alternative results;
determining whether a difference between the one or more results and the one or more alternative results satisfies one or more criteria; and
responsive to determining that the difference between the one or more results satisfies the one or more criteria, replacing the model with the modified model.
As illustrated above, each limitation in the dependent claims of this application has a corresponding equivalent or broader limitation in a corresponding claim of the ‘210 patent. Thus, the claims of the ‘210 patent anticipate the claims of this application.
Dependent claims 22-30 and 39-40 have similar a similar correspondence with the other dependent claims of the ‘210 patent.


Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale or otherwise available to the public before the effective filing date of the claimed invention.
Claims 21-24, 26-28, 30-33, 35 and 37-39 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Richardson (Richardson M, Dominowska E, Ragno R. Predicting clicks: estimating the click-through rate for new ads. In Proceedings of the 16th international conference on World Wide Web 2007 May 8, pp. 521-530).

Regarding claims 21, 31 and 38, Richardson discloses a system (and a related method and non-transitory computer-readable storage medium) comprising:
a processor; and a memory storing instructions, which when executed by the processor, causes the processor to:
The Examiner notes that both a processor and a memory are inherent throughout the Richardson disclosure.
receive data comprising one or more characteristics of content;
P. 523, sec. 4: Data set; pp. 523-24, sec. 5: Model.
generate a plurality of partitions of the data,
P. 523, sec. 4: Data set: “We randomly placed 70% of the advertisers in the training set, 10% in the validation set, and 20% in the test set.”. The Examiner notes that this 70/10/20 split is equivalent to three partitions of the data set.
wherein each partition of the plurality of partitions of the data: comprises values for the one or more characteristics of content, and
P. 523 “Features may be anything, such as the number of words in the title, the existence of a word, etc.”PP. 523-24, sec. 5: application of logistic regression value Z to training set, validation set, and test set. Also p. 524: “For each feature fi, we added derived features of log(fi+1), and fi2 (the purpose of adding one before taking the log is so as to naturally handle features whose minimum value is 0, such as counts).”
corresponds to a respective table that identifies the values of the one or more characteristics of the content;
See e.g. data in table form at p. 526, sec. 8, first col.Also, p. 524, table 1.
update each partition of data to comprise one or more values of at least one additional feature, wherein the one or more values for at least one additional feature is determined, for each example in a training data set, from values of one or more characteristics of the content on which the at least one additional feature is based and that are included in the partition of data; and
PP. 523-24, sec. 5: application of logistic regression value Z to training set. Also p. 524: “For each feature fi, we added derived features of log(fi+1), and fi2 (the purpose of adding one before taking the log is so as to naturally handle features whose minimum value is 0, such as counts).”
update the training data set to include the one or more values of the additional feature for each example in the training data set, wherein the updated training data used to train a machine learning model.
PP. 523-24, sec. 5: application of logistic regression value Z to the training set and validation set. Also p. 524: “For each feature fi, we added derived features of log(fi+1), and fi2 (the purpose of adding one before taking the log is so as to naturally handle features whose minimum value is 0, such as counts).”

Regarding claims 22 and 32, Richardson discloses the further limitation wherein the at least one additional feature is identified for the machine learning model for a specified time.

    PNG
    media_image1.png
    104
    390
    media_image1.png
    Greyscale
Excerpt from Richardson, p. 523.See also p. 529, first col. “A time-dependent model such as this could be kept up-to-date with information about all advertisers, ads, terms, clicks, and views, and would have the power to update its the estimated CTR of all ads any time an ad is shown.” (Emphasis added.)

Regarding claims 23 and 33, Richardson discloses the further limitation wherein the at least one additional feature is dynamic and the one or more values for the at least one additional feature is based on values of the one or more characteristics of content maintained by a computing system at a time prior to the specified time, wherein the content maintained by the computing system is selected from a group comprising at least one of a user or a content item.

    PNG
    media_image1.png
    104
    390
    media_image1.png
    Greyscale
Excerpt from Richardson, p. 523.Also p. 529, first col. “A time-dependent model such as this could be kept up-to-date with information about all advertisers, ads, terms, clicks, and views, and would have the power to update its the estimated CTR of all ads any time an ad is shown.” (Emphasis added.)P. 521: “The search system can make expected user behavior predictions based on historical click-through performance of the ad.” (Emphasis added.)The Examiner notes that the Richardson system ‘maintains’ information pertaining to both users and content items. See p. 523, sec. 4, describing the ad information data set.

Regarding claim 24, Richardson discloses the further limitation wherein the processor is further caused to:
compute the one or more values of the at least one additional feature based on the partition of data comprising values of one or more characteristics of content associated with a time prior to the specified time.

    PNG
    media_image1.png
    104
    390
    media_image1.png
    Greyscale
Excerpt from Richardson, p. 523.See also p. 529, first col. “A time-dependent model such as this could be kept up-to-date with information about all advertisers, ads, terms, clicks, and views, and would have the power to update its the estimated CTR of all ads any time an ad is shown.” (Emphasis added.)P. 521: “The search system can make expected user behavior predictions based on historical click-through performance of the ad.” (Emphasis added.)The Examiner notes that the Richardson system ‘maintains’ information pertaining to both users and content items. See p. 523, sec. 4, describing the ad information data set.

Regarding claims 26, 35 and 39, Richardson discloses the further limitation wherein the one or more values for at least one additional feature is further determined, for each example in a training data set, from values of one or more characteristics of the content on which the additional feature is based and that are associated with a time prior to a current time.
PP. 523-24, sec. 5: application of logistic regression value Z to training set, validation set, and test set. Also p. 524: “For each feature fi, we added derived features of log(fi+1), and fi2 (the purpose of adding one before taking the log is so as to naturally handle features whose minimum value is 0, such as counts).”

Regarding claim 27, Richardson discloses the further limitation wherein the processor is further caused to:
generate an updated machine learning model based on the updated training data set.
P. 529, first col. “A time-dependent model such as this could be kept up-to-date with information about all advertisers, ads, terms, clicks, and views, and would have the power to update its the estimated CTR of all ads any time an ad is shown.” (Emphasis added.)

Regarding claim 28, Richardson discloses the further limitation wherein updating each partition of data further comprises:
modifying a plurality of partitions of data, in parallel, to include one or more values for the at least one additional feature.
PP. 523-24, sec. 5: application of logistic regression value Z to the training set and validation set. The Examiner notes that the logistic regression value Z is a derived feature. Also p. 524: “For each feature fi, we added derived features of log(fi+1), and fi2 (the purpose of adding one before taking the log is so as to naturally handle features whose minimum value is 0, such as counts).”

Regarding claims 30 and 37, Richardson discloses the further limitation wherein the processor is further caused to:
generate, using the updated training data set, one or more alternative results; and
P. 529, first col. “A time-dependent model such as this could be kept up-to-date with information about all advertisers, ads, terms, clicks, and views, and would have the power to update its the estimated CTR of all ads any time an ad is shown.” (Emphasis added.)
replace the trained machine learning model with a replacement machine learning model, the replacement machine learning model based on the one or more alternative results.
Id.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The following are the references relied upon in the rejections below:
Richardson, primary reference. (Richardson M, Dominowska E, Ragno R. Predicting clicks: estimating the click-through rate for new ads. In Proceedings of the 16th international conference on World Wide Web 2007 May 8 (pp. 521-530).)
Chandramouli (Chandramouli B, Goldstein J, Duan S. Temporal analytics on big data for web advertising. In 2012 IEEE 28th international conference on data engineering 2012 Apr 1, pp. 90-101. IEEE.)
Lessin (US 2014/0143325 A1)
Claims 25 and 34 are rejected under 35 U.S.C. 103 as being unpatentable over Richardson and Chandramouli.
Regarding claim 25, Chandramouli discloses the following further limitation which Richardson does not disclose wherein updating each partition of data further comprises:
modifying the partition of data to include a plurality of values for the at least one additional feature, each of the plurality of values for the at least one additional feature associated with a different time.
P. 95, sec. III (B), Temporal Partitioning.
“Many CQs (e.g., RunningClickCount for a single ad) may not be partitionable by any data column. However, if the CQ uses a window of width w, we can partition computation based on time as follows. We divide the time axis into overlapping spans S 0, S 1,..., such that the overlap between successive spans is w. Each span is responsible for output during a time interval of width s, called the span width. Let t denote a constant reference timestamp. Span S i receives events with timestamp in the interval [t+ s· i−w, t+ s· i+ s), and produces output for the interval [t + s · i, t + s · i + s). Note that some events at the boundary between spans may belong to multiple partitions.”
At the time of filing, it would have been obvious to a person of ordinary skill to apply temporal partitioning of training data (as taught by Chandramouli) to the click-through rate prediction system of Richardson because many relevant events in advertising are discrete events associated with a particular time. For instance, these may include a new product announcement, the start of a new advertising campaign, news related to the company who’s product/service is being advertised, or other news events. Any of these could affect predicted click through rates in such a way that reasonable models should account for this information if possible. Both disclosures pertain to web advertising, and particularly click-through rate prediction.

Regarding claim 34, its further limitations are substantially identical to those of claims 24, 25 and 27 together. Accordingly, the rejections of claims 24, 25 and 27 apply equally here.

Claims 29, 36 and 40 are rejected under 35 U.S.C. 103 as being unpatentable over Richardson and Lessin.
Regarding claims 29, 36 and 40, Lessin discloses the following further limitation which Richardson does not disclose wherein updating each partition of data further comprises:
generating a value for the at least one additional feature associated with a time by applying one or more additional machine learning models to one or more values of one or more characteristics of the content, wherein the one or more additional machine learning models account for at least one of a decay rate or a propagation delay.
[0042], describing fig. 2, “In some embodiments, the social networking system 100 assigns a recency score to an information item that may decay over time by a decay rate; the decay rate for an information item may depend on the type of information item.”
[0042], fig. 1 and fig. 3, “In some embodiments, in addition to identifying information items in a user profile 104 not associated with data, the scoring module 122 identifies information items associated with data that has not been updated for a threshold amount of time or that was not obtained within a threshold amount of time from a current time.”
At the time of filing, it would have been obvious to a person of ordinary skill to include either or both of a decay rate and a propagation delay (as taught by Lessin) in the click through rate prediction system of Richardson because these additional metrics can affect how likely a user is to interact with a particular object on a web page. Both disclosures pertain to user behavior modeling for use in website content generation.

Additional Relevant Prior Art
The following references were identified by the Examiner as being relevant to the disclosed invention, but are not relied upon in any particular prior art rejection:
Graepel discloses techniques pertaining to click-through rate prediction based on machine learning using high-dimensional features. See especially secs. 2.2 and 2.3. (Graepel T, Candela JQ, Borchert T, Herbrich R. Web-scale Bayesian click-through rate prediction for sponsored search advertising in Microsoft's Bing search engine. Omnipress. 2010.).

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Vincent Gonzales whose telephone number is (571) 270-3837. The examiner can normally be reached on Monday-Friday 7 a.m. to 4 p.m. MT.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Miranda Huang, can be reached at (571) 270-7092. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/Vincent Gonzales/Primary Examiner, Art Unit 2124