DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Notice to Applicant
The following is a Final Office Action for Application Serial Number: 15/354,944, filed on November 17, 2016.  In response to Examiner’s Non-Final Rejection of September 28, 2020, Applicant, on January 27, 2021, amended Claims 1 and 17, cancelled Claims 5 and 19 and added new Claims 22 and 23.  Claims 1-4, 6-12, 14-18 and 20-23 are pending in this application and have been rejected below.

 Response to Amendment
Applicant's amendments are acknowledged. 

Regarding the 35 U.S.C. 101 rejection, Applicants arguments and amendments have been considered but are insufficient to overcome the rejection. Please refer to the 35 U.S.C. 101 rejection for further explanation and rationale. 

The 35 U.S.C. § 112(f) interpretation of claims 17-20 is hereby maintained.

The 35 U.S.C. § 112(b) rejections of claims 17-20 are hereby withdrawn in light of the structure disclosed in the specification.

The 35 U.S.C. § 103 rejections are hereby maintained pursuant to applicants amendments to claim 1 and 17. A new 35 U.S.C. 103 rejection has been applied to amended claims 22-23.

Response to Arguments
Applicant's Arguments/Remarks filed January 27, 2021 (hereinafter Applicant Remarks) have been fully considered but they are not persuasive. Applicant’s Remarks will be addressed herein below in the order in which they appear in the response filed January 27, 2021.

Regarding the 35 U.S.C. 103 rejection, Applicant submits that none of the alleged references alone disclose or in combination teach or suggest these features. In particular, the references do not describe a system to (1) "train a model using machine learning based on the user interaction data to model achievement of the metric by the user population, the model training module employing a penalty term using regularization to reduce overfitting or underfitting of the model to the user interaction data as part of balancing accuracy and complexity of the model, respectively". 
Duncan describes selecting optimal or superior forecasting models for time series data from a group of related forecasting models using state space representations in combination with cross-validation. Duncan, col. 2, lines 40-45. Duncan also describes that "regularization parameters may be used, for example, to introduce additional information into a model for preventing or reducing over-fitting." Duncan, col. 2, lines 57-59. Regularization parameters in Duncan, for instance, "may differ from one model to the other of a family." Duncan, col. 8, lines 2-5. That is, regularization parameters indicate differences between different forecasting models. 

In response, Examiner respectfully disagrees. Applicant’s own specification discloses penalty terms as regularization parameters (see par. 0052 of Applicant’s specification). Thus Examiner finds the regularization parameters disclosed in Duncan are sufficient in teaching the penalty terms disclosed in the claims. 
Additionally, claim 1 (similarly claims 12 and 17) currently requires either overfitting or underfitting. Examiner finds Duncan sufficiently teaches the aforementioned limitation because the reference discloses overfitting which is one of the two options. 
Furthermore, par. 0037 of Applicant’s specification states: 
[0037] On the other hand, under fitting occurs when the model 304 does not capture an underlying trend of the data and thus is not complex enough and "under fits" the user interaction data 118. Accordingly, the segment valuation system 122 may employ a penalty term 308 to balance accuracy and complexity of the model 304 through regularization. This may be performed automatically by the model training module 302 through adjustment of the penalty term 308 through successive training iterations using different portions of the user interaction data 118. The penalty term 308 may also be user specified, such as through interaction with a user interface of the model training module 302.


A plurality of such related models may be generated, instead of stopping at just one model, because in general it may be hard to pre-select the set of parameters that will tend to provide the most accurate forecasts. After the model family has been generated, one among them may be selected as the “best” (e.g., in terms of some metric indicative of the expected accuracy of its forecasts), and that optimal model may subsequently be used for forecasting.

and, col. 9, ln. 67 – col. 10, ln. 1-15 states:

If the Quality Metric (QM) values for a group of two or more Forecast Models (FM) are reasonably close (e.g., within one standard error) with respect to each other, and superior to the QM values of the remaining FMs, the least complex or most “parsimonious” model in the group (such as the one with the fewest number of parameters) may be designated as the best model of the group. The optimal model 387 may subsequently be utilized for obtaining forecasts for some or all of the variables represented in the TSS 310. Of course, in some scenarios it may be the case that no single model may stand out as the best, either on the basis of the QMs alone or on the basis of QMs combined with secondary criteria such as parameter count parsimony. In such a scenario, if a group of FMs appear to be of equal quality and equally complex, one may be selected at random as the optimal model in some embodiments.

Examiner also finds, the selection of the best/optimal model, is also sufficient in teaching the accuracy and complexity elements of the claim. 

Regarding the 35 U.S.C. 103 rejection, Applicant submits that none of the alleged references alone disclose or in combination teach or suggest these features. In particular, the references do not describe a system to (2) "an attribute selection module ... to select a subset of attributes from the plurality of attributes, the selection based on the determined significance".
	Duncan describes selecting optimal or superior forecasting models for time series data from a group of related forecasting models using state space representations in combination with 
Accordingly, withdrawal of the rejection is respectfully requested.

In response, Examiner respectfully disagrees. Applicant is respectfully reminded although the claims are interpreted in light of the specification, limitations from the specification are not read into the claims.  See In re Van Geuns, 988 F.2d 1181, 26 USPQ2d 1057 (Fed. Cir. 1993) and one cannot show nonobviousness by attacking references individually where the rejections are based on combinations of references.  See In re Keller, 642 F.2d 413, 208 USPQ 871 (CCPA 1981); In re Merck & Co., 800 F.2d 1091, 231 USPQ 375 (Fed. Cir. 1986). Examiner finds the parameters disclosed in Duncan to sufficiently teach the attributes recited in the claims. Duncan discloses training models using a test subset (e.g. quality metrics based on the errors in predicting the test set values) (see col. 4, ln. 10-14), which Examiner finds sufficient in teachings the selected attributes significance. For at least these reasons, the claim 1 remains 

Examiner encourages Applicant to further narrow the claims to include the combination of “overfitting and under fitting of the model” and significant details mentioned in Applicant’s arguments that are not currently claims, in order to advance prosecution. 

Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 

(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 

Because these claim limitations are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, they are being interpreted to cover the corresponding structure described in at least paragraph 0066 of the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.

35 USC § 101 - Claim Analysis
Claims 1-4, 6-12, 14-18 and 20-23 are eligible under Step 2A-Prong Two 35 U.S.C. 101 because the combination of additional elements integrates the abstract idea into a practical application. Claim 1 as a whole implements certain methods of organizing human activity (e.g. commercial interactions involving marketing, sales activities or behaviors) in a specific manner that sufficiently limits the abstract idea to the practical application. This is attributed to the used 

Claim Rejections - 35 USC § 103  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-5, 9-12, and 16-18 are rejected under 35 U.S.C. 103 as being unpatentable over Markey, et al., U.S. Publication No. 2014/0046777 [hereinafter Markey], and further in view of Duncan et al. U.S. Patent No. 10,318,874 [hereinafter Duncan].

Referring to Claim 1, Markey teaches: 
In a digital medium environment to value a segment of a user population towards achieving a metric (Markey, [0134]; [0139]; [0156]), a segment valuation system comprising: 
a model training module implemented at least partially in hardware of a computing device to:
obtain user interaction data, the user interaction data describing the user population, a plurality of attributes associated with the user population, and achievement of the metric by the 
train a model using machine learning based on the user interaction data to model achievement of the metric by the user population (Markey, [0098]), “Each of the algorithms may be trained on a training set of data, and then validated (measured) for predictiveness against the validation set of data… machine learning metrics… . It will be understood that general analytic methods, statistical techniques, and tools for evaluating competing algorithms and models, such as valuation models, as well as analytic methods, statistical techniques, and tools known to a person of ordinary skill in the art are intended to be encompassed by the present invention and may be used to evaluate competing algorithms and valuation models in accordance with the methods and systems of the present invention… how well it predicts the likelihood that showing a particular advertisement to a particular consumer in a particular context is likely to influence a consumer to engage in a desirable action, such as purchasing one of the advertiser's products, engaging with the advertiser product, affecting the consumer perception about the advertiser's 
Markey teaches an economic valuation model deployed by the real-time bidding machine facility 142 may be refined by the machine learning facility to evaluate information relating to one or more available placements to predict an economic valuation for each of the one or more placements. The learning machine facility 138 may obtain different types of data to refine the economic valuation model, including campaign and historic logs, an identifier for the user, the channel, time, price paid, ad message shown, and user resulting user actions, or some other type of campaign or historic log data (see par. 0187), but Markey does not explicitly teach: 
the model training module employing a penalty term using regularization to reduce overfitting or underfitting of the model to the user interaction data as part of balancing accuracy and complexity of the model, respectively; 
an attribute significance determination module implemented at least partially in hardware of a computing device to determine a significance of respective attributes of the plurality of attributes on the achievement of the metric based on the trained model; 
an attribute selection module implemented at least partially in hardware of a computing device to select a subset of attributes from the plurality of attributes, the selection based on the determined significance; and
a segment valuation module implemented at least partially in hardware of a computing device to generate data describing a valuation of the segment based on the selected subset of attributes. 
However Duncan teaches: 
 (Duncan, [col. 2, ln. 57-67]), “(Regularization parameters may be used, for example, to introduce additional information into a model for preventing or reducing over-fitting)… After the model family has been generated, one among them may be selected as the “best” (e.g., in terms of some metric indicative of the expected accuracy of its forecasts”; (Duncan, [col. 8, ln. 2-12]), “… Regularization-related parameters may differ from one model to the other of a family 250 in some embodiments. The number of model parameters for which non-null values or non-default values have been selected may differ from one model of the family to another in some embodiments. As mentioned earlier, a plurality of related models, all using the same methodology but with different parameter settings, may be generated because it may not be straightforward to select the parameter settings that are likely to provide the most accurate forecasts for the TSS 210”; (Duncan, [col. 9, ln. 60-67, col. 10, ln. 1-15]), “If the QM values for a group of two or more FMs are reasonably close (e.g., within one standard error) with respect to each other, and superior to the QM values of the remaining FMs, the least complex or most “parsimonious” model in the group (such as the one with the fewest number of parameters) may be designated as the best model of the group. The optimal model 387 may subsequently be utilized for obtaining forecasts for some or all of the variables represented in the TSS 310…”;
an attribute significance determination module implemented at least partially in hardware of a computing device to determine a significance of respective attributes of the plurality of attributes on the achievement of the metric based on the trained model (Duncan, [col. 5, ln. 2-5]), “the SSR and the training variant may be used to obtain an estimate of an intermediary set of 
an attribute selection module implemented at least partially in hardware of a computing device to select a subset of attributes from the plurality of attributes, the selection based on the determined significance (Duncan, [col. 9, ln. 39-42]), “the SSR may be used to estimate one or more optimal parameter values for the corresponding FM, and the optimal parameter values may then be used to obtain the predictions”; (Duncan, [col. 7, ln. 22-25]), “A member of a family of ARIMA models may differ from other members of the family in the specific combination of “order” and “seasonal” parameter settings, for example. In such a scenario, the cross-validation technique described herein may be used to select the combination of “order” and “seasonal” parameter values that is likely to result in the most accurate forecasts”; (Duncan, [col. 8, ln. 2-12]; [col. 4, ln. 10-14]); and
a segment valuation module implemented at least partially in hardware of a computing device to generate data describing a valuation of the segment based on the selected subset of attributes (Duncan, [col. 7, ln. 22-25]), “the cross-validation technique described herein may be used to select the combination of “order” and “seasonal” parameter values that is likely to result in the most accurate forecasts”; (Duncan, [col. 15, ln. 38-44]), “The SSR-based cross-validation approach towards forecasting model selection for time series data described above may be useful in a variety of scenarios. For example, the technique may be used for forecasting retail sales, for predicting demand for various commodities, for econometric problems, for weather forecasting, for medical diagnosis, and so on”; (Duncan, [col. 12, ln. 50]).  


Referring to Claim 2, the combination of Markey in view of Duncan teaches the system as described in claim 1. Markey further teaches:
wherein the training of the model using machine learning is supervised to train modelling of the achievement of the metric by the user population (Markey, [0145]), “data may be taken from various formats, including but not limited to information that is not about advertisements, such as successful market demographics data, and the like. This may include specific data streams, translating data into a neutral format, specific machine learning techniques, or some other data type or technique. In embodiments, the learning system may perform an auditing and/or supervisory function, including but not limited to optimizing the methods and systems as described herein. In embodiments, the learning system may learn from multiple data sources, and base optimization of the methods and systems as described herein based at least in part on the multiple data sources”. 

Referring to Claim 3, the combination of Markey in view of Duncan teaches the system as described in claim 1. Markey teaches multiple models may be run against multiple training algorithms that embody specified objectives, such as key performance indicators (see par. 0143) 
wherein the model is an ensemble model formed using a plurality of sub-models having weighted contributions towards an overall result of the ensemble model to describe the achievement of the metric by the user population.
However Duncan teaches: 
wherein the model is an ensemble model formed using a plurality of sub-models having weighted contributions towards an overall result of the ensemble model to describe the achievement of the metric by the user population (Duncan, [col. 11, ln. 9-23]), “predicted values pv0, pv3 and pv4 of prediction set pred0 may respectively be generated for the original observed values ov0, ov3 and ov4 using training variant Tr-variant0, predicted values pv1, pv5 and pv8 of pred1 may be generated using training variant Tr-variant1, and so on. Respective quality metrics 413 QM0, QM1, and QM2 may be generated based on the accuracy of each of the prediction sets Pred0, Pred1, and Pred2 in the depicted embodiment. The individual quality metrics obtained for each of the training/test combinations may then be combined (e.g., using the mean or some other selected aggregation function or functions) to arrive at an overall quality metric QM. The overall QMs for all the different models may then be compared to select the optimal model”. 
At the time the invention was filed, it would have been obvious to a person of ordinary skill in the art to have modified Markey to include the model limitations as taught by Duncan. The motivation for doing this would have been to improve the method of determining an anticipated economic valuation to service consumers in Markey (see par. 0095) to efficiently include the results of selecting parameter settings that are likely to provide the most accurate forecasts (see Duncan col. 8, ln. 9-10).

Referring to Claim 4, the combination of Markey in view of Duncan teaches the system as described in claim 1. Markey teaches: 
wherein the machine learning includes a linear model penalized with L1 normal regularization (Lasso), a random forest, a guided random forest, an adaptive boosting ensemble model (AdaBoost), or gradient boosted trees (GBRT) (Markey, [0148]), “Examples of machine learning algorithms may include, but are not limited to, Naive Bayes, Bayes Net, Support Vector Machines, Logistic Regression, Neural Networks, and Decision Trees…”; (Markey, [0200]; [0237]).

Referring to Claim 5, the combination of Markey in view of Duncan teaches the system as described in claim 1. Markey teaches on demand granularity of target parameters (see par. 0371-0373), but Markey does not explicitly teach: 
wherein the model training module employs a  penalty term that is configured to adjust a tradeoff between accuracy of the model and complexity of the model to reduce overfitting or under fitting of the model to the user interaction data.
However Duncan teaches: 
wherein the model training module employs a penalty term that is configured to adjust a  tradeoff between accuracy of the model and complexity of the model to reduce overfitting or under fitting of the model to the user interaction data (Duncan, [col. 2, ln. 57-67]), “(Regularization parameters may be used, for example, to introduce additional information into a model for preventing or reducing over-fitting)… After the model family has been generated, one among them may be selected as the “best” (e.g., in terms of some metric indicative of the 
At the time the invention was filed, it would have been obvious to a person of ordinary skill in the art to have modified Markey to include the model limitation as taught by Duncan. The motivation for doing this would have been to improve the method of determining an anticipated economic valuation to service consumers in Markey (see par. 0095) to efficiently include the results of selecting parameter settings that are likely to provide the most accurate forecasts (see Duncan col. 8, ln. 9-10).

Referring to Claim 9, the combination of Markey in view of Duncan teaches the system as described in claim 1. Markey further teaches:
wherein the segment valuation module is further implemented to output the generated data in a user interface (Markey, [0228]), “FIG. 40 depicts a data visualization embodiment presenting a summary of page visits by the number of impressions. The methods and system of the present invention may identify the conversion rates that different cohorts of consumers present…”; (Markey, [0226]-[0227]).

Referring to Claim 10, the combination of Markey in view of Duncan teaches the system as described in claim 1. Markey further teaches:
wherein the segment valuation module generates data describing the valuation of the segment of the user population based on the selected subset of attributes independent of use of attributes of the plurality of attributes that are not in the selected subset of attributes (Markey, [0229]), “Measured advertising campaign results, including results that are categorized by user, user groups, and the like, may be subsequently utilized by advertisers to modify advertising campaigns to maximize the effect of the advertisement messages on intended user and/or user group targets”; (Markey, [0231]-[0233]), “…several characteristics of media may be utilized to enable the creation of small segments that may contain anywhere from one or a plurality of individuals, all of whom may share one or more characteristics. Characteristics may include, but are not limited to, a time of day (e.g., the time of day that an advertisement is viewed), a geographic region, an individuals' interest in a type of content. Each characteristic, or combination of characteristics may be used to define and/or describe a set of individuals. Therefore, the characteristics (such as time of the day, day of the week, browser and operating system used, screen resolution, geographic region, and type of content/content category) may be used as targeting parameters. … Targeting parameters may vary among media channels in terms of nature of these channels… Moreover, the nature of these parameters may change… it may be possible to use a combination of multiple parameters (available to a channel) to name definite sections of the channel, irrespective of the channel being chosen by the advertiser. Also, channel sections may be small in some cases and describe few individuals, but may be defined nonetheless by using as many targeting parameters as possible”.

Referring to Claim 11, the combination of Markey in view of Duncan teaches the system as described in claim 1. Markey further teaches: 
wherein the metric is conversion defined using a conversion rate or a monetary amount (Markey, [0228]), “The methods and system of the present invention may identify the conversion rates that different cohorts of consumers present”; (Markey, [0362]-[0363]).

Referring to Claim 12, Markey teaches: 
In a digital medium environment to value a segment of a user population towards achieving a metric, a method implemented by at least one computing device, the method comprising (Markey, [0134]; [0139]; [0156]):
obtaining, by the at least one computing device, user interaction data describing the user population, a plurality of attributes associated with the user population, and achievement of the metric by the user population (Markey, [0142]-[0143]), “the consumer (i.e., the digital media user), and the message/advertisement may be used to predict the success of an advertisement based at least in part on specified key performance indicators 300. Contextual data may include data relating to the type of media, the time of day or week, or some other type of contextual data. Data relating to a consumer, or digital media user, may include demographics, geographic data, and data relating to consumer intent or behavior, or some other type of consumer data. Data relating to the message and/or advertisement may include data associated with the creative content of the message/advertisement, the intention or call to action embodied in the message/advertisement, or some other type of data… data may be collected based at least in part on the interactions of the plurality of digital media users and the selected advertising content”; (Markey [0088]; [0093]; [0231]); 

outputting, by the at least one computing device, the data describing the valuation of the segment of the user population in a user interface (Markey, [0228]), “FIG. 40 depicts a data visualization embodiment presenting a summary of page visits by the number of impressions. The methods and system of the present invention may identify the conversion rates that different cohorts of consumers present…”; (Markey, [0226]-[0227]).
Markey teaches an economic valuation model deployed by the real-time bidding machine facility 142 may be refined by the machine learning facility to evaluate information relating to one or more available placements to predict an economic valuation for each of the one or more placements. The learning machine facility 138 may obtain different types of data to refine the 
the training employing a penalty term using regularization to reduce overfitting or underfitting of the model to the user interaction data as part of balancing accuracy and complexity of the model respectively;
determining, by the at least one computing device, a significance of respective attributes of the plurality of attributes on the achievement of the metric based on the trained model; 
selecting, by the at least one computing device, a subset of attributes from the plurality of attributes, the selecting based on significance of respective ones of the plurality of attributes in the achievement of the metric based on the trained model; and
generating, by the at least one computing device, data describing a valuation of the segment based on inclusion of the selected subset of attributes in the segment and the determined significance. 
However Duncan teaches: 
the training employing a penalty term using regularization to reduce overfitting or underfitting of the model to the user interaction data as part of balancing accuracy and complexity of the model respectively (Duncan, [col. 2, ln. 57-67]), “(Regularization parameters may be used, for example, to introduce additional information into a model for preventing or reducing over-fitting)… After the model family has been generated, one among them may be selected as the “best” (e.g., in terms of some metric indicative of the expected accuracy of its forecasts”; (Duncan, [col. 8, ln. 2-12]), “… Regularization-related parameters may differ from one model to the other of a family 250 in some embodiments. The number of model parameters 
determining, by the at least one computing device, a significance of respective attributes of the plurality of attributes on the achievement of the metric based on the trained model (Duncan, [col. 9, ln. 39-42]), “the SSR may be used to estimate one or more optimal parameter values for the corresponding FM, and the optimal parameter values may then be used to obtain the predictions”; 

generating, by the at least one computing device, data describing a valuation of the segment based on inclusion of the selected subset of attributes in the segment and the determined significance (Duncan, [col. 5, ln. 22-26]), “the optimal model from the family may then be selected based on a comparison of the summary quality metrics corresponding to each of the models in at least some embodiments. The optimal model may then be used for subsequent forecasting”; (Duncan, [col. 7, ln. 22-25]), “the cross-validation technique described herein may be used to select the combination of “order” and “seasonal” parameter values that is likely to result in the most accurate forecasts”; (Duncan, [col. 15, ln. 38-44]), “The SSR-based cross-validation approach towards forecasting model selection for time series data described above may be useful in a variety of scenarios. For example, the technique may be used for forecasting retail sales, for predicting demand for various commodities, for econometric problems, for weather forecasting, for medical diagnosis, and so on”.  
At the time the invention was filed, it would have been obvious to a person of ordinary skill in the art to have modified Markey to include the training, determining, selecting and generating limitations as taught by Duncan. The motivation for doing this would have been to improve the method of determining an anticipated economic valuation to service consumers in Markey (see par. 0095) to efficiently include the results of selecting parameter settings that are likely to provide the most accurate forecasts (see Duncan col. 8, ln. 9-10).

Referring to Claim 16, the combination of Markey in view of Duncan teaches the method as described in claim 12. Markey further teaches: 
wherein the metric defines conversion of a good or service by a respective said user of the user population (Markey, [0228]), “The methods and system of the present invention may identify the conversion rates that different cohorts of consumers present”; (Markey, [0362]-[0363]).

Referring to Claim 17, Markey teaches: 
In a digital medium environment to value a segment of a user population towards achieving a metric (Markey, [0134]; [0139]; [0156]), a system comprising:
means for obtaining user interaction data, the user interaction data describing the user population, a plurality of attributes associated with the user population, and achievement of the metric by the user population (Markey, [0142]-[0143]), “the consumer (i.e., the digital media user), and the message/advertisement may be used to predict the success of an advertisement based at least in part on specified key performance indicators 300. Contextual data may include data relating to the type of media, the time of day or week, or some other type of contextual data. Data relating to a consumer, or digital media user, may include demographics, geographic data, and data relating to consumer intent or behavior, or some other type of consumer data. Data relating to the message and/or advertisement may include data associated with the creative content of the message/advertisement, the intention or call to action embodied in the message/advertisement, or some other type of data… data may be collected based at least in part 
means for training a model using machine learning based on user interaction data to model achievement of the metric by the user population (Markey, [0098]), “Each of the algorithms may be trained on a training set of data, and then validated (measured) for predictiveness against the validation set of data… machine learning metrics… . It will be understood that general analytic methods, statistical techniques, and tools for evaluating competing algorithms and models, such as valuation models, as well as analytic methods, statistical techniques, and tools known to a person of ordinary skill in the art are intended to be encompassed by the present invention and may be used to evaluate competing algorithms and valuation models in accordance with the methods and systems of the present invention… how well it predicts the likelihood that showing a particular advertisement to a particular consumer in a particular context is likely to influence a consumer to engage in a desirable action, such as purchasing one of the advertiser's products, engaging with the advertiser product, affecting the consumer perception about the advertiser's product, visiting a web page, or taking some other kind of action which is valued by the advertiser”; (Markey, [0099]; [0152]; [0142]-[0143]).
Markey teaches an economic valuation model deployed by the real-time bidding machine facility 142 may be refined by the machine learning facility to evaluate information relating to one or more available placements to predict an economic valuation for each of the one or more placements. The learning machine facility 138 may obtain different types of data to refine the economic valuation model, including campaign and historic logs, an identifier for the user, the channel, time, price paid, ad message shown, and user resulting user actions, or some other type of campaign or historic log data (see par. 0187), but Markey does not explicitly teach: 
the training means employing a penalty term using regularization to reduce overfitting or underfitting of the model to the user interaction data as part of balancing accuracy and complexity of the model, respectively;
	means for determining a significance of respective attributes of the plurality of attributes on the achievement of the metric based on the trained model 
means for selecting a subset of attributes from the plurality of attributes, the selecting based on the determined significance; and
means for generating data describing a valuation of the segment based on inclusion of the selected subset of attributes in the segment and the determined significance.
However Duncan teaches: 
the training means employing a penalty term using regularization to reduce overfitting or underfitting of the model to the user interaction data as part of balancing accuracy and complexity of the model, respectively (Duncan, [col. 2, ln. 57-67]), “(Regularization parameters may be used, for example, to introduce additional information into a model for preventing or reducing over-fitting)… After the model family has been generated, one among them may be selected as the “best” (e.g., in terms of some metric indicative of the expected accuracy of its forecasts”; (Duncan, [col. 8, ln. 2-12]), “… Regularization-related parameters may differ from one model to the other of a family 250 in some embodiments. The number of model parameters for which non-null values or non-default values have been selected may differ from one model of the family to another in some embodiments. As mentioned earlier, a plurality of related models, all using the same methodology but with different parameter settings, may be generated because it may not be straightforward to select the parameter settings that are likely to provide the most accurate forecasts for the TSS 210”; (Duncan, [col. 9, ln. 60-67, col. 10, ln. 1-15]), “If the QM 
means for determining a significance of respective attributes of the plurality of attributes on the achievement of the metric based on the trained model (Duncan, [col. 5, ln. 2-5]), “the SSR and the training variant may be used to obtain an estimate of an intermediary set of “optimal” parameter values, which can then be used with the forecasting model to obtain predictions for the test variant entries”; (Duncan, [col. 9, ln. 39-42]), “the SSR may be used to estimate one or more optimal parameter values for the corresponding FM, and the optimal parameter values may then be used to obtain the predictions”; (Duncan, [col. 6, ln. 7-13]), “The machine learning service may include numerous computation engines (e.g., physical and/or virtual machines), with each engine comprising one or more threads of execution. A large number of computation engines spread over numerous geographically-dispersed data centers may be used for machine learning tasks or statistical computing tasks in some provider networks”;   
means for selecting a subset of attributes from the plurality of attributes, the selecting based on the determined significance  (Duncan, [col. 9, ln. 39-42]), “the SSR may be used to estimate one or more optimal parameter values for the corresponding FM, and the optimal parameter values may then be used to obtain the predictions”; (Duncan, [col. 7, ln. 22-25]), “A member of a family of ARIMA models may differ from other members of the family in the specific combination of “order” and “seasonal” parameter settings, for example. In such a scenario, the cross-validation technique described herein may be used to select the combination 
means for generating data describing a valuation of the segment based on inclusion of the selected subset of attributes in the segment and the determined significance (Duncan, [col. 7, ln. 22-25]), “the cross-validation technique described herein may be used to select the combination of “order” and “seasonal” parameter values that is likely to result in the most accurate forecasts”; (Duncan, [col. 15, ln. 38-44]), “The SSR-based cross-validation approach towards forecasting model selection for time series data described above may be useful in a variety of scenarios. For example, the technique may be used for forecasting retail sales, for predicting demand for various commodities, for econometric problems, for weather forecasting, for medical diagnosis, and so on”; (Duncan, [col. 6, ln. 7-13]), “The machine learning service may include numerous computation engines (e.g., physical and/or virtual machines), with each engine comprising one or more threads of execution. A large number of computation engines spread over numerous geographically-dispersed data centers may be used for machine learning tasks or statistical computing tasks in some provider networks”; (Duncan, [col. 12, ln. 50]).  
At the time the invention was filed, it would have been obvious to a person of ordinary skill in the art to have modified Markey to include the determining, selecting and generating limitations as taught by Duncan. The motivation for doing this would have been to improve the method of determining an anticipated economic valuation to service consumers in Markey (see par. 0095) to efficiently include the results of selecting parameter settings that are likely to provide the most accurate forecasts (see Duncan col. 8, ln. 9-10).

Referring to Claim 18, the combination of Markey in view of Duncan teaches the system as described in claim 17. Markey further teaches: 
wherein the machine learning includes a random forest machine learning techniques or an adaptive boosting ensemble model (AdaBoost) ) (Markey, [0148]), “Examples of machine learning algorithms may include, but are not limited to, Naive Bayes, Bayes Net, Support Vector Machines, Logistic Regression, Neural Networks, and Decision Trees…”; (Markey, [0200]; [0237]).

Claims 6, 7, 14, 15 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Markey, et al., U.S. Publication No. 2014/0046777 [hereinafter Markey], in view of Duncan et al. U.S. Patent No. 10,318,874 [hereinafter Duncan], and further in view of Chittilappilly et al. U.S. Publication No. 2016/0210657 [hereinafter Chittilappilly].

Referring to Claim 6, the combination of Markey in view of Duncan teaches the system as described in claim 1. Markey teaches a score may exist for every consumer (see par. 0330) and grouping users based on shared attributes (see par. 0261), but the combination of Markey in view of Duncan does not explicitly teach:
wherein the attribute selection module selects the subset of attributes from the plurality of attributes based on a respective score generated for each attribute of the plurality of attributes regarding the significance in achieving the metric.
However Chittilappilly teaches: 
wherein the attribute selection module selects the subset of attributes from the plurality of attributes based on a respective score generated for each attribute of the plurality of attributes 
At the time the invention was filed, it would have been obvious to a person of ordinary skill in the art to have modified the combination of Markey in view of Duncan to include the select limitation as taught by Chittilappilly. The motivation for doing this would have been to improve the method of determining an anticipated economic valuation to service consumers in Markey (see par. 0095) to efficiently include the results of using statistically accurate user behavior predictions (see Chittilappilly par. 0096).

Referring to Claim 7, the combination of Markey in view of Duncan in view of Chittilappilly teaches the system as described in claim 6. Markey teaches a score may exist for every consumer (see par. 0330) and grouping users based on shared attributes (see par. 0261), but the combination of Markey in view of Duncan does not explicitly teach:
wherein the score defines a measure of accuracy in the significance in achieving the metric.
However Chittilappilly teaches: 
wherein the score defines a measure of accuracy in the significance in achieving the metric (Chittilappilly, [0092]), “…when a representative portion of the accessed sets of user interactions have been scored, a calculation determines how many of the representative portion of the accessed sets of user interactions have a score above a given threshold (see decision 528). If there is a sufficiently high likelihood of conversion with further stimuli (e.g., propensity score is greater than a threshold), more spend might be allocated to further stimuli (see the "Yes" branch of decision 528). If the likelihood of conversion is low (e.g., propensity score is lower than a threshold), then no spend might be allocated to additional stimuli (e.g., see "No" branch of decision 528 and step 530).
At the time the invention was filed, it would have been obvious to a person of ordinary skill in the art to have modified the combination of Markey in view of Duncan to include the score limitation as taught by Chittilappilly. The motivation for doing this would have been to improve the method of determining an anticipated economic valuation to service consumers in Markey (see par. 0095) to efficiently include the results of using statistically accurate user behavior predictions (see Chittilappilly par. 0096).

Referring to Claim 14, the combination of Markey in view of Duncan teaches the method as described in claim 12. Markey teaches a score may exist for every consumer (see par. 0330) and grouping users based on shared attributes (see par. 0261), but the combination of Markey in view of Duncan does not explicitly teach:
wherein the selecting is based on a respective score generated for each attribute of the plurality of attributes regarding the significance in achieving the metric.
However Chittilappilly teaches: 
wherein the selecting is based on a respective score generated for each attribute of the plurality of attributes regarding the significance in achieving the metric (Chittilappilly, [0092]-[0093]), “…when a representative portion of the accessed sets of user interactions have been scored, a calculation determines how many of the representative portion of the accessed sets of user interactions have a score above a given threshold (see decision 528). If there is a sufficiently high likelihood of conversion with further stimuli (e.g., propensity score is greater than a threshold), more spend might be allocated to further stimuli (see the "Yes" branch of decision 528). If the likelihood of conversion is low (e.g., propensity score is lower than a threshold), then no spend might be allocated to additional stimuli (e.g., see "No" branch of decision 528 and step 530)…In the event that there is a sufficiently high likelihood of conversion with further stimuli, a next stimulus and/or next set of stimuli can be selected for the subject users (see step 540). For example, the selected user stimuli might be identified based in part on the stimulus selection rules 187…”. 
At the time the invention was filed, it would have been obvious to a person of ordinary skill in the art to have modified the combination of Markey in view of Duncan to include the selecting limitation as taught by Chittilappilly. The motivation for doing this would have been to improve the method of determining an anticipated economic valuation to service consumers in Markey (see par. 0095) to efficiently include the results of using statistically accurate user behavior predictions (see Chittilappilly par. 0096).

Referring to Claim 15, the combination of Markey in view of Duncan teaches the method as described in claim 14. Markey teaches a score may exist for every consumer (see par. 0330) and grouping users based on shared attributes (see par. 0261), but the combination of Markey in view of Duncan does not explicitly teach:
wherein the score defines a measure of accuracy in the significance in achieving the metric.
However Chittilappilly teaches: 
wherein the score defines a measure of accuracy in the significance in achieving the metric (Chittilappilly, [0092]), “…when a representative portion of the accessed sets of user interactions have been scored, a calculation determines how many of the representative portion of the accessed sets of user interactions have a score above a given threshold (see decision 528). If there is a sufficiently high likelihood of conversion with further stimuli (e.g., propensity score is greater than a threshold), more spend might be allocated to further stimuli (see the "Yes" branch of decision 528). If the likelihood of conversion is low (e.g., propensity score is lower than a threshold), then no spend might be allocated to additional stimuli (e.g., see "No" branch of decision 528 and step 530).
At the time the invention was filed, it would have been obvious to a person of ordinary skill in the art to have modified the combination of Markey in view of Duncan to include the score limitation as taught by Chittilappilly. The motivation for doing this would have been to improve the method of determining an anticipated economic valuation to service consumers in Markey (see par. 0095) to efficiently include the results of using statistically accurate user behavior predictions (see Chittilappilly par. 0096).

Referring to Claim 20, the combination of Markey in view of Duncan teaches the system as described in claim 17. Markey teaches a score may exist for every consumer (see par. 0330) and grouping users based on shared attributes (see par. 0261), but the combination of Markey in view of Duncan does not explicitly teach:
wherein the selecting means is to select the subset of attributes from the plurality of attributes based on a respective score generated for each attribute of the plurality of attributes regarding the significance in achieving the metric, the score defining a measure of accuracy in the significance in achieving the metric.
However Chittilappilly teaches:
wherein the selecting means is to select the subset of attributes from the plurality of attributes based on a respective score generated for each attribute of the plurality of attributes regarding the significance in achieving the metric, the score defining a measure of accuracy in the significance in achieving the metric (Chittilappilly, [0092]-[0093]), “…when a representative portion of the accessed sets of user interactions have been scored, a calculation determines how many of the representative portion of the accessed sets of user interactions have a score above a given threshold (see decision 528). If there is a sufficiently high likelihood of conversion with further stimuli (e.g., propensity score is greater than a threshold), more spend might be allocated to further stimuli (see the "Yes" branch of decision 528). If the likelihood of conversion is low (e.g., propensity score is lower than a threshold), then no spend might be allocated to additional stimuli (e.g., see "No" branch of decision 528 and step 530)…In the event that there is a sufficiently high likelihood of conversion with further stimuli, a next stimulus and/or next set of stimuli can be selected for the subject users (see step 540). For example, the selected user stimuli might be identified based in part on the stimulus selection rules 187…”. 
.

Claim 8 is rejected under 35 U.S.C. 103 as being unpatentable over Markey, et al., U.S. Publication No. 2014/0046777 [hereinafter Markey], in view of Duncan et al. U.S. Patent No. 10,318,874 [hereinafter Duncan], in view of Chittilappilly et al. U.S. Publication No. 2016/0210657 [hereinafter Chittilappilly], and further in view of Lee et al. U.S. Publication No. 2015/0379429 [hereinafter Lee].

Referring to Claim 8, the combination of Markey in view of Duncan in view of Chittilappilly teaches the method as described in claim 7. Markey teaches a score may exist for every consumer (see par. 0330) and grouping users based on shared attributes (see par. 0261), but the combination of Markey in view of Duncan in view of Chittilappilly does not explicitly teach:
wherein the score is an F-score that describes significance of respective said attributes towards the achievement of the metric.
However Lee teaches:
wherein the score is an F-score that describes significance of respective said attributes towards the achievement of the metric (Lee, [0246]), “operations that may be performed at a 
At the time the invention was filed, it would have been obvious to a person of ordinary skill in the art to have modified the combination of Markey in view of Duncan in view of Chittilappilly to include the score limitation as taught by Lee. The motivation for doing this would have been to improve the method of determining an anticipated economic valuation to service consumers in Markey (see par. 0095) to efficiently include the results of improving the quality of predictions made by a machine learning model (see Lee par. 0179).

Claim 21 is rejected under 35 U.S.C. 103 as being unpatentable over Markey, et al., U.S. Publication No. 2014/0046777 [hereinafter Markey], in view of Duncan et al. U.S. Patent No. 10,318,874 [hereinafter Duncan], and further in view of Achin et al. U.S. Publication No. 2018/0060738 [hereinafter Achin].

Referring to Claim 21, the combination of Markey in view of Duncan teaches the method as described in claim 12. Markey teaches a score may exist for every consumer (see par. 0330) and grouping users based on shared attributes (see par. 0261), but the combination of Markey in view of Duncan does not explicitly teach:
wherein the output data in a user interface includes a ranked list of attributes and corresponding metric significance scores.
However Achin teaches: 
wherein the output data in a user interface includes a ranked list of attributes and corresponding metric significance scores (Achin, [0319]),” the system 100 may present (e.g., display) an evaluation of the dataset to the user (e.g., at step 410 of the method 400), and the presented evaluation may include the predictive values of the dataset's features and/or information derived therefrom. For example, for one or more modeling procedures or models, the system 100 may (1) identify "more important" and/or "less important features", (2) display the predictive values of the features, (3) rank the features by their predictive values, and/or (4) recommend that collection of less important features be halted and/or that less important features be removed from the dataset…”.
At the time the invention was filed, it would have been obvious to a person of ordinary skill in the art to have modified the combination of Markey in view of Duncan to include the output limitation as taught by Achin. The motivation for doing this would have been to improve the method of determining an anticipated economic valuation to service consumers in Markey (see par. 0095) to efficiently include the results of identifying variables that are likely to have significant predictive value (see Achin par. 0111).

Claim 22 is rejected under 35 U.S.C. 103 as being unpatentable over Markey, et al., U.S. Publication No. 2014/0046777 [hereinafter Markey], in view of Duncan et al. U.S. Patent No. 10,318,874 [hereinafter Duncan], and further in view of Nagano et al. U.S. Publication No. 2013/0006991 [hereinafter Nagano].

Referring to Claim 22, the combination of Markey in view of Duncan teaches the system as described in claim 1. Markey teaches KPIs, each with a specific weight (see par. 0298), but Markey does not explicitly teach:
wherein the attribute significance determination module is further configured to adjust a weighting of each of the plurality of attributes in the model to determine a relative effect of each of the attributes on the achievement of the metric.
However Nagano teaches: 
wherein the attribute significance determination module is further configured to adjust a weighting of each of the plurality of attributes in the model to determine a relative effect of each of the attributes on the achievement of the metric (Nagano, [0047]), “all features do not necessarily have equal importance to represent a certain piece of content…the present invention, the weight wk of each of the features Xi,k is learned to show the degree of contribution to the degree of subjective similarity”; (Nagano, [0121]-[0122]). 
At the time the invention was filed, it would have been obvious to a person of ordinary skill in the art to have modified the weights in Markey to include the adjustment limitation as taught by Nagano. The motivation for doing this would have been to improve the method of determining an anticipated economic valuation to service consumers in Markey (see par. 0095) .

Claim 23 is rejected under 35 U.S.C. 103 as being unpatentable over Markey, et al., U.S. Publication No. 2014/0046777 [hereinafter Markey], in view of Duncan et al. U.S. Patent No. 10,318,874 [hereinafter Duncan], and further in view of Stajner U.S. Publication No. 2017/0293857  [hereinafter Stajner].

Referring to Claim 23, the combination of Markey in view of Duncan teaches the system as described in claim 17. Markey teaches an economic valuation model deployed by the real-time bidding machine facility may be refined by the machine learning facility to evaluate information relating to one or more available placements to predict an economic valuation for each of the one or more placements. The learning machine facility may obtain different types of data to refine the economic valuation model, including campaign and historic logs, an identifier for the user, the channel, time, price paid, ad message shown, and user resulting user actions, or some other type of campaign or historic log data (see par. 0187) and Duncan teaches regularization parameters (see col. 2, ln. 54-59), but the combination of Markey in view of Duncan does not explicitly teach: 
wherein the penalty term is adjusted automatically through successive training iterations using different portions of the user interaction data.
However Stajner teaches: 
wherein the penalty term is adjusted automatically through successive training iterations using different portions of the user interaction data (Stajner, [0037]), “… the model parameter 
At the time the invention was filed, it would have been obvious to a person of ordinary skill in the art to have modified the combination of Markey in view of Duncan to include the adjustment limitation as taught by Stajner. The motivation for doing this would have been to improve the method of determining an anticipated economic valuation to service consumers in Markey (see par. 0095) to efficiently include the results of providing distributed online learning for personalized predictive models (see Stajner par. 0028).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Doshi et al. (US 20180024859 A1) – Equations may be regularized to reduce the risk of over-fitting. Ridge regression techniques may be utilized to prevent over-fitting of curves, by adjusting coefficients in the Nth degree polynomial. A gradient descent technique may be implemented for several iterations until the equations stabilize in order to ensure the correct minimum is obtained and the cost function is minimized. An appropriate degree (2nd degree) of the features is nd degree of features allows the regional boundaries represented by the work classification model equations to be curves rather than straight lines. This may enable more accurate representation of a region shape and is highly suitable for discrete classification.

Chidlovskii et al. (US 8386574 B2) – The effect of selecting different values of .gamma. Because of the different nature of the features used in the document representation based on the implicit social network, less tuning is needed. The feature values are fixed. No evaluation was made of the effect of reducing the number of features. For the Gaussian kernel, different values of the y parameter, which controls the smoothness of the decision boundary, were evaluated. The optimal value of y for the data set, from those tested, was found to be 0.1. Significantly larger values tend to lead to under-fitting: large steps in the convergence. Significantly smaller values also tend to lead to under-fitting: giving good performance until a certain point, with erratic behavior thereafter.

Chapelle (US 20090089274 A1) – Methods, systems, and apparatuses for generating relevance functions for ranking documents obtained in searches are provided. One or more features to be used as predictor variables in the construction of a relevance function are determined. The relevance function is parameterized by one or more coefficients. A query error is defined that measures a difference between a relevance ranking generated by the relevance function and a training set . 

Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Crystol Stewart whose telephone number is (571)272-1691.  The examiner can normally be reached on 9:00am-5:00pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is 
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Patricia Munson can be reached on (571)270-5396.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/CRYSTOL STEWART/Primary Examiner, Art Unit 3624