DETAILED ACTION
This Final Office Action is responsive to Applicant’s Amendment filed on 05 Feb 2021 in which claims 1, 9, and 22-24 were amended and claims 17-20 are canceled.
Claims 1 - 16 and 21 - 24 are currently pending and under examination, of which claims 1, 9, and 22 are independent claims. No claims are currently in condition for allowance.
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Remarks
Applicant’s remarks dated 02/05/2021 regarding the prior art have been considered, but they are moot in view of the new grounds of rejection as necessitated by applicant’s amendments. Examiner has updated search and consideration in view of the present status of claims. Additional pertinent art cited at conclusion, see Malhotra as relevant patent pending literature.

Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) because the claim limitations use a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim 
Because these claim limitations are being interpreted under 35 U.S.C. 112(f) they are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have these limitations interpreted under 35 U.S.C. 112(f), applicant may:  (1) amend the claim limitations to avoid them being interpreted under 35 U.S.C. 112(f) (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitations recite sufficient structure to perform the claimed function so as to avoid them being interpreted under 35 U.S.C. 112(f).
To the extent at which drafting of the claims use the word “or”, this is interpreted as requiring one but not both of the elements.
To the extent at which drafting of the claims use the word “describes”, this is interpreted as non-functional descriptive matter, see MPEP 2111.05

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the 

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1-6, 8, and 21-24 are rejected under 35 U.S.C. 103 as being unpatentable over: 
Sifa et al., “Predicting Purchase Decisions in Mobile Free-to-Play Games”, hereinafter Sifa, in view of 
Loshchilov et Hutter, “Online Batch Selection for Faster Training of Neural Networks”, hereinafter Loshchilov, in view of 
Peng, Yuxin, “Adaptive Sampling with Optimal Cost for Class-Imbalance Learning”, hereinafter Peng.
With respect to claim 1, Sifa teaches: 
	In a digital medium environment to predict user interaction with subsequent digital content {Sifa [P.81 ¶2] “our main goal is to predict whether the player is going to purchase an in-game item in the future” by [P.82 ¶6] “learning a model that predicts the average number of future purchases”}, a method implemented by at least one computing device {Sifa [P.81 ¶4] “method to handle learning tasks in imbalanced datasets” where [P.80 ¶7] “player starts the game client on their mobile device”} comprising: 
	receiving training data, by the at least one computing device, that describes a first class of users having interactions with digital content and a second class of users that did not interact with the digital content {Sifa [P.83 ¶4] “Given our dataset with a player’s sessions and purchase, we construct a training dataset akin to the one used for the classification task” is training whereby a first and second class of users corresponds to [P.79 ¶1] “imbalance in between players that spend money, which we will refer to as premium players, and those who do not, here referred to as non-spending players”}; 
predicting a likelihood of interaction with the subsequent digital content using the trained model by the computing device {Sifa [P.83] Tbl. 3 values for “predicted number of future purchases” and/or [P.84 ¶2] “reduce the prediction of a mean λ to a binary” performed as [P.82] “Regression Task: learning a model that predicts the average number of future purchases” is learned probabilistic model, trained on [P.80] “Tbl. 1 describes the processed dataset used for training and testing the prediction models”}.
	Sifa further discloses [P.81 Last¶] “grid search over their parameters”, Figs 1-2.
	However, Sifa does not teach training comprising a “subset parameter” and “batch parameter”, which is disclosed by Loshchilov:
	obtaining, by the at least one computing device, a subset parameter and a batch parameter, the subset parameter is set using machine learning to indicate a total number of samples from a first class sample subset and a second class sample subset to be included in a subset of the training data and the batch parameter set using machine learning to indicate a number of samples to be included in a batch taken from the subset of the training data {Loshchilov [Sect.4 P.3-5] “The procedure we propose for batch selection is given in Algorithm 3” details online batch selection whereby “We propose to online select batches (both size and datapoints) for an algorithm A to maximize its progress (e.g., defined by the objective function f or cross-validation error) over the resource budget (e.g., the number of evaluated datapoints or time)”. Parameters are identified in Algorithm 3 at Lines1-2 which include “batch size b” is batch parameter, and a subset parameter is “number of datapoints N” as computed over resource budget, i.e., “the number of available training datapoints is bounded N” and/or “Neff denote the number of unique datapoints selected among the last N selections”. The effective result is to train a model sampled according to probability of i-th datapoint ranked w.r.t. loss. See also “recompute loss values for top rratio*N datapoints” and “the whole set will be enumerated in N/b steps (denoted here as one epoch)”}; 
	training a model using machine learning {Loshchilov [Title] “training of neural networks”} by the at least one computing device {Loshchilov [P.12 Sect10.2 Last¶] “compute on a GPU”}, the training including: 
selecting a batch from the subset of the training data having the number of samples specified by the batch parameter {Loshchilov [P.4] Algorithm 3 details “Batch Selection Procedure SelectBatch()”. Further, [P.4 ¶2] “batch selection is given in Algorithm 3, and we can integrate it into AdaDelta and Adam by calling it in line 5 of Algorithms 1 and 2”}; and 
		processing the selected batch using machine learning by the at least one computing device to train the model {Loshchilov [P.3 Sect.4 ¶1] “∇ft(xt) is computed w.r.t. a batch” gradient calc model training}; and 
	Both Sifa and Loshchilov are directed to predictive modeling with sample optimization thus being analogous. A person having ordinary skill in the art would have considered it obvious prior to the effective filing date to utilize the online batch selection of Loshchilov in combination with Sifa in order to “control the sources of this information, the batch itself. More specifically, we suggest to more frequently select training datapoints with the greatest contributions to the objective function… demonstrated that online batch selection speeds up the convergence of the state-of-the-art DNN training methods AdaDelta and Adam by a factor of about 5” (Loshchilov [P.9 Sect.9 ¶1-2]).
However, the combination of Sifa and Loshchilov does not expressly teach subsets being “added”.
	This is cured by Peng:
		generating the subset of the training data by: 
			generating the first class sample subset having a number of samples taken from the first class {Peng [P.2923] Algorithm 1 details “training data set labeled as positive set P… divide P into P1 and P2” class label positive is first. Further, [P.2923 Part3] “For each positive example pi in P, calculate npi as the number of samples that needs to be generated”}; and
			generating the second class sample subset having a number of samples sampled from the second class that is added to the number of samples in the first class sample subset to have the number of samples specified by the subset parameter {Peng [P.2923] Algorithm 1 details “training data set labeled as …negative set N… N into N1, N2” class label negative is second. Further, “P1+N1” or “P2+N2” is adding first/second or pos/neg. Parameterization of subset is by weighting-based sampling (Wp, Wn) with sub-classifiers model1 & model2 to consider sample importance whereby [P.2923 Last¶] “each sub-classifierij, which uses (j+1)P+Ni as training data set”. Additional teaching includes under-sampling of negative, over-sampling of positive, modified smote, and weighted fusion set according to accuracy. The process as a whole is described per [Abstract] “different sub-classifiers by different subsets of training data with the best cost ratio adaptively chosen”};  
Peng, published in AAAI like Sifa, is directed to predictive model training for optimized sampling thus being analogous. A person having ordinary skill in the art would have considered it obvious prior to the effective filing date to add subsets of training as in Peng in combination with Sifa and Loshchilov to arrive at the invention as claimed as applying a known technique to a known method to yield predictable results and/or in order to address “the main problems of the existing methods are: (1) In most existing data-level methods, the degree of re-sampling, which is a key factor that affects greatly the performance, needs to be pre-fixed… To address the above issues, a novel approach of adaptively sampling with optimal cost is proposed for class-imbalance learning” (Peng [P.2922 ¶2-3]).

With respect to claim 2, the combination of Sifa, Loshchilov, and Peng teaches the method as described in claim 1, wherein
	the machine learning is performed by the at least one computing device using a neural network {Loshchilov [P.1 Sect.1 ¶1] “Deep neural networks (DNNs)… on high-performance GPUs”}. One having ordinary skill in the art would have considered it obvious prior to the effective filing date to implement the machine learning according to the neural network disclosed by Loshchilov as obvious to try from among a finite number of ways to implement a machine learning model with reasonable expectation of success and/or because “(DNNs) are the best-performing method for many classification problems” (Loshchilov [P.1 Sect.1 ¶1]).

With respect to claim 3, the combination of Sifa, Loshchilov, and Peng teaches the method as described in claim 1, wherein
	the batch is iteratively selected and iteratively processed using stochastic gradient descent {Loshchilov [P.4] Algorithm 3 details “Batch Selection Procedure SelectBatch()” with batch parameter “batch size b”. Further, [P.4 ¶2] “batch selection is given in Algorithm 3, and we can integrate it into AdaDelta and Adam by calling it in line 5 of Algorithms 1 and 2”. The call of SelectBatch() in Algs 1 and 2 [P.2-3] is nested within “repeat” loop over time steps, until stopping criterion is met. This is under the heading of [Sect.2] “Stochastic Gradient Descent (SGD)” where gradient is updated algorithmically at ∇ft(xt-1). The process as a whole is online. See also [P.2 Sect.2 Last¶] “we argue that not only the update of xt given ∇ft(xt) is crucial, but the selection of batch {ψbi=1}~Db used to compute ∇ft(xt) also greatly contributes to the overall performance” and [P.6] “epoch index e”}.

With respect to claim 4, the combination of Sifa, Loshchilov, and Peng teaches the method as described in claim 1, wherein
	the training data is sampled iteratively by the at least one computing device as part of the training {Peng [P.2923 Alg1 Step4] “Calculate the weight in iteration t” weight calc is training, iterated for positive and negative samples}.

With respect to claim 5, the combination of Sifa, Loshchilov, and Peng teaches the method as described in claim 1, wherein:
the first class is a positive class that describes the user interactions with the digital content that involve selection of the digital content or conversion of a related good or service; the second class is a negative class that describes users that did not select the digital content or convert the related good or service; and the first class has a fewer number of members than the second class {Sifa [P.79] “imbalance in between players that spend money, which we will refer to as premium players, and those who do not, here referred to as non-spending players” premium player is positive (purchasing) class and non-spending player is negative (non-purchasing) class, and the classes are imbalanced where [P.84 LeftCol] “the number of premium users is low”}.  

With respect to claim 6, the combination of Sifa, Loshchilov, and Peng teaches the method as described in claim 1, wherein
the training data is sampled based on a parameter that sets a sample size for the first or second classes {Peng [P.2923 Alg1 Step4] “Wp(pi)” and “Wn(ni)” W is weight parameter for samples of pos(p) and neg(n). Weighting distribution for sample size is processed by under-sample, over-sample, modified smote [P.2923 ¶2-4]. Additionally, [P.2924 Alg2] BestCostRatio balances size}. 

With respect to claim 8, the combination of Sifa, Loshchilov, and Peng teaches the method as described in claim 1, further comprising
controlling exposure of the subsequent digital content, by the at least one computing device, based at least in part on the predicted likelihood of interaction {Sifa [P.82] “exposure to purchasing is a strong indicator of purchase activity in the future”; [P.84] “players are partitioned into groups based on their expected number of future purchases” so as “to strengthen revenue streams from players through smart CRM and to increase engagement via tailored experiences”}. 

With respect to claim 21, the combination of Sifa, Loshchilov, and Peng teaches the method as described in claim 1, wherein 
	the subset parameter and the batch parameter are set using machine learning through use of a grid-search procedure to set an optimum combination of the subset parameter and the batch parameter that leads to a best result in the training of the model to predict the likelihood.
	Sifa discloses [P.81 Last¶] “grid search over their parameters” but does not disclose the parameters specified as subset and batch. Loshchilov discloses parameters for batch and subset as in claim 1. Finally, Peng discloses [P.2922 ¶3,6] “combining these sub-classifiers according to their accuracy… adaptively select the best cost ratio” where Algorithm2 details BestCostRatio for imbalance [P.2924 Alg2] and “final strong-classifier is defined as the weighted fusion of the above m weak-classifiers, the weights of which are set according to their accuracy”. A person having ordinary skill in the art would have considered it obvious prior to the effective filing date to implement the grid search disclosed by Sifa with parameters of Loshchilov so as to balance pos/neg as disclosed by Peng to arrive at the invention as claimed. Doing so would improve accuracy of a training set to optimize classifier performance, “best cost ratio is selected which achieves the highest accuracy… if accuracyk > BestAccuracy then” (Peng [P.2924], [Alg2 Line5]).

With respect to claim 22, Sifa teaches: 
In a digital medium environment to predict user interaction with subsequent digital content {Sifa [P.81 ¶2] “our main goal is to predict whether the player is going to purchase an in-game item in the future” by [P.82 ¶6] “learning a model that predicts the average number of future purchases”}, a method implemented by at least one computing device {Sifa [P.81 ¶4] “method to handle learning tasks in imbalanced datasets” where [P.80 ¶7] “player starts the game client on their mobile device”} comprising: 
receiving training data, by the at least one computing device, that describes a first class of users having interactions with digital content and a second class of users that did not interact with the digital content {Sifa [P.83 ¶4] “Given our dataset with a player’s sessions and purchase, we construct a training dataset akin to the one used for the classification task” is training whereby a first and second class of users corresponds to [P.79 ¶1] “imbalance in between players that spend money, which we will refer to as premium players, and those who do not, here referred to as non-spending players”}; 
training a model using machine learning by the at least one computing device, the training including generating a training sample {Sifa [P.80 Last¶] “training and testing the prediction models” with “extracted a random sample from a week of new installs”} by: 
sbmc7 P5747-USpredicting a likelihood of interaction with the subsequent digital content by using the trained model by the computing device {Sifa [P.83] Tbl. 3 values for “predicted number of future purchases” and/or [P.84 ¶2] “reduce the prediction of a mean λ to a binary” performed as [P.82] “Regression Task: learning a model that predicts the average number of future purchases” is learned probabilistic model, trained on [P.80] “Tbl. 1 describes the processed dataset used for training and testing the prediction models”}.
Sifa further discloses [P.81 Last¶] “grid search over their parameters” and [P.81 ¶3 or P.82 ¶2] “oversampling”.
However, Sifa does not expressly disclose a “subset parameter indicating a total number of samples from a first class sample subset and a second class sample subset”, which is cured by Peng:
obtaining a subset parameter set using machine learning, the subset parameter indicating a total number of samples from a first class sample subset and a second class sample subset to be included in a subset of the training data {Peng [P.2923] Algorithm 1 details total number of samples from first/second or positive/negative classes as “P1+N1” or “P2+N2”. Parameterization of subset is by weighting-based sampling (Wp, Wn) through sub-classifiers model1 & model2 to consider sample importance for imbalanced class problem, [P.2922-23]. The process as a whole is described per [Abstract] “different sub-classifiers by different subsets of training data with the best cost ratio adaptively chosen”}; 
generating the first class sample subset from the first class {Peng [P.2923] Algorithm 1 details “training data set labeled as positive set P… divide P into P1 and P2” class label positive is first class}; 
upsampling the first class sample subset to have a number of samples based on an upsampling parameter {Peng [P.2923 Part3] “Wp(pi)… weighting-based oversampling” upsampling is oversampling, and weight W is parameter. Further, “For each positive example pi in P, calculate npi as the number of samples that needs to be generated”}; 
generating the subset of the training data as having the upsampled first class sample subset and the second class sample subset having a number of samples sampled from the second class such that the subset of training data reaches the subset parameter {Peng [P.2923 Last¶ - P.2924] “After re-sampling the data set, the adaptive cost-sensitive learning method is proposed to train classifiers. For each sub-classifierij, which uses (j+1)P+Ni as training data set, the ‘best’ cost ratio C+/C- of the penalty constants C+ and C- is adaptively selected for positive and negative samples, so the ‘best’ model for sub-classifierij is constructed… The final strong-classifier is defined as the weighted fusion of the above m weak-classifiers, the weights of which are set according to their accuracy”}; 
Peng, published in AAAI like Sifa, is directed to predictive model training for optimized sampling thus being analogous. A person having ordinary skill in the art would have considered it obvious prior to the effective filing date to add subsets of training disclosed by Peng in combination with Sifa to arrive at the invention as claimed as applying a known technique to a known method to yield predictable results and/or in order to address “the main problems of the existing methods are: (1) In most existing data-level methods, the degree of re-sampling, which is a key factor that affects greatly the performance, needs to be pre-fixed… To address the above issues, a novel approach of adaptively sampling with optimal cost is proposed for class-imbalance learning” (Peng [P.2922 ¶2-3]).
However, the combination of Sifa and Peng does not expressly disclose a “batch parameter”, which is disclosed by Loshchilov: 
obtaining a batch parameter, the batch parameter set using machine learning to indicate a number of samples to be included in a batch taken from the subset of the training data {Loshchilov [Sect.4 P.3-5] “The procedure we propose for batch selection is given in Algorithm 3” details online batch selection comprising parameters identified in Algorithm 3 at Lines1-2 which include “batch size b” is batch parameter}; 
selecting a batch from the subset of the training data based on the batch parameter {Loshchilov [P.4] Algorithm 3 details “Batch Selection Procedure SelectBatch()”. Further, [P.4 ¶2] “batch selection is given in Algorithm 3, and we can integrate it into AdaDelta and Adam by calling it in line 5 of Algorithms 1 and 2”}; and 
processing the selected batch using machine learning by the at least one computing device to train the model {Loshchilov [P.3 Sect.4 ¶1] “∇ft(xt) is computed w.r.t. a batch” gradient calc model training}; and 
	Loshchilov is directed to predictive modeling with sample optimization thus being analogous. A person having ordinary skill in the art would have considered it obvious prior to the effective filing date to utilize the online batch selection of Loshchilov in combination with Sifa and Peng in order to “control the sources of this information, the batch itself. More specifically, we suggest to more frequently select training datapoints with the greatest contributions to the objective function… demonstrated that online batch selection speeds up the convergence of the state-of-the-art DNN training methods AdaDelta and Adam by a factor of about 5” (Loshchilov [P.9 Sect.9 ¶1-2]).

With respect to claim 23, the combination of Sifa, Loshchilov, and Peng teaches the method as described in claim 22, wherein 
	the subset parameter and the batch parameter are set using machine learning through use of a grid-search procedure. Sifa discloses [P.81 Last¶] “grid search over their parameters” but does not disclose the parameters specified as subset and batch. These elements are disclosed by Loshchilov and Peng as detailed in claim 22. A person having ordinary skill in the art would have considered it obvious prior to the effective filing date to perform the grid search disclosed by Sifa over parameters of Loshchilov and Peng so as applying a known technique to a known method to yield predictable results and/or to update training with the most accurate weighted parameters. 

With respect to claim 24, the combination of Sifa, Loshchilov, and Peng teaches the method as described in claim 22, wherein 
	the subset parameter and the batch parameter are set using machine learning as a part of a technique to set an optimum combination of the subset parameter and the batch parameter that leads to a best result in the training of the model to predict the likelihood.
Peng discloses [P.2922 ¶3,6] “combining these sub-classifiers according to their accuracy… adaptively select the best cost ratio” where Algorithm2 details BestCostRatio for imbalance [P.2924 Alg2] and “final strong-classifier is defined as the weighted fusion of the above m weak-classifiers, the weights of which are set according to their accuracy”. A person having ordinary skill in the art would have considered it obvious prior to the effective filing date to implement the cost-sensitive learning with weighted fusion set to accuracy as disclosed by Peng in arriving at an optimum combination of parameters. Doing so would improve accuracy of a training set to optimize classifier performance, “best cost ratio is selected which achieves the highest accuracy… if accuracyk > BestAccuracy then” (Peng [P.2924], [Alg2 Line5]).

Claim 6 is rejected under 35 U.S.C. 103 as being unpatentable over Sifa, Loshchilov, and Peng in view of 
Szegedy et al., US Patent No 9129228B1, hereinafter Szegedy.
With respect to claim 6, the combination of Sifa, Loshchilov, and Peng teaches the method as described in claim 1. Szegedy teaches wherein
the training data is sampled based on a parameter that sets a sample size for the first or second classes {Szegedy Fig 2 “Select a set of optimization parameters including a sample size… Sample a subset of training data based on the selected sample size”}. 
Szegedy is directed to training of machine learning models with sample optimization thus being analogous. A person having ordinary skill in the art would have considered it obvious prior to the effective filing date to use the subset sample size parameterization of Szegedy for the first or second classes of Sifa in order to optimize model fitting or tune the parameters (Szegedy [Col1 Lines14-27]).

Claim 7 is rejected under 35 U.S.C. 103 as being unpatentable over Sifa, Loshchilov, and Peng in view of 
Giannini et al., “Purchase Likelihood Prediction for Targeted Organic Food Marketing Campaigns in China”, hereinafter Giannini.
With respect to claim 7, the combination of Sifa, Loshchilov, and Peng teaches the method as described in claim 1, wherein
the predicted likelihood of interaction includes a probability that the subsequent digital content Wolfe-SBMC5 of 30P5747-USconfigured as an advertisement will be selected by the user or a probability of a likelihood of conversion of a related good or service by the user {Sifa [P.83] Tbl. 3 values detail “predicted number of future purchases” for [P.82] “Regression Task: learning a model that predicts the average number of future purchases”}.  
However, the combination of Sifa, Loshchilov, and Peng does not expressly disclose the content configured as an “advertisement” which is disclosed by Giannini per [P.1765 Sect.C] “Purchase likelihood prediction… predict purchase outcomes for the advertised product”; Figs 1, 8 purchase and no-purchase probabilities using [P.1761] “imbalanced classes… imbalanced training sets”
Giannini is directed to training of machine learning models with sample optimization thus being analogous. A person having ordinary skill in the art would have considered it obvious prior to the effective filing date for the content of Sifa to be configured as an advertisement as disclosed by Giannini because known work in one field of endeavor may prompt variations of it for use in the same or a different field based on market forces. Purchase predictions would apply to a wide range of content as different products or services being reasonable to one of ordinary skill in the art. See for example “optimize marketing strategy decisions and gain competitive advantages” (Giannini [P.1760 ¶2]) and/or “target consumer selection” (Giannini [P.1766 Sect.D]) where “spending among customers is not unique to games” (Sifa [P.79]).

Claims 9-11 and 13-16 are rejected under 35 U.S.C. 103 as being unpatentable over Mehanian et al., US PG Pub No 20150269609A1, hereinafter Mehanian, in view of Peng and Loshchilov.
With respect to claim 9, Mehanian teaches: 
In a digital medium environment to generate a model to predict user interaction with subsequent digital content, an iterative machine learning system {Mehanian teaches a system environment illustrated Fig 3A-3B having user device computer, retailer website server, and personalization server with assorted modules including [0086] “classifier module 354 to compute the purchase probability” and [0049] “iterative algorithm”. Further, [0115] “it should be understood that the modules, routines, features, attributes, methodologies and other aspects of the present subject matter can be implemented using hardware, firmware, software, or any combination” i.e., [0057,60] “system 300 illustrated in Figs. 3A and 3B is representative of an example system… processor(s) may be coupled to the memor(ies) via data/communication bus” as [0062] “non-transitory”} comprising: 
a classifier module implemented by the processing system and computer readable storage medium to process the iteratively selected batch using machine learning to train the model to predict a likelihood of interaction with the subsequent digital content by the user {Mehanian [0086-87] “classifier module 354 to compute the purchase probability” and [0049] “training module 344… the model parameters are updated with the values that maximize the likelihood of the data, given the latent variables. These two steps are iterated until the data likelihood stops changing appreciably”. Both modules training and classifier interrelate as [0069] “training module344 may provide a set of parameters (e.g., to the classifier module 354)” and hardware is [0073] “classifier module 354 includes computer logic executable by a processor”}.
a sample selection module implemented by a processing system and computer readable storage medium {Mehanian [0067-69] “encoder module 342 includes computer logic executable by a computer processor”, [0108] “computer readable storage medium”} to: 
a batch selection module implemented by the processing system and computer readable storage medium {Mehanian [0071-74] “encoder module 352 includes computer logic executable by a processor”, [0108] “computer readable storage medium” where [0065] “Multiple data stores may all be included in the same storage device or system, or disparate storage systems”} to: 
	However, Mehanian does not teach functionality for batch parameter or subset parameter.
	Peng teaches: 
obtain a subset parameter specified using machine learning to indicate a total number of samples from a first class sample subset and a second class sample subset to be included in a subset of training data {Peng [P.2923] Algorithm 1 details total number of samples from first/second or positive/negative classes as “P1+N1” or “P2+N2”. Parameterization of subset is by weighting-based sampling (Wp, Wn) through sub-classifiers model1 & model2 to consider sample importance for imbalanced class problem, [P.2922-23]. The process as a whole is described per [Abstract] “different sub-classifiers by different subsets of training data with the best cost ratio adaptively chosen”}; and 
generate the subset of the training data including the first class sample subset taken from a first class of the training data and the second class sample subset having a number of samples sampled from a second class of the training data such that the subset of the training data reaches the number of samples specified by the subset parameter {Peng [P.2923 Last¶ - P.2924] “After re-sampling the data set, the adaptive cost-sensitive learning method is proposed to train classifiers. For each sub-classifierij, which uses (j+1)P+Ni as training data set, the ‘best’ cost ratio C+/C- of the penalty constants C+ and C- is adaptively selected for positive and negative samples, so the ‘best’ model for sub-classifierij is constructed… The final strong-classifier is defined as the weighted fusion of the above m weak-classifiers, the weights of which are set according to their accuracy”}; 
Peng, published in AAAI, is directed to predictive model training for optimized sampling thus being analogous. A person having ordinary skill in the art would have considered it obvious prior to the effective filing date to add subsets of training as in Peng in combination with the modular system of Mehanian to arrive at the invention as claimed as applying a known technique to a known method to yield predictable results and/or in order to address “the main problems of the existing methods are: (1) In most existing data-level methods, the degree of re-sampling, which is a key factor that affects greatly the performance, needs to be pre-fixed… To address the above issues, a novel approach of adaptively sampling with optimal cost is proposed for class-imbalance learning” (Peng [P.2922 ¶2-3]).
However, the combination of Mehanian and Peng does not teach “batch parameter”.
		Loshchilov teaches: 
sbmc4 P5747-USobtain a batch parameter set using machine learning to indicate a number of samples to be included in a batch taken from the subset of the training data {Loshchilov [Sect.4 P.3-5] “The procedure we propose for batch selection is given in Algorithm 3” details online batch selection comprising parameters identified in Algorithm 3 at Lines1-2 which include “batch size b” is batch parameter}; and 
iteratively select a batch from the subset of the training data having the number of samples indicated by the batch parameter {Loshchilov [P.4] Algorithm 3 details “Batch Selection Procedure SelectBatch()” with batch parameter “batch size b”. Further, [P.4 ¶2] “batch selection is given in Algorithm 3, and we can integrate it into AdaDelta and Adam by calling it in line 5 of Algorithms 1 and 2”. The call of SelectBatch() in Algs 1 and 2 [P.2-3] is nested within “repeat” loop over time steps, until stopping criterion is met. The process as a whole is online. See also [P.2 Sect.2 Last¶] “we argue that not only the update of xt given ∇ft(xt) is crucial, but the selection of batch {ψbi=1}~Db used to compute ∇ft(xt) also greatly contributes to the overall performance” and [P.6] “epoch index e”}; and 
Loshchilov is directed to predictive modeling with sample optimization thus being analogous. A person having ordinary skill in the art would have considered it obvious prior to the effective filing date to utilize the online batch selection of Loshchilov in combination with Mehanian and Peng in order to “control the sources of this information, the batch itself. More specifically, we suggest to more frequently select training datapoints with the greatest contributions to the objective function… demonstrated that online batch selection speeds up the convergence of the state-of-the-art DNN training methods” (Loshchilov [P.9 Sect.9 ¶1-2]).

With respect to Claim 10, the combination of Mehanian, Peng, and Loshchilov teaches the system as described in claim 9, wherein 
	the batch selection module uses stochastic gradient descent (SGD) to iteratively select the batch {Loshchilov [P.4] Algorithm 3 details “Batch Selection Procedure SelectBatch()” with batch parameter “batch size b”. Further, [P.4 ¶2] “batch selection is given in Algorithm 3, and we can integrate it into AdaDelta and Adam by calling it in line 5 of Algorithms 1 and 2”. The call of SelectBatch() in Algs 1 and 2 [P.2-3] is nested within “repeat” loop over time steps, until stopping criterion is met. This is under the heading of [Sect.2] “Stochastic Gradient Descent (SGD)” where gradient is updated algorithmically at ∇ft(xt-1). The process as a whole is online. See also [P.2 Sect.2 Last¶] “we argue that not only the update of xt given ∇ft(xt) is crucial, but the selection of batch {ψbi=1}~Db used to compute ∇ft(xt) also greatly contributes to the overall performance” and [P.6] “epoch index e”}.

With respect to claim 11, the combination of Mehanian, Peng, and Loshchilov teaches the system as described in claim 9, wherein 
the sample selection module is implemented by the processing system and computer readable storage medium to iteratively sample the subset of the training data from the first class and the subset of the training data from the second class that is used by the batch selection module to iteratively select the batch {Loshchilov teaches iterative selection of batch as described in rejection of claims 10 or 3, incorporated herein. Iterative sampling of training subset is taught by Peng [P.2923] “Calculate the weight in iteration t” and/or “re-sampling the data set”}. 
One of ordinary skill in the art would have considered it obvious prior to the effective filing date to combine Loshchilov and Peng to arrive at the invention as claimed because the batch selection of Loshchilov is called online. The motivation to do so is “rank-based batch selection strategy, where all training datapoints k are ranked (sorted) in descending order w.r.t. their latest computed ψk(x)” loss (Loshchilov [P.4 Last¶], [Abs]). Furthermore, one would expect to achieve the result, with or without the teaching of Peng as evidenced by the motivation statement.

With respect to claim 13, the combination of Mehanian, Peng, and Loshchilov teaches the system as described in claim 9, wherein
the sample selection module samples the training data based on a parameter that sets a sample size {Peng [P.2923 Alg1 Step4] “Wp(pi)” and “Wn(ni)” W is weight parameter for samples of pos(p) and neg(n). Weighting distribution for sample size is processed by under-sample, over-sample, modified smote [P.2923 ¶2-4]. Additionally, [P.2924 Alg2] BestCostRatio balances size}. 

With respect to claim 14, the combination of Mehanian, Peng, and Loshchilov teaches the system as described in claim 9, further comprising
a prediction management module implemented by the processing system and computer readable storage medium to predict a likelihood of interaction with the subsequent digital content by the user using the trained model {Mehanian [0070] “production engine 350 includes computer logic executable by a processor… uses the parameters from the training engine 340 to predict probabilities that a user will perform a given action”}. 

With respect to claim 15, the combination of Mehanian, Peng, and Loshchilov teaches the system as described in claim 14, wherein
the predicted likelihood of interaction includes a probability that the subsequent digital content formedWolfe-SBMC5 of 30P5747-USforme as an advertisement will be selected by the user or a probability of a likelihood of conversion of a related good or service by the user {Mehanian Fig 6 “Purchase probability” and “Purchase likelihood” and Fig 12 purchase probability for Offer A, Offer B, Offer C. An offer is an ad. Reference is replete with regard to product purchase probability/likelihood, for which calculation is per [0086-89]}.  

With respect to claim 16, the combination of Mehanian, Peng, and Loshchilov teaches the system as described in claim 14, further comprising 
a service control module implemented by the processing system and computer readable storage medium to control exposure of the subsequent digital content based at least in part on the predicted likelihood of interaction {Mehanian [0076-78] “offer module 356 is coupled to and may include computer logic executable by a processor of the personalization server 330 to formulate offers (e.g., a discount or other offer to send to the user device 310) based on probability data… determine a customized offer for a given user based on the probability that user will purchase”}. 

Claim 12 is rejected under 35 U.S.C. 103 as being unpatentable over Mehanian, Peng, and Loshchilov in view of Sifa.
With respect to claim 12, the combination of Mehanian, Peng, and Loshchilov teaches the system as described in claim 9. Sifa teaches wherein:
the first class is a positive class that describes the user interactions with the digital content that involve selection of the digital content or conversion of a related good or service; the second class is a negative class that describes users that did not select the digital content or convert the related good or service; and the first class has a fewer number of members than the second class {Sifa [P.79] “imbalance in between players that spend money, which we will refer to as premium players, and those who do not, here referred to as non-spending players” premium player is positive (purchasing) class and non-spending player is negative (non-purchasing) class, and the classes are imbalanced where [P.84 LeftCol] “the number of premium users is low”}.  
	A person having ordinary skill in the art would have considered it obvious prior to the effective filing date to specify the purchase and non-purchase of Mehanian Fig 5 according to imbalanced distribution as disclosed by Sifa as obvious in applying a known technique to a known method with reasonable expectation of success and/or since “more than half of the players can essentially be neglected because it is very unlikely that they will ever purchase any virtual goods” (Sifa [P.83 Last¶]).




The prior art made of record and not relied upon is considered pertinent to applicant's disclosure: 
Malhotra et al., US PG Pub No 20180365715A1 “Method and System for Purchase Behavior Prediction of Customers” disclosure from Tata comprises class-imbalance and grid-search parameter tuning for probabilistic purchase prediction LSTM model, of note pars.  [0037], [0033], [0028]. Earliest priority date precedes instant application. Examiner has attached a certified copy of foreign priority document as FOR Malhotra.
Zhang et al., “Towards Class-Imbalance Aware Multi-Label Learning” publication IJCAI discloses multi-class imbalance learner with training subsets and ensemble mixture.
Zhang et Pennacchiotti, “Predicting Purchase Behaviors from Social Media” discloses user purchase prediction with rank probabilistic SVM model weighted by purchase and grid search over training subsets, see equations 1-2 and 6.












Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Chase P Hinckley whose telephone number is (571)272-7935.  The examiner can normally be reached on M-F 9:00 - 5:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Miranda M. Huang can be reached on 571-270-7092.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/CHASE P. HINCKLEY/Examiner, Art Unit 2124                     
                                                                                                                                                                                   /MIRANDA M HUANG/Supervisory Patent Examiner, Art Unit 2124