Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION

This action is in reference to the communication filed on 7 FEB 2022.
Claim1 is present and examined, amendments therein have been considered. 

Claim Rejections - 35 USC § 101

35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1 rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter. As explained below, the claim(s) are directed to an abstract idea without significantly more. 

	
With respect to claim 1, the independent claim (claim 1) is directed, in part, to computing a bid for available inventory of advertisements, sending the bid, receiving an auction result in response, calculating a reward based on parameters, calculating q values, calculating losses, and using that information to recalculate a model. These claim elements are considered to be abstract ideas because they are directed to a method of organizing human activity which include commercial and legal commercial or legal interactions (including agreements in the form of contracts; legal obligations; advertising, marketing or sales activities or behaviors; business relations). For example, facilitating a bidding process for an advertisement auction is an advertising marketing sales behavior. If a claim limitation, under its broadest reasonable interpretation, covers commercial and legal interactions, then it falls within the “method of organizing human activity” grouping of abstract ideas. Further, the claims also recite mathematical concepts in the form of a computation of a bid, a calculation of a reward, q values, and losses, and backpropagation of the losses back through the model itself. If a claim limitation 
This judicial exception is not integrated into a practical application. In particular, the claim recites additional element –first and second neural networks are present,  a database is used to store tactics, and a real time bid server facilitates the auction (but, Examiner notes, does not appear to participate in the following calculations or modeling).  The database storing data is recited at a high level of generality – database storing information is extra solution activity, such that it amounts no more than mere instructions to apply the exception using a generic computer component data storage itself. Similarly, the “server” is also recited at a high level of generality, and again, at best performs extra solution activity of sending/receiving data. Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
The independent claims are additionally directed to claim elements such as the general use of neural networks/techniques, a  database and a real time bid server. When considered individually, the database and server claim elements only contribute generic recitations of technical elements to the claims. Similarly, the neural networks are not even identified by name in the claims, and are essentially recited as “applied” rather than a meaningful part of the claims. It is readily apparent, for example, that the claim is not directed to any specific improvements of these elements. Examiner looks to Applicant’s specification in [fig 1 and related text], which describes databases 105, servers 110 in purely functional terms – i.e. any machine capable of executing storing and communication via a network. These passages, as well as others, makes it clear that the invention is not directed to a technical improvement.    When the claims are considered individually and as a whole, the additional elements noted above, appear to merely apply the abstract concept to a technical environment in a very general sense – i.e. a generic computer receives information from another generic computer, processes the information and then sends information back. The most significant elements of the claims, that is the elements that really outline the inventive elements of the claims, are set forth in the elements identified as an abstract idea.   The fact that the generic computing devices are facilitating the abstract concept is not enough to confer statutory subject matter eligibility.

Claim Rejections - 35 USC § 103

In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claim 1 is/are rejected under 35 U.S.C. 103 as being unpatentable over Basu et al (US 20210065250, hereinafter Basu) in view of Gou et al (US 20190303765 A1, hereinafter Gou), further in view of Van de Wiele et al  (US 20210357731 A1, hereinafter Van de Wiele).

In reference to claim 1
Basu teaches: A system for training a bidding model (at least [fig 1 and related text]) using first and second neural networks (at least [fig. 6 and related text] – at step 606 – a first model trained by a first algorithm, at 608, a second model trained by a second algorithm , the system comprising: 
strategy, e.g., to maximize revenue for low-impression keywords.”; 
a plurality of hyperparameters (at least [073] “In one example, the bid constraint system 128 may be configured as a P-controller-based agent to tune hyperparameters of the low-impression keyword model 134. The bid constraint system 128 is configured to do so based on the live demand data 302 as the low-impression keyword model 134 is used in real-time for keyword bidding with the search engine platform 106…”); 
the system configured to perform operations, in response to an available inventory from a publisher relayed through a real time bid server, including computing a bid on the available inventory (at least [022] “The portfolio optimization platform continues to update the low-impression keyword model as it is used in real-time based on this data and according to the sparse-data algorithm, e.g., the temporal difference learning algorithm. For instance, the portfolio optimization platform tunes parameters of the model in real-time based on this data, such as a parameter that defines a number of days' worth of data used to generate the bids or a parameter that defines a learning rate for updating the model using the data.” At [043] “…the demand data 118 and the performance data 116 may be used as it is received in real-time to update the low-impression keyword model 134 once the model is deployed to make actual bids.”  [0102] model trainer 212 updates the low-impression keyword model 134 in real-time using the live demand data 302 that corresponds to the low-impression keywords. ; 
sending the bid to the real time bid server (at least [022, 028, 066,]  “…impression keyword model 134 is deployed to submit actual bids to the search engine platform 106. In other words, the model trainer 212 is also used when the low-impression keyword model 134 is “online….”); 
receiving an auction result in response to the bid (at least [047, 066, 070, fig 4 and related text] a bid wins the respective digital content is shown); 
calculating a plurality of rewards based on the auction result and the tactics (at least [056-066] based on the comparison of the auction result (i.e. a win) and as compared to the goal/strategy (i.e. tactic), a reward is calculated for ; 
using the first neural network, calculate a plurality of target q values based on the rewards (at least [056-066] i.e. at [066] “This reward is propagated to the state, as a feedback, to learn about a mean reward achieved due to the action taken in the state. The mean reward is used along with the state and the action to update the Q-value of the state, which is used subsequently for bidding in a next training iteration.” At [060] “By determining an optimal Q-value for each state s.sub.t iteratively, the model trainer 212 trains the low-impression keyword model…” ); 
training the second neural network to update the target q-values (at least [065] With the Bellman equation, a state's learned Q-values can be used to make a determination regarding adjusting bids in real time and also to provide feedback for updating learned Q-values with new data and rewards observed in live data. However, this approach has the disadvantage that it requires computation and storage of a probability p(s′|s, a) of a given keyword to transition from a state s given an action a to negation s′ of the state. This computation is difficult in connection with low-impression keywords due to the dynamics and uncertainty in behavior of the low-impression keywords.” at [066] “By using the temporal difference approach (i.e, a single state is capable of holding multiple bid units so that the low-impression keyword model 134 can bid (according to the action) using optimal Q-values for the given state. Responsive to submitting the training keyword bids 208, each bid unit has different feedback (e.g., a different reward). This reward is propagated to the state, as a feedback, to learn about a mean reward achieved due to the action taken in the state. The mean reward is used along with the state and the action to update the Q-value of the state, which is used subsequently for bidding in a next training iteration. The model trainer 212 is also used when the low-impression keyword model 134 is deployed to submit actual bids to the search engine platform 106. In other words, the model trainer 212 is also used when the low-impression keyword model 134 is “online.” When the low-impression keyword model 134 is deployed, the model trainer 212 may continuously update the low-impression keyword model 134 using the temporal difference learning algorithm discussed above. In so doing, the model trainer 212 updates the low-impression keyword model 134 as live data—describing actual behavior of the search engine platform 106 and users in connection with keyword bidding and content exposure—is received. In the context of submitting actual keyword bids to the search engine platform 106 and updating the low-impression keyword model 134 as it is being used to submit actual bids to the search engine platform 106, consider FIG. 3.”)
calculate a plurality of losses (at least [056-066, at 060] “Here, the term γ represents a discount factor for delayed rewards, which discounts rewards obtained in the future due to certain bids more than rewards now are discounted…” (i.e. a loss of a reward in the current calculation) ; 
[using] the losses [to train] through the bidding model (at least [056-060, at 060] “Here, the term γ represents a discount factor for delayed rewards, which discounts rewards obtained in the future due to certain bids more than rewards now are discounted. By determining an optimal Q-value for each state s.sub.t iteratively, the model trainer 212 trains the low-impression keyword model 134 to identify an optimized sequential decision, e.g., of actions to take in order and in terms of bid submissions.”). 
Although Basu discloses a plurality of means of training a model, and further the use of the Bellman equation (i.e. a form of neural networks) and while one of ordinary skill in the art could reasonably infer that backprop[o]gating is an effective way to converge data values in a manner similar to the cited reference, in the interest of compact prosecution Examiner notes that Basu does not specifically disclose backpropagation.  
Gou however, does teach: 
A system for training a bidding model using first and second neural networks, (at least [0139-0140] )
A plurality of tactics stored on at last one database (at last [0252,  fig 23 and related text] a plurality of strategies are stored for subsequent application); 
A plurality of hyperparameters (at least [0246, 0240, 0169, 0248] hyperparameters given as examples);
Calculating a plurality of rewards based  on a result and a tactic (at least [0244-0247] values of reward as compared to the goal, i.e. tactic); 
Calculating  a plurality of losses (at least [0249-0250, 0140, at 0140] “For example, the update may be based on backpropagation, a Bellman equation, a quality value (e.g., q value), a loss value (e.g., a squared error loss), and/or the like.”)
Backpropagating the losses through the model (at least [0140, 0221, 0247-050, at 0169] “Additionally or alternatively, the learn stage 640 may include updating the parameters of the first neural network (e.g., P) by reducing (e.g., minimizing) the loss 641 (e.g., the difference between q and q.sub.t), e.g., by backpropagation and/or the like.”). Basu and Gou are analogous references at the very least in that they both teach use of the Bellman equation to train a machine learning model. Gou specifically discloses that the use of backpropagation is commonly used in machine learning/training of machine learning models (see 0169), and specifically, to improve machine learning/prediction using the Bellman equation (see 0163). Gou further discloses that the use of backpropagation is especially useful when considering a parameter modification over several iterations (see 0140), which one of ordinary skill in the art would have found obvious to include with the Bellman equation/machine learning training of Basu. 
Although both references discuss a use of multiple neural networks, and while Basu discloses training multiple neural networks, and Gou specifically discloses doing so using backpropagation, and while one of ordinary skill in the art would recognize backpropagation is commonly used with stochastic gradient technique, in the interest of compact prosecution, Examiner notes that neither reference explicitly discloses the use of SGD in updating a value. 
Van de Wiele however, does teach: 
Using a stochastic gradient descent based on a plurality of values, training the neural network to update the target values (at least [0106] “The system determines a gradient of an error between the target output and the Q value for the first action with respect to the Q network parameters and determines, from the gradient, an update to the current values of the Q network parameters (step 414). For example, the error can be a mean squared error and the system can determine the gradient through backpropagation. The system can then determine the update by applying an update rule to gradient, e.g., a stochastic gradient descent update rule…” see also [0118]). Van de Wiele is analogous to both Bosu and Gou, as all references disclose uses of q-value calculation to optimize machine learning/neural networks. In particular, both Gou and Van de Wiele discuss a use of backpropagation to update or train the values needed. Van de Wiele further teaches that the use of an SGD to update the values is itself, a known means of updating training of a Q neural network – i.e. the use of SGD is a well-known means by which an update rule is applied to a gradient (see 00118).Therefore, one of ordinary skill in the art at the time of filing would have found it obvious to use SGD in training of one or more q-value neural networks. 

Response to Arguments 
Applicant's arguments filed 7 FEB 2022  have been fully considered but they are not persuasive. 
As per the rejection under 35 USC 101, Applicant cites various USPTO guidance on page 3 of the remarks, and then concludes that the claims provide a clear and distinct improvement – however, Applicant does not explicitly state what said improvement is beyond reciting elements of the claimed limitations. Training a model is not per se evidence of an improvement in the functioning of a computer/computing technology. Contrary to Applicant’s recitation of paragraph [093], on page 4, Examiner respectfully submits that Applicant is merely applied techniques to train a model, rather than the training itself embodying any sort of novelty or improvement. Applicant does not supply any additional remarks and as such Examiner is unable to comment further. If Applicant believes other portions of the specification recite such improvements, Applicant is encouraged to include them in the claimed language. 
Applicant’s remarks regarding the prior art rejection, beginning on page 4 of the rejection, are found unpersuasive/moot at least in view of the new grounds of rejection above. Examiner notes that the previously cited references at least imply use of SGD, however, in the interest of compact prosecution the newly cited reference explicitly discloses the use of SGD in updating Q-values from a target set.  
Examiner suggests incorporating elements of the disclosure found in figure 6 and related text to advance prosecution. 
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and 
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.  
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ilana Spar can be reached at 571-270-7537  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. 
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/KATHERINE KOLOSOWSKI-GAGER/Primary Examiner, Art Unit 3622