Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Reasons for Allowance
The following is an examiner’s statement of reasons for allowance: Claim(s) 1, 12 and 19 are considered allowable since when reading the claims in light of the specification, as per MPEP § 2111.01, none of the references of record either alone or in combination fairly disclose or suggest the combination of limitations specified in the independent claims, including at least:

In Claim 1
comparing, by the one or more computing devices, the candidate learning rate control value to a maximum previously observed learning rate control value, wherein the maximum previously observed learning rate control value is a maximum value of all previously determined candidate learning rate control values for the machine-learned model;
in response to determining that the candidate learning rate control value is greater than the maximum previously observed learning rate control value,
setting the maximum previously observed learning rate control value equal to the candidate learning rate control value;
setting  a current learning rate control value equal to the maximum previously observed learning rate control value;
determining, by the one or more computing devices, a current learning rate based at least in part on the current learning rate control value;
 training the machine-learned model, by the one or more computing devices, based on an updated set of values for the plurality of parameters of the machine-learned model,
wherein the determining the updated set of values is based at least in part on the gradient of the loss function and the current learning rate.

In claim 12
selecting a minimum of the candidate learning rate and a minimum previously observed learning rate to serve as a current learning rate based on a maximum previously observed learning rate control value that is a maximum value of all previously determined candidate learning rate control values for the machine-learned model; 
 training the machine learned model by updating at least one of the plurality of parameters of the machine-learned model based at least in part on the gradient of the loss function and to the current learning rate.
	
In claim 19
selecting a maximum of the candidate learning rate and a maximum previously observed learning rate to as a current learning rate, wherein the maximum previously observed learning rate control value is a maximum value of all previously determined candidate learning rate control values for the machine-learned model;
 training the machine learned model by updating at least one of the plurality of parameters of the machine-learned model based at least in part on the gradient of the loss function and to the current learning rate.

The closest prior art of record is Kingma et al “Adam: a method for stochastic optimization.” Kingma teaches an algorithm, AdaMax, which sets the current learning rate control value according to a maximum function of two parameters. Next, the current learning rate control value is used to determine a current learning rate. Training is performed according to the gradient and the determined current learning rate. However regarding the two parameters addressed above, Kingma does not teach that the claimed previously observed learning rate control value is a maximum of all previously determined candidate learning rate control values.
Furthermore, the instant application proposes that the invention solves convergence issues present in similar gradient descent algorithms. The instant application notes in para. 0060-0061 “One aim is to devise a new strategy with guaranteed convergence while preserving the practical benefits of Adam and RMSprop” this is achieved explicitly by the inventors by the following “One key difference of AMSGrad with Adam is that it maintains the maximum of all vt until the present time step and uses this maximum value for normalizing the running average of the gradient instead of vt in Adam. By doing this, AMSGrad results in a non-increasing step size and avoids the pitfalls of Adam and RMSprop ie Γt ≥ 0 … even with constant β2”. The steps recited in the claim ensure that the current learning rate control value is always greater than the previous iteration, which is used to derive a step size or current learning rate which is non-increasing.
Further still, Huang et al “Nostalgic Adam: Weighing more of the past gradients when designing the adaptive learning rate” solves the same issue. Huang notes the following on pg 2 “Algorithms that used exponential moving average to estimate the second moment of gradient cannot guarantee positive semi-definiteness of the quantity Γt… In this paper, we address the same issue as in Reddi et al… and provide another approach to utilize the past gradients in the design of the adaptive learning rate. We propose a new family of algorithms called Nostalgic Adam” (pg 2). The algorithm proposed by Huang however does not use a memory of previous learning rate control values to ensure that step size is non increasing, instead Huang uses a carefully selected function for β2 which is not constant in contrast to the instant application. It would not have been obvious to one of ordinary skill in the art before the effective filing date to combine these references to teach at least the limitations above. In particular neither reference teaches determining a current learning rate based on “all previously determined candidate learning rate control values”.
With respect to claim 12 above
Similar to claim 1 the minimum function selects or compares the candidate learning rate control value to the previously observed learning rate control value to derive the current learning rate. The previously observed learning rate control value is based on a maximum of all previously determine candidate learning rate control values.
With respect to claim 19 above
Similar to claim 1 the maximum function selects or compares the candidate learning rate control value to the previously observed learning rate control value to derive the current learning rate. The previously observed learning rate control value is based on a maximum of all previously determine candidate learning rate control values.

Dependent Claims 2-11, 13-18 and 20 are allowed as they depend upon an allowable independent claim.  
Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee.  Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance.”

Conclusion
Claims 1-20 are allowed.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JOHNATHAN R GERMICK whose telephone number is (571)272-8363. The examiner can normally be reached on Monday-Friday 7:30 am – 4:00 pm (EST).
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kakali Chaki, can be reached at telephone number 5712723719. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://portal.uspto.gov/external/portal. Should you have questions about access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.


/J.R.G./Examiner, Art Unit 2122  


/KAKALI CHAKI/Supervisory Patent Examiner, Art Unit 2122