Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claim 30 recites the limitation "the computing system of claim 7" in lines.  There is insufficient antecedent basis for this limitation in the claim. In the interest of compact prosecution the examiner will rejection claim 30 as if it is dependent on claim 27. 

Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.
Claims 21, 31 and 35 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1-2 and 8 of U.S. Patent No. 10,769,592. Although the claims at issue are not identical, they are not patentably distinct from each other because the US Patent 10,769,592 discloses all the limitations the instant application with the exception that the instant application has the claim limitations of: obtaining input data for a task that the machine learning is optimized to perform, and processing that data to get output data. This would be obvious have been obvious to one ordinary skill in the art before the effective filing date of the instant application as the purpose of the US 10,769,529B2 was the create the optimized machine learning, thus saving and it and using it  to perform a task by using input data and getting output data with obvious would have been obvious because that is exactly what the machine learning was created for.

Instant Application – 17/014,139
US Patent 10,769,529 B2
21. A computing system for utilization of a machine-learned model that is optimized to perform a task, the computing system comprising: 

8. A computing system, comprising:
one or more processors; 
one or more processors; and
an optimized machine-learned model comprising a plurality of optimized parameters, wherein the plurality of optimized parameters have been optimized over a plurality of iterations based at least in part on a gradient of a loss function and an effective learning rate,
determining a gradient of a loss function that evaluates a performance of a machine-learned model that comprises a plurality of parameters;

determining an updated set of values for the plurality of parameters of the machine-learned model based at least in part on the gradient of the loss function and according to the current effective learning rate. 

wherein the effective learning rate is based at least in part on a current learning rate
determining a current effective learning rate based at least in part on the current learning rate control value; and
control value that equals a recent learning rate control value minus an update value, wherein a magnitude of the update value is a function of the gradient of the loss function, and wherein a polarity of the update value is a function of both the gradient of the loss function and the recent learning rate control value; and
determining a current learning rate control value based on the gradient of the loss function, wherein the current learning rate control value equals a most recent learning rate control value minus an update value, wherein a magnitude of the update value is equal to a square of the gradient of the loss function times a scaling coefficient, and wherein a polarity of the update value is a function of both the gradient of the loss function and the most recent learning rate control value;
one or more tangible, non-transitory computer readable media storing computer-readable instructions that when executed by the one or more processors cause the one or more processors to perform operations, the operations comprising: 
one or more non-transitory computer-readable media that store instructions that, when executed by the one or more processors, cause the one or more processors to perform operations, the operations comprising, for each of one or more iterations:
obtaining input data for a task that the optimized machine-learned model is optimized to perform; and 

processing the input data using the optimized machine-learned model to obtain optimized output data associated with the task.




31. A process for the production of an optimized machine-learned model product trained using an adaptive learning algorithm with improved convergence properties and stored on one or more tangible, non-transitory computer readable media, wherein the process comprises the steps of: 
1. A computer-implemented method for optimizing machine-learned models that provides improved convergence properties, the method comprising:
(a) for a plurality of iterations: 
for each of a plurality of iterations:
determining a gradient of a loss function that evaluates a performance of a machine-learned model, wherein the machine-learned model comprises a plurality of parameters; 
determining, by one or more computing devices, a gradient of a loss function that evaluates a performance of a machine-learned model that comprises a plurality of parameters;
determining a current learning rate control value based on the gradient of the loss function, wherein the current learning rate control value equals a most recent learning rate control value minus an update value, wherein a magnitude of the update value is a function of the gradient of the loss function but not the most recent learning rate control value, and wherein a polarity of the update value is a function of both the gradient of the loss function and the most recent learning rate control value; and
determining, by the one or more computing devices, a current learning rate control value based on the gradient of the loss function, wherein the current learning rate control value equals a most recent learning rate control value minus an update value, wherein a magnitude of the update value is a function of the gradient of the loss function but not the most recent learning rate control value, and wherein a polarity of the update value is a function of both the gradient of the loss function and the most recent learning rate control value;
determining an updated set of values for the plurality of parameters of the machine-learned model based at least in part on the gradient of the loss function and according to the current effective learning rate; 

determining, by the one or more computing devices, an updated set of values for the plurality of parameters of the machine-learned model based at least in part on the gradient of the loss function and according to the current effective learning rate; and
wherein, for at least one of the plurality of iterations, the polarity of the update value is positive such that the current learning rate control value is less than the most recent learning rate control value, whereby the current effective learning rate is greater than a most recent effective learning rate; and 

wherein, for at least one of the plurality of iterations, the polarity of the update value is positive such that the current learning rate control value is less than the most recent learning rate control value, whereby the current effective learning rate is greater than a most recent effective learning rate.
(b) storing the optimized machine-learned model product on the one or more tangible, non-transitory computer readable media, wherein the optimized machine-learned model product comprises an optimized version of the machine-learned model comprising a final set of values for the parameters.

providing, by the one or more computing devices, an optimized version of the machine-learned model as an output, the optimized version of the machine-learned model comprising a final set of values for the plurality of parameters



35. A computer-implemented method for utilization of a machine-learned model that is optimized to perform a task, comprising: 


1. A computer-implemented method for optimizing machine-learned models that provides improved convergence properties, the method comprising:
obtaining, by a computing system comprising one or more computing devices, input data for a task that an optimized machine-learned model is optimized to perform, 


wherein the optimized machine-learned model comprises a plurality of optimized parameters, 

determining, by one or more computing devices, a gradient of a loss function that evaluates a performance of a machine-learned model that comprises a plurality of parameters;
wherein the plurality of optimized parameters have been optimized over a plurality of iterations based at least in part on a gradient of a loss function and an effective learning rate, 

for each of a plurality of iterations:

determining, by the one or more computing devices, a current learning rate control value based on the gradient of the loss function,
wherein the effective learning rate is based at least in part on a current learning rate control value that equals a recent learning rate control value minus an update value, 

wherein the current learning rate control value equals a most recent learning rate control value minus an update value, wherein a magnitude of the update value is a function of the gradient of the loss function but not the most recent learning rate control value, and wherein a polarity of the update value is a function of both the gradient of the loss function and the most recent learning rate control value;

determining, by the one or more computing devices, a current effective learning rate based at least in part on the current learning rate control value; and

determining, by the one or more computing devices, an updated set of values for the plurality of parameters of the machine-learned model based at least in part on the gradient of the loss function and according to the current effective learning rate; and


providing, by the one or more computing devices, an optimized version of the machine-learned model as an output, the optimized version of the machine-learned model comprising a final set of values for the plurality of parameters;


wherein, for at least one of the plurality of iterations, the polarity of the update value is positive such that the current learning rate control value is less than the most recent learning rate control value, whereby the current effective learning rate is greater than a most recent effective learning rate.

wherein the update value is equal to a square of the gradient of the loss function multiplied by a sign function applied to the recent learning rate control value minus the square of the gradient of the loss function and multiplied by a scaling coefficient that is equal to one minus an update scaling parameter; and 

2. The computer-implemented method of claim 1, wherein the update value is equal to a square of the gradient of the loss function multiplied by a sign function applied to the most recent learning rate control value minus the square of the gradient of the loss function and multiplied by a scaling coefficient that is equal to one minus an update scaling parameter.
processing, by the computing system, the input data using the optimized machine-learned model to obtain optimized output data associated with the task.  





Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 21-23 and 27-30 are rejected under 35 U.S.C. 103 as being unpatentable over Kingma et al. (Kingma et al. – “ADAM: A Method for Stochastic Optimization”) and further in view of Alistarh et al. (US 2018/0075347 A1 – hereinafter referred to as Alistarh.)

In regards to claim 21, Kingma discloses a computing system for utilization of a machine-learned model that is optimized to perform a task, the computing system comprising: 
one or more processors; 

an optimized machine-learned model comprising a plurality of optimized parameters, wherein the plurality of optimized parameters have been optimized over a plurality of iterations based at least in part on a gradient of a loss function and an effective learning rate, (Kingma page 2 algorithm 1 last step disclose returning a set of updated parameters values, wherein this is the optimized version of the machine learning model with final set of values for the parameters.)

wherein the effective learning rate is based at least in part on a current learning rate (Kingma section 2.1 teaches finding an effective step size which is the effective current learning rated and teaches it is based on current learning rate value.)control value that equals a recent learning rate control value minus an update value, wherein a magnitude of the update value is a function of the gradient of the loss function, and wherein a polarity of the update value is a function of both the gradient of the loss function and the recent learning rate control value; and (Kingma page 2 algorithm 1 and last line of section 2 on page 2.)

However Kingma does not disclose one or more processors; one or more tangible, non-transitory computer readable media storing computer-readable instructions that when executed by the one or more processors cause the one or more processors to perform operations; obtaining input data for a task that the optimized machine-learned model is optimized to perform; and processing the input data using the optimized machine-learned model to obtain optimized output data associated with the task.

Alistarh discloses one or more processors; (Alistarh figure 1 Element 112)  one or more tangible, non-transitory computer readable media storing computer-readable instructions that when executed by the one or more processors cause the one or more processors to perform operations; (Alistarh fig. 1 element 114) obtaining input data for a task that the optimized machine-learned model is optimized to perform and processing the input data using the optimized machine-learned model to obtain optimized output data associated with the task. (Alistarh para. [0034] teaches on the once the machine learning model is trained it is deployed to users, where it saved on user devices, wherein a user can then input an image and get output from the machine learning model in response.)

It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention modify the teachings of Kingma with that of Alistarh in order to allow for storing and using the machine learning model to get output based on the input as both references deal with using machine learning models that user gradient loss functions and the benefit of doing so it allows quick and accurate results by using trained and optimized machine learning models.


In regards to claim 22, Kingma in view of Alistarh disclose the computing system of claim 21, wherein the operations further comprise: performing, based at least in part on the optimized output data, one or more actions associated with the task. (Alistarh para. [0034] teaches carring out task using the machine learning model, wherei task in the action.)

In regards to claim 23, Kingma in view of Alistarh disclose the computing system of claim 21, wherein the operations further comprise: evaluating the loss function, wherein the loss function is configured to evaluate a difference between the optimized output data and a ground truth label associated with the input data.  (Alistarh para. [0019] cites “The training data instance is labeled and so the ground truth output of the neural network is known and the difference or error between the observed output and the ground truth output is found and provides information about a loss function.”)

In regards to claim 27, Kingma in view of Alistarh disclose the computing system of claim 21, wherein the task that the optimized machine- learned model is optimized to perform comprises: an image analysis task; a predictive task; or a classification task. (Alistarh para. [0034] teaches image analysis task, it can also be considered a classification task it classifies the image as a number being 0-9.)


In regards to claim 28, Kingma in view of Alistarh disclose the computing system of claim 27, wherein: the task comprises the image analysis task; the input data comprises image data that depicts one or more objects; and the optimized output data comprises a descriptive annotation of at least one of the one or more objects.  (Alistarh para. [0034] teaches image analysis task, the image data captured is input into the machine learning model, wherein the image data depicts an object (number) and the system outputs data descriptive of object, what number it is.)

In regards to claim 29, Kingma in view of Alistarh disclose the computing system of claim 27, wherein: the task comprises a classification task; the input data comprises data descriptive of an entity; and the optimized output data comprises a classification of the entity.  (Alistarh para. [0034] teaches classification task, the image data captured is input into the machine learning model, wherein the image data depicts an object (number) and the system outputs data descriptive of object classifying it as a number that is 0-9, what number it is. Also see para. [0026] that teachings classifying hand written images.)

In regards to claim 30, Kingma in view of Alistarh disclose the computing system of claim 7, wherein: the task comprises a prediction task; the input data comprises sensor data from one or more sensors; and the optimized output data comprises a decision. (Alistarh para. [0034] teaches capturing an image, this would indicate a sensor that captures image such as a camera, that is input into the machine learning model and the a decision on what number is the image is output, as such it prediction task is performed.) 



Allowable Subject Matter
Claims 24-26, 32-34 and 36-40 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to PAULINHO E SMITH whose telephone number is (571)270-1358. The examiner can normally be reached Mon-Fri. 10AM-6PM CST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Abdullah Kawsar can be reached on 571-270-3169. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/PAULINHO E SMITH/Primary Examiner, Art Unit 2127