Detailed Action
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claims 1-7 are pending for examination. Claims 1, 6, and 7 are independent.

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 10/08/2018 and 10/23/2019.  The submission is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-7 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 

When considering subject matter eligibility under 35 U.S.C. 101, it must be determined whether the claim is directed to one of the four statutory categories of invention, i.e., process, machine, manufacture, or composition of matter (Step 1). If the claim does fall within one of the statutory categories, the second step in the analysis is to determine whether the claim is directed to a judicial exception (Step 2A). The Step 2A analysis is broken into two prongs. In the first prong (Step 2A, Prong 1), it is determined 2019 PEG for more details of the analysis.

Step 1
According to the first part of the analysis, in the instant case, claims 1-5 are directed to a learning apparatus, claim 5 is directed to a method, and claim 7 is directed to a non-transitory computer-readable recording medium. Thus, each of the claims falls within one of the four statutory categories (i.e. process, machine, manufacture, or composition of matter).

Step 2A, Prong 1
Following the determination of whether or not the claims fall within one of the four categories (Step 1), it must be determined if the claims recite a judicial exception (e.g. 
	
	Regarding Claims 1, 6, and 7
calculate a first-order gradient in the stochastic gradient descent (This step appears to be mere calculation which is understood to be a recitation of mathematical calculations.);
calculate a statistic of the first-order gradient (This step appears to be mere calculation which is understood to be a recitation of mathematical calculations.); 
remove an initialization bias when calculating the statistic of the first-order gradient from the statistic of the first-order gradient calculated (This step appears to be removing a bias vale and could be practically implementable in the human mind and is understood to be a recitation of a mental process and math.); 
adjust a learning rate by dividing the learning rate by standard deviation of the first-order gradient based on the statistic of the first- order gradient (This step appears to be mere calculation which is understood to be a recitation of mathematical calculations.); and
update a parameter of a learning model using the learning rate adjusted (This step appears to be updated a model with the calculated parameters and could be practically implementable in the human mind and is understood to be a recitation of a mental process and math.).

Step 2A, Prong 2
Following the determination that the claims recite a judicial exception, it must be determined if the claims recite additional elements that integrate the exception into a practical application of the exception (Step 2A, Prong 2). In this case, after considering all claim elements individually and as an ordered combination, it is determined that the claims do not include additional elements that integrate the exception into a practical application of the exception as explained below.
	
Regarding Claims 1, 6, and 7
	A learning apparatus (The “learning apparatus” and other hardware components are understood to be generic computer equipment. See MPEP 2106.05(f).);
	a processor (The “processor” and other hardware components are understood to be generic computer equipment. See MPEP 2106.05(f).);
A non-transitory computer-readable recording medium storing therein a learning program that causes a computer to execute a process (The “non-transitory computer-readable recording medium” and other hardware components are understood to be generic computer equipment. See MPEP 2106.05(f).);

Step 2B
Based on the determination in Step 2A of the analysis that the claims are directed to a judicial exception, it must be determined if the claims contain any element or combination of elements sufficient to ensure that the claim amounts to significantly 

Regarding Claims 1, 6, and 7
	A learning apparatus (The “learning apparatus” and other hardware components are understood to be generic computer equipment. See MPEP 2106.05(f).);
	a processor (The “processor” and other hardware components are understood to be generic computer equipment. See MPEP 2106.05(f).);
A non-transitory computer-readable recording medium storing therein a learning program that causes a computer to execute a process (The “non-transitory computer-readable recording medium” and other hardware components are understood to be generic computer equipment. See MPEP 2106.05(f).);

Step 2A, Prong 1 for Dependent Claims

Regarding Claim 2
wherein the processor is further configured to: calculate an approximate value of a moving average of the first-order gradient and a moving average of variance of the (Excluding the recitation of generic computer equipment (“processor”) See MPEP 2106.05(f), this step appears to be mere calculation which is understood to be a recitation of mathematical calculations.), and 
adjust the learning rate by calculating a product of the learning rate and a value obtained by dividing the approximate value of the moving average of the first-order gradient by the standard deviation of the first-order gradient 5Application No. 16/092,135Preliminary Amendmentthat is a square root of the moving average of the variance of the first-order gradient (This step appears to be mere calculation which is understood to be a recitation of mathematical calculations.).

Step 2A, Prong 2 

Regarding Claim 2
processor (The “processor” and other hardware components are understood to be generic computer equipment. See MPEP 2106.05(f).)

Step 2B 

Regarding Claim 2
processor (The “processor” and other hardware components are understood to be generic computer equipment. See MPEP 2106.05(f).)

Step 2A, Prong 1 

Regarding Claim 3
The learning apparatus according to claim 1, wherein the processor is further configured to: 
calculate a moving average of the first-order gradient  and a moving average of variance of the first-order gradient as the statistics of the first-order gradient (Excluding the recitation of generic computer equipment (“processor”) See MPEP 2106.05(f), this step appears to be mere calculation which is understood to be a recitation of mathematical calculations.), and 
adjust the learning rate by calculating a product of the learning rate and a value obtained by dividing the first-order gradient by the standard deviation of the first-order gradient that is a square root of the moving average of the variance of the first-order gradient (This step appears to be mere calculation which is understood to be a recitation of mathematical calculations.).

Step 2A, Prong 2 

Regarding Claim 3
processor (The “processor” and other hardware components are understood to be generic computer equipment. See MPEP 2106.05(f).)

Step 2B 

Regarding Claim 3
processor (The “processor” and other hardware components are understood to be generic computer equipment. See MPEP 2106.05(f).)

Step 2A, Prong 1 

Regarding Claim 4
wherein the processor is further configured to:  
remove an initialization bias of the approximate value of the moving average of the first-order gradient by dividing the approximate value of the moving average of the first-order gradient by a value obtained by subtracting a weight in calculating the moving average of the first-order gradient from one (Excluding the recitation of generic computer equipment (“processor”) See MPEP 2106.05(f), this step appears to be removing a bias vale and could be practically implementable in the human mind and is understood to be a recitation of a mental process and math calculation.), and 
remove an initialization bias of the approximate value of the moving average of the variance of the first-order gradient by dividing the moving average of the variance of the first-order gradient by a value obtained by subtracting a weight in calculating the moving average of the variance of the first-order gradient from one (This step appears to be removing a bias vale and could be practically implementable in the human mind and is understood to be a recitation of a mental process and math calculation.).

Step 2A, Prong 2 

Regarding Claim 4
processor (The “processor” and other hardware components are understood to be generic computer equipment. See MPEP 2106.05(f).)

Step 2B 

Regarding Claim 4
processor (The “processor” and other hardware components are understood to be generic computer equipment. See MPEP 2106.05(f).)

Step 2A, Prong 1 

Regarding Claim 5
wherein the processor is further configured to remove the initialization bias of the approximate value of the moving average of the variance of the first- order gradient by dividing the moving average of the variance of the first-order gradient by a value obtained by subtracting a weight in calculating the moving average of the variance of the first-order gradient from one (Excluding the recitation of generic computer equipment (“processor”) See MPEP 2106.05(f), this step appears to be removing a bias vale and could be practically implementable in the human mind and is understood to be a recitation of a mental process and math calculation.).

Step 2A, Prong 2 

Regarding Claim 5
processor (The “processor” and other hardware components are understood to be generic computer equipment. See MPEP 2106.05(f).)

Step 2B 

Regarding Claim 5
processor (The “processor” and other hardware components are understood to be generic computer equipment. See MPEP 2106.05(f).)

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim 1-7 is/are rejected under 35 U.S.C. 103 as being unpatentable over Kingma et al. ("ADAM: A METHOD FOR STOCHASTIC OPTIMIZATION", hereinafter "Kingma") in view of He et al. (US 2016/0379112, hereinafter "He").

Regarding Claim 1
Kingma discloses: A learning apparatus that performs learning using a stochastic gradient descent method in machine learning ([Introduction second Para] “We propose Adam, a method for efficient stochastic optimization that only requires first-order gradients with little memory requirement. The method computes individual adaptive learning rates for different parameters from estimates of first and second moments of the gradients;”),:  
calculate a first-order gradient in the stochastic gradient descent method ([Algorithm 1 on page 2 and Section 2] “gt ← ∇θft(θt−1) (Get gradients w.r.t. stochastic objective at timestep t)” Examiner reads gt as the calculated first order gradient.);
calculate a statistic of the first-order gradient ([Algorithm 1 on page 2 and Section 2] “mt ← β1·mt−1+(1 − β1)·gt  (Update biased first moment estimate)” Examiner reads the moving average (i.e. mt) as a statistic of the first-order gradient (i.e. gt). vt in Algorithm 1 is also a statistic of the first-order gradient ); 
remove an initialization bias when calculating the statistic of the first-order gradient from the statistic of the first-order gradient calculated ([Algorithm 1 on page 2 and Section 2] “                        
                            
                                
                                    
                                        
                                            m
                                        
                                        ^
                                    
                                
                                
                                    t
                                
                            
                            ←
                            
                                
                                    m
                                
                                
                                    t
                                
                            
                            /
                            (
                            1
                            -
                            
                                
                                    β
                                
                                
                                    1
                                
                                
                                    t
                                
                            
                            )
                        
                     (Compute bias-corrected first moment estimate)” Examiner reads corrected bias                         
                            
                                
                                    
                                        
                                            m
                                        
                                        ^
                                    
                                
                                
                                    t
                                
                            
                        
                     as removing bias calculating statistic of the first-order gradient (i.e. mt) from the statistic of the first-order gradient calculated (i.e. gt).                        
                             
                            
                                
                                    
                                        
                                            v
                                        
                                        ^
                                    
                                
                                
                                    t
                                
                            
                        
                     also reads on the limitation. [Section 3], Examiner interprets section 3 as also disclosing removing initialization bias when calculating the statistic.); 
adjust a learning rate by dividing the learning rate by standard deviation of the first-order gradient based on the statistic of the first- order gradient ([Algorithm 1 on page 2 and Section 2] “                        
                            
                                
                                    θ
                                
                                
                                    t
                                
                            
                            ←
                            
                                
                                    θ
                                
                                
                                    t
                                    -
                                    1
                                
                            
                            -
                            α
                            *
                            
                                
                                    m
                                
                                ^
                            
                            /
                            (
                            
                                
                                    
                                        
                                            
                                                v
                                            
                                            ^
                                        
                                    
                                    
                                        t
                                    
                                
                            
                            +
                            ϵ
                            )
                        
                     (Update parameters)” Examiner interprets                         
                            α
                        
                     as the learning rate that is adjusted. The formula for                         
                            
                                
                                    θ
                                
                                
                                    t
                                
                            
                        
                     dividing the learning rate by standard deviation is similar to Applicants specification (para 0042-0043 and Formula 7 of Spec).); and
update a parameter of a learning model using the learning rate adjusted ([Algorithm 1 on page 2 and Section 2] “                        
                            
                                
                                    θ
                                
                                
                                    t
                                
                            
                            ←
                            
                                
                                    θ
                                
                                
                                    t
                                    -
                                    1
                                
                            
                            -
                            α
                            *
                            
                                
                                    m
                                
                                ^
                            
                            /
                            (
                            
                                
                                    
                                        
                                            
                                                v
                                            
                                            ^
                                        
                                    
                                    
                                        t
                                    
                                
                            
                            +
                            ϵ
                            )
                        
                     (Update parameters)” The algorithm states that the parameters are updated (i.e. θ) using the learning rate adjusted).
Kingma does not explicitly discloses: the learning apparatus comprising: 
a processor configured to;
a processor configured to ([Para 0029 and Fig 1] “computer-readable media 114 can store instructions executable by the processing unit(s) 112 that, as discussed above, can represent a processing unit incorporated in computing device 102.”);
It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the Algorithm disclosed by Kigma with the processing unit taught by He. One of ordinary skill in the art would have been motivated to make this modification in order to provide a computer environment to operate training or use methods to be performed (Para 0019, He).

Regarding Claim 6
Kingma discloses: A learning method that performs learning using a stochastic gradient descent method in machine learning ([Introduction second Para] “We propose Adam, a method for efficient stochastic optimization that only requires first-order gradients with little memory requirement. The method computes individual adaptive learning rates for different parameters from estimates of first and second moments of the gradients;”), the learning method comprising: 
calculating a first-order gradient in the stochastic gradient descent method ([Algorithm 1 on page 2 and Section 2] “gt ← ∇θft(θt−1) (Get gradients w.r.t. stochastic objective at timestep t)” Examiner reads gt as the calculated first order gradient.); 
([Algorithm 1 on page 2 and Section 2] “mt ← β1·mt−1+(1 − β1)·gt  (Update biased first moment estimate)” Examiner reads the moving average (i.e. mt) as a statistic of the first-order gradient (i.e. gt). vt in Algorithm 1 is also a statistic of the first-order gradient ); 
removing an initialization bias when calculating the statistic of the first-order gradient in calculation of the statistic from the statistic of the first-order gradient ([Algorithm 1 on page 2 and Section 2] “                        
                            
                                
                                    
                                        
                                            m
                                        
                                        ^
                                    
                                
                                
                                    t
                                
                            
                            ←
                            
                                
                                    m
                                
                                
                                    t
                                
                            
                            /
                            (
                            1
                            -
                            
                                
                                    β
                                
                                
                                    1
                                
                                
                                    t
                                
                            
                            )
                        
                     (Compute bias-corrected first moment estimate)” Examiner reads corrected bias                         
                            
                                
                                    
                                        
                                            m
                                        
                                        ^
                                    
                                
                                
                                    t
                                
                            
                        
                     as removing bias calculating statistic of the first-order gradient (i.e. mt) from the statistic of the first-order gradient calculated (i.e. gt).                        
                             
                            
                                
                                    
                                        
                                            v
                                        
                                        ^
                                    
                                
                                
                                    t
                                
                            
                        
                     also reads on the limitation. [Section 3], Examiner interprets section 3 as also disclosing removing initialization bias when calculating the statistic.); 
adjusting a learning rate by dividing the learning rate by standard deviation of the first-order gradient based on the statistic of the first-order gradient, ([Algorithm 1 on page 2 and Section 2] “                        
                            
                                
                                    θ
                                
                                
                                    t
                                
                            
                            ←
                            
                                
                                    θ
                                
                                
                                    t
                                    -
                                    1
                                
                            
                            -
                            α
                            *
                            
                                
                                    m
                                
                                ^
                            
                            /
                            (
                            
                                
                                    
                                        
                                            
                                                v
                                            
                                            ^
                                        
                                    
                                    
                                        t
                                    
                                
                            
                            +
                            ϵ
                            )
                        
                     (Update parameters)” Examiner interprets                         
                            α
                        
                     as the learning rate that is adjusted. The formula for dividing the learning rate by standard deviation is similar to Applicants specification (para 0042-0043 and Formula 7 of Spec).); and 
updating a parameter of a learning model using the learning rate adjusted in the adjustment ([Algorithm 1 on page 2 and Section 2] “                        
                            
                                
                                    θ
                                
                                
                                    t
                                
                            
                            ←
                            
                                
                                    θ
                                
                                
                                    t
                                    -
                                    1
                                
                            
                            -
                            α
                            *
                            
                                
                                    m
                                
                                ^
                            
                            /
                            (
                            
                                
                                    
                                        
                                            
                                                v
                                            
                                            ^
                                        
                                    
                                    
                                        t
                                    
                                
                            
                            +
                            ϵ
                            )
                        
                     (Update parameters)” The algorithm states that the parameters are updated (i.e. θ) using the learning rate adjusted).
 by a processor;
However, He discloses in the same field of endeavor: A learning method executed by a learning apparatus; by a processor ([Para 0019 and Fig 1] “FIG. 1 shows an example environment 100 in which examples of computational model training systems”);
It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the Algorithm disclosed by Kigma with the processing unit taught by He. One of ordinary skill in the art would have been motivated to make this modification in order to provide a computer environment to operate training or use methods to be performed (Para 0019, He).

Regarding Claim 7
Kingma discloses: calculating a first-order gradient in a stochastic gradient descent method in a case where learning is executed using the stochastic gradient descent method in machine learning ([Algorithm 1 on page 2 and Section 2] “gt ← ∇θft(θt−1) (Get gradients w.r.t. stochastic objective at timestep t)” Examiner reads gt as the calculated first order gradient.); 
calculating a statistic of the first-order gradient ([Algorithm 1 on page 2 and Section 2] “mt ← β1·mt−1+(1 − β1)·gt  (Update biased first moment estimate)” Examiner reads the moving average (i.e. mt) as a statistic of the first-order gradient (i.e. gt). vt in Algorithm 1 is also a statistic of the first-order gradient ); removing an initialization bias used in calculating the statistic of the first-7Application No. 16/092,135Preliminary Amendmentorder at the calculating the statistic from the statistic of the first-order gradient ([Algorithm 1 on page 2 and Section 2] “                        
                            
                                
                                    
                                        
                                            m
                                        
                                        ^
                                    
                                
                                
                                    t
                                
                            
                            ←
                            
                                
                                    m
                                
                                
                                    t
                                
                            
                            /
                            (
                            1
                            -
                            
                                
                                    β
                                
                                
                                    1
                                
                                
                                    t
                                
                            
                            )
                        
                     (Compute bias-corrected first moment estimate)” Examiner reads corrected bias                         
                            
                                
                                    
                                        
                                            m
                                        
                                        ^
                                    
                                
                                
                                    t
                                
                            
                        
                     as removing bias calculating statistic of the first-order gradient (i.e. mt) from the statistic of the first-order gradient calculated (i.e. gt).                        
                             
                            
                                
                                    
                                        
                                            v
                                        
                                        ^
                                    
                                
                                
                                    t
                                
                            
                        
                     also reads on the limitation. [Section 3], Examiner interprets section 3 as also disclosing removing initialization bias when calculating the statistic.); 
adjusting a learning rate by dividing the learning rate by standard deviation of the first-order gradient based on the statistic of the first-order gradient ([Algorithm 1 on page 2 and Section 2] “                        
                            
                                
                                    θ
                                
                                
                                    t
                                
                            
                            ←
                            
                                
                                    θ
                                
                                
                                    t
                                    -
                                    1
                                
                            
                            -
                            α
                            *
                            
                                
                                    m
                                
                                ^
                            
                            /
                            (
                            
                                
                                    
                                        
                                            
                                                v
                                            
                                            ^
                                        
                                    
                                    
                                        t
                                    
                                
                            
                            +
                            ϵ
                            )
                        
                     (Update parameters)” Examiner interprets                         
                            α
                        
                     as the learning rate that is adjusted. The formula for dividing the learning rate by standard deviation is similar to Applicants specification (para 0042-0043 and Formula 7 of Spec).); and 
updating a parameter of a learning model using the learning rate adjusted at the adjusting ([Algorithm 1 on page 2 and Section 2] “                        
                            
                                
                                    θ
                                
                                
                                    t
                                
                            
                            ←
                            
                                
                                    θ
                                
                                
                                    t
                                    -
                                    1
                                
                            
                            -
                            α
                            *
                            
                                
                                    m
                                
                                ^
                            
                            /
                            (
                            
                                
                                    
                                        
                                            
                                                v
                                            
                                            ^
                                        
                                    
                                    
                                        t
                                    
                                
                            
                            +
                            ϵ
                            )
                        
                     (Update parameters)” The algorithm states that the parameters are updated (i.e. θ) using the learning rate adjusted).
Kingma does not explicitly discloses: A non-transitory computer-readable recording medium storing therein a learning program that causes a computer to execute a process comprising: 
However, He discloses in the same field of endeavor: A non-transitory computer-readable recording medium storing therein a learning program that causes a computer to execute a process comprising: ([Para 0137 and Fig 1] “computer-readable medium, e.g., a computer storage medium, having thereon computer-executable instructions, the computer-executable instructions upon execution configuring a computer to perform operations…”)
It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the Algorithm disclosed by Kigma with the processing unit taught by He. One of ordinary skill in the art would have been motivated to make this modification in order to provide a computer environment to operate training or use methods to be performed (Para 0019, He).

Regarding Claim 2
Kigma in view of He discloses: The learning apparatus according to claim 1, wherein the processor is further configured to:  
calculate an approximate value of a moving average of the first-order gradient ([Algorithm 1 on page 2 and Section 2], Kingma “mt ← β1·mt−1+(1 − β1)·gt  (Update biased first moment estimate)” Examiner reads the moving average as mt.) and a moving average of variance of the first-order gradient as the statistics of the first-order gradient (“[Algorithm 1 on page 2 and Section 2], Kingma vt ← β2 · vt−1 + (1 − β2) · gt2 (Update biased second raw moment estimate)” Examiner reads the moving average of variance as vt.), and 
adjust the learning rate by calculating a product of the learning rate and a value obtained by dividing the approximate value of the moving average of the first-order gradient by the standard deviation of the first-order gradient 5Application No. 16/092,135Preliminary Amendmentthat is a square root of the moving average of the variance of the first-order gradient ([Algorithm 1 on page 2 and Section 2], Kingma “                        
                            
                                
                                    θ
                                
                                
                                    t
                                
                            
                            ←
                            
                                
                                    θ
                                
                                
                                    t
                                    -
                                    1
                                
                            
                            -
                            α
                            *
                            
                                
                                    m
                                
                                ^
                            
                            /
                            (
                            
                                
                                    
                                        
                                            
                                                v
                                            
                                            ^
                                        
                                    
                                    
                                        t
                                    
                                
                            
                            +
                            ϵ
                            )
                        
                     (Update parameters)” Examiner interprets                         
                            α
                        
                     as the learning rate).

Regarding Claim 3
Kigma in view of He discloses: The learning apparatus according to claim 1, wherein the processor is further configured to: ([Fig 1], He)  
calculate a moving average of the first-order gradient ([Algorithm 1 on page 2 and Section 2], Kingma “mt ← β1·mt−1+(1 − β1)·gt  (Update biased first moment estimate)” Examiner reads the moving average as mt.) and a moving average of variance of the first-order gradient as the statistics of the first-order gradient ([Algorithm 1 on page 2 and Section 2], Kingma “vt ← β2 · vt−1 + (1 − β2) · gt2 (Update biased second raw moment estimate)” Examiner reads the moving average of variance as vt.), and 
adjust the learning rate by calculating a product of the learning rate and a value obtained by dividing the first-order gradient by the standard deviation of the first-order gradient that is a square root of the moving average of the variance of the first-order gradient ([Algorithm 1 on page 2 and Section 2], Kingma “                        
                            
                                
                                    θ
                                
                                
                                    t
                                
                            
                            ←
                            
                                
                                    θ
                                
                                
                                    t
                                    -
                                    1
                                
                            
                            -
                            α
                            *
                            
                                
                                    m
                                
                                ^
                            
                            /
                            (
                            
                                
                                    
                                        
                                            
                                                v
                                            
                                            ^
                                        
                                    
                                    
                                        t
                                    
                                
                            
                            +
                            ϵ
                            )
                        
                     (Update parameters)” Examiner interprets                         
                            α
                        
                     as the learning rate).

Regarding Claim 4
Kigma in view of He discloses: The learning apparatus according to claim 2, wherein the processor is further configured to: ([Fig 1], He)  
remove an initialization bias of the approximate value of the moving average of the first-order gradient by dividing the approximate value of the moving average of the first-order gradient by a value obtained by subtracting a weight in calculating the moving average of the first-order gradient from one ([Algorithm 1 on page 2 and Section 2], Kingma “                        
                            
                                
                                    
                                        
                                            m
                                        
                                        ^
                                    
                                
                                
                                    t
                                
                            
                            ←
                            
                                
                                    m
                                
                                
                                    t
                                
                            
                            /
                            (
                            1
                            -
                            
                                
                                    β
                                
                                
                                    1
                                
                                
                                    t
                                
                            
                            )
                        
                     (Compute bias-corrected first moment estimate)” Examiner reads corrected bias                         
                            
                                
                                    
                                        
                                            m
                                        
                                        ^
                                    
                                
                                
                                    t
                                
                            
                        
                     as removing bias of the moving average (i.e. mt) divided by a  value obtained by subtracting a weight in calculating the moving average of the first-order gradient (i.e. β) from one (i.e. 1-β).), and 
remove an initialization bias of the approximate value of the moving average of the variance of the first-order gradient by dividing the moving average of the variance of the first-order gradient by a value obtained by subtracting a weight in calculating the moving average of the variance of the first-order gradient from one ([Algorithm 1 on page 2 and Section 2], Kingma “                        
                            
                                
                                    
                                        
                                            v
                                        
                                        ^
                                    
                                
                                
                                    t
                                
                            
                            ←
                            
                                
                                    v
                                
                                
                                    t
                                
                            
                            /
                            (
                            1
                            -
                            
                                
                                    β
                                
                                
                                    1
                                
                                
                                    t
                                
                            
                            )
                        
                     (Compute bias-corrected first moment estimate)” Examiner reads corrected bias                         
                            
                                
                                    
                                        
                                            v
                                        
                                        ^
                                    
                                
                                
                                    t
                                
                            
                        
                     as removing bias of the moving average of variance (i.e. vt) divided by a value obtained by subtracting a weight in calculating the moving average of the first-order gradient (i.e. β) from one (i.e. 1-β).).

Regarding Claim 5
Kigma in view of He discloses: The learning apparatus according to claim 3, wherein the processor([Fig 1], He) is further configured to remove the initialization bias of the approximate value of the moving average of the variance of the first- order gradient by dividing the moving average of the variance of the first-order gradient by a ([Algorithm 1 on page 2 and Section 2], Kingma “                        
                            
                                
                                    
                                        
                                            v
                                        
                                        ^
                                    
                                
                                
                                    t
                                
                            
                            ←
                            
                                
                                    v
                                
                                
                                    t
                                
                            
                            /
                            (
                            1
                            -
                            
                                
                                    β
                                
                                
                                    1
                                
                                
                                    t
                                
                            
                            )
                        
                     (Compute bias-corrected first moment estimate)” Examiner reads corrected bias                         
                            
                                
                                    
                                        
                                            v
                                        
                                        ^
                                    
                                
                                
                                    t
                                
                            
                        
                     as removing bias of the moving average of variance (i.e. vt) divided by a value obtained by subtracting a weight in calculating the moving average of the first-order gradient (i.e. β) from one (i.e. 1-β).).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Sinyavsky et al. (US 20130325774, hereinafter "Sinyavsky") similarly discloses adjusting control parameters according to a learning stochastic calculation. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to TEWODROS E MENGISTU whose telephone number is (571)270-7714. The examiner can normally be reached Mon-Fri 9:30-5:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, ABDULLAH KAWSAR can be reached on (571)270-3169. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.






/TEWODROS E MENGISTU/Examiner, Art Unit 2127                                                                                                                                                                                                        

/ABDULLAH AL KAWSAR/Supervisory Patent Examiner, Art Unit 2127