DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 7-12, and 17-18 is rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor, or for pre-AIA  the applicant regards as the invention.

Claim 7 lines 9, line 11, and lines 19-20 recite performing a/the floating-point multiplication.  It is unclear if each recite the same floating-point multiplication operation (including values operated on) or different floating-point multiplication operations.  If different operations, Examiner suggests differentiating each operation such as “first floating-point multiplication operation”, “second floating-point operation”, “third floating-point operation” or equivalent. Claim 9 similarly recites a/the floating-point multiplication.  Claim 12 similarly recites a/the floating-point multiplication. Claim 19 similarly recites a/the floating-point multiplication. Claim 8 inherits the same deficiency as claim 7 by reason of dependence. Claims 10-11 inherit the same deficiency as claim 9 by reason of dependence. Claim 20 inherits the same deficiency as claim 19 by reason of dependence.  

Claim 10 recites “repeating a Newton-Raphson method’.  It is unclear what limitations are repeated as there is no prior recitation of a Newton-Raphson method. 

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-20 are rejected under 35 U.S.C. § 101 because the claimed invention is directed to a judicial exception (i.e., a law of nature, a natural phenomenon, or an abstract idea) without significantly more.  

Regarding claim 1, the under the Alice framework Step 1, the claim falls within the four statutory categories of patentable subject matter identified by 35 USC 101: a process, machine, manufacture or composition of matter. 
 Under the Alice framework Step 2A prong 1, the claim recites Mathematical Concepts.  Claim 1 recites applying a Gradient Descent algorithm by transforming a cumulative change function of a gradient for an error function into an inverse square root function in the Gradient Descent algorithm and operating an inverse square root approximate value by using a Newton-Raphson method for the inverse square root function. 
A Gradient Descent algorithm is a mathematical algorithm that obtains a gradient of a cost function through the use of partial differential equations, using model parameters and updating the model parameters in the direction of the gradient. See specification [00102], [00171-00205], and figures 2-4.  Equations to apply the partial differential include an inverse square root and application of a Newton-Raphson mathematical method.  See specification [00193-00197].  Specific implementation of the method as claimed further includes mathematical concepts: division by 2, multiplication, inverse square root of integer and floating point numbers. See specification [00206-00209].  For these reasons, the claim recites Mathematical Concepts.
Under the Alice framework Step 2A prong 2 analysis, the claim recites the following additional elements: a learning apparatus to optimize a neural network, the learning apparatus comprising: an input interface configured to obtain input data or training data; a memory configured to store the input data, the training data, and a neural network model for deep learning; and a learning processor configured to apply a Gradient Descent algorithm to the neural network model. The claim does no more than generally link the additional elements to the mathematical concepts, the application of a Gradient Descent algorithm in a manner that in effect merely recites “apply it” on a computer comprising a processor, an input interface and a memory.  Furthermore the claim merely applies the mathematical concept to a particular technological environment, a learning apparatus for the intended result of optimizing a neural network.  At most, these elements merely recite an insignificant extra-solution activity. For these reasons the claims are not integrated into a practical application.
Moreover, under the Alice Framework Step 2B analysis, the claims, considered individually and as an ordered combination does not include additional elements that are sufficient to amount to significantly more than the abstract idea. As discussed in the Step 2A prong 2 analysis, the claim either merely generally links an additional element to the math, by applying the math in a computer, particular technological environment or recites insignificant extra-solution activity.  The innovative concept is in the mathematical concepts, i.e., the specific gradient descent algorithm applied.  Furthermore, these elements are well understood, routine and conventional activities commonly performed in computing systems.  See, e.g. D.A. Patterson et al., Computer Organization and Design: The Hardware/Software Interface, Elsevier, Ch 1, and Ch 3, 2007 (hereinafter “Patterson”).  See chapter 1, specifically figure 1.5 which depicts an input interface configured to obtain input data, a memory storing the input data, and a processor operating on the data.  See also chapter 3, which discloses various approaches for configuring a processor to perform arithmetic functions. For these reasons claim 1, when considering the mathematical concepts as a whole in combination with these generally applied and/or well understood, routine, and conventional activities,  does not amount to significantly more than the abstract idea. 

Claims 23 and 24 are rejected for at least the reasons cited with respect to claim 22.  Under the Step 2A prong 1 analysis, in addition to the mathematical concepts recited in claim 22, both claims 23 and 24 recite further mathematical steps to execute third (claim 23, 24) and fourth Jacobi variants on a smaller problem size, i.e., a subproblem of the first subproblem and pass results back (claim 24).  Under the step 2A prong 2 analysis, additional elements third kernel method, call by the second kernel method (claim 23, 24), and fourth kernel method are insignificant extra solution activities consistent with the claim 22 analysis.  For this reason the claims are not integrated into a practical application.  Under the step 2B analysis these additional elements when considered as a whole in combination with all limitations comprise well-understood, routine, and conventional activities.  The innovative concept remains within the mathematical calculations, the first, second third (claim 23, 24) and fourth Jacobi variant mathematical algorithms applied (claim 24) consistent with the claim 22 analysis.  For these reasons, the claims do not amount to significantly more than the abstract idea.

Regarding claim 2, claim 2 is rejected for at least the reasons cited with respect to claim 1.  Under the step 2A prong 1 analysis, claim 2 merely recites further mathematical concepts by transforming an error prevention constant value ϵ into the inverse square root function by shifting the error prevention constant value ϵ into an inverse square root, in the cumulative change function of the gradient. As to the shifting, shifting is mathematically equal to dividing by two. See specification [00216].  Claim 2 contains no further additional elements that would require further analysis under step 2A prong 2 or step 2B.

Regarding claim 3, claim 3 is rejected for at least the reasons cited with respect to claim 1.  Furthermore, claim 3 recites the following additional elements: the learning processor includes an inverse square root operator including a shifter, an integer subtractor, a floating-point subtractor, and a floating-point multiplier.  Under the step 2A prong 2 and step 2B analysis, these additional elements are merely generally linked to the processor in a manner that merely recites do some that in some math hardware. 

Regarding claims 4-6, claims 4-6 are rejected for at least the reasons cited with respect to claim 3.  Under the step 2A prong 1 analysis, claims 4-6 merely further recite recites further mathematical calculations and relationships performed by the inverse square root operator.  Claims 4-6 contain no further additional elements that would require further analysis under step 2A prong 2 or step 2B.

Regarding claims 7-8, claims 7-8 are rejected for at least the reasons cited with respect to claim 3.  Under the step 2A prong 1 analysis, claim 7 recites a series of mathematical calculations that include transforming number formats from floating point to integer, shifting integer bits, and performing a series of subtracting, squaring, multiplying operations.  Without further limitation, the element “operation” is interpreted to include mathematical calculations. Furthermore the element “receive an integer form of a single precision floating-point x” is not being considered as an additional element based on the further limitation “by transforming the single precision floating-point x into integer form”, which is a mathematical calculation.  Claim 8 merely further recites mathematical calculation by repeating the claim 1 Newton-Raphson method.  Claims 7-8 contain no further additional elements that would require further analysis under step 2A prong 2 or step 2B.

Regarding claim 9, the under the Alice framework Step 1, the claim falls within the four statutory categories of patentable subject matter identified by 35 USC 101: a process, machine, manufacture or composition of matter. 
 Under the Alice framework Step 2A prong 1, the claim recites Mathematical Concepts.  Claim 9 recites a series of mathematical calculations that include transforming number formats from floating point to integer, shifting integer bits, and performing a series of subtracting, squaring, multiplying operations.  Without further limitation, the element “operation” is interpreted to include mathematical calculations. Furthermore the element “receive an integer form of a single precision floating-point x” is not being considered as an additional element based on the further limitation “by transforming the single precision floating-point x into integer form”, which is a mathematical calculation.  For these reasons, the claim recites Mathematical Concepts.
Claim 9 contains no further additional elements that would require further analysis under step 2A prong 2 or step 2B.

Claims 10-11 are rejected for at least the reasons cited with respect to claim 9.  Claims 10-11 merely further mathematically limit the limitations of claim 9.  Claims 10-11 contain no further additional elements that would require further analysis under step 2A prong 2 or step 2B.

Claim 12 is directed to a computer readable recording medium in which a computer program for implementing the method of claim 9 has been recorded.  All steps recited in the method of claim 9 are performed by the computer readable recording medium of claim 12.  The claim 9 analysis applies equally to claims 12.

Claims 13-20 are directed to a method that would be practiced by the apparatus of claims 1-8.  All steps recited in the method of claims 13-20 are performed by the apparatus of claims 1-8, with the exception that the steps performed by the learning apparatus of claims 1-8 are performed by a learning processor in the method of claims 13-20.  The claim 1-8 analysis applies equally to claims 13-20, with the analysis with respect to the learning apparatus applying equally to the learning processor.


Allowable Subject Matter
Claims 1-20 would be allowable if rewritten to overcome the rejections under 35 USC 101, and with respect to claims 7-12, and 17-18 further rewritten to overcome the rejection under 35 USC 112(b).
The following is a statement of reasons for the indication of allowable subject matter. Applicant claims apparatus and methods for optimizing a neural network.  
The apparatus as in claim 1 comprises an input interface, a memory, and a learning processor.  The input interface is configured to obtain input data or training data.  The memory is configured to store the input data, the training data, and a neural network model for deep learning.  The learning processor transforms a cumulative change function of a gradient for an error function into an inverse square root function in the Gradient Descent algorithm, and operates an inverse square root approximate value by using a Newton-Raphson method for the inverse square root function.  
The method as in claim 9 comprises receiving an integer form of a single precision floating-point x by transforming the single precision floating-point x into the integer form; obtaining a data value after shifting a data of the integer form to the right by 1 bit; obtaining an initial estimated value yo by subtracting the data value from a constant R; obtaining yo2 by performing a floating-point square operation for the initial estimated value yo; obtaining 0.5x by performing a floating-point multiplication operation for the single precision floating-point x; obtaining 0.5xyo2 by performing the floating-point multiplication operation for the 0.5x and yo2. obtaining 1.5-0.5xyo2 by performing a floating-point subtraction operation for the 0.5xyo2 from 1.5; and obtaining an approximate value for yi by performing the floating-point multiplication operation by using the 1.5-0.5xyo2 and the initial estimated value yo.

The primary reason for indication of allowable subject matter are the specific combination of mathematical calculations and functions performed by the learning processor as in claim 1 and the specific combination of arithmetic operations performed on specific values as in claim 9.
Z. Ma et al., Privacy-Preserving Outsourced Speech Recognition for Smart IoT Devices, IEEE Internet of Things Journal, Vol 6, No 5, 21 May 2019 (hereinafter “Ma”) an implementation of a neural network training process (abstract).  Ma further discloses methods to find a better approximation to the root of reciprocal function with respect to activation functions in the neural network using the Newton-Raphson method to determine the square root (Section V.C,D).  Ma further discloses a gradient descent algorithm (section IV.B.2)).  Ma does not, however explicitly disclose the combination of functions wherein the learning processor transforms a cumulative change function of a gradient for an error function into an inverse square root function in the Gradient Descent algorithm, and operates an inverse square root approximate value by using a Newton-Raphson method for the inverse square root function as in claim 1.  Ma further does not explicitly disclose the specific combination of arithmetic operations performed on specific values as in claim 9.
H. A. Moghaddam et al., Fast adaptive algorithms and networks for class-separability features, Pattern Recognition 36, p. 1695-1702, 2003 (hereinafter “Moghaddam”) discloses adaptive algorithms for a self-organizing neural network for adaptive computation of the square root of the inverse covariance matrix (abstract).  Moghaddam does not, however, explicitly disclose the combination of functions wherein the learning processor transforms a cumulative change function of a gradient for an error function into an inverse square root function in the Gradient Descent algorithm, and operates an inverse square root approximate value by using a Newton-Raphson method for the inverse square root function as in claim 1.  Moghaddam further does not explicitly disclose the specific combination of arithmetic operations performed on specific values as in claim 9.
N. Shazeer et al., Adafactor: Adaptive Learning Rates with Sublinear Memory Cost, Proceedings of the 35th International Conference on Machine Learning, PMLR 80, 2018 (hereinafter “Shazeer”) discloses a neural network training algorithm that reduces memory usage while retaining the benefits of adaptivity by maintaining a factored representation of squared gradient accumulations across training steps (abstract, introduction).  Shazeer does not, however explicitly disclose the combination of functions wherein the learning processor transforms a cumulative change function of a gradient for an error function into an inverse square root function in the Gradient Descent algorithm, and operates an inverse square root approximate value by using a Newton-Raphson method for the inverse square root function as in claim 1.  Shazeer further does not explicitly disclose the specific combination of arithmetic operations performed on specific values as in claim 9.
Y. Lu et al., Block Mean Approximation for Efficient Second Order Optimization, arXiv:1804.05484v3 [cs.LG], 2018 (hereinafter “Lu”) discloses a matrix approximation method that allows for efficient computation of inverse square root as applied to the Newton AdaGrad method in training deep neural networks (abstract).  Lu does not, however explicitly disclose the combination of functions wherein the learning processor transforms a cumulative change function of a gradient for an error function into an inverse square root function in the Gradient Descent algorithm, and operates an inverse square root approximate value by using a Newton-Raphson method for the inverse square root function as in claim 1.  Lu further does not explicitly disclose the specific combination of arithmetic operations performed on specific values as in claim 9.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to EMILY E LAROCQUE whose telephone number is (469)295-9289.  The examiner can normally be reached on 10:00am - 1200pm, 2:00pm - 8pm ET M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor Jyoti Mehta can be reached on 571-270-3995.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


/EMILY E LAROCQUE/Primary Examiner, Art Unit 2182