DETAILED ACTION

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claim(s) 1 – 8, 10 – 17, and 19 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Kingsbury (US 2014/0067738).
As to claim 1, Kingsbury teaches a system for training a neural network (paragraph [0003]...methods for training of deep neural network models to perform structured classification tasks), the system comprising: 
a memory (paragraph [0055]...memory 702);
a processor (paragraph [0055]...processor 701) coupled to the memory;
a block diagonal Hessian free optimizer (paragraph [0019]...Hessian free optimization) stored in the memory and implemented using the processor, the block diagonal Hessian free optimizer configured to:
divide the neural network (paragraph [0018]...deep neural network model) into a first block (paragraph [0047]...worker node 602) and a second block (paragraph [0047]...worker node 603) (paragraph [0024]...the master can partition the data into pieces of approximately equal size, and assign each part to a different worker. The master can implement the Hessian free optimizationand coordinate the activity of the workers); 
paragraph [0047]...determine the gradients and the curvature-vector products from the training data); 
generate a curvature-vector product from a curvature mini-batch included in the training data (paragraph [0047]...determine the gradients and the curvature-vector products from the training data);
generate a conjugate gradient from the gradient and the curvature-vector product (paragraph [0035]...case where the gradient is determined on only part of the training data, the workers can perform the sampling at block 302. The current search direction can be updated in block 304 by running one step of the conjugate gradient); 
determine, using the conjugate gradient a change (paragraph [0035]...each search direction is tested in turn to determine if it reduces the loss on the held-out set) in at least one first parameter (paragraph [0035]...each search direction) of the first block;
determine, using the conjugate gradient a change (paragraph [0035]...each search direction is tested in turn to determine if it reduces the loss on the held-out set) in at least one second parameter (paragraph [0035]...each search direction) of the second block; and
determine the at least one first parameter in the first block using the at least one first parameter and the change in the at least one first parameter, and the at least one second parameter in the second block using the at least one second parameter and the change in the at least one second parameter (paragraph [0035]... a case where the gradient is determined on only part of the training data, the workers can perform the sampling at block 302. The current search direction can be updated in block 304 by running one step of the conjugate gradient. At block 305, if the update to the search direction reduces the quadratic approximation of the loss function by less than a target amount, for example 0.5%, the loop terminates. Note that blocks 304 and 305 together implement a truncated conjugate gradient search. The result of blocks 304 and 305 includes a set of search directions. At block 306, each search direction is tested in turn to determine if it reduces the loss on the held-out set. At block 307, if no search direction improves the held-out loss, that is, a loss determined for the held-out set, the search direction can be re-set to zero and the damping parameter increased at block 308).

As to claim 2, Kingsbury teaches a system, wherein the block diagonal Hessian free optimizer (paragraph [0019]...Hessian free optimization) is further configured to determine the change (paragraph [0035]...each search direction is tested in turn to determine if it reduces the loss on the held-out set) in the at least one first parameter (paragraph [0035]...each search direction) in parallel (paragraph [0009]...plurality of distributed worker computing devices configured to perform data-parallel computation of gradients and curvature matrix-vector products) with determining the change in the at least one second parameter (paragraph [0035]...each search direction).

As to claim 3, Kingsbury teaches a system, wherein a size of the first block (paragraph [0047]...worker node 602) and a size (paragraph [0047]...the master node 601 can partition the data into pieces of approximately equal size, and assign each part to a different worker node, e.g., 602 or 603) of a second block (paragraph [0047]...worker node 603) depend on characteristics of the neural network (paragraph [0018]...deep neural network model).

As to claim 4, Kingsbury teaches a system, wherein a block diagonal Hessian matrix (paragraph [0042]...a Gauss-Newton approximation can be used to determine an approximation of the Hessian matrix) in the curvature-vector product (paragraph [0047]...determine the gradients and the curvature-vector products from the training data) includes non-zero values (paragraph [0035]...if no search direction improves the held-out loss, that is, a loss determined for the held-out set, the search direction can be re-set to zero and the damping parameter increased at block 308) for one paragraph [0035]...each search direction).

As to claim 5, Kingsbury teaches a system, wherein a block diagonal Hessian matrix (paragraph [0042]...a Gauss-Newton approximation can be used to determine an approximation of the Hessian matrix) in the curvature-vector product (paragraph [0047]...determine the gradients and the curvature-vector products from the training data) includes zero values (paragraph [0035]...if no search direction improves the held-out loss, that is, a loss determined for the held-out set, the search direction can be re-set to zero and the damping parameter increased at block 308) for one or more terms that correspond to a pair of parameters that include the at least one first parameter (paragraph [0035]...each search direction) and the at least one second parameter (paragraph [0035]...each search direction).

As to claim 6, Kingsbury teaches a system, wherein the block diagonal Hessian free optimizer (paragraph [0019]...Hessian free optimization) is further configured to determine the change in at least one first parameter (paragraph [0035]...each search direction) until the change is below a conjugate gradient stop criterion (paragraph [0035]...if a best search direction among the set of search directions is used to update the network parameters, a check can be made for a stopping criterion at block 309, for example, based on the magnitude of the gradient. If the stopping criterion is not met at block 309, then the damping parameter can be adjusted at block 310).

As to claim 7, Kingsbury teaches a system, wherein the block diagonal Hessian free optimizer (paragraph [0019]...Hessian free optimization) is further configured to determine the change (paragraph [0035]...each search direction is tested in turn to determine if it reduces the loss on the held-out set) in at least one first parameter until the block diagonal Hessian free optimizer (paragraph [0019]...Hessian free optimization) performs a maximum (paragraph [0018]...the loss function can be based on an expected Hidden Markov Model state error rate, maximum mutual information, boosted maximum mutual information, and the like) number of conjugate gradient iterations.

As to claim 8, Kingsbury teaches a system, wherein the curvature mini-batch is smaller than the gradient mini-batch (paragraph [0042]...the samples may be about 1% of the training data, such that matrix-vector products of curvature are determined across discrete batches of the training data).

Claim 10 has similar limitations as claim 1. Therefore, the claim is rejected for the same reasons as above. 

Claim 11 has similar limitations as claim 2. Therefore, the claim is rejected for the same reasons as above. 

Claim 12 has similar limitations as claim 3. Therefore, the claim is rejected for the same reasons as above. 

Claim 13 has similar limitations as claim 4. Therefore, the claim is rejected for the same reasons as above. 



Claim 15 has similar limitations as claim 6. Therefore, the claim is rejected for the same reasons as above. 

Claim 16 has similar limitations as claim 7. Therefore, the claim is rejected for the same reasons as above. 

Claim 17 has similar limitations as claim 8. Therefore, the claim is rejected for the same reasons as above. 

Claim 18 has similar limitations as claim 9. Therefore, the claim is rejected for the same reasons as above. 



Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claims 9, 18, and 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Kingsbury (US 2014/0067738).
Kingsbury discloses the claimed invention except for “the block diagonal Hessian free optimizer is further configured to determine the change in the at least one first parameter by solving: 
    PNG
    media_image1.png
    44
    306
    media_image1.png
    Greyscale
and wherein b is the first block, / is a loss function, and G is a block diagonal Hessian matrix, and 
    PNG
    media_image2.png
    24
    43
    media_image2.png
    Greyscale
is the change in the at least one first parameter.”
It would have been obvious to one having ordinary skill in the art at the time the invention was made to the block diagonal Hessian free optimizer is further configured to determine the change in the at least one first parameter by solving: 
    PNG
    media_image1.png
    44
    306
    media_image1.png
    Greyscale
and wherein b is the first block, / is a loss function, and G is a block diagonal Hessian matrix, and 
    PNG
    media_image2.png
    24
    43
    media_image2.png
    Greyscale
is the change in the at least one first parameter, since it has been held that discovering an optimum value of a result effective variable involves only routine skill in the art. In re Boesch, 617 F.2d 272, 205 USPQ 215 (CCPA 1980).



Claim 20 has similar limitations as claim 9. Therefore, the claim is rejected for the same reasons as above. 


Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to BRANDON S COLE whose telephone number is (571)270-5075.  The examiner can normally be reached on Mon - Fri 7:30pm - 5pm EST (Alternate Friday's Off).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kakali Chaki can be reached on 571-272-3719.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  

/BRANDON S COLE/Primary Examiner, Art Unit 2122