DETAILED ACTION

This action is made FINAL in response to the amendments filed on 11/04/2021.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claims 1 – 3, 5 – 12, and 14 - 22 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Kingsbury (US 2014/0067738) in view of HEALEY et al (US 2018/0096241).
As to claim 1, Kingsbury teaches a system for training a neural network (paragraph [0003]...methods for training of deep neural network models to perform structured classification tasks), the system comprising: 
a memory (paragraph [0055]...memory 702);
a processor (paragraph [0055]...processor 701) coupled to the memory;
paragraph [0019]...Hessian free optimization) stored in the memory and implemented using the processor, the block diagonal Hessian free optimizer configured to:
divide the layers (paragraph [0047]...worker node 602 , worker node 603) neural network (paragraph [0018]...deep neural network model) into a first block (paragraph [0047]...worker node 602) and a second block (paragraph [0047]...worker node 603) (paragraph [0024]...the master can partition the data into pieces of approximately equal size, and assign each part to a different worker. The master can implement the Hessian free optimizationand coordinate the activity of the workers); 
generate a gradient from a gradient mini-batch included in training data (paragraph [0047]...determine the gradients and the curvature-vector products from the training data); 
generate a curvature-vector product from a curvature mini-batch included in the training data (paragraph [0047]...determine the gradients and the curvature-vector products from the training data);
generate a conjugate gradient from the gradient and the curvature-vector product (paragraph [0035]...case where the gradient is determined on only part of the training data, the workers can perform the sampling at block 302. The current search direction can be updated in block 304 by running one step of the conjugate gradient); 
determine, using the conjugate gradient a change (paragraph [0035]...each search direction is tested in turn to determine if it reduces the loss on the held-out set) in at least one first parameter (paragraph [0035]...each search direction) of the first block;
determine, using the conjugate gradient a change (paragraph [0035]...each search direction is tested in turn to determine if it reduces the loss on the held-out set) in at least one second parameter (paragraph [0035]...each search direction) of the second block; and
paragraph [0035]... a case where the gradient is determined on only part of the training data, the workers can perform the sampling at block 302. The current search direction can be updated in block 304 by running one step of the conjugate gradient. At block 305, if the update to the search direction reduces the quadratic approximation of the loss function by less than a target amount, for example 0.5%, the loop terminates. Note that blocks 304 and 305 together implement a truncated conjugate gradient search. The result of blocks 304 and 305 includes a set of search directions. At block 306, each search direction is tested in turn to determine if it reduces the loss on the held-out set. At block 307, if no search direction improves the held-out loss, that is, a loss determined for the held-out set, the search direction can be re-set to zero and the damping parameter increased at block 308).
Kingsbury fails to explicitly show/teach that the first block has first adjacent layers of the layers and the second block has second adjacent layers of the layers. 
However, HEALEY et al figure 4 shows and teaches the first block (410, 412, 414) has first adjacent layers of the layers and the second block (416, 418, 420) has second adjacent layers of the layers (paragraph [0087]...Communications grid computing system 400 (which can be referred to as a "communications grid") also includes one or more worker nodes. Shown in FIG. 4 are six worker nodes 410-420. Although FIG. 4 shows six worker nodes, a communications grid can include more or less than six worker nodes. The number of worker nodes included in a communications grid may be dependent upon how large the project or data set is being processed by the communications grid, the capacity of each worker node, the time designated for the communications grid to complete the project, among others. Each worker node within the communications grid computing system 400 may be connected (wired or wirelessly, and directly or indirectly) to control nodes 402-406. Each worker node may receive information from the control nodes (e.g., an instruction to perform work on a project) and may transmit information to the control nodes (e.g., a result from work performed on a project). Furthermore, worker nodes may communicate with each other directly or indirectly. For example, worker nodes may transmit data between each other related to a job being performed or an individual task within a job being performed by that worker node. In some examples, worker nodes may not be connected (communicatively or otherwise) to certain other worker nodes. For example, a worker node 410 may only be able to communicate with a particular control node 402. The worker node 410 may be unable to communicate with other worker nodes 412-420 in the communications grid, even if the other worker nodes 412-420 are controlled by the same control node 402) (Examiner’s Note: It would be obvious for someone to label a section of a layer as “a block” in order to make it easier to describe their invention. The examiner doesn’t feel doing this makes the case novel in anyway.)
Therefore, it would have been obvious for one having ordinary skill in the art, before the effective filing date of the claimed invention, for Kingsbury’s first block to have first adjacent layers of the layers and the second block to have second adjacent layers of the layers, as in HEALEY et al, for the purpose of the neural networking being more adaptive.

As to claim 2, Kingsbury teaches a system, wherein the block diagonal Hessian free optimizer (paragraph [0019]...Hessian free optimization) is further configured to determine the change (paragraph [0035]...each search direction is tested in turn to determine if it reduces the loss on the held-out set) in the at least one first parameter (paragraph [0035]...each search direction) in parallel (paragraph [0009]...plurality of distributed worker computing devices configured to perform data-parallel computation of gradients and curvature matrix-vector products) with determining the change in the at least one second parameter (paragraph [0035]...each search direction).


paragraph [0047]...worker node 602) and a size (paragraph [0047]...the master node 601 can partition the data into pieces of approximately equal size, and assign each part to a different worker node, e.g., 602 or 603) of a second block (paragraph [0047]...worker node 603) depend on characteristics of the neural network (paragraph [0018]...deep neural network model).

As to claim 5, Kingsbury teaches a system, further comprising: generate a block diagonal Hessian matrix (paragraph [0042]...a Gauss-Newton approximation can be used to determine an approximation of the Hessian matrix) of curvature-vector products (paragraph [0047]...determine the gradients and the curvature-vector products from the training data) that correspond to parameter pairs from the at least one first parameter of the first block and the at least one second parameter of the second block; and 
set curvature-vector products to zero values (paragraph [0035]...if no search direction improves the held-out loss, that is, a loss determined for the held-out set, the search direction can be re-set to zero and the damping parameter increased at block 308) for one or more terms that correspond to a pair of parameters that include a parameter from the at least one first parameter (paragraph [0035]...each search direction) and a parameter from the at least one second parameter (paragraph [0035]...each search direction).

As to claim 6, Kingsbury teaches a system, wherein the block diagonal Hessian free optimizer (paragraph [0019]...Hessian free optimization) is further configured to determine the change in at least one first parameter (paragraph [0035]...each search direction) until the change is below a conjugate gradient stop criterion (paragraph [0035]...if a best search direction among the set of search directions is used to update the network parameters, a check can be made for a stopping criterion at block 309, for example, based on the magnitude of the gradient. If the stopping criterion is not met at block 309, then the damping parameter can be adjusted at block 310).

As to claim 7, Kingsbury teaches a system, wherein the block diagonal Hessian free optimizer (paragraph [0019]...Hessian free optimization) is further configured to determine the change (paragraph [0035]...each search direction is tested in turn to determine if it reduces the loss on the held-out set) in at least one first parameter until the block diagonal Hessian free optimizer (paragraph [0019]...Hessian free optimization) performs a maximum (paragraph [0018]...the loss function can be based on an expected Hidden Markov Model state error rate, maximum mutual information, boosted maximum mutual information, and the like) number of conjugate gradient iterations.

As to claim 8, Kingsbury teaches a system, wherein the curvature mini-batch is smaller than the gradient mini-batch (paragraph [0042]...the samples may be about 1% of the training data, such that matrix-vector products of curvature are determined across discrete batches of the training data).

Kingsbury discloses the claimed invention except for “the block diagonal Hessian free optimizer is further configured to determine the change in the at least one first parameter by solving: 
    PNG
    media_image1.png
    44
    306
    media_image1.png
    Greyscale
and wherein b is the first block, / is a loss function, and G is a block diagonal Hessian matrix, and 
    PNG
    media_image2.png
    24
    43
    media_image2.png
    Greyscale
is the change in the at least one first parameter.”

    PNG
    media_image1.png
    44
    306
    media_image1.png
    Greyscale
and wherein b is the first block, / is a loss function, and G is a block diagonal Hessian matrix, and 
    PNG
    media_image2.png
    24
    43
    media_image2.png
    Greyscale
is the change in the at least one first parameter, since it has been held that discovering an optimum value of a result effective variable involves only routine skill in the art. In re Boesch, 617 F.2d 272, 205 USPQ 215 (CCPA 1980).

Claim 10 has similar limitations as claim 1. Therefore, the claim is rejected for the same reasons as above. 

Claim 11 has similar limitations as claim 2. Therefore, the claim is rejected for the same reasons as above. 

Claim 12 has similar limitations as claim 3. Therefore, the claim is rejected for the same reasons as above. 

Claim 14 has similar limitations as claim 5. Therefore, the claim is rejected for the same reasons as above. 



Claim 16 has similar limitations as claim 7. Therefore, the claim is rejected for the same reasons as above. 

Claim 17 has similar limitations as claim 8. Therefore, the claim is rejected for the same reasons as above. 

Claim 18 has similar limitations as claim 9. Therefore, the claim is rejected for the same reasons as above. 

Claim 19 has similar limitations as claim 1. Therefore, the claim is rejected for the same reasons as above. 

Claim 20 has similar limitations as claim 9. Therefore, the claim is rejected for the same reasons as above. 

As to claim 21, modified Kingsbury teaches the system, wherein the block diagonal Hessian free optimizer (paragraph [0019]...Hessian free optimization) is further configured to identify a type of the neural network (paragraph [0019]...an exemplary Hessian free optimization having a structured loss function can be applied to machine learning tasks in general where multiple, local decisions can be combined to make a larger-scale decision (e.g., a structured classification), and where local decisions can be made using a deep neural network, for example, in the field of speech recognition. Other exemplary practical applications can include handwriting recognition and video processing); and
 identify the first adjacent layers for the first block (410, 412, 414 of HEALEY et al figure 4) and the second adjacent layers for the second block (416, 418, 420 of HEALEY et al figure 4))  based on the type of the neural network (paragraph [0087]...Communications grid computing system 400 (which can be referred to as a "communications grid") also includes one or more worker nodes. Shown in FIG. 4 are six worker nodes 410-420. Although FIG. 4 shows six worker nodes, a communications grid can include more or less than six worker nodes. The number of worker nodes included in a communications grid may be dependent upon how large the project or data set is being processed by the communications grid, the capacity of each worker node, the time designated for the communications grid to complete the project, among others. Each worker node within the communications grid computing system 400 may be connected (wired or wirelessly, and directly or indirectly) to control nodes 402-406. Each worker node may receive information from the control nodes (e.g., an instruction to perform work on a project) and may transmit information to the control nodes (e.g., a result from work performed on a project). Furthermore, worker nodes may communicate with each other directly or indirectly. For example, worker nodes may transmit data between each other related to a job being performed or an individual task within a job being performed by that worker node. In some examples, worker nodes may not be connected (communicatively or otherwise) to certain other worker nodes. For example, a worker node 410 may only be able to communicate with a particular control node 402. The worker node 410 may be unable to communicate with other worker nodes 412-420 in the communications grid, even if the other worker nodes 412-420 are controlled by the same control node 402) (Examiner’s Note: It would be obvious for someone to label a section of a layer as “a block” in order to make it easier to describe their invention. The examiner doesn’t feel doing this makes the case novel in anyway.)


paragraph [0019]...Hessian free optimization) of curvature-vector products (paragraph [0047]...determine the gradients and the curvature-vector products from the training data), the curvature-vector products correspond to parameter pairs from the at least one first parameter (paragraph [0035]...each search direction) of the first block and the at least one second parameter (paragraph [0035]...each search direction) of the second block; and setting curvature-vector products from parameters of different blocks to zero (paragraph [0035]...if no search direction improves the held-out loss, that is, a loss determined for the held-out set, the search direction can be re-set to zero and the damping parameter increased at block 308) for

Response to Arguments
Applicant's arguments filed 11/04/2021 have been fully considered but they are not persuasive. 
Kingsbury fails to explicitly show/teach that the first block has first adjacent layers of the layers and the second block has second adjacent layers of the layers. 
However, HEALEY et al figure 4 shows and teaches the first block (410, 412, 414) has first adjacent layers of the layers and the second block (416, 418, 420) has second adjacent layers of the layers (paragraph [0087]...Communications grid computing system 400 (which can be referred to as a "communications grid") also includes one or more worker nodes. Shown in FIG. 4 are six worker nodes 410-420. Although FIG. 4 shows six worker nodes, a communications grid can include more or less than six worker nodes. The number of worker nodes included in a communications grid may be dependent upon how large the project or data set is being processed by the communications grid, the capacity of each worker node, the time designated for the communications grid to complete the project, among others. Each worker node within the communications grid computing system 400 may be connected (wired or wirelessly, and directly or indirectly) to control nodes 402-406. Each worker node may receive information from the control nodes (e.g., an instruction to perform work on a project) and may transmit information to the control nodes (e.g., a result from work performed on a project). Furthermore, worker nodes may communicate with each other directly or indirectly. For example, worker nodes may transmit data between each other related to a job being performed or an individual task within a job being performed by that worker node. In some examples, worker nodes may not be connected (communicatively or otherwise) to certain other worker nodes. For example, a worker node 410 may only be able to communicate with a particular control node 402. The worker node 410 may be unable to communicate with other worker nodes 412-420 in the communications grid, even if the other worker nodes 412-420 are controlled by the same control node 402) (Examiner’s Note: It would be obvious for someone to label a section of a layer as “a block” in order to make it easier to describe their invention. The examiner doesn’t feel doing this makes the case novel in anyway.)
Therefore, Kingsbury’s in view of  HEALEY et al clearly shows all the limitations as claimed. The examiner feels the Kingsbury’s in view of HEALEY et al rejection can easily be overcome if the applicant would go into more detail about the first and second parameters and how each block is defined. 


Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to BRANDON S COLE whose telephone number is (571)270-5075. The examiner can normally be reached Mon - Fri 7:30pm - 5pm EST (Alternate Friday's Off).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Omar Fernandez can be reached on 571-272-2589. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 
/BRANDON S COLE/Primary Examiner, Art Unit 2128