DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on 9/8/2022 has been entered.
 
Response to Amendments
Acknowledgement is made of Applicant's claim amendments on 9/8/2022. The claim amendments are entered. Presently, claims 1, 4-39, and 42-61 remain pending. Claims 2, 3, 40, and 41 have previously been cancelled. Claims 12, 45, 55, and 61 have been amended.
  
Response to Arguments
Applicant's arguments filed on 9/8/2022 have been fully considered and are addressed below.

Applicant argues that Baker ‘960 allegedly does not teach the c2 step (Applicant’s reply pgs. 15-16). This argument has been considered and is moot because this reference is no longer being used to teach this limitation. 

Applicant argues that Choi allegedly does not teach the newly amended claim limitations because it allegedly does not teach the “target training datum” and argues that Choi allegedly does not teach the amended step c2 (Applicant’s 16-17). These arguments are not persuasive. First, the target training datum was described in the citations and that the increase in performance of the neural network is in correlation with the target training datum as described in the various citations. For instance, in an example, a citation teaches a target training datum such as a node with a largest load being selected for improvement. Whereas other citations provide other such examples. Second, Choi is not being used to teach the amended step c2, so the argument is moot. Thus, these arguments are not persuasive. 

Applicant argues that the citations allegedly do not teach claims 6 and 42 regarding the node selection and classification error (Applicant’s reply pgs. 17-18). This argument is not persuasive. The citations describe an error determination of the neural network and its nodes, wherein the error determination can be for a classification task. As such, when errors are present, modifications to the neural network and its nodes are made to minimize such errors. That is, the nodes in the neural network are selected and optimized to minimize this error. As stated in the citations, the nodes are selected for various reasons and metrics which relate to the performance of the nodes, wherein their performance affects the performance of the neural network, leading to correct or erroneous classification results. Upon the determination of erroneous results, the nodes are selected and optimized to improve the performance. Thus, the nodes are being selected when there are classification errors. The use of metrics such as activation and entropy as part of the selection does not negate the fact that selection and optimization of the nodes are occurring due to the classification errors as well. Thus, this argument is not persuasive. 

Applicant argues that Choi allegedly does not teach claim 9 because it allegedly does not teach a threshold comparison of activation values (Applicant’s reply pgs. 18-19). This argument is not persuasive. The citations clearly show a comparison of activation values with a threshold and selecting a node based on a comparison of an activation value with the threshold value, with various examples of such selection. Thus, this argument is not persuasive. 

Applicant argues that Baker ‘960 allegedly does not teach claims 10 and 43 because it allegedly does not teach the claim limitations and allegedly is not combinable with Choi and   Applicant also cites a triple node set in Baker ‘960 that acts to replace output nodes [0146] (Applicant’s reply pg. 19). These arguments are not persuasive. The triple node is not being used to teach the claim limitations. The citations in Baker ‘960 is used simply to teach the detector node and detecting data using the detector node. Baker ‘960 is combinable with Choi because both are in the same field of endeavor teaching neural network systems with various components and nodes that can be used for analyzing data for machine learning, wherein Choi can be modified to include a detector node to detect various data that can improve its detection of data, and consequently its data analysis for machine learning. Thus, these arguments are not persuasive.  

Applicant argues that Baker ‘542 allegedly does not teach claims 11 and 44 because it allegedly does not teach the claim limitations because the ensemble neural network and joint optimization neural network allegedly does not require a discrimination node and allegedly is not combinable with Choi (Applicant’s reply pgs. 19-20). These arguments are not persuasive. The citations in Baker ‘542 is used simply to teach the discriminator node for discriminating between various data. The discrimination nodes are not being used to enable an ensemble network a joint optimization network. Baker ‘542 is combinable with Choi because both are in the same field of endeavor teaching neural network systems with various components and nodes that can be used for analyzing data for machine learning, wherein Choi can be modified to include a discriminator node to distinguish between various data that can improve its data analysis for machine learning. Thus, these arguments are not persuasive.  

Applicant argues that Choi allegedly does not teach claims 12 and 45 because it allegedly does not teach the error-prediction node and error-correction node (Applicant’s reply pg. 20). This argument is not persuasive. Choi is not being used to teach the error-prediction node. Choi is being used to teach error-correction node and classification error. Furthermore, the error-correcting nodes are identified as shown in the citations. Thus, the arguments are not persuasive. 

Applicant argues that Choi allegedly does not teach claims 14 and 46 because it allegedly does not teach the regularization link because the regularization link being taught in Choi includes weight and allegedly does not consider loss function as part of back-propagation (Applicant’s reply pg. 20). Applicant reiterates this argument for claims 26, 50, and 35 and also states that Choi allegedly does not use the “regularization” and therefore cannot teach the claim limitations (Applicant’s reply pgs. 21-22). These arguments are not persuasive. The links as shown in Figs. 7-9 connect the various nodes together and enable operations aside from acting merely as weight connections, i.e., it also enables back-propagation with error/loss function, as part of regulating the nodes and their performance. The fact that the weights can be part of the link does not negate the fact that the link also serves to enable back-propagation with error/loss function as argued by Applicant. Additionally, the link in Choi does consider back-propagation as shown in the citations. Moreover, Choi is not required to use the exact same language term regularization as Applicant in order to teach the claim limitations. The fact that Choi teaches the concepts being recited in the claim limitations is sufficient. Thus, these arguments are not persuasive.

Applicant argues that Baker ‘960 allegedly does not teach claims 17 and 49 because it allegedly does not teach the node selection (Applicant’s reply pg. 21). This argument is not persuasive. The citations teach a selection of nodes. The various citations are used to teach the various aspects as described in the mapping. The citation at [0459] was used to teach objective function in relation to error cost function. Thus, the argument is not persuasive. 

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claims 1, 4-9, 14, 15, 25-28, 35-39, 42, 46, 47, 50, 51, 53-58, 60, and 61 are rejected under 35 U.S.C. 103 as being unpatentable over Choi et al. (U.S. Pat. App. Pre-Grant Pub. No. 2016/0155049, hereinafter Choi) in view of Baker (WIPO No. WO 2018/226527, hereinafter Baker) and Hackbarth et al., “Modular Connectionist Structure for 100-Word Recognition” (hereinafter Hackbarth). 

Regarding claim 1, Choi teaches:
A method of training a neural network, the method comprising, by a programmed computer system ([0082] and [0086]-[0088]: describing training of a neural network (NN).): 
(a) training, at least partially, a base neural network on a first set of training data ([0082]-[0084] and [0129]: describing training of NN using training data.), 
wherein training the base neural network comprises computing for each datum in the first set of training data, activation values for nodes in the base neural network ([0074] and [0099]-[0102]: describing generation of activation data for the various nodes in the NN.) and 
…, 
wherein the base neural network comprises an input layer, an output layer, and one or more inner layers between the input and output layers ([0076]-[0081]: describing that the NN has input layer, hidden layers, and output layer.); 
(b) after step (a) and based on the training, selecting, based on specified criteria, a target node of the base neural network for targeted improvement ([0087]-[0089], [0100], and [0103]: describing the selection of a target node for improvement, e.g. extension, “based on a variety of information”, i.e. criteria.); and 
(c) after step (b), adding a target-specific improvement network sub-component to the base network to form an expanded neural network, wherein the target-specific improvement network sub-component comprises one or more nodes ([0090]-[0091], [0135], and [0138]: describing adding in an additional/new node to expand the NN.) and 
wherein the target-specific improvement network sub-component, when added to the base neural network, improves performance of the base network ([0085], [0126], and [0132]: describing improved performance of the NN based on its extension by adding in additional node(s).), 
wherein adding the target-specific improvement network sub-component comprises, by the programmed computer system: (c1) selecting the target-specific improvement network sub-component ([0087]-[0090], [0100], and [0103]: describing the selection of the additional node.); 
…; and 
(c3) after step (c2), merging the target-specific improvement network sub-component with the base neural network to form the expanded neural network ([0092]-[0093]: describing connecting the new node to the NN to create an extended NN.).

While the cited reference teaches the above limitations of claim 1, it does not explicitly teach: “estimates of partial derivatives of an objective function for the base neural network for the nodes in the base neural network” on lines 5-7. Baker discloses the claim limitations, teaching: a determination of the partial derivatives of an objection function, e.g. an error cost function, for each node in a base NN in correlation (Baker [0022]-[0024]). 
Thus, it would have been obvious to Person Having Ordinary Skill in the Art (PHOSITA) before the effective filing date (EFD) to modify the method of training the NN in the cited reference to include the partial derivatives computation for the base NN in Baker. Doing so would enable a technique to “improve the performance of a network that has converged such that the gradient of the network and all the partial derivatives are zero…. The present system and method can create a new network by splitting the candidate nodes or arcs that diverge from zero and then trains the resulting network with each selected node trained on the corresponding cluster of the data.” (Baker Abstract). 

While the cited references in combination teach the above limitations of claim 1, they do not explicitly teach: “(c2) after step (cl), training the target-specific improvement network sub-component separately from the base network” on lines 19-20. Hackbarth teaches: a process for training neural networks, wherein subnets of hidden node units are all trained separately outside of the neural network prior to being inserted and assembled into the neural network (Hackbarth Section 3). Wherein each of the subnets of the hidden node units denotes a target-specific network sub-component since they are a target specific sub-component that is being targeted for improvement. 
Thus, it would have been obvious to Person Having Ordinary Skill in the Art (PHOSITA) before the effective filing date (EFD) to modify the method of training the NN in the combined cited references to include the separate training in Hackbarth. Doing so would enable improved accuracy and performance of the neural network in performing its tasks (Hackbarth Section 5). 

Regarding claim 4, the rejection of claim 1 is incorporated. Choi teaches:
The method of claim 1, further comprising, by the programmed computer system, after step (c3), training the expanded neural network ([0087] and [0096]: describing training of the extended neural network.).


Regarding claim 5, the rejection of claim 1 is incorporated. Choi teaches:
The method of claim 1, further comprising, by the programmed computer system, after step (c), training the expanded neural network ([0087] and [0096]: describing training of the extended neural network.).

Regarding claim 6, the rejection of claim 1 is incorporated. Choi teaches:
The method of claim 1, wherein the specified criteria for selecting the target node comprises selecting the target node upon a determination that the target node made a classification error for a first datum in the first set of training data ([0083]-[0084], [0124], and [0126]: describing an error determination between an actual output value and an expected value for training in correlation with the nodes in the NN for extension of the NN, wherein such error can relate to a classification.).

Regarding claim 7, the rejection of claim 6 is incorporated. Choi teaches:
The method of claim 6, wherein the target node is an output node of the base neural network ([0080]: describing the nodes in NN that can be targeted for extension, e.g. output of the hidden layer nodes.).

Regarding claim 8, the rejection of claim 6 is incorporated. Choi teaches:
	The method of claim 6, wherein the target node is on a first inner layer of the base neural network ([0080] and [0090]: describing the nodes in NN that can be targeted for extension, e.g. hidden layer nodes. Wherein a hidden layer node can denote an inner layer of the NN.).
Regarding claim 9, the rejection of claim 8 is incorporated. Choi teaches:
	The method of claim 8, wherein the specified criteria for selecting the target node comprises a comparison of the activation value for the target node to a threshold value ([0103]-[0106]: describing comparison of the activation value in correlation with a predetermined threshold to determine the target node to select for extension.).

Regarding claim 14, the rejection of claim 1 is incorporated. Choi teaches:
	The method of claim 1, wherein the target-specific improvement network sub-component comprises a second node that is a copy of the target node, such that incoming and outgoing connections, and corresponding weight values, for the target node are initially copied to the second node ([0095] and [0116]-[0117]: describing the copying/duplication of additional nodes from a selected node such that the connection weights and input and output connections between the selected node and additional nodes can be copied/duplicated with each other.), and 
	wherein there is a directional relationship regularization link between the target node and the second node (Figs. 7-9: showing a directional connection link between the selected nodes and additional nodes. Wherein regularization of the directional link can be achieved by the weights and errors associated with the link, which allows the link to regulate the connections between the nodes based on the weights and errors through error back-propagation and implementing error/loss function ([0083]-[0085], [0116]-[0117], [0119], and [0122]).).


Regarding claim 15, the rejection of claim 14 is incorporated. Choi teaches:
	The method of claim 14, wherein the directional relationship regularization link comprises a bidirectional relationship regularization link ([0083]: describing a forward direction and a backward direction, i.e. a bi-directionality, for the connection links and their corresponding weight of the nodes in the NN. The bi-directionality links being utilized to estimate and minimize errors, i.e., regularization of the links.).

Regarding claim 25, the rejection of claim 1 is incorporated. Choi teaches:
The method of claim 1, wherein the target-specific improvement network sub- component, when added to the base neural network, changes a layer structure of the base neural network ([0082] and [0085]: describing that the structure of the initial NN can change via extension of the NN with additional nodes, wherein such an extension can provide an improvement to the performance of the NN.).

Regarding claim 26, the rejection of claim 1 is incorporated. Choi teaches:
The method of claim 1, wherein: 
the target-specific improvement network sub-component comprises a second node ([0090]-[0091] and [0102]: describing the generation of new nodes.); 
there is a node-to-node relationship regularization link between the second node and the target node (Figs. 7-9: showing a connection link between the selected nodes and additional nodes. Wherein the link can regulate elements such as weights or errors between the nodes through error back-propagation and implementing error/loss function ([0083]-[0085], [0116]-[0117], [0119], and [0122]).); and 
the node-to-node relationship regularization link imposes a node-specific regularization cost on the second node for a training datum if the activation value computed for the target node during a prior feed forward computation for the training datum violates a specified relation for the node-to-node relationship regularization link ([0082]-[0084]: describing that the connection links with corresponding weights can be analyzed and updated as needed to reduce errors in correlation with the new and selected nodes and the training data. Wherein the computation to reduce the errors can denote a regularization cost related to a violation, e.g. an error between an actual output vs. an expected output, and the computation can occur in a prior feed forward computation in order to perform a back propagation of the errors. Whereby the connection links with corresponding weights are related to an activation function ([0074]).).

Regarding claim 27, the rejection of claim 26 is incorporated. Choi teaches:
The method of claim 26, wherein: 
there is a node-to-node relationship regularization link between every node of the base network and a corresponding node of the expanded network (Figs. 7-9: showing a connection link between the selected nodes and additional nodes. Wherein the link can regulate elements such as weights or errors between the nodes ([0116]-[0117], [0119], and [0122]).); and 
the node-to-node relationship regularization links impose node-specific regularization costs for the training datum on each node in the expanded network if the activation value computed for the corresponding node in the base network for the training datum during a prior feed forward computation violates a specified relation for the node-to-node relationship regularization link ([0082]-[0084]: describing that the connection links with corresponding weights can be analyzed and updated as needed to reduce errors in correlation with the new and selected nodes and the training data. Wherein the computation to reduce the errors can denote a cost related to a violation, e.g. an error between an actual output vs. an expected output, and the computation can occur in a prior feed forward computation in order to perform a back propagation of the errors. Whereby the connection links with corresponding weights are related to an activation function ([0074]).).

Regarding claim 28, the rejection of claim 26 is incorporated. Choi teaches:
The method of claim 26, wherein the specified relation is that an activation value for the second node for the training datum equals the activation value for the target node for the training datum ([0095]: describing the additional nodes can have duplicate connection weight values as the selected node. Wherein such weights can comprise activation function values ([0074]).).

Regarding claim 35, the rejection of claim 1 is incorporated. Choi teaches:
The method of claim 1, wherein adding the target-specific improvement network sub- component comprises, by the programmed computer system: 
creating an expanded network that doubles the base network by having two nodes for each node in the base network, such that each node in the base network has first and second corresponding nodes in the expanded network ([0088], [0120], [0135], and [0138]: describing the generation of new additions nodes for the expanded neural networks, wherein “a plurality of nodes are selected and a plurality of new nodes are generated”. Whereby doing so can result in a doubling of the original NN.); and 
- 83 -creating a node-to-node relationship regularization link from one or more nodes in the base network to each of the one or more node’s first and second corresponding nodes (Figs. 7-9: showing a connection link between the selected nodes and additional nodes. Wherein the link can regulate elements such as weights or errors between the nodes through error back-propagation and implementing error/loss function ([0083]-[0085], [0116]-[0117], [0119], and [0122]).).

Regarding claim 36, the rejection of claim 35 is incorporated. Choi teaches:
The method of claim 35, wherein creating the node-to-node relationship regularization link from the one or more nodes in the base network to the first and second corresponding nodes in the expanded network comprises creating an is-equal-to regularization link from the one or more nodes in the base network to the first and second corresponding nodes in the expanded network ([0094] and [0116]-[0117]: describing that the connection link edges between the new additional nodes in the extended NN and the selected nodes in the base network can be equal as a result of equal connection weights.).

Regarding claim 37, Choi teaches:
The method of claim 35, wherein creating the node-to-node relationship regularization link from the one or more nodes in the base network to the one or more nodes first and second corresponding nodes in the expanded network comprises creating a directional is-not-equal-to regularization link from the one or more nodes in the base network to the one or more nodes first and second corresponding nodes in the expanded network ([0083]: describing a forward direction and a backward direction, i.e. a bi-directionality, for the connection links and their corresponding weight of the nodes in the NN. The bi-directionality links being utilized to estimate and minimize errors, i.e. regularization of the links. Wherein the relationship in the direction link is-not-equal since the connection weights differ, i.e. are not equal, during a training of the neural network because the weights are being updated/changed in the backward direction to minimize the errors.).

Regarding claim 38, the rejection of claim 37 is incorporated. Choi teaches:
The method of claim 37, wherein the one or more directional is-not-equal-to regularization links comprise one or more bidirectional is-not-equal-to regularization links ([0083]: describing a forward direction and a backward direction, i.e. a bi-directionality, for the connection links and their corresponding weight of the nodes in the NN. The bi-directionality links being utilized to estimate and minimize errors, i.e. regularization of the links. Wherein the relationship in the direction link is-not-equal since the connection weights differ, i.e. are not equal, during a training of the neural network because the weights are being updated/changed in the backward direction to minimize the errors.).

Regarding independent claim 39, claim 39 is substantially similar to independent claim 1 and therefore is rejected on the same grounds as claim 1. Claim 39 is a system claim that corresponds to method claim 1. A mapping is shown below for the limitations of claim 39 that differ from claim 1. Choi teaches:
A computer system for training a neural network, the computer system comprising: 
one or more processor units ([0153] and [0163]-[0164]: describing various processors.); and 
memory in communication with the one or more processor units, where the memory stores computer instructions that, when executed by the one or more processor units, cause the one or more processor units, to ([0163]-[0164] and [0166]-[0167]: describing memory with instructions that can be executed by the processing units.): 
…
wherein the memory stores computer instructions that, when executed by the one or more processor units, causes the one or more processor units ([0163]-[0164] and [0166]-[0167]: describing memory with instructions that can be executed by the processing units.) to add the target-specific improvement network sub-component by  ([0088] and [0149]: describing the addition of a new/additional node to a neural network.): ….

Regarding claim 42, claim 42 is substantially similar to claim 6 and therefore is rejected on the same ground as claim 6. Claim 42 is a system claim that corresponds to method claim 6.

Regarding claim 46, claim 46 is substantially similar to claim 14 and therefore is rejected on the same ground as claim 14. Claim 46 is a system claim that corresponds to method claim 14.

Regarding claim 47, claim 47 is substantially similar to claim 15 and therefore is rejected on the same ground as claim 15. Claim 47 is a system claim that corresponds to method claim 15.

Regarding claim 50, claim 50 is substantially similar to claim 26 and therefore is rejected on the same ground as claim 26. Claim 50 is a system claim that corresponds to method claim 26.

Regarding claim 51, claim 51 is substantially similar to claim 27 and therefore is rejected on the same ground as claim 27. Claim 51 is a system claim that corresponds to method claim 27.

Regarding claim 53, claim 53 is substantially similar to claim 35 and therefore is rejected on the same ground as claim 35. Claim 53 is a system claim that corresponds to method claim 35.

Regarding independent claim 54, Choi teaches: 
A method of training a neural network, the method comprising, by a programmed computer system (([0082] and [0086]-[0088]: describing training of a neural network (NN).): 
(a) training, at least partially, a base neural network on a first set of training data ([0082]-[0084] and [0129]: describing training of NN using training data.), 
wherein training the base neural network comprises computing for each datum in the first set of training data, activation values for nodes in the base neural network ([0074] and [0099]-[0102]: describing generation of activation data for the various nodes in the NN.) and
…, 
wherein the base neural network comprises an input layer, an output layer, and one or more inner layers between the input and output layers ([0076]-[0081]: describing that the NN has input layer, hidden layers, and output layer.); 
(b) after step (a), selecting a target training datum ([0085]-[0086], [0149], and [0154]: describing examples of selecting a target training datum, e.g. a target node or a set of text or digital images related to handwritings out of a general training data set stored in memory to train a neural network for performing a particular task, e.g. recognizing English handwriting patterns.); and 
(c) after step (b), adding a target-specific improvement network sub-component to the base neural network to form an expanded neural network, wherein the target-specific improvement network sub-component comprises one or more nodes ([0090]-[0091], [0135], and [0138]: describing adding in an additional/new node to expand the NN.) and wherein the target-specific improvement network sub-component, when added to the neural network, improves performance of the neural network ([0085], [0126], and [0132]: describing improved performance of the NN based on its extension by adding in additional node(s).) for the target training datum ([0085]-[0086], [0100], and [0126]: describing improved performance of the NN in correlation with the target training datum related to the various tasks or nodes. Wherein the datum was previously described above.). 

While the cited reference teaches the above limitations of claim 54, it does not explicitly teach: “estimates of partial derivatives of an objective function for the base neural network for the nodes in the base neural network” on lines 5-7. Baker discloses the claim limitations, teaching: a determination of the partial derivatives of an objection function, e.g. an error cost function, for each node in a base NN in correlation (Baker [0022]-[0024]). 
Thus, it would have been obvious to Person Having Ordinary Skill in the Art (PHOSITA) before the effective filing date (EFD) to modify the method of training the NN in the cited reference to include the partial derivatives computation for the base NN in Baker. Doing so would enable a technique to “improve the performance of a network that has converged such that the gradient of the network and all the partial derivatives are zero…. The present system and method can create a new network by splitting the candidate nodes or arcs that diverge from zero and then trains the resulting network with each selected node trained on the corresponding cluster of the data.” (Baker Abstract). 

Regarding claim 55, the rejection of claim 54 is incorporated. Choi teaches:
The method of claim 54, wherein the step of adding the target-specific improvement network sub-component comprises: 
(c1) selecting the target-specific improvement network sub-component ([0087]-[0090], [0100], and [0103]: describing the selection of the additional node.); 
…; and
(c3) after step (c2), merging the target-specific improvement network sub-component with the neural network to form an expanded neural network ([0092]-[0093]: describing connecting the new node to the NN to create an extended NN.).

While the cited reference Choi teaches the above limitations of claim 55, it does not explicitly teach: “(c2) after step (c1), training the target-specific improvement network sub-component separately from the base network”. Hackbarth teaches: a process for training neural networks, wherein subnets of hidden node units are all trained separately outside of the neural network prior to being inserted and assembled into the neural network (Hackbarth Section 3). Wherein each of the subnets of the hidden node units denotes a target-specific network sub-component since they are a target specific sub-component that is being targeted for improvement. 
Thus, it would have been obvious to Person Having Ordinary Skill in the Art (PHOSITA) before the effective filing date (EFD) to modify the method of training the NN in the combined cited references to include the separate training in Hackbarth. Doing so would enable improved accuracy and performance of the neural network in performing its tasks (Hackbarth Section 5). 

Regarding claim 56, claim 56 is substantially similar to claim 4 and therefore is rejected on the same ground as claim 4. Claim 56 is a method claim that corresponds to another method claim 4.

Regarding claim 57, the rejection of claim 54 is incorporated. Baker further teaches:
The method of claim 54, wherein the target training datum is not a member of the set of training data (Baker [0034]-[0035]: describing that the training data is a member either in a first group or in a second group of training data.).
Thus, it would have been obvious to Person Having Ordinary Skill in the Art (PHOSITA) before the effective filing date (EFD) to modify the method of training the NN in Choi to include the target training data computation in Baker. Doing so would enable “different sub-networks of a neural network could be trained with the different groups of data” (Baker [0035]). Whereby doing so can “improve the performance of a network that has converged such that the gradient of the network and all the partial derivatives are zero…. The present system and method can create a new network by splitting the candidate nodes or arcs that diverge from zero and then trains the resulting network with each selected node trained on the corresponding cluster of the data.” (Baker Abstract).

Regarding claim 58, the rejection of claim 54 is incorporated. Baker further teaches:
The method of claim 54, wherein the target training datum is a member of the set of training data (Baker [0034]-[0035] and [0049]: describing that training data is a member in a respective group, e.g. a first group or a second group of training data.).
Thus, it would have been obvious to Person Having Ordinary Skill in the Art (PHOSITA) before the effective filing date (EFD) to modify the method of training the NN in Choi to include the target training data computation in Baker. Doing so would enable “different sub-networks of a neural network could be trained with the different groups of data” (Baker [0035]). Whereby doing so can “improve the performance of a network that has converged such that the gradient of the network and all the partial derivatives are zero…. The present system and method can create a new network by splitting the candidate nodes or arcs that diverge from zero and then trains the resulting network with each selected node trained on the corresponding cluster of the data.” (Baker Abstract).

Regarding independent claim 60, claim 60 is substantially similar to independent claim 54 and therefore is rejected on the same grounds as claim 54. Claim 60 is a system claim that corresponds to method claim 54.
A mapping is shown below for the limitations of claim 60 that differ from claim 54. Choi teaches:
A computer system for training a neural network, the computer system comprising: 
one or more processor units ([0153] and [0163]-[0164]: describing various processors.); and 
memory in communication with the one or more processor units, where the memory stores computer instructions that, when executed by the one or more processor units, cause the one or more processor units, to ([0163]-[0164] and [0166]-[0167]: describing memory with instructions that can be executed by the processing units.): ….

Regarding claim 61, the rejection of claim 60 is incorporated. Choi teaches:
The computer system of claim 60, wherein the memory stores computer instructions that, when executed by the one or more processor units, causes the one or more processor units to add the target-specific improvement network sub-component by: 
(c1) select the target-specific improvement network sub-component ([0087]-[0090], [0100], and [0103]: describing the selection of the additional node.); 
…; and
(c3) after step (c2), merge the target-specific improvement network sub-component with the base neural network to form the expanded neural network ([0092]-[0093]: describing connecting the new node to the NN to create an extended NN.).

While the cited reference Choi teaches the above limitations of claim 61, it does not explicitly teach: “(c2) after step (c1), train the target-specific improvement network sub-component separately from the base network”. Hackbarth teaches: a process for training neural networks, wherein subnets of hidden node units are all trained separately outside of the neural network prior to being inserted and assembled into the neural network (Hackbarth Section 3). Wherein each of the subnets of the hidden node units denotes a target-specific network sub-component since they are a target specific sub-component that is being targeted for improvement. 
Thus, it would have been obvious to Person Having Ordinary Skill in the Art (PHOSITA) before the effective filing date (EFD) to modify the method of training the NN in the combined cited references to include the separate training in Hackbarth. Doing so would enable improved accuracy and performance of the neural network in performing its tasks (Hackbarth Section 5). 

Claims 10, 16-20, 30-32, 43, 48, 49, 52, and 59 are rejected under 35 U.S.C. 103 as being unpatentable over Choi et al. (U.S. Pat. App. Pre-Grant Pub. No. 2016/0155049, hereinafter Choi), Baker (WIPO No. WO 2018/226527, hereinafter Baker), and Hackbarth et al., “Modular Connectionist Structure for 100-Word Recognition” (hereinafter Hackbarth) in view of Baker (WIPO No. WO 2019/067960, hereinafter Baker ‘960). 

Regarding claim 10, the rejection of claim 6 is incorporated. The cited references in combination do not explicitly teach: “comprises a detector node that detects instances of the first datum and data that is within a threshold distance of the first datum.” Baker ‘960 discloses the claim limitations, teaching: detector nodes in a NN (Baker ‘960 [0170], [0424], and [0552]), wherein the detector nodes can determine training data and split of the training data such that the data can be within some threshold distance of each other (Baker ‘960 [0120] and [0552]).
	Thus, it would have been obvious to Person Having Ordinary Skill in the Art (PHOSITA) before the effective filing date (EFD) to modify the method of training the NN in the combined cited references to include the detector node in Baker ‘960. Doing so would enable a detection technique for detecting data samples via a detector (Baker ‘960 [0120]).

Regarding claim 16, the rejection of claim 14 is incorporated. The cited references in combination do not explicitly teach: “wherein the directional relationship regularization link enforces an "is-not-equal-to" relationship between activation values of the target node and the second node on a second set of training data, such that the second node is trained to produce different activation values than the target node on data in the second set of training data”. Baker ‘960 discloses the claim limitations, teaching: regularization link via, e.g., soft-tying techniques, such that the activation values of the various nodes can be different from each other (Baker ‘960 [0240] and [0273]).
	Thus, it would have been obvious to Person Having Ordinary Skill in the Art (PHOSITA) before the effective filing date (EFD) to modify the method of training the NN in the combined cited references to include the different activation values in Baker ‘960. Doing would enable soft tying of various parameters, e.g. activation values, of different data values (Baker ‘960 [0081]).

Regarding claim 17, the rejection of claim 1 is incorporated. The cited references in combination do not explicitly teach: “comprises selecting the target node upon a determination that an average, over the first set of training data, of the estimate of the partial derivative of the objective function for the base network with respect to an activation function for the target node is less than a threshold value.” Baker ‘960 discloses the claim limitations, teaching: describing node selection process comprising an average computation/determination for the partial derivatives over a training data set (Baker ‘960 [0527]-[0529] and [0367]-[0370]). Wherein the activation values can be less than a threshold value (Baker ‘960 [0400]) and the partial derivatives can be related an objective function such as an error cost function (Baker ‘960 [0206] and [0459]). 
	Thus, it would have been obvious to Person Having Ordinary Skill in the Art (PHOSITA) before the effective filing date (EFD) to modify the method of training the NN in the combined cited references to include the average determination in Baker ‘960. Doing so would enable “accurate estimates for a small percentage of the partial derivatives” (Baker ‘960 [0370]).

Regarding claim 18, the rejection of claim 17 is incorporated. Choi teaches:
	The method of claim 17, further comprising, by the programmed computer system:
	prior to step (c), selecting a target datum ([0089], [0100]-[0103], and [0134]: describing target data/criteria, e.g. having an activation frequency value within some predetermined threshold or a performance metric.); and 
	selecting the target-specific improvement network sub-component based on a combination of the selected target node and the selected target datum ([0089]-[0091], [0103], [0105], and [0135]: describing that a new node can be generated based on considerations involving the selected node and the target data/criteria.).

Regarding claim 19, the rejection of claim 18 is incorporated. The cited references in combination do not explicitly teach: “comprises selecting an arbitrary datum in the first set of training data for which a magnitude of a partial derivative for the target node is non-zero and greater than a magnitude of the partial derivative averaged over a set of data.” Baker ’960 discloses the claim limitations, teaching: a data selector that can select arbitrary data (Baker ‘960 [0135]) and can have a partial derivative with a magnitude greater than a specified value (Baker ‘960 [0353] and [0371]) and non-zero (Baker ‘960 [0360] and [0716]). Wherein the data can comprise an arbitrary set of data (Baker ‘960 [0359] and [0709]).
	Thus, it would have been obvious to Person Having Ordinary Skill in the Art (PHOSITA) before the effective filing date (EFD) to modify the method of training the NN in the combined cited references to include the partial derivative and arbitrary data in Baker ‘960. Doing so would enable “improving the aggressive development of machine learning systems…. [Wherein] various systems and methods can be utilized to separate the process of detailed learning and knowledge acquisition and the process of imposing restrictions and smoothing estimates, thereby allowing machine learning systems to aggressively learn from training data, while mitigating the effects of overfitting on the training data.” (Baker ‘960 Abstract).

Regarding claim 20, the rejection of claim 18 is incorporated. The cited references in combination do not explicitly teach: “comprises selecting a datum in the first set of training data for which a value of an absolute value of the derivative for the target node is greater than a threshold value.” Baker ’960 discloses the claim limitations, teaching: selection of a node via an absolute value determination of a partial derivative in correlation with the node that has “an absolute value above some specified threshold” (Baker ‘960 [0459]). Wherein the computation on the node can comprise training data (Baker ‘960 [0457], [0460], and [0465]-[0466]). 
Thus, it would have been obvious to Person Having Ordinary Skill in the Art (PHOSITA) before the effective filing date (EFD) to modify the method of training the NN in the combined cited references to include the absolute value in Baker ‘960. Doing so would enable “improving the aggressive development of machine learning systems…. [Wherein] various systems and methods can be utilized to separate the process of detailed learning and knowledge acquisition and the process of imposing restrictions and smoothing estimates, thereby allowing machine learning systems to aggressively learn from training data, while mitigating the effects of overfitting on the training data.” (Baker ‘960 Abstract). 

Regarding claim 30, the rejection of claim 26 is incorporated. The cited references in combination do not explicitly teach: “wherein a strength of the node-to-node relationship regularization link is controlled by a node-to-node relationship regularization link hyperparameter”. Baker ‘960 discloses the claim limitations, teaching: that the strength of the neural network and its respective linked nodes and objectives are related to hyperparameter values (Baker ‘960 [0239], [0243], [0250], [0273], and [0519]-[0520]).
Thus, it would have been obvious to Person Having Ordinary Skill in the Art (PHOSITA) before the effective filing date (EFD) to modify the method of training the NN in the combined cited references to include the regularization and hyperparameter in Baker ‘960. Doing so would enable “improving the aggressive development of machine learning systems…. [Wherein] various systems and methods can be utilized to separate the process of detailed learning and knowledge acquisition and the process of imposing restrictions and smoothing estimates, thereby allowing machine learning systems to aggressively learn from training data, while mitigating the effects of overfitting on the training data.” (Baker ‘960 Abstract). 

Regarding claim 31, the rejection of claim 26 is incorporated. The cited references in combination do not explicitly teach: “wherein a value of the node-to-node relationship regularization link hyperparameter is controlled by an intelligent learning management system”. Baker ‘960 discloses the claim limitations, teaching: that the computer system can evaluate and determine the hyperparameter values that can control a strength in correlation with the neural network and its respective linked nodes and objectives (Baker ‘960 [0236], [0243], [0319], and [0519]). Wherein the computer system can comprise “a machine learning system”, i.e. an intelligent learning management system (Baker ‘960 [0066]). 
Thus, it would have been obvious to Person Having Ordinary Skill in the Art (PHOSITA) before the effective filing date (EFD) to modify the method of training the NN in the combined cited references to include the intelligent system in Baker ‘960. Doing so would enable “hyperparameter tuning” to optimize a machine learning system (Baker ‘960 [0265]).

Regarding claim 32, the rejection of claim 1 is incorporated. The cited references in combination do not explicitly teach: “determining a range, over a selected set of data, of a value of a summation of incoming connections to the target node; and upon a determination that the range is greater than a threshold value, creating a second node, wherein incoming connections for the second node are initialized by copying the incoming connections and weights from the target node, and wherein a bias for the second node is initialized to discriminate a first datum in the selected set of data from a second datum in the selected set of data.” Baker ‘960 discloses the claim limitations, teaching: 
“determining a range, over a selected set of data (Baker ‘960 [0255], [0400], and [0449]: describing a range of data), of a value of a summation of incoming connections to the target node (Baker ‘960 [0320], [0400], and [0415]: describing a summation of the data. Wherein a summing neuron can perform a summation (Baker ‘960 [0456].); and 
upon a determination that the range is greater than a threshold value (Baker ‘960 [0261]-[0262] and [0400]: describing that the range, e.g. via a standard deviation, can be greater than some specified value or threshold.), 
creating a second node, wherein incoming connections for the second node are initialized by copying the incoming connections and weights from the target node (Baker ‘960 [0456], [0461], and [0492]: describing a copy of a connection of the new/additional nodes with the previous nodes that they are connected to.), and 
wherein a bias for the second node is initialized to discriminate a first datum in the selected set of data from a second datum in the selected set of data (Baker ‘960 [0260] and [0529]-[0531]: describing a bias for the nodes, wherein the bias can correlate with a splitting of the data to enable the nodes to be more decisive, i.e. discriminatory towards the data.).”
Thus, it would have been obvious to Person Having Ordinary Skill in the Art (PHOSITA) before the effective filing date (EFD) to modify the method of the training the NN in the combined cited references to include the bias and range determinations in Baker ‘960. Doing so would enable “improving the aggressive development of machine learning systems…. [Wherein] various systems and methods can be utilized to separate the process of detailed learning and knowledge acquisition and the process of imposing restrictions and smoothing estimates, thereby allowing machine learning systems to aggressively learn from training data, while mitigating the effects of overfitting on the training data.” (Baker ‘960 Abstract).

Regarding claim 43, claim 43 is substantially similar to claim 10 and therefore is rejected on the same ground as claim 10. Claim 43 is a system claim that corresponds to method claim 10.

Regarding claim 48, claim 48 is substantially similar to claim 16 and therefore is rejected on the same ground as claim 16. Claim 48 is a system claim that corresponds to method claim 16.

Regarding claim 49, claim 49 is substantially similar to claim 17 and therefore is rejected on the same ground as claim 17. Claim 49 is a system claim that corresponds to method claim 17.

Regarding claim 52, claim 52 is substantially similar to claim 32 and therefore is rejected on the same ground as claim 32. Claim 52 is a system claim that corresponds to method claim 32.

Regarding claim 59, the rejection of claim 58 is incorporated. The cited references in combination do not explicitly teach: “comprises selecting a data item in the set of training data on which a target node of the base network made a classification error”. Baker ‘960 discloses the claim limitations, teaching: selecting a data example for a model in which a main classifier makes an error ([0382]-[0383]). Wherein the model can be a neural network with various nodes (Baker ‘960 [0341] and [0388]).
	Thus, it would have been obvious to Person Having Ordinary Skill in the Art (PHOSITA) before the effective filing date (EFD) to modify the method of training the NN in the combined cited references to include the classification error in Baker ‘960. Doing so would enable “improving the aggressive development of machine learning systems…. [Wherein] various systems and methods can be utilized to separate the process of detailed learning and knowledge acquisition and the process of imposing restrictions and smoothing estimates, thereby allowing machine learning systems to aggressively learn from training data, while mitigating the effects of overfitting on the training data.” (Baker ‘960 Abstract).

Claims 11, 24, and 44 are rejected under 35 U.S.C. 103 as being unpatentable over Choi et al. (U.S. Pat. App. Pre-Grant Pub. No. 2016/0155049, hereinafter Choi), Baker (WIPO No. WO 2018/226527, hereinafter Baker), and Hackbarth et al., “Modular Connectionist Structure for 100-Word Recognition” (hereinafter Hackbarth) in view of Baker (WIPO No. WO 2019/067542, hereinafter Baker ‘542). 

Regarding claim 11, the rejection of claim 6 is incorporated. The cited references in combination do not explicitly teach: “comprises a discriminator node that discriminates between the first datum and a second datum in the first set of training data”. Baker ‘542 discloses the claim limitations, teaching: a “discrimination node” that can discriminate with regards to pair of data items (Baker ‘542 [0055]-[0056]). 
Thus, it would have been obvious to Person Having Ordinary Skill in the Art (PHOSITA) before the effective filing date (EFD) to modify the method of training the NN in the combined cited references to include the discrimination node in Baker ‘542. Doing so would enable a machine learning system that is optimized for continued learning (Baker ‘542 [0054]). 

Regarding claim 24, the rejection of claim 1 is incorporated. The cited references in combination do not explicitly teach: “comprises training the target-specific improvement network sub-component with one-shot learning”. Baker ‘542 discloses the claim limitations, teaching: a process comprising “one-shot learning, a node, called herein a "template node," is added to a neural network based on a single data item example” (Baker ‘542 [0054]-[0055]). Wherein the node can continue learning via the one-shot learning process (Baker ‘542 [0056]).
Thus, it would have been obvious to Person Having Ordinary Skill in the Art (PHOSITA) before the effective filing date (EFD) to modify the method of training the NN in the combined cited references to include the one-shot learning in Baker ‘542. Doing so would enable the additional ensemble node in the neural network to “continue[] learning from additional training data items” (Baker ‘542 [0054]).

Regarding claim 44, claim 44 is substantially similar to claim 11 and therefore is rejected on the same ground as claim 11. Claim 44 is a system claim that corresponds to method claim 11.


Claims 12, 13, and 45 are rejected under 35 U.S.C. 103 as being unpatentable over Choi et al. (U.S. Pat. App. Pre-Grant Pub. No. 2016/0155049, hereinafter Choi), Baker (WIPO No. WO 2018/226527, hereinafter Baker), and Hackbarth et al., “Modular Connectionist Structure for 100-Word Recognition” (hereinafter Hackbarth) in view of Baker (WIPO No. WO 2019/005507, hereinafter Baker ‘507).

Regarding claim 12, the rejection of claim 1 is incorporated. Choi teaches:
	The method of claim 1 wherein the target-specific improvement network sub-component comprises: 
	…; and 
	an error-correction node that passes through an activation value from the target node unless certain conditions apply ([0083]-[0084]: describing nodes for propagating weights based on an error calculation to minimize errors, i.e., error correction, with the nodes being nodes in various layers of the neural network ([0077]). Wherein the weights being propagated operate in correlation with activation values of the nodes ([0074]).), wherein the conditions comprise 
	(i) the target node made a - 79 -classification choice ([0084], [0112], [0124], and [0126]: describing output values or classifications that can be made by the NN and its nodes.) and 
(ii) … predicts that the classification choice by the target node is erroneous ([0084]: describing a determination/prediction that an error has occurred when an actual output is different than an expected output.). 


While the cited references in combination teach the above limitations of claim 12, they do not explicitly teach: “an error-prediction node that is trained to detect training data on which the target node makes classification errors” on lines 3-4 and “the error prediction node” on line 7. Baker ‘507 discloses the claim limitations, teaching: a node that can predict the errors related activation functions for related data items in the various layers comprises the node (Baker ‘507 [0020]). Wherein a prediction of the error can be associated with the training data (Baker ‘507 [0042]) which can include classification data (Baker ‘507 [0029]). 
Thus, it would have been obvious to Person Having Ordinary Skill in the Art (PHOSITA) before the effective filing date (EFD) to modify the method of training the NN in the combined cited references to include the one-shot learning in Baker ‘507. Doing so would enable guidance in a neural network that “can be provided by aligning sets of nodes or entire layers in a network being trained with sets of nodes in a reference system. This guidance facilitates the trained network to more efficiently learn features learned by the reference system using fewer parameters and with faster training. The guidance also enables training of a new system with a deeper network, i.e., more layers, which tend to perform better than shallow networks. Also, with fewer parameters, the new network has fewer tendencies to overfit the training data.” (Baker ‘507 Abstract). 

Regarding claim 13, the rejection of claim 12 is incorporated. Choi teaches:
	The method of claim 12, wherein the error-correction node reverses an output of the target node relative to a threshold value when conditions (i) and (ii) apply ([0083]-[0085]: describing error computations and backpropagation through the respective nodes to improve performance of the NN based on a predetermined level.).
Regarding claim 45, claim 45 is substantially similar to claim 12 and therefore is rejected on the same ground as claim 12. Claim 45 is a system claim that corresponds to method claim 12.

Claims 21-23 are rejected under 35 U.S.C. 103 as being unpatentable over Choi et al. (U.S. Pat. App. Pre-Grant Pub. No. 2016/0155049, hereinafter Choi), Baker (WIPO No. WO 2018/226527, hereinafter Baker), and Hackbarth et al., “Modular Connectionist Structure for 100-Word Recognition” (hereinafter Hackbarth) in view of Baker (WIPO No. WO 2018/226492, hereinafter Baker ‘492).

Regarding claim 21, the rejection of claim 1 is incorporated. Choi teaches: 
	The method of claim 1, wherein merging the target-specific improvement network sub- component with the base neural network comprises establishing an incoming connection from a first node in the base network to a first node of the target-specific improvement network sub- component ([0091]-[0093], [0116]-[0117] and [0121]: describing the connection between the new nodes and the selected nodes in the base network.), ….

While the cited references in combination teach the above limitations of claim 21, they do not explicitly teach: “wherein a weight for the incoming connection is initialized to zero prior to training of the expanded network”. Baker ‘492 discloses the claim limitations, teaching: “[a] weight for a new incoming arc may be initially set to zero prior to subsequently training the updated deep neural network” (Baker ‘492 [0223]). 
Thus, it would have been obvious to Person Having Ordinary Skill in the Art (PHOSITA) before the effective filing date (EFD) to modify the method of training the NN in the combined cited references to include the absolute value in Baker ‘492. Doing so would enable techniques to “improve a trained base deep neural network by structurally changing the base deep neural network to create an updated deep neural network, such that the updated deep neural network has no degradation in performance relative to the base deep neural network on the training data. The updated deep neural network is subsequently training.” (Baker ‘492 Abstract). 

Regarding claim 22, the rejection of claim 21 is incorporated. Choi teaches:
The method of claim 21, wherein merging the target-specific improvement network sub- component with the base neural network further comprises establishing an outgoing connection from the first node of the target-specific improvement network sub-component to a second node of the base network ([0091]-[0093], [0116]-[0117] and [0121]: describing the connections between the new nodes and the selected nodes in the base network.), ….

While the cited references in combination teach the above limitations of claim 22, they do not explicitly teach: “wherein a weight for the outgoing connection is initialized to zero prior to training of the expanded network”. Baker ‘492 discloses the claim limitations, teaching: “a weight of the new outgoing arc may be initially set to zero prior to subsequently training the updated deep neural network” (Baker ‘492 [0223]).
Thus, it would have been obvious to Person Having Ordinary Skill in the Art (PHOSITA) before the effective filing date (EFD) to modify the method of training the NN in the combined cited references to include the absolute value in Baker ‘492. Doing so would enable “arc weight is initialized to zero, so there is no immediate change in the activations, so no change in performance” when new node element are added to the NN (Baker ‘492 [0121]).

Regarding claim 23, the rejection of claim 22 is incorporated. Choi teaches:
The method of claim 22, wherein the target node is the second node of the base network, such that there is an outgoing connection from the first node of the target-specific improvement network sub-component to the target node ([0090]-[0093] and [0107]: describing connection of a new node to a selected node.).

Claim 29 is rejected under 35 U.S.C. 103 as being unpatentable over Choi et al. (U.S. Pat. App. Pre-Grant Pub. No. 2016/0155049, hereinafter Choi), Baker (WIPO No. WO 2018/226527, hereinafter Baker), and Hackbarth et al., “Modular Connectionist Structure for 100-Word Recognition” (hereinafter Hackbarth) in view of Baker (WIPO No. WO 2018/231708, hereinafter Baker ‘708).

Regarding claim 29, the rejection of claim 28 is incorporated. The cited references in combination do not explicitly teach: “wherein the node-specific regularization cost comprises an absolute value of a difference between the activation value for the second node for the training datum and the activation value for the target node for the training datum”. Baker ‘708 discloses the claim limitations, teaching: an absolute value of a difference for a gate node that comprises several nodes with their respective activation values (Baker ‘708 [0043]-[0045]). 
Thus, it would have been obvious to Person Having Ordinary Skill in the Art (PHOSITA) before the effective filing date (EFD) to modify the method of training the NN in the combined cited references to include the absolute value in Baker ‘708. Doing so would enable techniques to “improve the robustness of a network that has been trained to convergence, particularly with respect to small or imperceptible changes to the input data. Various techniques … can include adding biases to the input nodes of the network, increasing the minibatch size of the training data, adding special nodes to the network that have activations that do not necessarily change with each data example of the training data, splitting the training data based upon the gradient direction, and making other intentionally adversarial changes to the input of the neural network.” (Baker ‘708 Abstract).

Claims 33 and 34 are rejected under 35 U.S.C. 103 as being unpatentable over Choi et al. (U.S. Pat. App. Pre-Grant Pub. No. 2016/0155049, hereinafter Choi), Baker (WIPO No. WO 2018/226527, hereinafter Baker), and Hackbarth et al., “Modular Connectionist Structure for 100-Word Recognition” (hereinafter Hackbarth) in view of Baker (WIPO No. WO 2018/231708, hereinafter Baker ‘708).

Regarding claim 33, the rejection of claim 32 is incorporated. The cited references in combination do not explicitly teach: “wherein the second datum is selected as datum in the selected set of data that maximizes an absolute value of a difference between the value of the summation of the incoming connections to the target node for the target node and the value of the summation of the incoming connections to the target node for the second datum”. Baker ‘708 discloses the claim limitations, teaching: a selection of values based on an absolute value of a difference for a gate node that comprises several nodes with their respective activation values (Baker ‘708 [0043]-[0045]). Wherein the activation values for the nodes can be summed (Baker ‘708 [0035], [0043], and [0090]). 
Thus, it would have been obvious to Person Having Ordinary Skill in the Art (PHOSITA) before the effective filing date (EFD) to modify the method of training the NN in the combined cited references to include the absolute value in Baker ‘708. Doing so would enable techniques to “improve the robustness of a network that has been trained to convergence, particularly with respect to small or imperceptible changes to the input data. Various techniques … can include adding biases to the input nodes of the network, increasing the minibatch size of the training data, adding special nodes to the network that have activations that do not necessarily change with each data example of the training data, splitting the training data based upon the gradient direction, and making other intentionally adversarial changes to the input of the neural network.” (Baker ‘708 Abstract)

Regarding claim 34, the rejection of claim 33 is incorporated. Choi teaches:
The method of claim 33, wherein adding the target-specific improvement network sub- component further comprises adding a connection from the second node to the target node ([0090]-[0093] and [0107]: describing connection of a new node to a selected node).

Conclusion
The prior art made of record and not relied upon is considered pertinent to Applicant's disclosure:
Jayadeva (U.S. Pat. App. Pre-Grant Pub. No. 2018/0144246)

Any inquiry concerning this communication or earlier communications from the examiner should be directed to SELENE A HAEDI whose telephone number is (571)270-5762. The examiner can normally be reached M-F 11 AM - 7 PM EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, OMAR FERNANDEZ RIVAS can be reached on (571)272-2589. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/SELENE A. HAEDI/Examiner, Art Unit 2128