DETAILED ACTION
This action is in response to claims filed 16 February, 2021 for application 16/334204 filed 18 March, 2019. Currently claims 1-4, 6, 11-15, 17-20, 25, 26, 36, and 44-49 are pending.
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.

Claims 1-4, 6, 11, 18, 19, 25, 26, 44 and 45 are rejected under 35 U.S.C. 103 as being unpatentable over Tomkins et al. (US 2007/0288410) in view of Hill (US 2009/0276385).

Regarding claims 1, 25 and 36, Tomkins discloses: A machine learning computer system comprising: 
a first student machine learning system that comprises a deep neural network that comprises one or more inner layers, wherein the first student machine learning system, using machine learning, automatically learns from and make predictions on input source data (“This invention relates to signal enhancement and transformation.  It is also related to a self-learning method to derive a proper mapping transformation that maps an input signal to an output signal where the output signal is an enhancement or a transformation of the input signal.” [0002],
“The configurations of a neural network records the connections of the weighted synaptic links among nodes and the computational functions of each the nodes in at least one chromosome layer; [0009] 
(c).  performing a first training on the plurality of neural networks by adjusting the weighted synaptic links to learn the mapping transformation using a data set.  The data set comprises a set of the input signals and a set of target signals.  The target signal is obtained from the subject using a value of the parameter different from the input signal; [0010] 
(d).  performing a second training on the plurality of neural networks by modifying the configurations of the plurality of neural networks.” [0011],
note: each neural network is a student machine learning system which is a deep neural networks (Fig 9A&B).); and 
a first learning coach machine learning system that is in communication with the first student machine learning system (fig 2, note: Figure 2 shows the genetic algorithm (coach machine learning system) for learning the structure of the student machine learning systems (neural network)), wherein: 
input to the first learning coach machine learning system comprises values related to learned parameters and activation values of nodes of the one or more inner layers of the deep neural network of the first student machine learning system (“modifying the configurations of the plurality of neural networks by repetitively performing the steps of: 
i. selecting at least one candidate chromosome from the plurality of 
chromosomes according to a pre-specified criteria; 
ii.  generating at least one child chromosome by a genetic operator, and  
iii.  applying at least one global constraint to the child chromosome and repeating steps (i) and 
(ii) if the child chromosome fails to satisfy the at least one constraint 
iv.  so that a plurality of child chromosomes can be generated.  The plurality of child chromosomes defines the configurations of the plurality of neural networks; and 
(g).  repeating steps (e) and (f) for a predetermined number of generations such that in each generation the configuration of each neural network may be altered and selected flexibly by the genetic operator to derive an optimal neural network for the mapping transformation.” [0035-40], 
“Step 39 is to choose the number of nodes in each hidden layer.  In one implementation, the total number of hidden nodes in the entire neural network is restricted to be less than the global parameter S. This restricts the size of the overall neural network such that the computation resources (memory and training time) will not be overly stretched.  Moreover, each hidden node has a plurality of functions selected from a plurality of function categories.  In one embodiment, each hidden node has a transfer function (also known as the activation function) selected from a transfer function category, a bias function selected from a bias function category and a weight function selected from a weight function category.” [0144], note: the genetic algorithm is the learn coach machine learning system, the number of nodes and weight functions are values related to learned parameters and activation functions are activation values); and 
the first learning coach machine learning system, using machine learning, automatically learns and implements an enhancement to the first student machine learning system based on the values related to learned parameters and activation values of nodes of the one or more inner layers of the deep neural network of the first student machine learning system to improve operation of the first student machine learning system [0035-40], Figs. 7&8.



Hill teaches: the first learning coach machine learning system comprises a neural network (“Referring to FIG. 3, a first subset of the weight values shown in FIG. 2 that may be used in a training set for a trainer artificial-neural-network is disclosed.  The phrase "trainer artificial-neural-network" is used herein to refer to an artificial-neural-network that can generate output values to be used as weight values in another artificial-neural-network.  The first subset of the weight values includes the first n weight values of FIG. 2 associated with each connection of the artificial-neural-network 100 and the final (i.e., the T.sup.th) weight value associated with each connection of the artificial-neural-network 100.” [0032]).

Tomkins and Hill are both in the same field of endeavor of using coach machine learning systems to train a neural network and are analogous. Tomkins teaches an evolutionary algorithm approach. Hill teaches a neural network approach wherein a training neural network learns the weights of the student network. It would have been obvious to one of ordinary skill in the art before the effective filing date to substitute the evolutionary algorithm as taught by Tomkins with the training neural network as taught by Hill to yield predictable results. One would have been motivated to combine to have a trained neural network that has the goal of generating weights for another neural network (Hill [0033]).


Regarding claim 2, Tomkins does not explicitly disclose: The machine learning computer system of claim 1, wherein the first learning coach machine learning system comprises a pattern recognition system that recognizes different patterns than the first student machine learning system.

Hill teaches: wherein the first learning coach machine learning system comprises a pattern recognition system that recognizes different patterns than the first student machine learning system (“Referring to FIG. 3, a first subset of the weight values shown in FIG. 2 that may be used in a training set for a trainer artificial-neural-network is disclosed.  The phrase "trainer artificial-neural-network" is used herein to refer to an artificial-neural-network that can generate output values to be used as weight values in another artificial-neural-network.  The first subset of the weight values includes the first n weight values of FIG. 2 associated with each connection of the artificial-neural-network 100 and the final (i.e., the T.sup.th) weight value associated with each connection of the artificial-neural-network 100.” [0032], “The final weight value (i.e., the T.sup.th value) in each sequence of weight values associated with a connection of the artificial-neural-network 100 is mapped to an output of the trainer artificial-neural-network.  The artificial-neural-network 100 should perform best when operated in a feed-forward manner when the weight values for each connection are set to the final weight value of the sequence of weight values generated for that connection during the training of the artificial-neural-network 100.” [0033], see also [0038], note: neural network are pattern recognition systems, the trainer of Hill is recognizing the pattern of weights for a neural network and the student network is recognizing whichever task it is given).

Tomkins and Hill are both in the same field of endeavor of using coach machine learning systems to train a neural network and are analogous. Tomkins teaches an evolutionary algorithm approach. Hill teaches a neural network approach wherein a training neural network learns the weights of the student network. It would have been obvious to one of ordinary skill in the art before the effective filing date to substitute the evolutionary algorithm as taught by Tomkins with the training neural network as taught by Hill to yield predictable results. One would have been motivated to combine to have a trained neural network that has the goal of generating weights for another neural network (Hill [0033]).

Regarding claim 3, Tomkins discloses: The machine learning computer system of claim 1, wherein the first student learning coach machine learning system has a different objective than the first student machine learning system (“training the plurality of neural networks to learn the mapping transformation by adjusting the weight values of the weighted synaptic links so that a fitness score can be optimized.  The fitness score measures the mapping transformation performance of the neural network” [0034], see also [0038]).

claim 4, Tomkins discloses: The machine learning computer system of claim 1, wherein the values related to learned parameters and activation values comprise observed and/or computed values [0035-40], Figure 10, note: values are observed.

Regarding claim 5, Tomkins discloses: The machine learning computer system of claim 1, wherein the first learning coach machine learning system comprises a machine learning architecture that is not a deep neural network  [0035-40], note: coach architecture is a genetic/evolutionary algorithm.

Regarding claim 6, Tomkins discloses: The machine learning computer system of claim 1, wherein the enhancement comprises one or more revised hyperparameters for the deep neural network of the first student machine learning system that improve learning by the deep neural network of the first student machine learning system [0035-40], Figs. 9A&B, note: hyperparameters such as number of nodes, layer position etc. are revised in the cloned and updated child (student) network.

Regarding claim 11, Tomkins discloses: The machine learning computer system of claim 1, wherein the enhancement comprises a structural change to the deep neural network of the first student machine learning system [0035-40], Figs. 9A&B, note: layer position and number of nodes are a structural change.

claim 18, Tomkins discloses: The machine learning computer system of claim 11, wherein: the deep neural network of the first student machine learning system comprises a network with multiple layers, wherein each layer comprises one or more nodes Fig 9a&B; and 
the structural change comprises one or more additional layers to be added to the deep neural network of the first student machine learning system Fig 9a&B.

Regarding claim 19, Tomkins does not explicitly disclose: The machine learning computer system of claim 1, wherein the wherein the enhancement comprises selectively controlling training data input to the first student machine learning system to control the learning of the deep neural network of the first student machine learning system.

Hill teaches: wherein the wherein the enhancement comprises selectively controlling training data input to the first student machine learning system to control the learning of the deep neural network of the first student machine learning system (“The two artificial-neural-networks 100A, 100B are trained using two different training sets.  In particular embodiments, the two artificial-neural-networks 100A, 100B are both trained to work on similar pattern recognition problems.  For example, both artificial-neural-networks 100A, 100B may be trained to work on image recognition problems.  However, the first artificial-neural-network 100A may be trained to recognize a particular image, such as an image of a particular face or an image of a particular military target, for example, and the second artificial-neural-network 100B may be trained to recognize a different particular image, such as an image of a different particular face or an image of a different particular military target.  Similarly, both artificial-neural-networks 100A, 100B may be trained to recognize voice patterns while each artificial-neural-network is trained to recognize a different voice pattern.).

Tomkins and Hill are both in the same field of endeavor of using coach machine learning systems to train a neural network and are analogous. Tomkins teaches an evolutionary algorithm approach. Hill teaches a neural network approach wherein a training neural network learns the weights of the student network. It would have been obvious to one of ordinary skill in the art before the effective filing date to substitute the evolutionary algorithm as taught by Tomkins with the training neural network as taught by Hill to yield predictable results. One would have been motivated to combine to have a trained neural network that has the goal of generating weights for another neural network (Hill [0033]).

Regarding claim 44, Tomkins does not explicitly disclose: The machine learning system of claim 1, wherein the first learning coach machine learning system comprises a pattern recognition system that recognizes patterns of learning performance of a machine learning system 
Hill teaches: wherein the first learning coach machine learning system comprises a pattern recognition system that recognizes patterns of learning performance of a machine learning system (“Referring to FIG. 3, a first subset of the weight values shown in FIG. 2 that may be used in a training set for a trainer artificial-neural-network is disclosed.  The phrase "trainer artificial-neural-network" is used herein to refer to an artificial-neural-network that can generate output values to be used as weight values in another artificial-neural-network.  The first subset of the weight values includes the first n weight values of FIG. 2 associated with each connection of the artificial-neural-network 100 and the final (i.e., the T.sup.th) weight value associated with each connection of the artificial-neural-network 100.” [0032], “The final weight value (i.e., the T.sup.th value) in each sequence of weight values associated with a connection of the artificial-neural-network 100 is mapped to an output of the trainer artificial-neural-network.  The artificial-neural-network 100 should perform best when operated in a feed-forward manner when the weight values for each connection are set to the final weight value of the sequence of weight values generated for that connection during the training of the artificial-neural-network 100.” [0033], see also [0038], note: neural network are pattern recognition systems, the trainer of Hill is recognizing the pattern of weights for a neural network and the student network is recognizing whichever task it is given).

Tomkins and Hill are both in the same field of endeavor of using coach machine learning systems to train a neural network and are analogous. Tomkins teaches an evolutionary algorithm approach. Hill teaches a neural network approach wherein a training neural network learns the weights of the student network. It would have been obvious to one of ordinary skill in the art before the effective filing date to substitute the 

Regarding claim 45, Tomkins discloses: The machine learning system of claim 1, wherein the learned parameters comprise connection weights and biases for the nodes of the one or more inner layers of the deep neural network of the first student machine learning system (“Step 39 is to choose the number of nodes in each hidden layer.  In one implementation, the total number of hidden nodes in the entire neural network is restricted to be less than the global parameter S. This restricts the size of the overall neural network such that the computation resources (memory and training time) will not be overly stretched.  Moreover, each hidden node has a plurality of functions selected from a plurality of function categories.  In one embodiment, each hidden node has a transfer function (also known as the activation function) selected from a transfer function category, a bias function selected from a bias function category and a weight function selected from a weight function category.” [0144].


Claim 12, 13, and 17 is/are rejected under 35 U.S.C. 103 as being unpatentable over Tomkins in view Hill and further in view of Sharma et al. (Constructive Neural Networks: a Review).

Regarding claim 12, Tomkins discloses: The machine learning computer system of claim 11, wherein: the deep neural network of the first student machine learning system comprises multiple layers, wherein each layer comprises one or more nodes (“(d).  performing a second training on the plurality of neural networks by modifying the configurations of the plurality of neural networks.” [0011], Fig 9A&B).
However, Tomkins does not explicitly disclose: and the structural change comprises one or more additional nodes to be added to a selected layer of the deep neural network of the first student machine learning system, wherein the selected layer is one of the one or more inner layers of the deep neural network of the first student machine learning system.

Sharma teaches: structural change comprises one or more additional nodes to be added to a selected layer of the deep neural network of the first student machine learning system, wherein the selected layer is one of the one or more inner layers of the deep neural network of the first student machine learning system (“Constructive algorithm starts with a minimal network architecture and adds layers, nodes and connections during the training, as required by the given problem. The architecture adaptation process is continued till the training algorithm finds a near optimal architecture that gives satisfactory solution of the problem.” P7848 §3 ¶1).

(p7848-9 six motivations 1-6).

Regarding claim 13, Tomkins discloses the learning coach, however, does not explicitly disclose: implements the one or more additional nodes by providing a set of virtual nodes and activation levels for the virtual nodes associated with a particular set of data input values to the first student machine learning system.

Sharma teaches: implements the one or more additional nodes by providing a set of virtual nodes and activation levels for the virtual nodes associated with a particular set of data input values to the first student machine learning system (“Constructive algorithm starts with a minimal network architecture and adds layers, nodes and connections during the training, as required by the given problem. The architecture adaptation process is continued till the training algorithm finds a near optimal architecture that gives satisfactory solution of the problem.” P7848 §3 ¶1, “There are a variety of ways of training the resulting network after each hidden node addition in constructive algorithms. These can be classified into two general methods. The first consists of training the whole network after the addition of a new hidden node. The second consists of only training a newly added node, with the remaining weights frozen.” P7850 ¶3, note: the virtual node is interpreted as the added node until it is fully integrated into the network through training).

Tomkins, Hill and Sharma are in the same field of endeavor of constructing and training neural networks and are analogous. Tomkins teaches an exemplary deep neural networks with a coach machine learning system to assist in construction and training. Sharma teaches a survey of neural network construction methods, particularly adding nodes. It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the machine learning coach that constructs and alters the network architecture as taught by Tomkins and Hill with adding a node as taught by Sharma to yield predictable results. One would have been motivated to modify Tomkins with adding nodes because adding nodes individually has many benefits as stated by Sharma (p7848-9 six motivations 1-6).

Regarding claim 17, Tomkins discloses the learning coach, however, does not explicitly disclose: implements the one or more additional nodes by providing connection weights for the additional nodes of the deep neural network of the first student machine learning system.

Sharma teaches: implements the one or more additional nodes by providing connection weights for the additional nodes of the deep neural network of the first student machine learning system (“There are a variety of ways of training the resulting network after each hidden node addition in constructive algorithms. These can be classified into two general methods. The first consists of training the whole network after the addition of a new hidden node. The second consists of only training a newly added node, with the remaining weights frozen. The method for adding a new hidden node is standard across many constructive algorithms and in general consists of either adding a new hidden node when the error fails to meet a set amount over a given period or testing for some criterion such as a local minimum.” P7850 ¶3).

Tomkins, Hill and Sharma are in the same field of endeavor of constructing and training neural networks and are analogous. Tomkins teaches an exemplary deep neural network with a coach machine learning system to assist in construction and training. Sharma teaches a survey of neural network construction methods, particularly adding nodes. It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the machine learning coach that constructs and alters the network architecture as taught by Tomkins and Hill with adding a node as taught by Sharma to yield predictable results. One would have been motivated to modify Tomkins with adding nodes because adding nodes individually has many benefits as stated by Sharma (p7848-9 six motivations 1-6).

Claims 20 and 49 is/are rejected under 35 U.S.C. 103 as being unpatentable over Tomkins in view of Hill and further in view of Talathi et al. (US 2016/0224903).

Regarding claim 20, Tomkins discloses: The machine learning computer system of claim 1, wherein: the machine learning system comprises a computer network that comprises: 
a first computer system that comprises at least one processor and high-speed memory (“In one implementation, the present invention can be implemented as a software application that runs on an exemplary data processing system 800 as shown in FIG. 11.  In the present embodiment, the data processing system 800 is a single processor personal computer.  In alternative embodiments, this data processing device is a computer server, an embedded system, a multi-processor machine, a grid computer, or an equivalent computer system thereof.  The hardware components in the present embodiment further comprises a Central Processing Unit (CPU) 810, memory 811, storage 812, and external interface module 813 which serves to communicate with external peripherals.” [0184]); and 
remote secondary storage that is in communication with the first computer system  [0184].
However, Tomkins does not explicitly disclose: connection weights and activations for the first student machine learning system are stored in the high speed memory so that the first student machine learning system can be run when the first student machine learning system is active; and 


Talathi teaches: connection weights and activations for the first student machine learning system are stored in the high speed memory so that the first student machine learning system can be run when the first student machine learning system is active (“By way of example, a software module may be loaded into RAM from a hard drive when a triggering event occurs.  During execution of the software module, the processor may load some of the instructions into cache to increase access speed.  One or more cache lines may then be loaded into a general register file for execution by the processor.” [0138]); and 
the connection weights and activations for the first student machine learning system are stored in the secondary storage when the first student machine learning system is not active (“By way of example, a software module may be loaded into RAM from a hard drive when a triggering event occurs.  During execution of the software module, the processor may load some of the instructions into cache to increase access speed.  One or more cache lines may then be loaded into a general register file for execution by the processor.” [0138]).
Tomkins, Hill and Talathi are both in the same field of endeavor of neural networks and are analogous. Tomkins teaches an exemplary deep neural network with a coach machine learning system to assist in construction and training. Talathi teaches hyper parameter selection for deep neural networks implemented with high speed (Talathi [0138]).

Regarding claim 49, Tomkins does not explicitly disclose: The machine learning system of claim 6, wherein the revised hyperparameter comprises a hyperparameter selected from the group consisting of a mini batch size for the first student machine learning system, a learning rate for the first student machine learning system, a regularization parameter for the first student machine learning system and a momentum parameter for the first student machine learning system.

Talathi teaches: wherein the revised hyperparameter comprises a hyperparameter selected from the group consisting of a mini batch size for the first student machine learning system [0057], a learning rate for the first student machine learning system [0053], a regularization parameter for the first student machine learning system [0055]and a momentum parameter for the first student machine learning system [0054].

Tomkins, Hill and Talathi are both in the same field of endeavor of neural networks and are analogous. Tomkins teaches an exemplary deep neural network with (Talathi [0031]).


Claim 26 is/are rejected under 35 U.S.C. 103 as being unpatentable over Tomkins in view of Hill and further in view of Thieberger (US 2013/0103624).

Regarding claim 26, Tomkins does not explicitly disclose: The machine learning computer system of claim 4, wherein the computed values comprise partial derivatives for the nodes of the one or more inner layers of the deep neural network of the first student machine learning system.

Thieberger teaches: wherein the computed values comprise partial derivatives for the nodes of the one or more inner layers of the deep neural network of the first student machine learning system (“Optionally, the model comprises information derived from the analysis of the importance and/or contribution of some of the variables to the predicted response.  For example, by utilizing methods such as computing the partial derivatives of the output neurons in the neural network, with respect to the input neurons.” [0418], see also [0419], “FIG. 13 illustrates one embodiment of a system that trains a machine learning-based situation predictor.  The sample generator 697 is configured to receive temporal windows token instances 693 and possibly other inputs 695 such as information regarding a baseline values.  The sample generator 697 produces samples 702 corresponding to the temporal windows of token instances 693 and the other inputs 695; the samples 702 are provided to a machine learning classifier trainer 732, which also receives situation identifiers 730 that serve as target values for the sample.  The trainer 730 utilizes a machine learning model training algorithm to train situation classifier model 734 (e.g., the model may be for a classifier such as a decision tree, neural network, or a naive Bayes classifier).” [0279]).

Tomkins, Hill and Thieberger are both in the same field of endeavor of trainers for neural networks and are analogous. Tomkins teaches an exemplary deep neural networks with a coach machine learning system to assist in construction and training. Thieberger teaches training neural networks utilizing partial derivatives. It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the machine learning coach that constructs and alters the network architecture as taught by Tomkins and Hill with the trainer utilizing partial derivatives as taught by Thieberger to yield predictable results.













Claim 46-48 is/are rejected under 35 U.S.C. 103 as being unpatentable over Tomkins in view of Hill and further in view of Bacchiani et al. (US 2015/0127327).

Regarding claim 46, Tomkins does not explicitly disclose: The machine learning computer system of claim 1, wherein the first learning coach machine learning system comprises a deep neural network.
Bacchiani teaches: wherein the first learning coach machine learning system comprises a deep neural network (Fig 1, Deep Neural Network 130).

Tomkins, Hill and Bacchiani teach machine learning systems for training a neural network. Tomkins teaches a genetic algorithm and Hill teaches a neural network for training other neural networks. Bacchiani teaches a second deep neural network. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the neural network coach of Tomkins and Hill with the deep neural network as taught by Bacchiani to yield predictable results. 

Regarding claim 47, The machine learning computer system of claim 1, wherein the first learning coach machine learning system models learning performance of the first student machine learning system as a regression.

Bacchiani teaches: wherein the first learning coach machine learning system models learning performance of the first student machine learning system as a regression (“In some implementations, decision trees can be constructed using various techniques, including, for example, the technique described in the publication: P. Chou, "Optimal partitioning for classification and regression trees," IEEE Trans.  on Pattern Analysis and Machine Intelligence, vol. 13, no. 4, pp.  340-354, 1991, the content of which is hereby incorporated by reference.  For example, starting from a single node tree for each CI state n, leaf nodes that have the largest likelihood gain under a Gaussian model are greedily split.  Potential new left and right child nodes can be defined by a Gaussian distribution such that any leaf in the tree is characterized, for example, by a Gaussian centroid model.  The system can be made efficient by assuming a Gaussian distribution for the activation vectors, because such an assumption allows the use of sufficient statistics to implement the process.  Potential splits can be evaluated by partitioning the activation vectors in a parent node into left and right child sets and computing the likelihood gain from modeling the activation vectors in the left/right child partitioning (rather than, for example, jointly using the parent distribution).” [0029]).

Tomkins, Hill and Bacchiani teach machine learning systems for training a neural network. Tomkins teaches a genetic algorithm and Hill teaches a neural network for training other neural networks. Bacchiani teaches a second deep neural network. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the neural network coach of Tomkins and Hill with the deep neural network as taught by Bacchiani to yield predictable results.



Bacchiani teaches: wherein the first learning coach machine learning system models learning performance of the first student machine learning system as a classification task.  (“The hidden layers 133 can be configured to compute weighted sums of the activation vectors received from the connections and a bias, and output activation vectors based on a non-linear function applied to the sums.  In some implementations, the output softmax layer 135 can be configured to produce network output for N classes, wherein the n-th class probability for the t-th acoustic input observation vector x(t) is defined as: (Equation 1)
where w.sub.n denotes the weight vector associated with the output neuron of the neural network 130 for the n-th class, b.sub.n denotes the bias of that neuron, a(t) denotes the activation output from the final hidden layer (i.e., the one connected as an input to the softmax layer 135) for the t-th input pattern and [.].sup.T denotes a transposition.” [0025])

Tomkins, Hill and Bacchiani teach machine learning systems for training a neural network. Tomkins teaches a genetic algorithm and Hill teaches a neural network for training other neural networks. Bacchiani teaches a second deep neural network. It would have been obvious to one of ordinary skill in the art before the effective filing date .

Response to Arguments
Applicant’s arguments, see pp 8-10, filed 16 February, 2021, with respect to the rejection(s) of claim(s) 1-6,11,18,19, 25, under 35 USC 102 have been fully considered and are persuasive.  Therefore, the rejection has been withdrawn.  However, upon further consideration, a new ground(s) of rejection is made in view of Tomkins and Hill.
Applicant's arguments filed 16 February, 2021 have been fully considered but they are not persuasive.
Regarding claim 6, Applicant argues that hyperparameters do not include parameters of the network structure such as number and positions of nodes. Examiner respectfully disagrees. The Wikipedia page cited by Applicant provides support for Examiner’s interpretation. “The time required to train and test a model can depend upon the choice of its hyperparameters.[1] A hyperparameter is usually of continuous or integer type, leading to mixed-type optimization problems.[1] The existence of some hyperparameters is conditional upon the value of others, e.g. the size of each hidden layer in a neural network can be conditional upon the number of layers.[1]” (Emphasis added). See also cited art Talathi [0033-57]. The number of nodes is also a setting before the creation of the network that can be tuned through hyperparameter tuning equivalent to the revising in claim 6 by adding or removing nodes. The specific hyperparameters given are not present in claim 6 and are rejected in view of Talathi in claim 49. 
.

Allowable Subject Matter
Claims 14 and 15 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
Specifically, none of the prior fairly discloses, either alone or in combination: 
	From claim 14: 
the existing nodes do not back-propagate to virtual nodes during training; 
the virtual nodes back-propagate to a layer of the deep neural network of the first student machine learning system below the selected layer; and 
activations of the virtual nodes are controlled by the first learning coach machine learning system.

From claim 15:
a first learning coach machine learning system controls a regularization to the second set of nodes so differences between activation values for the second set of nodes and activation values for the first set of nodes is less than a threshold value to control a drop-out rate of the nodes in the first and second sets.


Conclusion
                                                                                                                                                                                    
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ERIC NILSSON whose telephone number is (571)272-5246.  The examiner can normally be reached on M-F: 9-5.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kakali Chaki can be reached on (571)-272-3719.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/ERIC NILSSON/Primary Examiner, Art Unit 2122