Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
EXAMINER’S AMENDMENT
An examiner’s amendment to the record appears below. Should the changes and/or additions be unacceptable to applicant, an amendment may be filed as provided by 37 CFR 1.312. To ensure consideration of such an amendment, it MUST be submitted no later than the payment of the issue fee.
Authorization for this examiner’s amendment was given in an interview with Mark G. Knedeisen on 10/18/2021.
The application has been amended as follows. Claim 22 has been amended to provide clarity in regards to how the machine classifier is trained. 
1. 	(Original) A method for building a machine learning ensemble, the method comprising:
selecting, by a computer system, n selected network elements of a base machine-learning network, where n > 1; 
making, by the computer system, M copies of the base machine-learning network, wherein the value of M is greater than or equal to 2, and less than or equal to 2n ; 
wherein prior to making the M copies of the base machine-learning network, iteratively for each training data item in an initial set of training data items: 
computing, by a computer system, in a forward computation through the base machine- learning network, an activation value for each non-input layer node of the base machine-learning network; and 

for each non-input layer node, a partial derivative for an objective of the base machine-learning network with respect to the activation value for the non- input layer node; and 
for each directed arc in the base machine-learning network, a partial derivative for the objective with respect to a weight parameter for the directed arc; 
training, by the computer system, each of the M copies of the base machine-learning network such that each of the M copies of the base machine-learning network is trained to change its learned parameters in a different direction than any of the other M copies, wherein training each of the M copies of the base machine-learning network comprises training, by the computer system, the m-th copy of the base network, where m=1,..., M, with a m-th set of training data items, wherein the m-th set of training data items comprises each training data item in an initial set of training data items where there is agreement between a value of a k-th bit of an n-bit Boolean vector and a sign for the kth selected network element of the n selected network elements of the base network for the training data item, for all k=1,..., n, where:  
if the k-th selected network element is a node, the value of the k-th bit of the n-bit Boolean vector is compared to the sign of the partial derivative for the objective of the base network with respect to the activation value for the node to determine agreement; and 

combining, by the computer system, the M copies of the base machine-learning network into an ensemble.  
2. 	(Original) The method of claim 1, wherein the base machine-learning network comprises a base neural network.  
3. 	(Original) The method of claim 2, wherein: the base neural network comprises a plurality of nodes and plurality of directed arcs; each directed arc is between two nodes of the base neural network; and the n selected network elements of the base machine-learning network comprise s nodes of the base neural network and t directed arcs of the base neural network, where s and t are integers greater than or equal to zero, and where s + t = n.  
4. 	(Original) The method of claim 2, wherein the base neural network comprises a base deep neural network.  
5. 	(Original) The method of claim 4, wherein the base deep neural network comprises a base feed forward deep neural network.  
6-7. 	(Canceled)  
8. 	(Original) The method of claim 1, wherein the selected network elements of the base machine- learning network are selected by a machine-learning learning coach.  
9.	 (Original) The method of claim 1, further comprising, after combining the base machine-learning network and the M copies of the base machine-learning network into the ensemble, training, by the computer system, the ensemble with a joint optimization network.  

11. 	(Original) The method of claim 10, where the M subsets of training data comprise M unique subsets of training data.  
12. 	(Original) The method of claim 11, wherein the M unique subsets of training data comprise M disjoint sets of training data.  
13. 	(Original) The method of claim 10, wherein there is an upper limit F on the number of M subsets of training data on which every training data example in the initial set of training data can be included, such that no training data examples in the initial training set may be placed into more than F of the M subsets.  
14-15.	(Canceled)  
16. 	(Canceled)  
17. 	(Original) The method of claim 1, wherein training each of the M copies of the base machine-learning network comprises: training, by the computer system, a machine-learning classifier, to classify each training data item into at least one of two or more classification categories, wherein training the machine- learning classifier comprises using partial derivatives computed in the back-propagation computation through the base network as input variables; partitioning, by the computer system, the training data items into subsets of training data items based on the classification categories; and training, by the computer system, each of the M copies of the base machine-learning network with one of the subsets of training data items.  
18. 	(Canceled)  

20. 	(Original) The method of claim 19, wherein the distance measure is computed using a formula that comprises a hyperparameter, wherein the hyperparameter is a relative weight given to the distance measure compared to a weight given to a difference in signs of partial derivatives for the pairs of training data items 
21.	 (Original) The method of claim 17, wherein the machine-learning classifier comprises a classifier form selected from the group consisting of a decision tree, a neural network and a clustering algorithm.  
22. 	(Currently Amended) The method of claim 17, wherein training the machine-learning classifier [[with]] is trained through supervised learning.  
23. 	(Original) A computer system for building a machine learning ensemble, the computer system comprising one or more processing units that are programmed to: select n selected network elements of a base machine-learning network, where n > 1; 
make M copies of the base machine-learning network, wherein the value of M is greater than or equal to 2, and less than or equal to 2n; 
wherein prior to making the M copies of the base machine-learning network, iteratively for each training data item in an initial set of training data items: 
compute, in a forward computation through the base machine-learning network, an activation value for each non-input layer node of the base machine-learning network; and 

for each non-input layer node, a partial derivative for an objective of the base machine-learning network with respect to the activation value for the node; and 
for each directed arc in the base machine-learning network, a partial derivative for the objective with respect to a weight parameter for the directed arc; 
train each of the M copies of the base machine-learning network such that each of the M copies of the base machine-learning network is trained to change its learned parameters in a different direction than any of the other M copies, wherein the one or more processing units are programmed to train each of the M copies of the base machine-learning network by training the m-th copy of the base network, where m=1,..., M, with a m-th set of training data items, wherein the m-th set of training data items comprises each training data item in an initial set of training data items where there is agreement between a value of a k-th bit of an n-bit Boolean vector and a sign for the kth selected network element of the n selected network elements of the base network for the training data item, for all k=1,..., n, where: 
if the k-th selected network element is a node, the value of the k-th bit of the n-bit Boolean vector is compared to the sign of the partial derivative for the objective of the base network with respect to the activation value for the node to determine agreement, and 

combine the M copies of the base machine-learning network into an ensemble.   
24. 	(Original) The computer system of claim 23, wherein the base machine-learning network comprises a base neural network.  
25. 	(Original) The computer system of claim 24, wherein: the base neural network comprises a plurality of nodes and plurality of directed arcs; each directed arc is between two nodes of the base neural network; and the n selected network elements of the base machine-learning network comprise s nodes of the base neural network and t directed arcs of the base neural network, where s and t are integers greater than or equal to zero, and where s + t = n.  
26. 	(Original) The computer system of claim 24, wherein the base neural network comprises a base deep neural network.  
27. 	(Original) The computer system of claim 26, wherein the base deep neural network comprises a base feed forward deep neural network.  
28-29.  (Canceled)  
30. 	(Original) The computer system of claim 23, wherein the computer system comprises a machine learning coach that selects the n selected network elements of the base machine-learning network.  
31. 	(Original) The computer system of claim 23, wherein the one or more processing units are further programmed to, after combining the base machine-learning network and the M copies 
32. 	(Original) The computer system of claim 23, wherein the one or more processing units are further programmed to train each of the M copies of the base machine-learning network by: partitioning a initial set of training data for the M copies into M subsets of training data; and training each of the M copies on a separate subset of training data.  
33. 	(Original) The computer system of claim 32, where the M subsets of training data comprise M unique subsets of training data.  
34. 	(Original) The computer system of claim 33, wherein the M unique subsets of training data comprise M disjoint sets of training data.  
35. 	(Original) The computer system of claim 32, wherein there is an upper limit F on the number of M subsets of training data on which every training data example in the initial set of training data can be included, such that no training data examples in the initial training set may be placed into more than F of the M subsets.  
36-37.  (Canceled)  
38. 	(Canceled)  
39. 	(Original) The computer system of claim 23, wherein the one or more processing units are further programmed to train each of the M copies of the base machine- learning network by: training a machine-learning classifier, to classify each training data item into at least one of two or more classification categories, wherein training the machine-learning classifier comprises using partial derivatives computed in the back-propagation computation through the base network as input variables; partitioning the training data items into subsets of training data items 
40. 	(Canceled)  
41. 	(Original) The computer system of claim 39, wherein the machine-learning classifier is trained to classify data items to the two or more classification categories based on a distance measure between pairs of training data items.  
42. 	(Original) The computer system of claim 41, wherein the distance measure is computed using a formula that comprises a hyperparameter, wherein the hyperparameter is a relative weight given to the distance measure compared to a weight given to a difference in signs of partial derivatives for the pairs of training data items 
43. 	(Original) The computer system of claim 39, wherein the machine-learning classifier comprises a classifier form selected from the group consisting of a decision tree, a neural network and a clustering algorithm.  
44. 	(Original) The computer system of claim 39, wherein the machine-learning classifier is trained through supervised learning.   
Allowable Subject Matter
Claims 1-5, 8-13, 17, 19-27, 30-35, 39, and 41-44 are allowed.
Reasons for Allowance
The following is an examiner’s statement of reasons for allowance: Claims 1 and 23
are considered allowable after finding when reading the claims in light of the specification as per
MPEP § 2111.01, none of the references of record either alone or in combination fairly disclose
or suggest the combination of limitations specified in the independent claims, including at least:

1 and 23:
…
wherein training each of the M copies of the base machine-learning network comprises training, by the computer system, the m-th copy of the base network, where m=1,..., M, with a m-th set of training data items, wherein the m-th set of training data items comprises each training data item in an initial set of training data items where there is agreement between a value of a k-th bit of an n-bit Boolean vector and a sign for the kth selected network element of the n selected network elements of the base network for the training data item, for all k=1,..., n, where: -2-Serial No. 16/609,130 
 if the k-th selected network element is a node, the value of the k-th bit of the n-bit Boolean vector is compared to the sign of the partial derivative for the objective of the base network with respect to the activation value for the node to determine agreement; and 
if the k-th selected network element is a directed arc, the value of the k-th bit of the n-bit Boolean vector is compared to the sign of the partial derivative with respect to the weight parameter for the directed arc to determine agreement;
…

The closest prior art of record that touches upon the recited limitation is Kong et al. "Error-correcting output coding corrects bias and variance," which teaches the error-correcting output (EEOC) technique in which k distinct binary strings of length L are constructed to turn a k-way classification problem into a set of binary classification problems where the L binary classification functions (i.e. ensembles) are constructed and used to compute a vector of binary decisions. The class (i.e., codeword) with the closest hamming distance to the vector of binary decisions is the predicted class (See Kong, pg. 2 right-column). However, Kong does not teach the limitation of: if the k-th selected network element is a node, the value of the k-th bit of the n-bit Boolean vector is compared to the sign of the partial derivative for the objective of the base network with respect to the activation value for the node to determine agreement; and if the k-th 
However, the examiner has found that the distinct feature of the applicant’s claimed
invention over the prior art is the explicit claiming of the aforementioned limitations as specified
in independent Claims 1 and 23 in combination with all the other limitations recited therein.
When taken in context, the claims as a whole were not uncovered in the prior art, i.e.
dependent Claims 2-5, 8-13, 17, 19-22, 24-27, 30-35, 39, and 41-44 are allowed as they depend upon an
allowable independent claim.
Any comments considered necessary by applicant must be submitted no later than the
payment of the issue fee and, to avoid processing delays, should preferably accompany the issue
fee. Such submissions should be clearly labeled “Comments on Statement of Reasons for
Allowance.”
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Adam Clark Standke whose telephone number is (571)270-1806. The examiner can normally be reached 10AM-9PM M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
 can be reached on (303) 297-4307. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
Adam Clark Standke
Assistant Examiner
Art Unit 2129



/MICHAEL J HUNTLEY/Supervisory Patent Examiner, Art Unit 2129