PNG
    media_image1.png
    172
    172
    media_image1.png
    Greyscale
United States Patent and Trademark Office
    
        
            
                                
            
        
    

Commissioner for Patents
United States Patent and Trademark Office
P.O. Box 1450
Alexandria, VA 22313-1450
www.uspto.gov











BEFORE THE BOARD OF PATENT APPEALS 
AND INTERFERENCES


Application Number: 15976514
Filing Date: 5/10/2018
Appellant(s): Ge Yang, Nicolo Fusi, and Francesco Paolo Casale



__________________
Jose M. Nunez (Reg. No. 59,979)
For Appellant


EXAMINER’S ANSWER





This is in response to the appeal brief filed 7/25/2022.


Grounds of Rejection to be Reviewed on Appeal
Every ground of rejection set forth in the Office action dated March 2nd, 2022 from which the appeal is taken is being maintained by the examiner except for the grounds of rejection (if any) listed under the subheading “WITHDRAWN REJECTIONS.” 
(2)  Response to Argument
1) Rejection of claim 1 under 35 U.S.C. 102(a)(1) as being anticipated by Brock.
Issue 1: the Appellant argues Brock fails to disclose "accessing a machine learning problem space associated with a machine learning problem and a plurality of untrained candidate neural networks for solving the machine learning problem" (Pages 7-9 of Appeal Brief). This argument is not persuasive.
Brock clearly discloses "accessing a machine learning problem space associated with a machine learning problem and a plurality of untrained candidate neural networks for solving the machine learning problem, wherein the machine learning problem space comprising data to be processed by a trained network”. First, Brock discloses a machine learning problem space which  is directed to selectively finding suitable untrained candidate neural network architectures using a HyperNet-based auxiliary neural network (lines 1-9 of algorithm 1) for subsequent training and evaluation (line 10 of algorithm 1 and Brock [p. 3, Section 3] “We hypothesize that so long as the HyperNet learns to generate reasonable weights, the validation error of networks with generated weights will correlate with the performance when using normally trained weights, with the difference in architecture being the primary factor of variation”.)  Second, Brock discloses that the problem space comprises an “original training set” (data in the problem space) that is partitioned to form a training set and a (held-out) validation set. The validation set (also data in the problem space) is processed by a neural network with a selected architecture that has been trained over 30 epochs (at the last step of Algorithm 1) after the HyperNet-based auxiliary neural network (SMASH network) has learned to predict that architecture also using the original training set (See [page 5 of the 2 March 2022 FOA and Brock, p. 1, Section 1, and Brock, p. 5, Section 4.1], (see [Brock, p. 5, Section 4.1] “First, we train a SMASH network for 300 epochs on CIFAR-100, using a standard annealing schedule [15], then sample 250 random architectures and evaluate their SMASH score on a held-out validation set formed of 5,000 random examples from the original training set. We then sort the architectures by their SMASH score and select every 5th architecture for full training and evaluation, using an accelerated training schedule of 30 epochs.”) Accordingly, Brock meets the claimed limitation.

Issue 2: the Appellant argues Brock fails to disclose "computing, for each untrained candidate neural network, at least one expressivity measure based on data of the machine learning problem space, the expressivity measure capturing an expressivity of the candidate neural network with respect to the machine learning problem, the expressivity measure being computed without training the candidate neural network" (Pages 9-11 of Appeal Brief). This argument is not persuasive.
Brock clearly discloses "computing, for each untrained candidate neural network, at least one expressivity measure based on data of the machine learning problem space, the expressivity measure capturing an expressivity of the candidate neural network with respect to the machine learning problem, the expressivity measure being computed without training the candidate neural network". Brock teaches the computation of weights (expressivity measures, H(c)) at line 4 of algorithm 1 which are used to characterize a candidate neural network architecture (untrained neural network) which is not trained until the last step of algorithm 1.  The original training set data, as taught in Brock, serves multiple purposes (see arguments addressed in Issue 1 above), including that of providing the validation data (for scoring the HyperNet-generated architecture candidates as well for the fully trained neural network selected from those candidate architectures), as well as that of providing the specific training data used for training either the HyperNet-based auxiliary neural network (i.e., learning H in algorithm 1) or the neural network with a selected candidate architecture (i.e., the “neural network” recited in the claims).  Accordingly, Brock meets the claimed limitation.

Issue 3: the Appellant argues Brock fails to disclose "computing, for each untrained candidate neural network, at least one trainability measure based on data of the machine learning problem space, the trainability measure capturing a trainability of the candidate neural network with respect to the machine learning problem" (Pages 11-12 of Appeal Brief). This argument is not persuasive.
Brock clearly discloses "computing, for each untrained candidate neural network, at least one trainability measure based on data of the machine learning problem space, the trainability measure capturing a trainability of the candidate neural network with respect to the machine learning problem". Brock teaches the computation of trainability measures (training and validation errors) at lines 5 and 8 of algorithm 1. These error/trainability measures are used to characterize the performance of the HyperNet-based auxiliary neural network for predicting a candidate neural network architecture.  The neural network with the candidate architecture is not trained until the last step of algorithm 1 (see arguments addressed in Issue 2 above).  Accordingly, Brock meets the claimed limitation.

Issue 4: the Appellant argues Brock fails to disclose "selecting, based on the at least one expressivity measure and the at least one trainability measure, at least one candidate neural network for solving the machine learning problem" (Pages 13-14 of Appeal Brief). This argument is not persuasive.
Brock clearly discloses "selecting, based on the at least one expressivity measure and the at least one trainability measure, at least one candidate neural network for solving the machine learning problem". The candidate neural network (c in Algorithm 1) is selected at line 8 of algorithm 1 based on the error (trainability measure, see arguments addressed in Issue 3 above) and on the weights (expressivity measure, see arguments addressed in Issue 2 above) (error, H(c) in algorithm 1, especially at line 8). Accordingly, Brock meets the claimed limitation.

2) Rejection of claim 6 under 35 U.S.C. 102(a)(1) as being anticipated by Brock.
Issue 1: the Appellant argues Brock fails to disclose "wherein selecting the at least one candidate neural network for solving the machine learning problem comprises: selecting the at least one candidate neural network having the at least one expressivity measure exceeding a threshold and the at least one trainability measure within a range" (Pages 14-16 of Appeal Brief). This argument is not persuasive.
Brock clearly discloses "wherein selecting the at least one candidate neural network for solving the machine learning problem comprises: selecting the at least one candidate neural network having the at least one expressivity measure exceeding a threshold and the at least one trainability measure within a range".  First, Brock teaches that the number of weights in any candidate architecture is finite (i.e., it is specifically limited by the constraints on the architecture); consequently, this set of weights must have a minimum value which determines a threshold that each weight exceeds (see, for example, Brock [p. 5, Section 3.2]  We make the spatial extent of c some fraction k of the size of W, and place k units at the output of the HyperNet, then reshape the resulting 1 × k × height × width tensor to the required size of W.)  The claim does not recite any “checking” functionality associated with the threshold. Second, the error (trainability) measures implicitly must span a range of values (which inherently also includes between “0 and infinity”). In addition, the candidate neural network architectures are scored at the end (also forming an ordered range of finite non-negative values) from which the candidate architectures are selected (see Brock at p. 5, Section 4.1 We then sort the architectures by their SMASH score and select every 5th architecture for full training and evaluation, using an accelerated training schedule of 30 epochs.). Moreover, Figure 4 discloses a range of values for the error (trainability) measure over a set of candidate architectures. The claim does not recite any “checking” functionality associated with the range. Accordingly, Brock meets the claimed limitation.

3) Rejection of claim 2 under 35 U.S.C. 103(a) as being unpatentable over Brock in view of Hamel.
Issue 1: the Appellant argues Brock fails to disclose "wherein the at least one expressivity measure represents a measure separation of samples from the machine learning problem space" (Pages 17-18 of Appeal Brief). This argument is not persuasive.
Brock and Hamel clearly disclose " the system, wherein the at least one expressivity measure represents a measure of separation of samples from the machine learning problem space".   First, Brock teaches that the weights/expressivity represent a measure of a correspondence between candidate architectures in an embedding space of the HyperNet-based auxiliary network and the data (from the “original training set”). Both the embedded architecture representations and the data are samples in the problem space. Specifically, the HyperNet-based auxiliary neural network learns a transform from the embedded architecture space to the weights (weight space) so that as the auxiliary neural network is trained, the weights change in order to adapt the architecture (embedded representation samples) to the data of the problem space (original training set samples) (see, Brock [p. 2, Section 2] “In our case we learn a transform from a binary encoding of an architecture to the weight space, rather than learning to adapt weights based on the model input.” and  [p. 5, Section 3.2] “A HyperNet [12] is a neural net used to parameterize the weights of another network, the main network. For a Static HyperNet with parameters H, the main network weights W are some function (e.g. a multilayer perceptron) of a learned embedding z, …. For a Dynamic HyperNet, the weights W are generated conditioned on the network input x, ….We propose a variant of a Dynamic HyperNet which generates the weights W based on a tensor encoding of the main network architecture c. Our goal is to learn a mapping W = H(c) that is reasonably close to the optimal W for any given c, such that we can rank each c based on the validation error using HyperNet-generated weights.”)
Second, Hamel discloses that the expressivity measure that represent a measure associated with samples from the machine learning problem space, specifically represents a measure of separation of those samples at [p. 4, Section 6.1 and p. 3, Section 3.2]. Accordingly, Brock and Hamel meet the claimed limitation.



(11) Related Proceeding(s) Appendix
No decision rendered by a court or the Board is identified by the examiner in the Related Appeals and Interferences section of this examiner’s answer.
For the above reasons, it is believed that the rejections should be sustained.
Respectfully submitted,
 /R.L.K./             Examiner, Art Unit 2124                                                                                                                                                                                         
Conferees:
/MIRANDA M HUANG/            Supervisory Patent Examiner, Art Unit 2124                                                                                                                                                                                            
/RYAN M STIGLIC/            Primary Examiner 


Requirement to pay appeal forwarding fee.  In order to avoid dismissal of the instant appeal in any application or ex parte reexamination proceeding, 37 CFR 41.45 requires payment of an appeal forwarding fee within the time permitted by 37 CFR 41.45(a), unless appellant had timely paid the fee for filing a brief required by 37 CFR 41.20(b) in effect on March 18, 2013.