DETAILED ACTION
This final rejection is responsive to amendments and remarks filed 09 November 2021.
Claims 1-2, 6-7, 9-10, 14-16, and 20 are amended. No claims have been added, cancelled, or withdrawn. Therefore, claims 1-20 are presently pending.

Response to Arguments
In view of the amendments, the previous 112(b) rejection to claim 6 is withdrawn. 
Applicant's arguments filed with respect to the claims under 35 USC 102 have been fully considered but they are not persuasive.
The Applicant first argues that Brock does not teach the amended limitation of “computing, for each untrained neural network, at least one expressivity measure…,” and that the “present claims calculate the expressivity measure based on the data to be processed by the trained neural network, which is different from the weights resulting from training a neural network” (Remarks, pp. 8-9). The Applicant also argues that “the trainability measure is based on the data of the machine learning problem space, which is different from the error obtained during training of the neural network” (Remarks, p. 9).
The Examiner respectfully disagrees. Independent claim 1 recites multiple instances of “data of the machine learning problem space,” but the claim does not recite that each instance refers to the same “data.” Under broadest reasonable interpretation, the “data to be processed by a trained neural network” may differ from the data recited in the limitations “at least one expressivity measure based on data of the machine learning problem space” and “at least one trainability measure based on data of the machine learning problem space” (see claim 1). The updated rejection below reflects Brock’s disclosure of the amended limitations. 
The Applicant further argues that the “cited art does not teach selecting the at least one candidate neural network for solving the machine learning problem [comprising] selecting the at least one candidate neural network having the at least one expressivity measure exceeding a threshold and the at least one trainability measure within a range” (Remarks, p. 10).
The Examiner respectfully disagrees with this argument, though the Examiner does agree that the Examiner misinterpreted the cited art. However, it still stands that the weights in Brock (used to disclose the claimed “at least one expressivity measure”) exceed a threshold. Under broadest reasonable interpretation, the claimed threshold could be any value, because it is not recited in the claims or explained in the Specification how this threshold is determined or if it even predetermined. In response to applicant's argument that the references fail to show certain features of applicant’s invention, it is noted that the features upon which applicant relies (i.e., “a check must be made to determine that the threshold is exceeded”) are not recited in the rejected claim(s).  Although the claims are interpreted in light of the specification, limitations from the specification are not read into the claims.  See In re Van Geuns, 988 F.2d 1181, 26 USPQ2d 1057 (Fed. Cir. 1993).
The Applicant applies the same argument to the claimed range (Remarks, p. 11). The Examiner also agrees that they have misinterpreted the range of values of the training error disclosed in Brock. However, it still stands that the training error in Brock (used to disclose the claimed “trainability measure”) fall within a range. Under broadest reasonable interpretation, the claimed “range” could represent any range of values, because it is not recited in the claims or explained in the Specification how the range is determined.
Applicant's arguments filed with respect to the claims under 35 USC 103 have been fully considered but they are not persuasive.
The Applicant argues that “Brock cannot be combined with Hamel as discussed by the Office, because the references teach different things” (Remarks, p. 12). 
The Examiner respectfully disagrees. Brock discloses that, for “a Static HyperNet with parameters H, the main network weights W are some function (e.g. a multilayer perceptron) of a learned embedding z” (Brock, p. 5, Section 3.2). In light of this disclosure, both Brock and Hamel’s disclosure of a measure of separation are directed to embeddings in a space.

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.

Claim(s) 1, 6-9, 14-15, and 20 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Brock et al. (“SMASH: One-Shot Model Architecture Search through HyperNetworks”, In Journal of Computing Research Repository, August 2017, pp. 1-21) (“Brock”).
Regarding claim 1, Brock teaches a system comprising: 
processing hardware; and 
a memory storing instructions which cause the processing hardware to perform operations (Brock, p. 5, Section 4, “Our code is written in PyTorch [24] to leverage dynamic graphs, and explicitly defines each sampled network in line with the memory-bank view to avoid obfuscating its inner workings behind (potentially more efficient) abstractions.” The use of code to implement the disclosed method would generally require the use of processing hardware and a memory storing instructions.) comprising: 
accessing a machine learning problem space associated with a machine learning problem and a plurality of untrained candidate neural networks for solving the machine learning problem (Brock, p. 2, Section 3, “In SMASH (Algorithm 1), our goal is to rank a set of neural network configurations relative to one another based on each configuration’s validation performance, which we accomplish using weights generated by an auxiliary network.” Brock, p. 3, Algorithm 1, the input shows the “Space of all candidate architectures.”), 
the machine learning problem space comprising data to be processed by a trained network (Brock, p. 1, Section 1, “We validate our one-Shot Model Architecture Search through Hypernetworks (SMASH) for Convolutional Neural Networks (CNN) on CIFAR-10 and CIFAR-100.” Brock, p. 5, Section 4.1, in one embodiment, “we train a SMASH network for 300 epochs on CIFAR-100.” The training samples represent an example of data to be processed by a trained network.); 
computing, for each untrained candidate neural network, at least one expressivity measure based on data of the machine learning problem space, the expressivity measure capturing an expressivity of the candidate neural network with respect to the machine learning problem, the expressivity measure being computed without training the candidate neural network (Brock, p. 2, Section 3, “At each training step, we randomly sample a network architecture [based on data of the machine learning problem space], generate the weights for that architecture [example of at least one expressivity measure, which are generated before or without training the candidate neural network] using a HyperNet, and train the entire system end-to-end through backpropagation. When the model is finished training, we sample a number of random architectures and evaluate their performance on a validation set, using weights generated by the HyperNet.”);
computing, for each untrained candidate neural network, at least one trainability measure based on data of the machine learning problem space, the trainability measure capturing a trainability of the candidate neural network with respect to the machine learning problem (Brock, p. 2, Section 3, “At each training step, we randomly sample a network architecture, generate the weights for that architecture using a HyperNet, and train the entire system end-to-end through backpropagation. When the model is finished training, we sample a number of random architectures and evaluate their performance on a validation set, using weights generated by the HyperNet.” Brock, p. 3, Algorithm 1 shows “Get training error… [example of at least one trainability measure based on data of the machine learning problem space], backprop and update H.” Brock, p. 3, Section 3, Algorithm 1 shows that all candidate architectures exist in space                         
                            
                                
                                    R
                                
                                
                                    c
                                
                            
                        
                    , and input minibatch                         
                            
                                
                                    x
                                
                                
                                    i
                                
                            
                        
                     is sampled. Brock, p. 5, Section 4.1, in one embodiment, “we train a SMASH network for 300 epochs on CIFAR-100.”); 
selecting, based on the at least one expressivity measure and the at least one trainability measure, at least one candidate neural network for solving the machine learning problem (Brock, p. 3, Algorithm 1, “Sample random c and evaluate error on validation set                         
                            
                                
                                    E
                                
                                
                                    v
                                
                            
                            =
                            
                                
                                    f
                                
                                
                                    c
                                
                            
                            (
                            H
                            
                                
                                    c
                                
                            
                            ,
                            
                                
                                    x
                                
                                
                                    v
                                
                            
                            )
                        
                    ,” which is based on the at least one expressivity measure and the at least one trainability measure.); and 
providing an output representing the selected at least one candidate neural network (Brock, p. 2, Section 3, “We then select the architecture with the best estimated validation performance and train its weights normally.”).  

Regarding claim 6, Brock teaches the system of claim 1, wherein selecting the at least one candidate neural network for solving the machine learning problem comprises: 
selecting the at least one candidate neural network having the at least one expressivity measure exceeding a threshold and the at least one trainability measure within a range (Brock, p. 2, Section 3, “At each training step, we randomly sample a network architecture, generate the weights for that architecture [it is implicit that weights, or the at least one expressivity measure exceed a threshold] using a HyperNet, and train the entire system end-to-end through backpropagation [it is implicit that the training error, or the at least one trainability measure, calculated during backpropagation is within a range]. When the model is finished training, we sample a number of random architectures and evaluate their performance on a validation set, using weights generated by the HyperNet.” Brock, p. 2, Section 3, “We then select the architecture with the best estimated validation performance and train its weights normally.”).

Regarding claim 7, Brock teaches the system of claim 1, the operations further comprising: 
training the at least one candidate neural network to solve the machine learning problem (Brock, p. 2, Section 3, “We then select the architecture with the best estimated validation performance and train its weights normally.”).

Regarding claim 8, Brock teaches the system of claim 7, the operations further comprising: 
(Brock, p. 7, Section 4.3, “We next investigate how well our best-found CIFAR-100 architecture performs on ModelNet10 [41], a 3D object classification benchmark. We train on the voxelated instances of the ModelNet10 training set using the settings of [5], and report accuracy on the ModelNet10 test set [running the trained at least one candidate neural netwoork]. Our 8M parameter model achieves an accuracy of 93.28%, compared to a 93.61% accuracy from a hand-designed Inception-ResNet [5] with 18M parameters trained on the larger ModelNet40 dataset.”); and 
providing a solution to the machine learning problem generated by the trained at least one candidate neural network (Brock, p. 7, Section 4.3, “We next investigate how well our best-found CIFAR-100 architecture performs on ModelNet10 [41], a 3D object classification benchmark. We train on the voxelated instances of the ModelNet10 training set using the settings of [5], and report accuracy on the ModelNet10 test set [running the model on a test set implicitly discloses providing a solution to the machine learning problem generated by the trained at least one candidate neural network]. Our 8M parameter model achieves an accuracy of 93.28%, compared to a 93.61% accuracy from a hand-designed Inception-ResNet [5] with 18M parameters trained on the larger ModelNet40 dataset.”).

Regarding claims 9 and 14, claims 9 and 14 are directed to a non-transitory machine-readable medium storing instructions which cause one or more machines to perform steps similar to those recited in claims 1 and 6. Therefore, the rejection made to claims 1 and 6 are applied to claims 9 and 14.

Regarding claims 15 and 20, claims 15 and 20 are directed to a method reciting limitations similar to those in claims 1 and 6. Therefore, the rejection made to claims 1 and 6 are applied to claims 15 and 20.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date 

Claims 2-4, 10-12, and 16-18 is/are rejected under 35 U.S.C. 103 as being unpatentable over Brock in view of Hamel et al. (“Transfer Learning in MIR: Sharing Learned Latent Representations for Music Audio Classification and Similarity,” 2013, International Society for Music Information Retrieval, 6 pages) (“Hamel”).
Regarding claim 2, Brock teaches the system of claim 1, wherein the at least one expressivity measure represents a measure … of samples from the machine learning problem space (Brock, pp. 2-3, Section 3 and Algorithm 1, “At each training step, we randomly sample a network architecture, generate the weights for that architecture using a HyperNet, and train the entire system end-to-end through backpropagation.” Because the weights are a function of the HyperNet generated before training, they disclose a measure of samples from the machine learning space.).
Brock does not explicitly disclose the system, wherein the at least one expressivity measure represents a measure of separation of samples from the machine learning problem space (emphasis added).
However, Hamel teaches the system, wherein the at least one expressivity measure represents a measure of separation of samples from the machine learning problem space (Hamel, p. 4, Section 6.1, “To do this, we compute a distance matrix using an L1- distance on the embeddings of all the classes. Then, for each class, we look at which classes are the closest and perform a qualitative evaluation.” Hamel, p. 3, Section 3.2, “In this work, we use the L1-distance on our different feature sets in order to obtain a similarity matrix. We also tested Euclidian distance and Cosine distance and obtained similar results.” The distance between embeddings discloses a measure of separation of samples, where the embeddings disclose samples from the machine learning space.).  
Both Brock and Hamel are directed to applications in transfer learning. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the at least one expressivity measure in Brock to represent a measure of separation, as disclosed in Hamel. Doing so provides a way to assess if “the embedding process learns semantic information” (Hamel, p. 4, Section 6.1).

Regarding claim 3, Brock in view of Hamel teaches the system of claim 2.
Hamel further teaches the system, wherein the measure of separation is a magnitude (Hamel, p. 4, Section 6.1, “To do this, we compute a distance matrix using an L1- distance on the embeddings of all the classes. Then, for each class, we look at which classes are the closest and perform a qualitative evaluation.” Hamel, p. 3, Section 3.2, “In this work, we use the L1-distance on our different feature sets in order to obtain a similarity matrix. We also tested Euclidian distance and Cosine distance and obtained similar results.”).

Regarding claim 4, Brock in view of Hamel teaches the system of claim 2.
Hamel further teaches the system, wherein the measure of separation is an angle (Hamel, p. 4, Section 6.1, “To do this, we compute a distance matrix using an L1- distance on the embeddings of all the classes. Then, for each class, we look at which classes are the closest and perform a qualitative evaluation.” Hamel, p. 3, Section 3.2, “In this work, we use the L1-distance on our different feature sets in order to obtain a similarity matrix. We also tested Euclidian distance and Cosine distance and obtained similar results.”).  

Regarding claims 10-12, claims 10-12 are directed to a non-transitory machine-readable medium storing instructions which cause one or more machines to perform steps similar to those recited in claims 2-4. Therefore, the rejection made to claims 2-4 is applied to claims 10-12.

Regarding claims 16-18, claims 16-18 are directed to a method reciting limitations similar to those in claims 2-4. Therefore, the rejection made to claims 2-4 is applied to claims 16-18.

Claims 5, 13, and 19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Brock in view of Elsken et al. (“Simple and Efficient Architecture Search for Convolutional Neural Networks,” 2017, arXiv:1711.04528v1 [stat.ML], pp. 1-14) (“Elsken”).
Regarding claim 5, Brock teaches the system of claim 1.
Brock does not explicitly disclose the system, wherein the at least one trainability measure represents a stochastic gradient descent of weights in the candidate neural network during a first phase of training.
However, Elsken teaches the system, wherein the at least one trainability measure represents a stochastic gradient descent of weights in the candidate neural network during a first phase of training (Elsken, p. 5, Algorithm 1, candidate models are trained with SGDR, or stochastic gradient descent with restarts.)
Both Brock and Elsken are directed to neural network architecture search. While Brock discloses computing training error and performing back-propagation, Brock does not disclose the at least one trainability measure representing a stochastic gradient descent of weights. However, Elsken discloses this limitation. It would have been obvious to one of ordinary skill in the art to modify the backpropagation in Brock to include stochastic gradient descent, as disclosed in Elsken, to yield predictable results of training a neural network model.

Regarding claim 13, claim 13 is are directed to a non-transitory machine-readable medium storing instructions which cause one or more machines to perform steps similar to those recited in claim 5. Therefore, the rejection made to claim 5 is applied to claim 13.

Regarding claim 19, claim 19 is directed to a method reciting limitations similar to those in claim 5. Therefore, the rejection made to claim 5 is applied to claim 19.








Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to CATHERINE F LEE whose telephone number is (571)270-7487. The examiner can normally be reached Monday thru Friday, 10:00AM-6:00PM PST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Miranda Huang can be reached on (571)270-7092. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/C.F.L./Examiner, Art Unit 2124                                                                                                                                                                                                        
/MIRANDA M HUANG/Supervisory Patent Examiner, Art Unit 2124