Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

DETAILED ACTION
This office action is in response to correspondence 08/02/22 regarding application 16/744,674, in which claims 1, 4, 8, 11, 15, and 18 were amended and claims 3, 10, and 17 were cancelled. Claims 1, 2, 4-9, 11-16, and 18-20 are pending in the application and have been considered.

Response to Arguments
Applicant’s arguments on pages 11-12 regarding the 35 U.S.C. 102(a)(1) rejections based on Guo have been considered but are moot in view of the new grounds for rejection, necessitated by Applicant’s amendments.
Applicant’s arguments on pages 12-17 regarding the 35 U.S.C. 103 rejections based on Guo in view of Matsuba have been considered but are not persuasive. 

On page 13, Applicant asserts that none of the references cited or any combination thereof teaches or suggests 
“…the network parameter of the super network comprises a weight parameter of the super network;
after obtaining the network parameter of the super network, for each of the candidate network sub-structures, storing a mapping relation between a structure identifier and a network parameter of the respective candidate network sub-structure”. 
The examiner respectfully disagrees, and maintains that Guo teaches the network parameter of the super network comprises a weight parameter of the super network (Supernet weights, page 3, Section 3.1. Revisiting One-Shot NAS);
after obtaining the network parameter of the super network, for each of the candidate network sub-structures, storing a relation between a structure identifier and a network parameter of the respective candidate network sub-structure (using a choice block to search the bit widths of the weights and feature maps… during subsequent training, for each choice block feature bit width and weight bit width are randomly sampled. They are determined in the evolutionary step, pages 7-8, Application: Mixed-Precision Quantization, Fig 5). 
Guo does not specifically mention storing a mapping relation.
Matsuba discloses a mapping relation (mapping of relation, Col 8 lines 13-15).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Guo by including a mapping relation in order to increase pattern recognition accuracy, as suggested by Matsuba (Col 2 lines 10-13).

Next, on page 14 Applicant argues that because Guo randomly samples the feature bit width and weight bit width for each block during the Supernet training, Guo somehow does not disclose “after obtaining the network parameter of the super network, for each of the candidate network sub-structures, storing a mapping relation between a structure identifier and a network parameter of the respective candidate network sub-structure”. 

In response, it is unclear why Guo randomly sampling the feature bit width and weight bit width for each block during the Supernet training somehow necessarily means Guo does not disclose “after obtaining the network parameter of the super network, for each of the candidate network sub-structures, storing a relation between a structure identifier and a network parameter of the respective candidate network sub-structure” as alleged in the Office Action. While Applicant is correct that Guo is not considered to specifically disclose storing a mapping relation, one cannot show non-obviousness by attacking references individually when the rejection (as in that described on page 10 of the Non-Final Rejection 05/13/22) is based upon a combination of references, specifically the combination of Guo with Matsuba. While Guo does not specifically mention storing a mapping relation, Matsuba discloses a mapping relation (mapping of relation, Col 8 lines 13-15). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Guo by including a mapping relation in order to increase pattern recognition accuracy, as suggested by Matsuba (Col 2 lines 10-13).

Next, on page 15 Applicant argues that Matsuba fails to cure the deficiencies of Guo, allegedly because since Matsuba discloses a mapping of relation between the input data Xi to Xn and the teacher data Xn+1 to XN, Matsuba does not disclose “a mapping relation between a structure identifier and a network parameter of the respective candidate network sub-structure” as recited in claim 1.
In response, if Matsuba discloses a mapping of relation between the input data Xi to Xn and the teacher data Xn+1 to XN as alleged by Applicant, then this evidence would appear to support the assertion on page 10 of the Non-Final Rejection 05/13/22 that Matsuba discloses “a mapping relation”. The examiner therefore disagrees that Matsuba fails to cure the deficiencies of Guo, since Guo merely does not specifically describe the stored relations between the subnetwork structures and weights a “mapping relation”. 

Finally, on page 16, Applicant asserts that neither reference discloses “training a super network to obtain a network parameter of the super network… wherein the network parameter of the super network comprises a weight parameter of the super network. 
In response, the examiner respectfully disagrees and contends that in Guo, training the Supernet results in optimization of the Supernet weight (Supernet weights, page 3, Section 3.1. Revisiting One-Shot NAS).

The remaining arguments on pages 16-17 regarding independent claims 8 and 15 as well as the dependent claims are similar to those addressed above, and are not persuasive for similar reasons.

35 USC § 101 Abstract Idea Analysis (NOT A REJECTION)
Claims 1, 2, 4-9, 11-16, and 18-20 are directed to training a neural network, which amounts to adjusting parameters of an abstract model (the neural network) which is necessarily implemented on a computer. Since the training improves the functioning of the computer itself, it provides meaningful limitation to transform the abstract model into a patent eligible application of the abstract model such that the claims amount to significantly more than the abstract model itself. This is solely for clarity of the record and is NOT a rejection.


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.


Claims 1, 2, 4-9, 11-16, and 18-20 are rejected under 35 U.S.C. 103 as being unpatentable over Guo et al. (“Single Path One-Shot Neural Architecture Search with Uniform Sampling”. ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY14853. March 31. 2019. 14 pages) in view of Matsuba et al. (5,255,347).

Consider claim 1, Guo discloses a method for training a neural network (Supernet is trained by a uniform path sampling method, Abstract), comprising:
training a super network to obtain a network parameter of the super network (Supernet weights, page 3, Section 3.1. Revisiting One-Shot NAS), wherein each network layer of the super network comprises multiple candidate network sub-structures in parallel (choice blocks, Fig 1, page 3, Section 3.1. Revisiting One-Shot NAS), wherein the network parameter of the super network comprises a weight parameter of the super network (Supernet weights, page 3, Section 3.1. Revisiting One-Shot NAS);
after obtaining the network parameter of the super network, for each of the candidate network sub-structures, storing a relation between a structure identifier and a network parameter of the respective candidate network sub-structure (using a choice block to search the bit widths of the weights and feature maps… during subsequent training, for each choice block feature bit width and weight bit width are randomly sampled. They are determined in the evolutionary step, pages 7-8, Application: Mixed-Precision Quantization, Fig 5);
for each network layer of the super network, selecting, from the multiple candidate network sub-structures, a candidate network sub-structure to be a target network sub-structure (Algorithm 1 Evolutionary Architecture Search, Section 3.4., pages 4-5); 
constructing a sub-network based on target network sub-structures each selected in a respective network layer of the super network (running the mixed-precision quantization search space, Section 3.5, Summary, pages 7-8); and 
training the sub-network, by taking the network parameter inherited from the super network as an initial parameter of the sub-network, to obtain a network parameter of the sub-network (all underlying architectures and their weights get trained fully and equally, Abstract, page 1), using a choice block to search the bit widths of the weights and feature maps… during subsequent training, for each choice block feature bit width and weight bit width are randomly sampled. They are determined in the evolutionary step, pages 7-8, Application: Mixed-Precision Quantization, Fig 5).
Guo does not specifically mention storing a mapping relation.
Matsuba discloses a mapping relation (mapping of relation, Col 8 lines 13-15).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Guo by including a mapping relation in order to increase pattern recognition accuracy, as suggested by Matsuba (Col 2 lines 10-13).
 
Consider claim 8, Guo discloses an apparatus for training a neural network, comprising: a processor (GPU, page 8); and a memory configured to store instructions executable by the processor (GPU memory, page 8), wherein the processor is configured to:
train a super network to obtain a network parameter of the super network (Supernet weights, page 3, Section 3.1. Revisiting One-Shot NAS), wherein each network layer of the super network comprises multiple candidate network sub-structures in parallel (choice blocks, Fig 1, page 3, Section 3.1. Revisiting One-Shot NAS), wherein the network parameter of the super network comprises a weight parameter of the super network (Supernet weights, page 3, Section 3.1. Revisiting One-Shot NAS);
after obtaining the network parameter of the super network, for each of the candidate network sub-structures, storing a relation between a structure identifier and a network parameter of the respective candidate network sub-structure (using a choice block to search the bit widths of the weights and feature maps… during subsequent training, for each choice block feature bit width and weight bit width are randomly sampled. They are determined in the evolutionary step, pages 7-8, Application: Mixed-Precision Quantization, Fig 5); 
for each network layer of the super network, select, from the multiple candidate network sub-structures, a candidate network sub-structure to be a target network sub-structure (Algorithm 1 Evolutionary Architecture Search, Section 3.4., pages 4-5); 
construct a sub-network based on target network sub-structures each selected in a respective network layer of the super network (running the mixed-precision quantization search space, Section 3.5, Summary, pages 7-8); and 
train the sub-network, by taking the network parameter inherited from the super network as an initial parameter of the sub-network, to obtain a network parameter of the sub-network (all underlying architectures and their weights get trained fully and equally, Abstract, page 1, using a choice block to search the bit widths of the weights and feature maps… during subsequent training, for each choice block feature bit width and weight bit width are randomly sampled. They are determined in the evolutionary step, pages 7-8, Application: Mixed-Precision Quantization, Fig 5). 
Guo does not specifically mention storing a mapping relation.
Matsuba discloses a mapping relation (mapping of relation, Col 8 lines 13-15).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Guo by including a mapping relation for reasons similar to those for claim 1.

Consider claim 15, Guo discloses a non-transitory computer-readable storage medium having stored therein instructions (GPU memory, page 8) that, when executed by a processor of a device, cause the device to perform a method for training a neural network, the method comprising: 
training a super network to obtain a network parameter of the super network (Supernet weights, page 3, Section 3.1. Revisiting One-Shot NAS), wherein each network layer of the super network comprises multiple candidate network sub-structures in parallel (choice blocks, Fig 1, page 3, Section 3.1. Revisiting One-Shot NAS), wherein the network parameter of the super network comprises a weight parameter of the super network (Supernet weights, page 3, Section 3.1. Revisiting One-Shot NAS);
after obtaining the network parameter of the super network, for each of the candidate network sub-structures, storing a relation between a structure identifier and a network parameter of the respective candidate network sub-structure (using a choice block to search the bit widths of the weights and feature maps… during subsequent training, for each choice block feature bit width and weight bit width are randomly sampled. They are determined in the evolutionary step, pages 7-8, Application: Mixed-Precision Quantization, Fig 5);
for each network layer of the super network, selecting, from the multiple candidate network sub-structures, a candidate network sub-structure to be a target network sub-structure (Algorithm 1 Evolutionary Architecture Search, Section 3.4., pages 4-5); 
constructing a sub-network based on target network sub-structures each selected in a respective network layer of the super network (running the mixed-precision quantization search space, Section 3.5, Summary, pages 7-8); and 
training the sub-network, by taking the network parameter inherited from the super network as an initial parameter of the sub-network, to obtain a network parameter of the sub-network (all underlying architectures and their weights get trained fully and equally, Abstract, page 1, using a choice block to search the bit widths of the weights and feature maps… during subsequent training, for each choice block feature bit width and weight bit width are randomly sampled. They are determined in the evolutionary step, pages 7-8, Application: Mixed-Precision Quantization, Fig 5). 
Guo does not specifically mention storing a mapping relation.
Matsuba discloses a mapping relation (mapping of relation, Col 8 lines 13-15).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Guo by including a mapping relation for reasons similar to those for claim 1.

Consider claim 2, Guo discloses the super network comprises N network layers, and each of the network layers comprises M candidate network sub-structures, where N is a positive integer no smaller than 2, and M is a positive integer no smaller than 2 (considering the choice blocks at network layers, and the choices within the blocks the candidate network sub-structures, Fig 1, page 3, Section 3.1. Revisiting One-Shot NAS); and wherein for each network layer of the super network, selecting, from the multiple candidate network sub-structures, a candidate network sub-structure to be a target network sub-structure comprises: selecting an mth candidate network sub-structure of an nth network layer of the super network to be the target network sub-structure constructing an nth network layer of the sub-network, where n is a positive integer smaller than or equal to N, and m is a positive integer smaller than or equal to M (for instance, selecting choice 3 for a layer, page 3, Section 3.1. Revisiting One-Shot NAS). 

Consider claim 4, Guo discloses wherein training the sub-network, by taking the network parameter inherited from the super network as the initial parameter of the sub-network, to obtain a network parameter of the sub-network comprises: for each of the candidate network sub-structures contained in the sub-network, querying, based on a structure identifier of the candidate network sub-structure, the relation to obtain a network parameter of the candidate network sub-structure (this work uses evolutionary search, which can be repeated many times on the same Supernet once trained, page 3, section 3.1. Revisiting One-Shot NAS, using a choice block to search the bit widths of the weights and feature maps… during subsequent training, for each choice block feature bit width and weight bit width are randomly sampled. They are determined in the evolutionary step, pages 7-8, Application: Mixed-Precision Quantization, Fig 5); and training, based on the obtained network parameters of the candidate network sub-structures, the sub-network, to obtain the network parameter of the sub-network (during search, i.e. query, each sample architecture inherits its weights from the Supernet weights, page 3, section 3.1. Revisiting One-Shot NAS).
Guo does not specifically mention storing a mapping relation.
Matsuba discloses a mapping relation (mapping of relation, Col 8 lines 13-15).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Guo by including a mapping relation for reasons similar to those for claim 1. 


Consider claim 5, Guo discloses for each network layer of the super network, selecting, from the multiple candidate network sub-structures, the candidate network sub-structure to be the target network sub-structure comprises: selecting, based on a set search algorithm, a candidate network sub-structure from the multiple candidate network sub-structures of each network layer of the super network to be a target network sub-structure; wherein the set search algorithm comprises at least one of: a random search algorithm, a Bayesian search algorithm, an evolutionary learning algorithm, a reinforcement learning algorithm, an evolutionary and reinforcement learning combined algorithm, or a gradient based algorithm (this work uses evolutionary search, which can be repeated many times on the same Supernet once trained, page 3, section 3.1). 

Consider claim 6, Guo discloses: processing input data based on the trained sub-network, wherein a type of the input data comprises at least one of: an image data type, a text data type, or an audio data type (Experiments on ImageNet, page 5, Section 4. Experiment Results). 

Consider claim 7, Guo discloses: conducting performance evaluation on the trained sub-network based on a test data set, to obtain an evaluation result, wherein a type of test data in the test data set comprises at least one of: an image data type, a service data type or an audio data type (results from ImageNet image processing evaluation compared to other methods in Table 4, page 7). 

Consider claim 9, Guo discloses the super network comprises N network layers, and each of the network layers comprises M candidate network sub-structures, where N is a positive integer no smaller than 2, and M is a positive integer no smaller than 2 (considering the choice blocks at network layers, and the choices within the blocks the candidate network sub-structures, Fig 1, page 3, Section 3.1. Revisiting One-Shot NAS); and wherein the processor is further configured to: select an mth candidate network sub-structure of an nth network layer of the super network to be the target network sub-structure constructing an nth network layer of the sub-network, where n is a positive integer smaller than or equal to N, and m is a positive integer smaller than or equal to M (for instance, selecting choice 3 for a layer, page 3, Section 3.1. Revisiting One-Shot NAS). 

Consider claim 11, Guo discloses for each of the candidate network sub-structures contained in the sub-network, query, based on a structure identifier of the candidate network sub-structure, the relation to obtain a network parameter of the candidate network sub-structure (this work uses evolutionary search, which can be repeated many times on the same Supernet once trained, page 3, section 3.1. Revisiting One-Shot NAS, using a choice block to search the bit widths of the weights and feature maps… during subsequent training, for each choice block feature bit width and weight bit width are randomly sampled. They are determined in the evolutionary step, pages 7-8, Application: Mixed-Precision Quantization, Fig 5); and train, based on the obtained network parameters of the candidate network sub-structures, the sub-network, to obtain the network parameter of the sub-network (during search, i.e. query, each sample architecture inherits its weights from the Supernet weights, page 3, section 3.1. Revisiting One-Shot NAS).
Guo does not specifically mention storing a mapping relation.
Matsuba discloses a mapping relation (mapping of relation, Col 8 lines 13-15).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Guo by including a mapping relation for reasons similar to those for claim 1. 

Consider claim 12, Guo discloses the processor is further configured to: select, based on a set search algorithm, a candidate network sub-structure from the multiple candidate network sub-structures of each network layer of the super network to be a target network sub-structure; wherein the set search algorithm comprises at least one of: a random search algorithm, a Bayesian search algorithm, an evolutionary learning algorithm, a reinforcement learning algorithm, an evolutionary and reinforcement learning combined algorithm, or a gradient based algorithm (this work uses evolutionary search, which can be repeated many times on the same Supernet once trained, page 3, section 3.1).

Consider claim 13, Guo discloses the processor is further configured to: process input data based on the trained sub-network, wherein a type of the input data comprises at least one of: an image data type, a text data type, or an audio data type (Experiments on ImageNet, page 5, Section 4. Experiment Results). 

Consider claim 14, Guo discloses the processor is further configured to: conduct performance evaluation on the trained sub-network based on a test data set, to obtain an evaluation result, wherein a type of test data in the test data set comprises at least one of: an image data type, a service data type or an audio data type (results from ImageNet image processing evaluation compared to other methods in Table 4, page 7). 

Consider claim 16, Guo discloses the super network comprises N network layers, and each of the network layers comprises M candidate network sub-structures, where N is a positive integer no smaller than 2, and M is a positive integer no smaller than 2 (considering the choice blocks at network layers, and the choices within the blocks the candidate network sub-structures, Fig 1, page 3, Section 3.1. Revisiting One-Shot NAS); and wherein for each network layer of the super network, selecting, from the multiple candidate network sub-structures, a candidate network sub-structure to be a target network sub-structure comprises: selecting an mth candidate network sub-structure of an nth network layer of the super network to be the target network sub-structure constructing an nth network layer of the sub-network, where n is a positive integer smaller than or equal to N, and m is a positive integer smaller than or equal to M (for instance, selecting choice 3 for a layer, page 3, Section 3.1. Revisiting One-Shot NAS). 

Consider claim 18, Guo discloses wherein training the sub-network, by taking the network parameter inherited from the super network as the initial parameter of the sub-network, to obtain a network parameter of the sub-network comprises: for each of the candidate network sub-structures contained in the sub-network, querying, based on a structure identifier of the candidate network sub-structure, the relation to obtain a network parameter of the candidate network sub-structure (this work uses evolutionary search, which can be repeated many times on the same Supernet once trained, page 3, section 3.1. Revisiting One-Shot NAS, using a choice block to search the bit widths of the weights and feature maps… during subsequent training, for each choice block feature bit width and weight bit width are randomly sampled. They are determined in the evolutionary step, pages 7-8, Application: Mixed-Precision Quantization, Fig 5); and training, based on the obtained network parameters of the candidate network sub-structures, the sub-network, to obtain the network parameter of the sub-network (during search, i.e. query, each sample architecture inherits its weights from the Supernet weights, page 3, section 3.1. Revisiting One-Shot NAS).
Guo does not specifically mention storing a mapping relation.
Matsuba discloses a mapping relation (mapping of relation, Col 8 lines 13-15).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Guo by including a mapping relation for reasons similar to those for claim 1. 

Consider claim 19, Guo discloses for each network layer of the super network, selecting, from the multiple candidate network sub-structures, the candidate network sub-structure to be the target network sub-structure comprises: selecting, based on a set search algorithm, a candidate network sub-structure from the multiple candidate network sub-structures of each network layer of the super network to be a target network sub-structure; wherein the set search algorithm comprises at least one of: a random search algorithm, a Bayesian search algorithm, an evolutionary learning algorithm, a reinforcement learning algorithm, an evolutionary and reinforcement learning combined algorithm, or a gradient based algorithm (this work uses evolutionary search, which can be repeated many times on the same Supernet once trained, page 3, section 3.1). 

Consider claim 20, Guo discloses the method further comprises: processing input data based on the trained sub-network, wherein a type of the input data comprises at least one of: an image data type, a text data type, or an audio data type (Experiments on ImageNet, page 5, Section 4. Experiment Results).

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Jesse Pullias whose telephone number is 571/270-5135. The examiner can normally be reached on M-F 8:00 AM - 4:30 PM. The examiner’s fax number is 571/270-6135.

Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.

If attempts to reach the examiner by telephone are unsuccessful, the examiner's supervisor, Andrew Flanders can be reached on 571/272-7516. 

Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free).


/Jesse S Pullias/
Primary Examiner, Art Unit 2655                                              08/30/22