DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This action is in response to amendments and remarks filed on 01/24/2022. In the current amendments, claims 1, 4, 5, 10, 13, 14, 16, 19, and 20 are amended. Claims 1-20 are pending and have been examined.
In response to amendments and remarks filed on 01/24/2022, the 35 U.S.C. 112(b) rejection to claims 4-5, 13-14, and 19-20 put forth in the previous Office Action has been withdrawn.

Claim Interpretation
Claims 10-20 recite “one or more computer readable mediums.” Specification [0087] notes the following, “A computer readable storage medium, as used herein, is not to be construed as being transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission media (e.g., light pulses passing through a fiber-optic cable), or electrical signals transmitted through a wire.” Therefore, for examination purposes, “one or more computer readable mediums” in claims 10-20 has been interpreted as “one or more non-transitory computer readable mediums.”

Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.


Claims 1, 6-8, 10, 15, and 16 are provisionally rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1, 5-7, 9, 13, and 15 of copending Application No. 16/047,526 in view of HUANG et al. (US 2018/0365564 A1). This is a provisional nonstatutory double patenting rejection.
Instant Application
Reference Application No. 16/047,526
Claim 1 

A computer-implemented method comprising:

selecting a teacher neural network among a plurality of teacher neural networks,

inputting at least an input data to the selected teacher neural network to obtain a soft label output generated by the selected teacher neural network, 

training a student neural network with at least the input data and the soft label output generated by the selected teacher neural network, and

repeating selecting the teacher neural network, inputting the input data to the selected
teacher neural network, and training the student neural network.
Claim 1

 A computer-implemented method, comprising:

inputting input data to each teacher neural network among a plurality of teacher neural networks to obtain a soft label output among a plurality of soft label outputs from each teacher neural network among the plurality of teacher neural networks, and 

training a student neural network with the input data and the plurality of soft label outputs.
Claim 6

The method of Claim 1, wherein the input data is audio data and the soft label output is a classification of the audio data.
Claim 5

The method of Claim 1, wherein the input data is audio data and each soft label output is a classification of the audio data.
Claim 7

The method of Claim 6, wherein the classification of the audio data identifies phonemes.
Claim 6 

The method of Claim 5, wherein the classification of the audio data identifies phonemes.
Claim 8

The method of Claim 1, wherein training the student neural network with at least the input data and the soft label output generated by the selected teacher neural network includes:

training the student neural network with at least the input data, the soft label output generated by the selected teacher neural network, and a correct data corresponding to the input data.
Claim 7

The method of Claim 1, wherein training the student neural network includes: training the student neural network with at least the input data, the plurality of soft label outputs, and a correct data corresponding to the input data.
Claim 10

An apparatus comprising:

a processor or a programmable circuitry; and one or more computer readable mediums collectively including instructions that, when executed by the processor or the programmable circuitry, cause the processor or the programmable circuitry to perform operations including:

selecting a teacher neural network among a plurality of teacher neural networks,

inputting at least an input data to the selected teacher neural network to obtain a soft label output generated by the selected teacher neural network, and

training a student neural network with at least the input data and the soft label output generated by the selected teacher neural network, and

repeating selecting the teacher neural network. inputting the input data to the selected teacher neural network, and training the student neural network.
Claim 9

A neural network training apparatus comprising:

a processor or a programmable circuitry; and one or more computer readable mediums collectively including instructions that, when executed by the processor or the programmable circuitry, cause the processor or the programmable circuitry to perform operations including: 

inputting an input data to each teacher neural network among a plurality of teacher neural networks to obtain a soft label output among a plurality of soft label outputs from each teacher neural network among the plurality of teacher neural networks, and 

training a student neural network with the input data and the plurality of soft label outputs.
Claim 15 

The apparatus of Claim 10, wherein the input data is audio data and the soft label output is a classification of the audio data.
Claim 13

The apparatus of Claim 9, wherein the input data is audio data and each soft label output is a classification of the audio data.
Claim 16

A computer program product including one or more computer readable storage mediums collectively storing program instructions that are executable by a processor or programmable circuitry to cause the processor or programmable circuitry to perform operations comprising:

selecting a teacher neural network among a plurality of teacher neural networks,

inputting at least an input data to the selected teacher neural network to obtain a soft label output generated by the selected teacher neural network, and

training a student neural network with at least the input data and the soft label output generated by the selected teacher neural network, and

repeating selecting the teacher neural network inputting the input data to the selected teacher neural network, and training the student neural network.
Claim 15

A computer program product including one or more computer readable storage mediums collectively storing program instructions for improving neural network training that are executable by a processor or programmable circuitry to cause the processor or programmable circuitry to perform operations comprising: 

inputting an input data to each teacher neural network among a plurality of teacher neural networks to obtain a soft label output among a plurality of soft label outputs from each teacher neural network among the plurality of teacher neural networks, and 

training a student neural network with the input data and the plurality of soft label outputs.


Regarding instant claim 1, claim 1 of reference application does not appear to explicitly teach “selecting a teacher neural network among a plurality of teacher neural networks,”; however, HUANG et al. teaches this limitation in pg. 3 [0038]-[0039]. Moreover, claim 1 of reference application does not appear to explicitly teach “repeating selecting the teacher neural network, inputting the input data to the selected teacher neural network, and training the student neural network”; however, HUANG et al. teaches this limitation in pg. 3 [0040]-[0041]. One of ordinary skill in the arts would have been motivated to make this modification (modifying the reference application with HUANG et al., which are both directed to neural networks) in order to “train and obtain student networks with a broader application range through aligning features of middle layers of teacher networks with those of student networks, and can adopt both classification models and regression models; in another aspect, it can work with other methods to further enhance the performance and the accuracy of student networks” (HUANG et al. pg. 2 [0036]). Instant claims 6-8 are rejected on the same rationale as instant claim 1.
Regarding instant claim 10, claim 9 of reference application does not appear to explicitly teach “selecting a teacher neural network among a plurality of teacher neural networks,”; however, HUANG et al. teaches this limitation in pg. 3 [0038]-[0039]. Moreover, claim 9 of reference application does not appear to explicitly teach “repeating selecting the teacher neural network, inputting the input data to the selected teacher neural network, and training the student neural network”; however, HUANG et al. teaches this limitation in pg. 3 [0040]-[0041]. One of ordinary skill in the arts would have been motivated to make this modification (modifying the reference application with HUANG et al., which are both directed to neural networks) in order to “train and obtain student networks with a broader application range through aligning features of middle layers of teacher networks with those of student networks, and can adopt both classification models and regression models; in another aspect, it can work with other methods to further enhance the performance and the accuracy of student networks” (HUANG et al. pg. 2 [0036]). Instant claim 15 is rejected on the same rationale as instant claim 10.
Regarding instant claim 16, claim 15 of reference application does not appear to explicitly teach “selecting a teacher neural network among a plurality of teacher neural networks,”; however, HUANG et al. teaches this limitation in pg. 3 [0038]-[0039]. Moreover, claim 15 of reference application does not appear to explicitly teach “repeating selecting the teacher neural network, inputting the input data to the selected teacher neural network, and training the student neural network”; however, HUANG et al. teaches this limitation in pg. 3 [0040]-[0041]. One of ordinary skill in the arts would have been motivated to make this modification (modifying the reference application with HUANG et al., which are both directed to neural networks) in order to “train and obtain student networks with a broader application range through aligning features of middle layers of teacher networks with those of student networks, and can adopt both classification models and regression models; in another aspect, it can work with other methods to further enhance the performance and the accuracy of student networks” (HUANG et al. pg. 2 [0036]).




Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 8, 10, and 16 are rejected under 35 U.S.C. 103 as being unpatentable over HUANG et al. (US 2018/0365564 A1) in view of OH et al. (US 2019/0034764 A1).
Regarding Claim 1,
HUANG et al. teaches A computer-implemented method comprising (pg. 6 [0094] teaches computer-implemented):
selecting a teacher neural network among a plurality of teacher neural networks (pg. 3 [0038]: “Step 101: selecting, by a training device, a teacher network performing the same functions of a student network” and pg. 3 [0039]: “The functions include such as image classification, object detection, semantic segmentation and so on. The teacher network is characterized by high performance and high accuracy; but, compared to the student network, it has some obvious disadvantages such as complex structure, a large number of parameters and weights and low computation speed. The student network is characterized by fast computation speed, average or poor performance, and simple network structure. The high-performance teacher network with the same functions of the student network could be selected from a set of preset neural network models” teach selecting a teacher neural network among a plurality of preset neural networks (correspond a plurality of teacher neural networks)),
...and repeating selecting the teacher neural network, inputting the input data to the selected teacher neural network, and training the student neural network (pg. 3 [0040]: “Step 102: iteratively training the student network and obtaining a target network, through aligning distributions of features between a first middle layer and a second middle layer corresponding to the same training sample data, so as to transfer knowledge of features of a middle layer of the teacher network to the student network” and pg. 3  [0041]: “Where, the features of the first middle layer refer to feature maps output from a first specific network layer of the teacher network after the training sample data are provided to the teacher network, and the features of the second middle layer refer to feature maps output from a second specific network layer of the student network after the training sample data are provided to the student network” teach iteratively (corresponds to repeatedly) training the student neural network wherein the iterative training comprises iteratively (repeatedly) obtaining and aligning distributions of features between a first and second middle layer; the iteratively obtaining of features from the first middle layer comprise selecting the teacher neural network and inputting training sample data into the selected teacher neural network; the iteratively obtaining of features from the second middle layer comprises training the student neural network with the training sample data).
HUANG et al. does not appear to explicitly teach inputting at least an input data to the selected teacher neural network to obtain a soft label output generated by the selected teacher neural network, training a student neural network with at least the input data and the soft label output generated by the selected teacher neural network.
However, OH et al. teaches inputting at least an input data to the selected teacher neural network to obtain a soft label output generated by the selected teacher neural network (Fig. 5 and pg. 6 [0078]: “Referring to FIG. 5, the training data generating apparatus inputs input data 510 provided in a form of image to the teacher model ensemble 520. Also, the training data generating apparatus acquires label values including probabilities that a position of an object and a type of the object in the input data 510 match a preset class from the teacher model ensemble 520. The training data generating apparatus visually outputs the acquired label values to input data 530. While extracting the object in the input data 530, the teacher model ensemble 520 outputs a probability that the object is a pedestrian, for example, 0.92, a probability that the object is a cyclist, for example, 0.31, and a probability that the object is a car, for example, 0.001” teach inputting training data (input data) to the teacher model to obtain labels and their probabilities (correspond to soft labels); pg. 3 [0037]: “In an example, each of the teacher model 120 and the student model 150 are a model trained to generate output data with respect to input data and include, for example, a neural network” teaches teacher and student neural networks),
training a student neural network with at least the input data and the soft label output generated by the selected teacher neural network (Fig. 5 and pg. 6 [0079]: “The training data generating apparatus determines a label value associated with the pedestrian and a label value associated with the car to be a label value appropriate for training the student model 550 among the label values acquired from the teacher model ensemble 520” teach training a student model with the soft labels from the teacher model and based on the input data (see Fig. 5 element 510); pg. 3 [0037]: “In an example, each of the teacher model 120 and the student model 150 are a model trained to generate output data with respect to input data and include, for example, a neural network” teaches teacher and student models are neural networks).
HUANG et al. and OH et al. are analogous art to the claimed invention because they are directed to student and teacher neural networks.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate the limitation(s) above as taught by OH et al. to the disclosed invention of HUANG et al.
One of ordinary skill in the arts would have been motivated to make this modification in order to “[detect] data inappropriate for training the student model from output data output by the trained teacher model based on the input data. The training data generating apparatus changes output data inappropriate for training the student model, to an ignore value such that the inappropriate output data is not to be used for training the student model. In terms of the output data appropriate for training the student model, the training data generating apparatus changes the output data such that the student model outputs an improved result in comparison to output data of the teacher model additionally to apply the output data to the training of the student model directly” (OH et al. pg. 8 [0096]).
Regarding Claim 8,
HUANG et al. in view of OH et al. teaches the method of Claim 1.
OH et al. further teaches wherein training the student neural network with at least the input data and the soft label output generated by the selected teacher neural network includes: training the student neural network with at least the input data, the soft label output generated by the selected teacher neural network, and a correct data corresponding to the input data (Fig. 5 and pg. 6 [0079]: “The training data generating apparatus determines a label value associated with the pedestrian and a label value associated with the car to be a label value appropriate for training the student model 550 among the label values acquired from the teacher model ensemble 520” teach training a student model with the soft labels from the teacher model and based on the input data (see Fig. 5 element 510); pg. 3 [0037]: “In an example, each of the teacher model 120 and the student model 150 are a model trained to generate output data with respect to input data and include, for example, a neural network” teaches teacher and student models are neural networks; pg. 3 [0041]: “The training data generating apparatus generates input data to be input to an input layer of the student model 150 and training data matching truth data to be acquired based on the input data using the student model 150 in order to train the student model 150” teaches training the student neural network with input data and the truth data (corresponds to correct data) corresponding to the input data).
HUANG et al. and OH et al. are analogous art to the claimed invention because they are directed to student and teacher neural networks.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate the limitation(s) above as taught by OH et al. to the disclosed invention of HUANG et al.
One of ordinary skill in the arts would have been motivated to make this modification in order to “[detect] data inappropriate for training the student model from output data output by the trained teacher model based on the input data. The training data generating apparatus changes output data inappropriate for training the student model, to an ignore value such that the inappropriate output data is not to be used for training the student model. In terms of the output data appropriate for training the student model, the training data generating apparatus changes the output data such that the student model outputs an improved result in comparison to output data of the teacher model additionally to apply the output data to the training of the student model directly” (OH et al. pg. 8 [0096]).
Regarding Claim 10,
Claim 10 recites analogous limitations to claim 1. Therefore, claim 10 is rejected based on the same rationale as claim 1. 
HUANG et al. further teaches An apparatus comprising: a processor or a programmable circuitry; and one or more computer readable mediums collectively including instructions that, when executed by the processor or the programmable circuitry, cause the processor or the programmable circuitry to perform operations including (pg. 6 [0093]-[0094] teach computer-readable storage medium, instructions, and processor).
Regarding Claim 16,
Claim 16 recites analogous limitations to claim 1. Therefore, claim 16 is rejected based on the same rationale as claim 1. 
HUANG et al. further teaches A computer program product including one or more computer readable storage mediums collectively storing program instructions that are executable by a processor or programmable circuitry to cause the processor or programmable circuitry to perform operations comprising (pg. 6 [0093]-[0094] teach computer-readable storage medium, instructions, and processor).

Claims 2, 3, 5, 11, 12, 14, 17, 18, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over HUANG et al. (US 2018/0365564 A1) in view of OH et al. (US 2019/0034764 A1) and further in view of KANG et al. (US 2017/0083829 A1).
Regarding Claim 2,
HUANG et al. in view of OH et al. teaches the method of Claim 1.
HUANG et al. further teaches wherein selecting the teacher neural network among the plurality of teacher neural networks includes (pg. 3 [0038]: “Step 101: selecting, by a training device, a teacher network performing the same functions of a student network” and pg. 3 [0039]: “The functions include such as image classification, object detection, semantic segmentation and so on. The teacher network is characterized by high performance and high accuracy; but, compared to the student network, it has some obvious disadvantages such as complex structure, a large number of parameters and weights and low computation speed. The student network is characterized by fast computation speed, average or poor performance, and simple network structure. The high-performance teacher network with the same functions of the student network could be selected from a set of preset neural network models” teach selecting a teacher neural network among a plurality of preset neural networks (correspond a plurality of teacher neural networks)).
HUANG et al. in view of OH et al. does not appear to explicitly teach randomly selecting the teacher neural network among the plurality of teacher neural networks.
However, KANG et al. teaches randomly selecting the teacher neural network among the plurality of teacher neural networks (pg. 8 [0133]: “the process of training a student model may be iteratively performed by randomly selecting at least one teacher model from a plurality of teacher models until the accuracy of the student model satisfies a predetermined condition(s)”).
HUANG et al., OH et al., and KANG et al. are analogous art to the claimed invention because they are directed to student and teacher neural networks.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate the limitation(s) above as taught by KANG et al. to the disclosed invention of HUANG et al. in view of OH et al.
One of ordinary skill in the arts would have been motivated to make this modification in order to “select at least one teacher model from a plurality of teacher models and train a student model using the selected at least one teacher model, thereby effectively increasing a learning rate and an accuracy of the student model” (KANG et al. pg. 8 [0131]).
Regarding Claim 3,
HUANG et al. in view of OH et al. teaches the method of Claim 1.
HUANG et al. further teaches wherein selecting the teacher neural network among the plurality of teacher neural networks includes (pg. 3 [0038]: “Step 101: selecting, by a training device, a teacher network performing the same functions of a student network” and pg. 3 [0039]: “The functions include such as image classification, object detection, semantic segmentation and so on. The teacher network is characterized by high performance and high accuracy; but, compared to the student network, it has some obvious disadvantages such as complex structure, a large number of parameters and weights and low computation speed. The student network is characterized by fast computation speed, average or poor performance, and simple network structure. The high-performance teacher network with the same functions of the student network could be selected from a set of preset neural network models” teach selecting a teacher neural network among a plurality of preset neural networks (correspond a plurality of teacher neural networks)).
HUANG et al. in view of OH et al. does not appear to explicitly teach selecting the teacher neural network based on an accuracy of the soft label output in comparison with a correct data corresponding to the input data.
However, KANG et al. teaches selecting the teacher neural network based on an accuracy of the soft label output in comparison with a correct data corresponding to the input data (pg. 1 [0022]: “The data recognizing method may further include selecting the teacher model based on accuracies of the plurality of teacher models or a correlation between output data of the plurality of teacher models, the output data corresponding to the input data” teaches selecting teacher model based on accuracies of the teacher models (corresponds to comparing the output of the teacher model with the correct output), or based on the correlation (comparison) between the output of the teacher model with the actual output corresponding to the input data; pg. 3 [0047]: “the output data of the teacher model 110 may be a value of logic output from the teacher model 110, a probability value, or an output value of a classifier layer derived from a hidden layer of the teacher model 110” teaches that the output of the teacher model can be a probability value (soft label)).
HUANG et al., OH et al., and KANG et al. are analogous art to the claimed invention because they are directed to student and teacher neural networks.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate the limitation(s) above as taught by KANG et al. to the disclosed invention of HUANG et al. in view of OH et al.
One of ordinary skill in the arts would have been motivated to make this modification in order to “select at least one teacher model from a plurality of teacher models and train a student model using the selected at least one teacher model, thereby effectively increasing a learning rate and an accuracy of the student model” (KANG et al. pg. 8 [0131]).
Regarding Claim 5,
HUANG et al. in view of OH et al. in view of KANG et al. teaches the method of Claim 3.
HUANG et al. further teaches wherein selecting the teacher neural network among the plurality of teacher neural networks includes (pg. 3 [0038]: “Step 101: selecting, by a training device, a teacher network performing the same functions of a student network” and pg. 3 [0039]: “The functions include such as image classification, object detection, semantic segmentation and so on. The teacher network is characterized by high performance and high accuracy; but, compared to the student network, it has some obvious disadvantages such as complex structure, a large number of parameters and weights and low computation speed. The student network is characterized by fast computation speed, average or poor performance, and simple network structure. The high-performance teacher network with the same functions of the student network could be selected from a set of preset neural network models” teach selecting a teacher neural network among a plurality of preset neural networks (correspond a plurality of teacher neural networks)).
KANG et al. further teaches selecting the teacher neural network that outputs the soft label output that is a closest to the correct data corresponding to the input data, among the plurality of teacher neural networks (pg. 3-4 [0059]: “The model training apparatus may select one teacher model from the plurality of teacher models based on accuracies of the plurality of teacher models. For example, the model training apparatus may select a teacher model having a highest accuracy from the plurality of teacher models” teaches selecting the teacher model with the highest accuracy, which renders the teacher model that outputs the output closest to the correct output is selected; pg. 3 [0047]: “the output data of the teacher model 110 may be a value of logic output from the teacher model 110, a probability value, or an output value of a classifier layer derived from a hidden layer of the teacher model 110” teaches that the output of the teacher model can be a probability value (soft label)).
HUANG et al., OH et al., and KANG et al. are analogous art to the claimed invention because they are directed to student and teacher neural networks.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate the limitation(s) above as taught by KANG et al. to the disclosed invention of HUANG et al. in view of OH et al.
One of ordinary skill in the arts would have been motivated to make this modification in order to “select at least one teacher model from a plurality of teacher models and train a student model using the selected at least one teacher model, thereby effectively increasing a learning rate and an accuracy of the student model” (KANG et al. pg. 8 [0131]).
Regarding Claim 11,
Claim 11 recites analogous limitations to claim 2. Therefore, claim 11 is rejected based on the same rationale as claim 2. 
Regarding Claim 12,
Claim 12 recites analogous limitations to claim 3. Therefore, claim 12 is rejected based on the same rationale as claim 3. 


Regarding Claim 14,
Claim 14 recites analogous limitations to claim 5. Therefore, claim 14 is rejected based on the same rationale as claim 5. 
Regarding Claim 17,
Claim 17 recites analogous limitations to claim 2. Therefore, claim 17 is rejected based on the same rationale as claim 2. 
Regarding Claim 18,
Claim 18 recites analogous limitations to claim 3. Therefore, claim 18 is rejected based on the same rationale as claim 3. 
Regarding Claim 20,
Claim 20 recites analogous limitations to claim 5. Therefore, claim 20 is rejected based on the same rationale as claim 5. 

Claims 6-7 and 15 are rejected under 35 U.S.C. 103 as being unpatentable over HUANG et al. (US 2018/0365564 A1) in view of OH et al. (US 2019/0034764 A1) and further in view of Schalkwyk et al. (US 2015/0340034 A1).
Regarding Claim 6,
HUANG et al. in view of OH et al. teaches method of Claim 1.
HUANG et al. in view of OH et al. does not appear to explicitly teach wherein the input data is audio data and the soft label output is a classification of the audio data.
However, Schalkwyk et al. teaches wherein the input data is audio data and the soft label output is a classification of the audio data (pg. 2 [0016]: “The acoustic model 110 is a neural network-based model that that receives an audio input and generates a respective score for each of a set of phoneme label sequences, e.g., phoneme label scores 116 for the audio input 102” and pg. 2 [0018]: “The phoneme CTC layer 114 receives the acoustic LSTM output generated by the LSTM memory blocks 112 and generates a set of phoneme label scores in accordance with current values of a set of phoneme CTC parameters. For example, the phoneme CTC layer 114 may be a softmax classifier layer that generates scores for each of the phoneme labels, with the scores being probabilities that the corresponding phoneme label represents the audio input and, if the phoneme labels include a "blank" label, the score for the "blank" label being a probability that none of the other phoneme labels accurately represent the audio input” teach a neural network-based acoustic model that receives audio input and generates a probability label (soft label) as the classification of the audio data).
HUANG et al., OH et al., and Schalkwyk et al. are analogous art to the claimed invention because they are directed to neural network based classification.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate the limitation(s) above as taught by Schalkwyk et al. to the disclosed invention of HUANG et al. in view of OH et al.
One of ordinary skill in the arts would have been motivated to make this modification because “[by] using a speech recognition system that is entirely neural network-based, the performance of the speech recognition system on previously unseen audio data can be improved” (Schalkwyk et al. pg. 1 [0005]).
Regarding Claim 7,
HUANG et al. in view of OH et al. in view of Schalkwyk et al. teaches method of Claim 6.
Schalkwyk et al. further teaches wherein the classification of the audio data identifies phonemes (pg. 2 [0016]: “The acoustic model 110 is a neural network-based model that that receives an audio input and generates a respective score for each of a set of phoneme label sequences, e.g., phoneme label scores 116 for the audio input 102” and pg. 2 [0018]: “The phoneme CTC layer 114 receives the acoustic LSTM output generated by the LSTM memory blocks 112 and generates a set of phoneme label scores in accordance with current values of a set of phoneme CTC parameters. For example, the phoneme CTC layer 114 may be a softmax classifier layer that generates scores for each of the phoneme labels, with the scores being probabilities that the corresponding phoneme label represents the audio input and, if the phoneme labels include a "blank" label, the score for the "blank" label being a probability that none of the other phoneme labels accurately represent the audio input” teach a neural network-based acoustic model that provides classification of audio data identifying phonemes).
HUANG et al., OH et al., and Schalkwyk et al. are analogous art to the claimed invention because they are directed to neural network based classification.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate the limitation(s) above as taught by Schalkwyk et al. to the disclosed invention of HUANG et al. in view of OH et al.
One of ordinary skill in the arts would have been motivated to make this modification because “[by] using a speech recognition system that is entirely neural network-based, the performance of the speech recognition system on previously unseen audio data can be improved” (Schalkwyk et al. pg. 1 [0005]).
Regarding Claim 15,
Claim 15 recites analogous limitations to claim 6. Therefore, claim 15 is rejected based on the same rationale as claim 6. 





Response to Arguments
Applicant's arguments filed on 01/24/2022 with respect to the Double Patenting rejection have been fully considered but they are not persuasive. Applicant asserts that “that this is a provisional rejection and that no further response is needed at this time. Although Applicant disagrees with the rejection, Applicant will consider filing a terminal disclaimer after such time as the co-pending Application issues and all other matters in the present Application are resolved” (Remarks, pg. 10).
Examiner’s Response:
Applicant did not put forth specific arguments regarding Applicant’s disagreement with the Double Patenting rejection. Please see above for a detailed explanation of the current Double Patenting rejection necessitated by amendments.

Applicant's arguments filed on 01/24/2022 with respect to the 35 U.S.C. 103 rejection to independent claims 1, 10, and 16 and dependent claims 2, 3, 5--8, 11, 12, 14, 15, 17, 18, and 20 have been fully considered but they are not persuasive. 
Applicant asserts that “[w]ith regard to the rejection of the claim feature reciting "training a student neural network with at least the input data and the soft label output generated by the selected teacher neural network", the Examiner contends on page 10 of the present Office Action that this feature is taught by the discussion in paragraph [0131] of Kang regarding a teacher model. However, it is respectfully asserted that Kang does not discuss any particular data used in this broad statement regarding training in the Kang specification” (Remarks, pg. 12).
Examiner’s Response:
Examiner notes that the previous Office Action, on pg. 10, does not assert that the Kang reference teaches the limitation “training a student neural network with at least the input data and the soft label output generated by the selected teacher neural network.” In contrast, the previous (and current) Office Action asserts that the limitation “training a student neural network with at least the input data and the soft label output generated by the selected teacher neural network” is disclosed by the Oh reference.

Applicant asserts that “The Examiner further contends that Oh is used for the data type in combination with Kang on page 10 of the present Office Action. However, it is respectfully asserted that there does not appear to be any reference to training being performed "with at least the input data and the soft label output generated by the selected teacher network", as essentially recited in independent claims 1, 10, and 16, but rather the art is silent regarding the same” (Remarks, pg. 12).
Examiner’s Response:
The Examiner respectfully disagrees. As indicated in the rejection above, OH et al. teaches training a student neural network with at least the input data and the soft label output generated by the selected teacher neural network (Fig. 5 and pg. 6 [0079]: “The training data generating apparatus determines a label value associated with the pedestrian and a label value associated with the car to be a label value appropriate for training the student model 550 among the label values acquired from the teacher model ensemble 520” teach training a student model with the soft labels from the teacher model and based on the input data (see Fig. 5 element 510); pg. 3 [0037]: “In an example, each of the teacher model 120 and the student model 150 are a model trained to generate output data with respect to input data and include, for example, a neural network” teaches teacher and student models are neural networks). Please see the current prior art rejection for additional information). More specifically, OH et al. in Fig. 5 element 510 teaches that the input data is used to train the student neural network, and pg. 6 [0079] teaches that soft labels (in the form of label values) from the teacher model are used to train the student neural network.

Applicant asserts that “[i]t is respectfully asserted that claims 1, 10, and 16, as amended, are clearly not taught or suggested by the presently cited art at least because none of the cited references discuss "repeating selecting the teacher neural network, inputting the input data to the selected teacher neural network, and training the student neural network", as essentially recited in amended claims 1, 10, and 16, but rather are silent regarding the same. It is further respectfully asserted that claims 1, 10, and 16 are not taught or suggested by the cited art at least because of the inclusion of a portion of the indicated allowable subject matter from claim 4 in amended claims 1, 10, and 16” (Remarks, pg. 13).
Examiner’s Response:
The Examiner respectfully disagrees. The previous Office Action indicated that claim 4 “[is] objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims, and if the ground(s) of rejection to claims 4, 13, and 19 are overcome” (Office Action mailed on 10/22/2021, pg. 19). However, amended claim 1 does not incorporate all of the limitations of claim 4. The incorporation of a subset of limitations from original claim 4 to claim 1 changes the scope of claim 1, thus necessitating a new ground of rejection.
As indicated in the rejection above, HUANG et al. teaches repeating selecting the teacher neural network, inputting the input data to the selected teacher neural network, and training the student neural network (pg. 3 [0040]: “Step 102: iteratively training the student network and obtaining a target network, through aligning distributions of features between a first middle layer and a second middle layer corresponding to the same training sample data, so as to transfer knowledge of features of a middle layer of the teacher network to the student network” and pg. 3  [0041]: “Where, the features of the first middle layer refer to feature maps output from a first specific network layer of the teacher network after the training sample data are provided to the teacher network, and the features of the second middle layer refer to feature maps output from a second specific network layer of the student network after the training sample data are provided to the student network” teach iteratively (corresponds to repeatedly) training the student neural network wherein the iterative training comprises iteratively (repeatedly) obtaining and aligning distributions of features between a first and second middle layer; the iteratively obtaining of features from the first middle layer comprise selecting the teacher neural network and inputting training sample data into the selected teacher neural network; the iteratively obtaining of features from the second middle layer comprises training the student neural network with the training sample data). 

Applicant relies on the above arguments regarding independent claims 1, 10, and 16 for dependent claims 2, 3, 5--8, 11, 12, 14, 15, 17, 18, and 20, therefore the above corresponding responses are applicable to the aforementioned dependent claims.

Allowable Subject Matter
Claims 4, 9, 13, and 19 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to YING YU CHEN whose telephone number is (571)270-1484. The examiner can normally be reached Monday-Friday 7:30 am-5:00 pm (EST).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kamran Afshar can be reached on (571) 272-7796. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/YING YU CHEN/Examiner, Art Unit 2125