DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . This office action is in response to an application filed on 01/19/2020. The applicant submits an Information Disclosure Statement dated 01/29/2021. The applicant does not claim Domestic priority. The applicant claims Foreign priority to a Chinese application dated 02/02/2019.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1 – 10 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea of a mental concept of evaluation or observation without significantly more. The claims recite a neural network training method. The claims fail the first prong of the 2019 Subject Matter Eligibility Guidance. The independent claim features of training a neural network using sample data, determining an indicator parameter, determining an update manner, and updating a parameter of a batch normalization are broad. The features do not state with specificity structure that gathers specific sample data. The claims do not further identify what the neural network is trained to do. The USPTO guidance example 39 shows the requirements for claiming training a neural network. The example is specific in defining the data of digital facial images from a database, are processed through transformation operation, and through the creation of training sets the network is trained. This judicial exception is not integrated into a practical application because the claims do not identify what the network is trained to do. The claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception because the claims do not identify any structure such as a database or operations that are performed to the data and how the neural network is trained to perform a specific operation. Thus, the claims fail the second prong of the 2019 Subject Matter Eligibility Guidance and not patentable.
Claims 11 – 19 are rejected under 35 U.S.C. 101 because the claimed invention is directed to abstract idea of a mental concept of evaluation or observation without significantly more. The claims recite a neural network training apparatus. The claims fail the first prong of the 2019 Subject Matter Eligibility Guidance. The independent claim features of training a neural network using sample data, determining an indicator parameter, determining an update manner, and updating a parameter of a batch normalization are broad. The features do not state with specificity structure that gathers specific sample data. The claims do not further identify what the neural network is trained to do. The USPTO guidance example 39 shows the requirements for claiming training a neural network. The example is specific in defining the data of digital facial images from a database, are processed through transformation operation, and through the creation of training sets the network is trained. This judicial exception is not integrated into a practical application because the claims do not identify what the network is trained to do. The claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception because the claims do not identify any structure such as a database or operations that are performed to the data and how the neural network is trained to perform a specific operation. Thus, the claims fail the second prong of the 2019 Subject Matter Eligibility Guidance and not patentable.
Claim 20 is rejected under 35 U.S.C. 101 because the claimed invention is directed to abstract idea of a mental concept of evaluation or observation without significantly more. The claim recites a computer readable media on which a computer program instruction is stored. The claims fail the first prong of the 2019 Subject Matter Eligibility Guidance. The independent claim features of training a neural network using sample data, determining an indicator parameter, determining an update manner, and updating a parameter of a batch normalization are broad. The features do not state with specificity structure that gathers specific sample data. The claims do not further identify what the neural network is trained to do. The USPTO guidance example 39 shows the requirements for claiming training a neural network. The example is specific in defining the data of digital facial images from a database, are processed through transformation operation, and through the creation of training sets the network is trained. This judicial exception is not integrated into a practical application because the claims do not identify what the network is trained to do. The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception because the claims do not identify any structure such as a database or operations that are performed to the data and how the neural network is trained to perform a specific operation. Thus, the claims fail the second prong of the 2019 Subject Matter Eligibility Guidance and not patentable. 

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 20 is rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter. Claim 20 is directed to transitory signals such as computer readable media or a set of instructions (such as a game or software per se) and are not included in the four patent eligible subject matter categories, and needs to be amended to include "a non-transitory computer readable media" if covered by the specifications.

Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.

The following is a quotation of the first paragraph of pre-AIA  35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.

Claims 1 – 20 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA  35 U.S.C. 112, the inventors, at the time the application was filed, had possession of the claimed invention. The claims do not identify any specific features of the sample data, parameters, or threshold for which the operations are to be performed. Claims 7, 10, 17, and 19 disclose a feature of map, however, that feature is not determinative in the training of the neural network. 
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 1 – 8, 11 – 18, and 20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention. The claims contain the term”if” which is conditional, the MPEP allows for positive and negative claiming but not conditional. Application is encouraged to either positively claim the features by deleting the term “if” or negatively claim the features.

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claims 1- 20 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Towal US 2016/0321540.
As per claim 1, A neural network training method, comprising: 
training a first neural network to be trained by using sample data; (Towal paragraph 0071 discloses, “As an example, the network may be tasked with discriminating between a dog and a cat. In this example, a limited number of training samples or an error in training may be present.”)
determining an indicator parameter of the first neural network in a current training process; (Towal paragraph 0076 discloses, “FIG. 8 illustrates an example of filters 800 trained from a first training iteration (epoch 1) and the same filters 800 after a ninetieth training iteration (epoch 90). The training iterations may sometimes be referred to as training passes. In this example, a data set may have a specific number of images, such as ten thousand. The training uses the images from the data sets to adjust the weights of the filters based on the weight update equation (EQUATION 3). The weights of the filters may be adjusted after training on a specific number of images from the data set, such as one hundred images.”)
determining an update manner corresponding to a preset condition if the indicator parameter meets the preset condition; (Towal paragraph 0057 discloses, “The results of the deep neural network may then be thresholded 522 and passed through an exponential smoothing block 524 in the classify application 510.”) and 
updating a parameter of a batch normalization layer in the first neural network based on the update manner. ( Towal paragraph 0063 discloses, “The first convolution layer 604 outputs the results of the convolution to the second convolution layer 606. Furthermore, the second convolution layer 606 outputs the results of the convolution to a third convolution layer 608. Finally, a predicted label 610 is output from the third convolution layer 608. Of course, aspects of the present disclosure are not limited to three convolution layers and more or less convolution layers may be specified as desired.”)
As per claim 2, The neural network training method of claim 1, wherein the determining an update manner corresponding to a preset condition if the indicator parameter meets the preset condition comprises: 
if the indicator parameter meets the preset condition, determining that the update manner is to reduce a translation parameter of the batch normalization layer by a sum of a penalty parameter and a product of a gradient and a learning rate that are updated when each training is performed through backpropagation. (Towal paragraph 0043 discloses, “In lower layers, the gradient may depend on the value of the weights and on the computed error gradients of the higher layers. The weights may then be adjusted so as to reduce the error. This manner of adjusting the weights may be referred to as “back propagation” as it involves a “backward pass” through the neural network.’)
As per claim 3, The neural network training method of claim 2, wherein the determining an update manner corresponding to a preset condition if the indicator parameter meets the preset condition further comprises: 
if the indicator parameter does not meet the preset condition, determining that the update manner is to reduce a translation parameter of the batch normalization layer by a product of a gradient and a learning rate that are updated when each training is performed through backpropagation. (Towal paragraph 0083 discloses, “In another configuration, the training of filters that have a particular specificity is terminated to reduce computation costs. That is, the learning of filters that have a specificity that is greater than or equal to a threshold is stopped so that the weights of the filters are no longer updated.”)
As per claim 4, The neural network training method of claim 1, wherein the determining an indicator parameter of the first neural network in a current training process comprises: 
determining a translation parameter of the batch normalization layer of the first neural network in the current training process, wherein the determining an update manner corresponding to a preset condition if the indicator parameter meets the preset condition comprises: (Towal paragraph 0059 discloses, “According to certain aspects of the present disclosure, each local processing unit 202 may be configured to determine parameters of the model based upon desired one or more functional features of the model, and develop the one or more functional features towards the desired functional features as the determined parameters are further adapted, tuned and updated.”)
determining the update manner corresponding to the preset condition if the translation parameter is greater than a predetermined translation threshold. (Towal paragraph 0059 discloses, “According to certain aspects of the present disclosure, each local processing unit 202 may be configured to determine parameters of the model based upon desired one or more functional features of the model, and develop the one or more functional features towards the desired functional features as the determined parameters are further adapted, tuned and updated.”)
As per claim 5, The neural network training method of claim 1, wherein the determining an indicator parameter of the first neural network in a current training process comprises: 
determining times of training of the first neural network in the current training process, wherein the determining an update manner corresponding to a preset condition if the indicator parameter meets the preset condition comprises: (Towal paragraph 0093 discloses, “In one configuration, if the determined specificity of a filter is greater than a threshold, the network stops training the filter (block 1208). Additionally, or alternatively, the network stops training the filter (block 1208) when a change in the specificity of the specific filter is less than a threshold after the predetermined number of training iterations. In another configuration, as shown in block 1210, a filter is eliminated from the neural network model when the specificity of the specific filter is less than a threshold after the predetermined number of training iterations.”)
determining the update manner corresponding to the preset condition if the times of training is greater than a predetermined times threshold. (Towal paragraph 0093 discloses, “In one configuration, if the determined specificity of a filter is greater than a threshold, the network stops training the filter (block 1208). Additionally, or alternatively, the network stops training the filter (block 1208) when a change in the specificity of the specific filter is less than a threshold after the predetermined number of training iterations. In another configuration, as shown in block 1210, a filter is eliminated from the neural network model when the specificity of the specific filter is less than a threshold after the predetermined number of training iterations.”)
As per claim 6, The neural network training method of claim 1, wherein the determining an indicator parameter of the first neural network in a current training process comprises: 
determining training precision of the first neural network in the current training process, wherein the determining an update manner corresponding to a preset condition if the indicator parameter meets the preset condition comprises: (Towal paragraph 0093 discloses, “In one configuration, if the determined specificity of a filter is greater than a threshold, the network stops training the filter (block 1208). Additionally, or alternatively, the network stops training the filter (block 1208) when a change in the specificity of the specific filter is less than a threshold after the predetermined number of training iterations. In another configuration, as shown in block 1210, a filter is eliminated from the neural network model when the specificity of the specific filter is less than a threshold after the predetermined number of training iterations.”)
determining the update manner corresponding to the preset condition if the training precision is greater than a predetermined precision threshold. (Towal paragraph 0093 discloses, “In one configuration, if the determined specificity of a filter is greater than a threshold, the network stops training the filter (block 1208). Additionally, or alternatively, the network stops training the filter (block 1208) when a change in the specificity of the specific filter is less than a threshold after the predetermined number of training iterations. In another configuration, as shown in block 1210, a filter is eliminated from the neural network model when the specificity of the specific filter is less than a threshold after the predetermined number of training iterations.”)
As per claim 7, The neural network training method of claim 1, wherein the determining an indicator parameter of the first neural network in a current training process comprises: 
determining a ratio of zero elements to all elements in a feature map output from one or more layers of the first neural network in the current training process, wherein the determining an update manner corresponding to a preset condition if the indicator parameter meets the preset condition comprises: (Towal paragraph 0070 discloses, “FIG. 7 illustrates a set of weak filters 702 compared to a set of strong filters 704. As shown in FIG. 7, the weak filters 702 do not have specific definitions. For example, each of the weak filters 702 is generalized and does not have a well-defined outline. In contrast, the definition of the strong filters 704 is greater than the definition of the weak filters 702, such that various lines and angles are visible. The strong filters 704 improve the detection of specific features of an input, such as whether one or more horizontal lines are present in an image.”)
determining the update manner corresponding to the preset condition if the ratio of zero elements to all elements is less than a first ratio threshold. (Towal paragraph 0070 discloses, “FIG. 7 illustrates a set of weak filters 702 compared to a set of strong filters 704. As shown in FIG. 7, the weak filters 702 do not have specific definitions. For example, each of the weak filters 702 is generalized and does not have a well-defined outline. In contrast, the definition of the strong filters 704 is greater than the definition of the weak filters 702, such that various lines and angles are visible. The strong filters 704 improve the detection of specific features of an input, such as whether one or more horizontal lines are present in an image.”)
As per claim 8, The neural network training method of claim 7, wherein the determining an update manner corresponding to a preset condition if the indicator parameter meets the preset condition comprises: 
determining whether the ratio of zero elements to all elements is less than a second ratio threshold if the ratio of zero elements to all elements is greater than the first ratio threshold, wherein the second ratio threshold is greater than the first ratio threshold; (Towal paragraph 0077 discloses, “As shown in FIG. 8 at the first training pass, each filter has a specific entropy. For example, a first filter 802 has an entropy of 2.006 and a second filter 804 has an entropy of 2.018. The filters in the first training pass are ordered from low entropy to high entropy. Furthermore, as shown in FIG. 8, the entropy of each filter is modified after the ninetieth training pass (epoch 90). The filters in the ninetieth training pass are ordered from low entropy to high entropy. It should be noted that because the filters in both epoch 1 and epoch 90 are ordered from low entropy to high entropy, the same filters do not have the same positions in each figure. That is, the first filter 808 of epoch 1 may or may not be the first filter 808 of epoch 90. In other words, the first filter 802 of epoch 1 may have had a greater change in entropy in comparison to neighboring filters such that the first filter 802 of epoch 1 may be, for example, an eleventh filter 814 of epoch 90.”) and 
determining the update manner corresponding to the preset condition if the ratio of zero elements to all elements is less than the second ratio threshold. (Towal paragraph 0077 discloses, “As shown in FIG. 8 at the first training pass, each filter has a specific entropy. For example, a first filter 802 has an entropy of 2.006 and a second filter 804 has an entropy of 2.018. The filters in the first training pass are ordered from low entropy to high entropy. Furthermore, as shown in FIG. 8, the entropy of each filter is modified after the ninetieth training pass (epoch 90). The filters in the ninetieth training pass are ordered from low entropy to high entropy. It should be noted that because the filters in both epoch 1 and epoch 90 are ordered from low entropy to high entropy, the same filters do not have the same positions in each figure. That is, the first filter 808 of epoch 1 may or may not be the first filter 808 of epoch 90. In other words, the first filter 802 of epoch 1 may have had a greater change in entropy in comparison to neighboring filters such that the first filter 802 of epoch 1 may be, for example, an eleventh filter 814 of epoch 90.”)
As per claim 9, The neural network training method of claim 7, wherein the first ratio threshold is updated as a number of iterations increases. (Towal paragraph 0030 discloses, “Specifically, in one configuration, when training a neural network model, a specificity of one or more filters is determined after a predetermined number of training iterations. Furthermore, in this configuration, the network determines whether to continue training each filter based on the specificity.” and paragraph 0063 discloses various iterations)
As per claim 10, The neural network training method of claim 1, wherein the determining an indicator parameter of the first neural network in a current training process comprises: 
outputting a first feature map of the sample data through a predetermined layer of the first neural network in the current training process; (Towal paragraph 0049 discloses, “The outputs of the convolutional connections may be considered to form a feature map in the subsequent layer 318 and 320, with each element of the feature map (e.g., 320) receiving input from a range of neurons in the previous layer (e.g., 318) and from each of the multiple channels. The values in the feature map may be further processed with a non-linearity, such as a rectification, max(0,x). Values from adjacent neurons may be further pooled, which corresponds to down sampling, and may provide additional local invariance and dimensionality reduction. Normalization, which corresponds to whitening, may also be applied through lateral inhibition between neurons in the feature map.”)
outputting a second feature map of the sample data through a corresponding predetermined layer of a trained second neural network; (Towal paragraph 0049 discloses, “The outputs of the convolutional connections may be considered to form a feature map in the subsequent layer 318 and 320, with each element of the feature map (e.g., 320) receiving input from a range of neurons in the previous layer (e.g., 318) and from each of the multiple channels. The values in the feature map may be further processed with a non-linearity, such as a rectification, max(0,x). Values from adjacent neurons may be further pooled, which corresponds to down sampling, and may provide additional local invariance and dimensionality reduction. Normalization, which corresponds to whitening, may also be applied through lateral inhibition between neurons in the feature map.”) and 
determining the indicator parameter of the first neural network in the current training process based on a loss function value between the first feature map and the second feature map. (Towal paragraph 0049 discloses, “The outputs of the convolutional connections may be considered to form a feature map in the subsequent layer 318 and 320, with each element of the feature map (e.g., 320) receiving input from a range of neurons in the previous layer (e.g., 318) and from each of the multiple channels. The values in the feature map may be further processed with a non-linearity, such as a rectification, max(0,x). Values from adjacent neurons may be further pooled, which corresponds to down sampling, and may provide additional local invariance and dimensionality reduction. Normalization, which corresponds to whitening, may also be applied through lateral inhibition between neurons in the feature map.”)
As per claim 11, A neural network training apparatus, comprising: 
a processor; (Towal paragraph 0013 discloses, “Another aspect of the present disclosure is directed to an apparatus for training a neural network model having a memory and one or more processors coupled to the memory. The processor(s) is configured to determine a specificity of multiple filters after a predetermined number of training iterations. The processor(s) is also configured to train each of the filters based on the specificity.”) and 
a memory on which a computer program instruction is stored, wherein when the computer program instruction is executed by the processor, the processor performs the following steps: (Towal paragraph 0013 discloses, “Another aspect of the present disclosure is directed to an apparatus for training a neural network model having a memory and one or more processors coupled to the memory. The processor(s) is configured to determine a specificity of multiple filters after a predetermined number of training iterations. The processor(s) is also configured to train each of the filters based on the specificity.”)
training a first neural network to be trained by using sample data; (Towal paragraph 0071 discloses, “As an example, the network may be tasked with discriminating between a dog and a cat. In this example, a limited number of training samples or an error in training may be present.”)
determining an indicator parameter of the first neural network in a current training process; (Towal paragraph 0093 discloses, “In one configuration, if the determined specificity of a filter is greater than a threshold, the network stops training the filter (block 1208). Additionally, or alternatively, the network stops training the filter (block 1208) when a change in the specificity of the specific filter is less than a threshold after the predetermined number of training iterations. In another configuration, as shown in block 1210, a filter is eliminated from the neural network model when the specificity of the specific filter is less than a threshold after the predetermined number of training iterations.”)
determining an update manner corresponding to a preset condition if the indicator parameter meets the preset condition; (Towal paragraph 0057 discloses, “The results of the deep neural network may then be thresholded 522 and passed through an exponential smoothing block 524 in the classify application 510.”)  and 
updating a parameter of a batch normalization layer in the first neural network based on the update manner. ( Towal paragraph 0063 discloses, “The first convolution layer 604 outputs the results of the convolution to the second convolution layer 606. Furthermore, the second convolution layer 606 outputs the results of the convolution to a third convolution layer 608. Finally, a predicted label 610 is output from the third convolution layer 608. Of course, aspects of the present disclosure are not limited to three convolution layers and more or less convolution layers may be specified as desired.”)
As per claim 12, The neural network training apparatus of claim 11, wherein the determining an update manner corresponding to a preset condition if the indicator parameter meets the preset condition comprises: 
if the indicator parameter meets the preset condition, determining that the update manner is to reduce a translation parameter of the batch normalization layer by a sum of a penalty parameter and a product of a gradient and a learning rate that are updated when each training is performed through backpropagation. (Towal paragraph 0043 discloses, “In lower layers, the gradient may depend on the value of the weights and on the computed error gradients of the higher layers. The weights may then be adjusted so as to reduce the error. This manner of adjusting the weights may be referred to as “back propagation” as it involves a “backward pass” through the neural network.’)
As per claim 13, The neural network training apparatus of claim 12, wherein the determining an update manner corresponding to a preset condition if the indicator parameter meets the preset condition further comprises: 
if the indicator parameter does not meet the preset condition, determining that the update manner is to reduce a translation parameter of the batch normalization layer by a product of a gradient and a learning rate that are updated when each training is performed through backpropagation. (Towal paragraph 0083 discloses, “In another configuration, the training of filters that have a particular specificity is terminated to reduce computation costs. That is, the learning of filters that have a specificity that is greater than or equal to a threshold is stopped so that the weights of the filters are no longer updated.”)
As per claim 14, The neural network training apparatus of claim 11, wherein the determining an indicator parameter of the first neural network in a current training process comprises: 
determining a translation parameter of the batch normalization layer of the first neural network in the current training process; wherein the determining an update manner corresponding to a preset condition if the indicator parameter meets the preset condition comprises: (Towal paragraph 0059 discloses, “According to certain aspects of the present disclosure, each local processing unit 202 may be configured to determine parameters of the model based upon desired one or more functional features of the model, and develop the one or more functional features towards the desired functional features as the determined parameters are further adapted, tuned and updated.”)
determining the update manner corresponding to the preset condition if the translation parameter is greater than a predetermined translation threshold. (Towal paragraph 0059 discloses, “According to certain aspects of the present disclosure, each local processing unit 202 may be configured to determine parameters of the model based upon desired one or more functional features of the model, and develop the one or more functional features towards the desired functional features as the determined parameters are further adapted, tuned and updated.”)
As per claim 15, The neural network training apparatus of claim 11, wherein the determining an indicator parameter of the first neural network in a current training process comprises: 
determining times of training of the first neural network in the current training process; wherein the determining an update manner corresponding to a preset condition if the indicator parameter meets the preset condition comprises: (Towal paragraph 0093 discloses, “In one configuration, if the determined specificity of a filter is greater than a threshold, the network stops training the filter (block 1208). Additionally, or alternatively, the network stops training the filter (block 1208) when a change in the specificity of the specific filter is less than a threshold after the predetermined number of training iterations. In another configuration, as shown in block 1210, a filter is eliminated from the neural network model when the specificity of the specific filter is less than a threshold after the predetermined number of training iterations.”)
determining the update manner corresponding to the preset condition if the times of training is greater than a predetermined times threshold. (Towal paragraph 0093 discloses, “In one configuration, if the determined specificity of a filter is greater than a threshold, the network stops training the filter (block 1208). Additionally, or alternatively, the network stops training the filter (block 1208) when a change in the specificity of the specific filter is less than a threshold after the predetermined number of training iterations. In another configuration, as shown in block 1210, a filter is eliminated from the neural network model when the specificity of the specific filter is less than a threshold after the predetermined number of training iterations.”)
As per claim 16, The neural network training apparatus of claim 11, wherein the determining an indicator parameter of the first neural network in a current training process comprises: 
determining training precision of the first neural network in the current training process; wherein the determining an update manner corresponding to a preset condition if the indicator parameter meets the preset condition comprises: (Towal paragraph 0093 discloses, “In one configuration, if the determined specificity of a filter is greater than a threshold, the network stops training the filter (block 1208). Additionally, or alternatively, the network stops training the filter (block 1208) when a change in the specificity of the specific filter is less than a threshold after the predetermined number of training iterations. In another configuration, as shown in block 1210, a filter is eliminated from the neural network model when the specificity of the specific filter is less than a threshold after the predetermined number of training iterations.”)
determining the update manner corresponding to the preset condition if the training precision is greater than a predetermined precision threshold. (Towal paragraph 0093 discloses, “In one configuration, if the determined specificity of a filter is greater than a threshold, the network stops training the filter (block 1208). Additionally, or alternatively, the network stops training the filter (block 1208) when a change in the specificity of the specific filter is less than a threshold after the predetermined number of training iterations. In another configuration, as shown in block 1210, a filter is eliminated from the neural network model when the specificity of the specific filter is less than a threshold after the predetermined number of training iterations.”)
As per claim 17, The neural network training apparatus of claim 11, wherein the determining an indicator parameter of the first neural network in a current training process comprises: 
determining a ratio of zero elements to all elements in a feature map output from one or more layers of the first neural network in the current training process; wherein the determining an update manner corresponding to a preset condition if the indicator parameter meets the preset condition comprises: (Towal paragraph 0070 discloses, “FIG. 7 illustrates a set of weak filters 702 compared to a set of strong filters 704. As shown in FIG. 7, the weak filters 702 do not have specific definitions. For example, each of the weak filters 702 is generalized and does not have a well-defined outline. In contrast, the definition of the strong filters 704 is greater than the definition of the weak filters 702, such that various lines and angles are visible. The strong filters 704 improve the detection of specific features of an input, such as whether one or more horizontal lines are present in an image.”)
determining the update manner corresponding to the preset condition if the ratio of zero elements to all elements is less than a first ratio threshold. (Towal paragraph 0070 discloses, “FIG. 7 illustrates a set of weak filters 702 compared to a set of strong filters 704. As shown in FIG. 7, the weak filters 702 do not have specific definitions. For example, each of the weak filters 702 is generalized and does not have a well-defined outline. In contrast, the definition of the strong filters 704 is greater than the definition of the weak filters 702, such that various lines and angles are visible. The strong filters 704 improve the detection of specific features of an input, such as whether one or more horizontal lines are present in an image.”)
As per claim 18, The neural network training apparatus of claim 17, wherein the determining an update manner corresponding to a preset condition if the indicator parameter meets the preset condition comprises: 
determining whether the ratio of zero elements to all elements is less than a second ratio threshold if the ratio of zero elements to all elements is greater than the first ratio threshold, wherein the second ratio threshold is greater than the first ratio threshold; (Towal paragraph 0077 discloses, “As shown in FIG. 8 at the first training pass, each filter has a specific entropy. For example, a first filter 802 has an entropy of 2.006 and a second filter 804 has an entropy of 2.018. The filters in the first training pass are ordered from low entropy to high entropy. Furthermore, as shown in FIG. 8, the entropy of each filter is modified after the ninetieth training pass (epoch 90). The filters in the ninetieth training pass are ordered from low entropy to high entropy. It should be noted that because the filters in both epoch 1 and epoch 90 are ordered from low entropy to high entropy, the same filters do not have the same positions in each figure. That is, the first filter 808 of epoch 1 may or may not be the first filter 808 of epoch 90. In other words, the first filter 802 of epoch 1 may have had a greater change in entropy in comparison to neighboring filters such that the first filter 802 of epoch 1 may be, for example, an eleventh filter 814 of epoch 90.”) and 
determining the update manner corresponding to the preset condition if the ratio of zero elements to all elements is less than the second ratio threshold. (Towal paragraph 0077 discloses, “As shown in FIG. 8 at the first training pass, each filter has a specific entropy. For example, a first filter 802 has an entropy of 2.006 and a second filter 804 has an entropy of 2.018. The filters in the first training pass are ordered from low entropy to high entropy. Furthermore, as shown in FIG. 8, the entropy of each filter is modified after the ninetieth training pass (epoch 90). The filters in the ninetieth training pass are ordered from low entropy to high entropy. It should be noted that because the filters in both epoch 1 and epoch 90 are ordered from low entropy to high entropy, the same filters do not have the same positions in each figure. That is, the first filter 808 of epoch 1 may or may not be the first filter 808 of epoch 90. In other words, the first filter 802 of epoch 1 may have had a greater change in entropy in comparison to neighboring filters such that the first filter 802 of epoch 1 may be, for example, an eleventh filter 814 of epoch 90.”)
As per claim 19, The neural network training apparatus of claim 11, wherein the determining an indicator parameter of the first neural network in a current training process comprises: 
outputting a first feature map of the sample data through a predetermined layer of the first neural network in the current training process; (Towal paragraph 0049 discloses, “The outputs of the convolutional connections may be considered to form a feature map in the subsequent layer 318 and 320, with each element of the feature map (e.g., 320) receiving input from a range of neurons in the previous layer (e.g., 318) and from each of the multiple channels. The values in the feature map may be further processed with a non-linearity, such as a rectification, max(0,x). Values from adjacent neurons may be further pooled, which corresponds to down sampling, and may provide additional local invariance and dimensionality reduction. Normalization, which corresponds to whitening, may also be applied through lateral inhibition between neurons in the feature map.”)
outputting a second feature map of the sample data through a corresponding predetermined layer of a trained second neural network; (Towal paragraph 0049 discloses, “The outputs of the convolutional connections may be considered to form a feature map in the subsequent layer 318 and 320, with each element of the feature map (e.g., 320) receiving input from a range of neurons in the previous layer (e.g., 318) and from each of the multiple channels. The values in the feature map may be further processed with a non-linearity, such as a rectification, max(0,x). Values from adjacent neurons may be further pooled, which corresponds to down sampling, and may provide additional local invariance and dimensionality reduction. Normalization, which corresponds to whitening, may also be applied through lateral inhibition between neurons in the feature map.”) and 
determining the indicator parameter of the first neural network in the current training process based on a loss function value between the first feature map and the second feature map. (Towal paragraph 0049 discloses, “The outputs of the convolutional connections may be considered to form a feature map in the subsequent layer 318 and 320, with each element of the feature map (e.g., 320) receiving input from a range of neurons in the previous layer (e.g., 318) and from each of the multiple channels. The values in the feature map may be further processed with a non-linearity, such as a rectification, max(0,x). Values from adjacent neurons may be further pooled, which corresponds to down sampling, and may provide additional local invariance and dimensionality reduction. Normalization, which corresponds to whitening, may also be applied through lateral inhibition between neurons in the feature map.”)
As per claim 20, A computer readable media on which a computer program instruction is stored, wherein when the computer program instruction is executed by a processor, the processor performs the following steps: 
training a first neural network to be trained by using sample data; (Towal paragraph 0071 discloses, “As an example, the network may be tasked with discriminating between a dog and a cat. In this example, a limited number of training samples or an error in training may be present.”)
determining an indicator parameter of the first neural network in a current training process; (Towal paragraph 0093 discloses, “In one configuration, if the determined specificity of a filter is greater than a threshold, the network stops training the filter (block 1208). Additionally, or alternatively, the network stops training the filter (block 1208) when a change in the specificity of the specific filter is less than a threshold after the predetermined number of training iterations. In another configuration, as shown in block 1210, a filter is eliminated from the neural network model when the specificity of the specific filter is less than a threshold after the predetermined number of training iterations.”)
determining an update manner corresponding to a preset condition if the indicator parameter meets the preset condition; (Towal paragraph 0057 discloses, “The results of the deep neural network may then be thresholded 522 and passed through an exponential smoothing block 524 in the classify application 510.”)  and 
updating a parameter of a batch normalization layer in the first neural network based on the update manner. ( Towal paragraph 0063 discloses, “The first convolution layer 604 outputs the results of the convolution to the second convolution layer 606. Furthermore, the second convolution layer 606 outputs the results of the convolution to a third convolution layer 608. Finally, a predicted label 610 is output from the third convolution layer 608. Of course, aspects of the present disclosure are not limited to three convolution layers and more or less convolution layers may be specified as desired.”)

Examiner Request
The examiner requests, in response to this office action, support must be shown for language added to any original claims on amendment and any new claims. That is, indicate support for newly added claim language by specifically pointing to page(s) and line number(s) in the specification and/or drawing figure(s). This will assist the examiner in prosecuting the application. When responding to this office action, applicant is advised to clearly point out the patentable novelty which he or she thinks the claims present, in view of the state of art disclosed by the references cited or the objections made. He or she must also show how the amendments avoid such references or objections. In amending in reply to a rejection of claims in an application or patent under reexamination, the applicant or patent owner must clearly point out the patentable novelty which he or she thinks the claims present in view the state of the art disclosed by the references cited or the objections made. The applicant or patent owner must also show how the amendments avoid such references or objections.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to TYLER D PAIGE whose telephone number is (571)270-5425. The examiner can normally be reached M-F 7:00am - 6:00pm (mst).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Thomas Black can be reached on 571-272-6956. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/TYLER D PAIGE/Examiner, Art Unit 3666