DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendment
Acknowledgement is made of Applicant's claim amendments on 2/16/2021. The claim amendments are entered. Presently, claims 1, 2, 4-6, 9, 10, 13, 16, 17, 19-21, and 23 are now pending. Claims 3, 7, 8, 11, 12, 14, 15, 18, and 22 have been cancelled. Claims 1, 9, and 16 have been amended.  

Response to Arguments
Applicant's arguments filed on 2/16/2021 have been fully considered but they are not persuasive.

Applicant argues that the cited references do not allegedly teach the amended claim limitations (Applicant’s reply pgs. 9-10). This is not persuasive because the amendments seek to introduce a labeling scheme into the word description that was previously recited. That is, the introduction of P, M, and N labels into the word description of the computation, along with an equation representation of the word description of the computation in the previously recited claims. Accordingly, contrary to Applicant’s arguments, the cited references do teach the amended limitations, as shown below. 


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date 


Claims 1, 6, 16, 19, 21, and 23 are rejected under 35 U.S.C. 103 as being unpatentable over Annapureddy et. al. (U.S. Pat. App. Pre-Grant Pub. No. 2016/0217369, hereinafter Annapureddy) in view of Wang et. al. (U.S. Pat. App. Pre-Grant Pub. No. 2019/0279089, hereinafter Wang, that claims priority to the CN filing and has PCT publication WO 2018/090706). 

Regarding claim 1, Annapureddy teaches:
A method of modifying a neural network, comprising: 
identifying a first layer of a neural network and a second layer of the neural network adjacent to the first layer of the neural network, wherein a first connection number of neuron connections are between the first layer and the second layer (Figs. 5A, 5B, 8A, and 8B: showing a first layer and a second layer with connections between the two layers. Similarly, see [0078]-[0079], [0093], and [0102]: describing the various figures.);
selecting a first number “P” of insertion neurons for an insertion layer of the neural network (see “Selection of design parameters”: describing “[s]everal of the design parameters involved in compression (e.g., the number of intermediate neurons in the case of fully-connected layers…)” ([0114]), whereby “[i]n one example, the number of neurons in the intermediate layer r may be selected using a parameter sweep” (0115]). The neurons of an intermediate layer, such as an initial intermediate layer, can comprise a first number of neurons.), 
“P” of insertion neurons in the insertion layer being less than a number “M” of first neurons in the first layer of the neural network and less than a number “N” of second neurons in the second layer of the neural network ([0079]: describing the compression of the fully-connected layer fc with the compressed layers fc1 and fc2 comprising a reduced r of n input neurons and m output neurons, wherein “[t]he number of neurons r1 and r2 in the intermediate layers may be determined based on the compression they achieve, and the performance of the resulting compressed network.” 
See also [0119]-[0122]: describing the insertion of non-linear layers to compress the network and enhance its performance. Wherein “[t]he [compression] process may further specify the size of the second layer, which may be denoted (r), such that the combined memory footprint of the compressed neural network, which includes the second and third layer, is smaller than the memory footprint (M) of the uncompressed neural network.” ([0135]). See also Figs. 5A-9B: showing the compressed and uncompressed network with the various adjacent layers, wherein the number of electrons r1 and r2 is less than both the number of neurons in the first and second layers (Figs. 6A and 6B).), 
wherein "P" is calculated according to the following equation: P<(M x N)/(M +N) ([0115]: “the number of neurons in the intermediate layer r [corresponding to the compressed layer] may be selected using a parameter sweep. The value of r may be swept from 16 to min(n, m) in increments of 16”, wherein n denotes input neurons (i.e. a first layer), m denotes output neurons (i.e. a second layer) (see [0078] for definitions of n and m), and r corresponds to the number of neurons P. Similarly, see [0118] for the pseudocode stating that “the compressed network may be fine-tuned for each determined parameter value (e.g., r)” by calculating min(n, m). That is, the number of neurons selected r for a compressed layer is less than some given neurons n and m of the original layer due to the min(n, m) calculation.); 
modifying the neural network by inserting the insertion layer between the first layer and the second layer of the neural network ([0119]-[0121]: describing a modification of the neural network by compressing an original 9x9 neural net into a final compressed version with 3x3 convolution layers and inserting nonlinear layers between the 3x3 convolution layers. See also [0124]: “In block 1004, the process inserts nonlinearity [layers] between the compressed layers of the compressed network. In some aspects, the process inserts the nonlinearity by applying a nonlinear activation function to neurons of the compressed layers.”), 
wherein the first number of insertion neurons is also selected based on the first number of insertion neurons being such that a second connection number of neuron connections between the insertion layer and the first layer and between the insertion layer and the second layer is less than the first connection number of neuron connections between the first layer and the second layer ([0116]-[0117]: describing techniques to determine the number of neurons in the compressed layer, wherein the number of neurons in the compressed layer is less than the number of neurons in the original uncompressed layer. See also [0079l: describing that the compression of the fully connected layer fc with the compressed layers fcl and fc2 comprising a reduced r of n input neurons and m output neurons. 
See also “Insertion of Nonlinear Layers Between Compressed Layers”, describing that “nonlinear layers may be added [i.e. inserted] between compressed layers to improve the representational capacity of the network” ([0119]). Wherein “[t]he [compression] process may further specify the size of the second layer .... Alternatively, or in addition to a reduction in memory footprint, the size of the second layer may be chosen so that the inference time of the compressed neural network may be smaller than the inference time (T) of the uncompressed neural network.” ([0135]). See also Figs. 5A-9B showing the compressed and uncompressed network with the various adjacent layer.); 
… by replacing the insertion layer ([0042]: “a layer may be compressed by replacing it with multiple layers of the same type”.) … and 
the performance threshold is based on performance of the neural network without the insertion layer inserted ([0079]: describing that “a single fully-connected layer is replaced with two fully-connected layers [in Figs. 5A and 5B]”, as well as “an uncompressed fully-connected layer ‘fc’ is replaced with three fully-connected layers ‘fc1’, ‘fc2’ and ‘fc3’ [in Figs. 6A and 6B]” wherein “[t]he number of neurons r1 and r2 in the intermediate layers may be determined based on the compression they achieve, and the performance of the resulting compressed network.” That is, the performance threshold is based on the replaced convolutional layers rather than any inserted layers.).

While Annapureddy teaches the limitations of claim 1 and “insertion”, Annapureddy does not explicitly teach: “obtaining a first value and a second value for a characteristic of the … layer, the characteristic including one or more of: a fraction of inactive neurons, weight initialization methods, a fraction of fixed-weight synapses, and a fraction of inactive synapses; determining a first performance of the modified neural network using the first value of the characteristic of the … layer; determining a second performance of the modified neural network using the second value of the characteristic of the … layer; and in response to at least one of the first performance and the second performance satisfying a performance threshold, Wang discloses the claim limitations, teaching: 
“obtaining a first value and a second value for a characteristic of the … layer, the characteristic including one or more of (Wang [0032] and [0033]: describing importance value and diversity value of neurons in a neural network layer.):
a fraction of inactive neurons (Wang [0069]: “At step 402a, a set of neurons is initialized as a null set C”), weight initialization methods (Wang [0092]: “the neural network having the weights adjusted may be used as an initial network model which can be re-trained based on original training data T at a low learning rate, so as to further improve the network accuracy of the pruned neural network”.), a fraction of fixed-weight synapses, and a fraction of inactive synapses; 
determining a first performance of the modified neural network using the first value of the characteristic of the … layer (Wang [0095]: “An importance value determining unit 81 may be configured to determine importance values of neurons in a network layer to be pruned based on activation values of the neurons.” Similarly, see also Wang [0103]: describing an “importance value determining module 814 may be configured to obtain the importance value of each neuron by normalizing the variance for the neuron based on the neuron variance importance vector.” 
This determination is used to evaluate the neural network’s performance, wherein “[a] neuron selecting unit 83 may be configured to select, from the network layer to be pruned, neurons to be retained based on the importance values and diversity values of the neurons in the network layer to be pruned in accordance with a volume maximization neuron selection policy.
A pruning unit 84 may be configured to prune the other neurons from the network layer to be pruned to obtain a pruned network layer.” (Wang [0097]-[0098]).); 
determining a second performance of the modified neural network using the second value of the characteristic of the … layer (Wang [0096]: “A diversity value determining unit 82 may be configured to determine a diversity value of each neuron in the network layer to be pruned based on connecting weights between the neuron and neurons in a next network layer.” 
This determination is used to evaluate the neural network’s performance, wherein “[a] neuron selecting unit 83 may be configured to select, from the network layer to be pruned, neurons to be retained based on the importance values and diversity values of the neurons in the network layer to be pruned in accordance with a volume maximization neuron selection policy.
A pruning unit 84 may be configured to prune the other neurons from the network layer to be pruned to obtain a pruned network layer.” (Wang [0097]-[0098]).); and 
in response to at least one of the first performance and the second performance satisfying a performance threshold, modifying the modified neural network … with an updated layer having a second number of neurons, wherein the second number is less than the first number (Wang [0075]: describing that “importance value of a neuron reflects a degree of impact the neuron has on an output result from the neural network, and a diversity of a neuron reflects its expression capability. Hence, the neurons selected in accordance with the volume maximization neuron selection policy have greater contributions to the output result from the neural network and higher expression capabilities, while the pruned neurons are neurons having smaller contributions to the output result from the neural network and lower expression capabilities. Accordingly, when compared with the original neural network, the pruned neural network may achieve good compression and acceleration effects while having little accuracy loss. Therefore, the pruning method according to the embodiments of the present disclosure may achieve good compression and acceleration effects while maintaining the accuracy of the neural network.” 
Wherein the pruned neural network has a smaller number of neurons than the original neural network since the neurons that do not meet the “volume maximization neuron selection policy” as it relates to the importance and diversity values are not retained, whereby this pruning process does not entail an insertion since pruning involves removing the number of original neurons in a neural network that do not meet the “volume maximization neuron selection policy” (Wang [0061], [0097], and [0116]). 
“[T]he neurons finally selected to be retained are optimal” (Wang [0061]) to maintain “the accuracy of the neural network” (Wang [0075] and [0126]) upon compression. That is, when the evaluation of the importance and diversity values for the neurons meet some predetermined optimal threshold that can maintain the neural network’s accuracy, then those neurons will be retained while the other neurons that fall below the threshold are pruned. The pruned neural network simply comprising an updated layer with a smaller number of neurons than the corresponding layer in the original neural network, resulting in a more compressed version of the original neural network.)….”
Thus, it would have been obvious to Person Having Ordinary Skill in the Art (PHOSITA) before the effective filing date (EFD) to modify the method in Annapureddy to include the Wang. Doing so would enable “a method and an apparatus for neural network pruning … [that] includes: determining (101) importance values of neurons in a network layer to be pruned based on activation values of the neurons; determining (102) a diversity value of each neuron in the network layer to be pruned based on connecting weights between the neuron and neurons in a next network layer; selecting (103), from the network layer to be pruned, neurons to be retained based on the importance values and diversity values of the neurons in the network layer to be pruned in accordance with a volume maximization neuron selection policy; and pruning (104) the other neurons from the network layer to be pruned to obtain a pruned network layer. With the above method, good compression and acceleration effects can be achieved while maintaining the accuracy of the neural network.” (Wang Abstract).

Regarding claim 6, Annapureddy teaches:
The method of claim 1, further comprising selecting the second number of neurons for the updated layer based on the second number of neurons for the updated layer increasing efficiency as compared to that of the insertion layer while maintaining a level of performance greater than the performance threshold ([0043]: “To combat the drop in accuracy, in some aspects, the compressed model may be fine-tuned using training examples.… The fine-tuning of the compressed model may be conducted on a layer-by-layer basis.” See also [0117]: “each layer may be compressed to the maximum extent possible so that the functional performance does not drop below a threshold”, wherein this maximum extent comprises “[t]he lowest value of r for which the drop in classification accuracy is acceptable (e.g., below a threshold) may be selected” ([0115]). Whereby the value of r comprises the number of neurons in the compressed layer of the compressed network. That is, when creating the compressed layer, the number of r should be optimized to ensure the level of performance is retained (see [0139]-[0140]).
See also [0056] and [0135]: describing neurons in a second layer, i.e. second number of neurons.).

Regarding claim 16, Annapureddy teaches:
A system, comprising: 
one or more processors configured to perform operations or cause the system to perform operations, the operations comprising: ([0074]: “The [machine learning] model includes replacing means, inserting means and tuning means. In one aspect, the replacing means, inserting means, and/or tuning means may be the general-purpose processor 102, program memory associated with the general-purpose processor 102, memory block 118, local processing units 202, and or the routing connection processing units 216 configured to perform the functions recited.”): 
identify a first layer of a neural network and a second layer of the neural network adjacent to the first layer of the neural network, wherein a first connection number of neuron connections are between the first layer and the second layer (Figs. 5A, 5B, 8A, and 8B: showing a first layer and a second layer with connections between the two layers. Similarly, see 0078]-[0079], [0093], and [0102]: describing the various figures.);
select a first number “P” of insertion neurons for an insertion layer of the neural network (see “Selection of design parameters”: describing “[s]everal of the design parameters involved in compression (e.g., the number of intermediate neurons in the case of fully-connected layers…)” ([0114]), whereby “[i]n one example, the number of neurons in the intermediate layer r may be selected using a parameter sweep” (0115]). The neurons of an intermediate layer, such as an initial intermediate layer, can comprise a first number of neurons.), 
the first number “P” of insertion neurons in the insertion layer being less than a number “M” of first neurons in the first layer of the neural network and less than a number “N” of second neurons in the second layer of the neural network ([0079]: describing the compression of the fully-connected layer fc with the compressed layers fc1 and fc2 comprising a reduced r of n input neurons and m output neurons, wherein “[t]he number of neurons r1 and r2 in the intermediate layers may be determined based on the compression they achieve, and the performance of the resulting compressed network.” 
See also [0119]-[0122]: describing the insertion of non-linear layers to compress the network and enhance its performance. Wherein “[t]he [compression] process may further specify the size of the second layer, which may be denoted (r), such that the combined memory footprint of the compressed neural network, which includes the second and third layer, is smaller than the memory footprint (M) of the uncompressed neural network.” ([0135]). See also Figs. 5A-9B: showing the compressed and uncompressed network with the various adjacent layers, wherein the number of electrons r1 and r2 is less than both the number of neurons in the first and second layers (Figs. 6A and 6B).), 
wherein "P" is calculated according to the following equation: P<(M x N)/(M +N) ([0115]: “the number of neurons in the intermediate layer r [corresponding to the compressed layer] may be selected using a parameter sweep. The value of r may be swept from 16 to min(n, m) in increments of 16”, wherein n denotes input neurons (i.e. a first layer), m denotes output neurons (i.e. a second layer) (see [0078] for definitions of n and m), and r corresponds to the number of neurons P. Similarly, see [0118] for the pseudocode stating that “the compressed network may be fine-tuned for each determined parameter value (e.g., r)” by calculating min(n, m). That is, the number of neurons selected r for a compressed layer is less than some given neurons n and m of the original layer due to the min(n, m) calculation.);  
modify the neural network by inserting the insertion layer between the first layer and the second layer of the neural network ([0119]-[0121]: describing a modification of the neural network by compressing an original 9x9 neural net into a final compressed version with 3x3 convolution layers and inserting nonlinear layers between the 3x3 convolution layers. See also [0124]: “In block 1004, the process inserts nonlinearity [layers] between the compressed layers of the compressed network. In some aspects, the process inserts the nonlinearity by applying a nonlinear activation function to neurons of the compressed layers.”), 
wherein the first number of insertion neurons is also selected based on the first number of insertion neurons being such that a second connection number of neuron connections between the insertion layer and the first layer and between the insertion layer and the second layer is less than the first connection number of neuron connections between the first layer and the second layer ([0116]-[0117]: describing techniques to determine the number of neurons in the compressed layer, wherein the number of neurons in the compressed layer is less than the number of neurons in the original uncompressed layer. See also [0079l: describing that the compression of the fully connected layer fc with the compressed layers fcl and fc2 comprising a reduced r of n input neurons and m output neurons.
See also “Insertion of Nonlinear Layers Between Compressed Layers”, describing that “nonlinear layers may be added [i.e. inserted] between compressed layers to improve the representational capacity of the network” ([0119]). 
Wherein “[t]he [compression] process may further specify the size of the second layer .... Alternatively, or in addition to a reduction in memory footprint, the size of the second layer may be chosen so that the inference time of the compressed neural network may be smaller than the inference time (T) of the uncompressed neural network.” ([0135]). See also Figs. 5A-9B showing the compressed and uncompressed network with the various adjacent layer.);  
… by replacing the insertion layer ([0042]: “a layer may be compressed by replacing it with multiple layers of the same type”.) … and 
the performance threshold is based on performance of the neural network without the insertion layer inserted ([0079]: describing that “a single fully-connected layer is replaced with two fully-connected layers [in Figs. 5A and 5B]”, as well as “an uncompressed fully-connected layer ‘fc’ is replaced with three fully-connected layers ‘fc1’, ‘fc2’ and ‘fc3’ [in Figs. 6A and 6B]” wherein “[t]he number of neurons r1 and r2 in the intermediate layers may be determined based on the compression they achieve, and the performance of the resulting compressed network.” That is, the performance threshold is based on the replaced convolutional layers rather than any inserted layers.).


While Annapureddy teaches the limitations of claim 16 and “insertion”, Annapureddy does not explicitly teach: “obtain a first value and a second value for a characteristic of the … layer, the characteristic including one or more of: a fraction of inactive neurons, weight Wang discloses the claim limitations, teaching: 
“obtain a first value and a second value for a characteristic of the … layer, the characteristic including one or more of (Wang [0032] and [0033]: describing importance value and diversity value of neurons in a neural network layer.):
a fraction of inactive neurons (Wang [0069]: “At step 402a, a set of neurons is initialized as a null set C”), weight initialization methods (Wang [0092]: “the neural network having the weights adjusted may be used as an initial network model which can be re-trained based on original training data T at a low learning rate, so as to further improve the network accuracy of the pruned neural network”.), a fraction of fixed-weight synapses, and a fraction of inactive synapses; 
determine a first performance of the modified neural network using the first value of the characteristic of the … layer (Wang [0095]: “An importance value determining unit 81 may be configured to determine importance values of neurons in a network layer to be pruned based on activation values of the neurons.” Similarly, see also Wang [0103]: describing an “importance value determining module 814 may be configured to obtain the importance value of each neuron by normalizing the variance for the neuron based on the neuron variance importance vector.” 
This determination is used to evaluate the neural network’s performance, wherein “[a] neuron selecting unit 83 may be configured to select, from the network layer to be pruned, neurons to be retained based on the importance values and diversity values of the neurons in the network layer to be pruned in accordance with a volume maximization neuron selection policy.
A pruning unit 84 may be configured to prune the other neurons from the network layer to be pruned to obtain a pruned network layer.” (Wang [0097]-[0098]).); 
determine a second performance of the modified neural network using the second value of the characteristic of the … layer (Wang [0096]: “A diversity value determining unit 82 may be configured to determine a diversity value of each neuron in the network layer to be pruned based on connecting weights between the neuron and neurons in a next network layer.” 
This determination is used to evaluate the neural network’s performance, wherein “[a] neuron selecting unit 83 may be configured to select, from the network layer to be pruned, neurons to be retained based on the importance values and diversity values of the neurons in the network layer to be pruned in accordance with a volume maximization neuron selection policy.
A pruning unit 84 may be configured to prune the other neurons from the network layer to be pruned to obtain a pruned network layer.” (Wang [0097]-[0098]).); and 
in response to at least one of the first performance and the second performance satisfying a performance threshold, modify the modified neural network … with an updated layer having a (Wang [0075]: describing that “importance value of a neuron reflects a degree of impact the neuron has on an output result from the neural network, and a diversity of a neuron reflects its expression capability. Hence, the neurons selected in accordance with the volume maximization neuron selection policy have greater contributions to the output result from the neural network and higher expression capabilities, while the pruned neurons are neurons having smaller contributions to the output result from the neural network and lower expression capabilities. Accordingly, when compared with the original neural network, the pruned neural network may achieve good compression and acceleration effects while having little accuracy loss. Therefore, the pruning method according to the embodiments of the present disclosure may achieve good compression and acceleration effects while maintaining the accuracy of the neural network.” 
Wherein the pruned neural network has a smaller number of neurons than the original neural network since the neurons that do not meet the “volume maximization neuron selection policy” as it relates to the importance and diversity values are not retained, whereby this pruning process does not entail an insertion since pruning involves removing the number of original neurons in a neural network that do not meet the “volume maximization neuron selection policy” (Wang [0061], [0097], and [0116]). 
“[T]he neurons finally selected to be retained are optimal” (Wang [0061]) to maintain “the accuracy of the neural network” (Wang [0075] and [0126]) upon compression. That is, when the evaluation of the importance and diversity values for the neurons meet some predetermined optimal threshold that can maintain the neural network’s accuracy, then those neurons will be retained while the other neurons that fall below the threshold are pruned. The pruned neural network simply comprising an updated layer with a smaller number of neurons than the corresponding layer in the original neural network, resulting in a more compressed version of the original neural network.) ….”
Thus, it would have been obvious to Person Having Ordinary Skill in the Art (PHOSITA) before the effective filing date (EFD) to modify the model in Annapureddy to include the performance analysis in Wang. Doing so would enable “a method and an apparatus for neural network pruning … [that] includes: determining (101) importance values of neurons in a network layer to be pruned based on activation values of the neurons; determining (102) a diversity value of each neuron in the network layer to be pruned based on connecting weights between the neuron and neurons in a next network layer; selecting (103), from the network layer to be pruned, neurons to be retained based on the importance values and diversity values of the neurons in the network layer to be pruned in accordance with a volume maximization neuron selection policy; and pruning (104) the other neurons from the network layer to be pruned to obtain a pruned network layer. With the above method, good compression and acceleration effects can be achieved while maintaining the accuracy of the neural network.” (Wang Abstract).

Regarding claim 19, claim 19 is substantially similar to claim 6 and therefore is rejected on the same ground as claim 6. Claim 19 is a system claim that corresponds to method claim 6.

Regarding 21, Annapureddy teaches:
The method of claim 1, further comprising: 
([0079]: describing “an uncompressed fully-connected layer ‘fc’ is replaced with three fully-connected layers ‘fc1’, ‘fc2’ and ‘fc3’. The number of neurons r1 and r2 in the intermediate layers may be determined based on the compression they achieve, and the performance of the resulting compressed network.” See also Figs. 6A and 9B: showing that the number of neurons in the replacement compressed layer is less than in the original layer.); and 
modifying the neural network by replacing the first layer with the replacement layer to reduce the number of connections in the neural network (0079]: describing “an uncompressed fully-connected layer ‘fc’ is replaced with three fully-connected layers ‘fc1’, ‘fc2’ and ‘fc3’. The number of neurons r1 and r2 in the intermediate layers may be determined based on the compression they achieve, and the performance of the resulting compressed network.” See also Figs. 6A and 9B: showing that the number of neurons in the replacement compressed layer is less than in the original layer.).

Regarding claim 23, claim 23 is substantially similar to claim 21 and therefore is rejected on the same ground as claim 21. Claim 23 is a system claim that corresponds to method claim 21.

Claims 2, 4, 5, 17, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Annapureddy et. al. (U.S. Pat. App. Pre-Grant Pub. No. 2016/0217369, hereinafter Annapureddy) and Wang et. al. (U.S. Pat. App. Pre-Grant Pub. No. 2019/0279089, hereinafter Wang, that claims priority to the CN filing and has PCT publication WO 2018/090706) in further view of Tomita (U.S. Pat. No. 6,049,793, hereinafter Tomita), Brothers et. al. (U.S. Pat. App. Pre-Grant Pub. No. 2016/0358070, hereinafter Brothers), and Tamura (U.S. Pat. No. 5,596,681, hereinafter Tamura).

Regarding claim 2, the rejection of claim 1 is incorporated. While the cited references teach the claim limitations and “insertion”, they do not explicitly teach: “segmenting the … layer into a first segment and a second segment; and …” on lines 1-2. Tomita discloses the claim limitations, teaching: “At step 27 of FIG. 2, a structure of a network of multiple artificial neurons is created based on regions defined in the 2D/3D map. The network is configured as a compound of multiple segments, each of which consists of one- to multiple-layers. At each segment the precise number of required artificial neurons and layers are identified.” (Tomita col. 13, lines 1-6). That is, “[t]he configured artificial neural network can provide a network structure as a group of connected artificial neurons or as multiple segments (groups of connected neurons) of one to three layer structures.” (Tomita col. 5, lines 50-52). 
Thus, it would have been obvious to Person Having Ordinary Skill in the Art (PHOSITA) before the effective filing date (EFD) to modify the method in the cited references to include the segments in Tomita. Doing so would enable an artificial neural network to be built wherein “[t]he numbers of required artificial neurons at the boundary and region tiers exactly correspond to the respective numbers of the boundaries and regions in the 2D/3D map” (Tomita col. 13, lines 23-26), resulting in a “network structure realized by this [technique] provides the least number of artificial neurons with no unnecessary connection between them” (Tomita col. 13, lines 34-36). That is, “[t]he artificial neural network is configured in accordance with the data points, clusters, boundaries, and regions, such that each boundary represents a different artificial neuron of the artificial neural network, and the geometric relationship of the regions on the map to the classes defines the logic connectivity of the artificial neurons. The synaptic weights and threshold of each artificial neuron in the network are graphically determined based on the data points of the map [, resulting in an improved neural network configuration].” (Tomita Abstract). 

While the cited references teach the limitations of claim 2, they do not explicitly teach: “wherein determining the first performance includes using the first value … and determining the second performance includes using the second value…” on lines 5-7. Brothers discloses the claim limitations, teaching: 
“wherein determining the first performance includes using the first value (Brothers [0087]-[0088]: describing “[t]he relative importance of each input feature map is effectively folded into the convolution matrix applied to that feature map for a given output feature map. As such, the convolutions applied to a given feature map are often scaled versions of each other or approximately scaled versions of each other. The neural network analyzer may leverage this characteristic by detecting these instances of similar convolution kernels and applying an optimization that may significantly reduce the number of multiply-accumulate (MAC) operations performed when executing neural network 106 [in order to generated a compressed neural network].” That is, an importance determination is made regarding the convolutional kernels via their feature maps. 
See also Brothers [0041] and [0064]: describing various types of performance requirements that can be specified and thresholds, respectively.) … and 
(Brothers [0046]-[0047]: describing examples of additional different modification parameters used for analysis and modification of the neural network. See also Brothers [0057]: “For example, during a first iteration, the neural network analyzer may identify a first subset of convolution kernels for substitution and, during a subsequent iteration, identify a second and different set of convolution kernels for substitution. During further iterations, other types of analysis may be applied.” See also Brothers [0079]: describing that “[t]he neural network analyzer may determine whether the calculation of different feature maps use similar feature map calculations” wherein an expression of the output feature map for the convolutional kernels of a convolutional layer can be used to show the expression capabilities, i.e. diversity determinations, of the kernels in order to evaluate the performance of those kernels to determine if they can be replaced by a more concise base kernel (Brothers [0086]). 
See also Brothers [0041] and [0064]: describing various types of performance requirements that can be specified and thresholds, respectively.) …” 
Thus, it would have been obvious to Person Having Ordinary Skill in the Art (PHOSITA) before the effective filing date (EFD) to modify the method in the cited references to include the performances in Brothers. Doing so would enable “automatically tun[ing] parameters of an input neural network and output a modified (e.g., optimized) neural network” (Brothers [0023], whereby “tuning a neural network may include selecting a portion of a first neural network for modification to increase computational efficiency and generating, using a processor, a second neural network based upon the first neural network by modifying the selected portion of the first neural network” (Brothers Abstract).
While the cited references teach the limitations of claim 2, they do not explicitly teach: “selecting at least one segment-dependent characteristic for the first segment of the layer,” on lines 3-4 and “for the segment-dependent characteristic of the first segment” on lines 5-7. Tamura discloses the claim limitations, teaching: “When the rank value R is smaller than the number of neurons, N, (N-R) neurons can be eliminated. That is, neurons from the second hidden layer can be eliminated without significantly increasing learning errors (more precisely, without increasing the minimum square errors or the sum total of the singular values is equal to or smaller than the error value e).” (Tamura col. 6, lines 12-18). Wherein the segment dependency comprises the eliminated neurons and the rank value R comprises “the number of neurons in the hidden layers” (Tamura col. 3, lines 53-56). 
Thus, it would have been obvious to Person Having Ordinary Skill in the Art (PHOSITA) before the effective filing date (EFD) to modify the method in the cited references to include the segment-dependence in Tamura. Doing so would enable a determination of “which neurons in a hidden layer should be left [i.e. segment-independent] and which neurons in the hidden layer should be eliminated [i.e. segment-dependent] based on the learning data irrespective of the number of neurons composing the hidden layer. Therefore, an FFNN [feed forward neural network] which allows efficient calculation and effective results can be constructed. Furthermore, those networks which uselessly increase the amount of calculations can be reduced, whereby an efficient optimum neural network can be constructed [by reducing the number of segment-dependent neurons]. As a result, calculation time and memory size can be saved, and a computer with this network can treat more advanced questions compared with the other computers of the same capacities.” (Tamura col. 11, lines 20-32).

Regarding claim 4, while the cited references teach the claim limitations and “insertion”, they do not explicitly teach: “further comprising setting the at least one selected segment-dependent characteristic to one of a plurality of settings to enhance a performance of the … layer”. Tamura discloses the claim limitations, teaching: “In the construction of a learned neural network, the number of the optimum neurons is determined by calculating the rank value.” (Tamura col. 7, lines 33-35). Wherein the rank value calculation for the neurons in the hidden layer that should be eliminated [i.e. segment-dependent] can comprise an “initial setting, [whereby] the … error value e is preset to a specific … [value] such as 0.1” (Tamura col. 7, lines 10-11). This enables the construction of an optimal network since the “procedure for obtaining the optimum network … [calls for] those neurons which are small in singular value … [to be] eliminated”, with the eliminated neurons in those layers being segment-dependent. This elimination can be done without increasing the network error (Tamura col. 6, lines 12-18), resulting in an enhanced performance of the network since the network now has a reduced number of layers and no substantial increase in error stemming from the reduction.
Thus, it would have been obvious to Person Having Ordinary Skill in the Art (PHOSITA) before the effective filing date (EFD) to modify the reduction method in the cited references to include the segment-dependence in Tamura. A motivation to combine the cited references with Tamura was previously given.

Regarding claim 5, Annapureddy teaches:
The method of claim 1, further comprising selecting at least one of neuron activation methods ([0042]: “The neurons in the compressed layer [after replacement] may be configured with an identity activation function.” For insertion layers, see [0124]: “the process inserts the nonlinearity by applying a nonlinear activation function to neurons of the compressed layers. The nonlinear activation function may comprise a rectifier, an absolute value function, hyperbolic tangent func­tion, a sigmoid function or other nonlinear activation function.”), a fraction of inactive neurons, weight initialization methods, a fraction of fixed-weight synapses, and a fraction of inactive synapses ….

While the cited references teach the limitations of claim 5 and “insertion”, they do not explicitly teach “as a segment-independent characteristic for the … layer” on lines 3-4. Tamura discloses “as a segment- independent characteristic for the layer” on lines 3-4, teaching: “When the rank value R agrees with the number of neurons, N, it means that all the neurons are independent and necessary in a neural network and therefore cannot be eliminated. If any more neurons are omitted, errors in learning increase.” (Tamura col. 6, lines 2-11).  Wherein the rank value R comprises “the number of neurons in the hidden layers” (Tamura col. 3, lines 53-56). 
Thus, it would have been obvious to Person Having Ordinary Skill in the Art (PHOSITA) before the effective filing date (EFD) to modify the reduction method in the cited references to include the segment-independence in Tamura. A motivation to combine the cited references with Tamura was previously given.

Regarding claim 17, claim 17 is substantially similar to claim 2 and therefore is rejected on the same ground as claim 2. Claim 17 is a system claim that corresponds to method claim 2.

Regarding claim 20, while the cited references teach the system of claim 16 and “insertion”, they do not explicitly teach: “wherein the operations further comprise select at least one segment-independent characteristic for at least one identified segment of the … layer.” Tamura discloses the claim limitations, teaching: “When the rank value R agrees with the number of neurons, N, it means that all the neurons are independent and necessary in a neural network and therefore cannot be eliminated. If any more neurons are omitted, errors in learning increase.” (Tamura col. 6, lines 2-11).  Wherein the rank value R comprises “the number of neurons in the hidden layers” (Tamura col. 3, lines 53-56). 
Thus, it would have been obvious to Person Having Ordinary Skill in the Art (PHOSITA) before the effective filing date (EFD) to modify the reduction method in the cited references to include the segment-independence in Tamura. A motivation to combine the cited references with Tamura was previously given.

Claims 9, 10, and 13 are rejected under 35 U.S.C. 103 as being unpatentable over Annapureddy et. al. (U.S. Pat. App. Pre-Grant Pub. No. 2016/0217369, hereinafter Annapureddy) and Wang et. al. (U.S. Pat. App. Pre-Grant Pub. No. 2019/0279089, hereinafter Wang, that claims priority to the CN filing and has PCT publication WO 2018/090706) and in further view of Lane et. al., “DeepX: A Software Accelerator for Low-Power Deep Learning Inference on Mobile Devices” (hereinafter Lane).

Regarding claim 9, Annapureddy teaches:
One or more non-transitory computer-readable media that include instructions that, when executed by one or more processors, are configured to cause the one or more processors to ([0151]: “The machine-readable media may comprise a number of software modules. The software modules include instructions that, when executed by the processor, cause the processing system to perform various functions. The software modules may include a transmission module and a receiving module.”): 
identifying a first layer of a neural network and a second layer of the neural network adjacent to the first layer of the neural network, wherein a first connection number of neuron connections are between the first layer and the second layer (Figs. 5A, 5B, 8A, and 8B: showing a first layer and a second layer with connections between the two layers. Similarly, see [0078]-[0079], [0093], and [0102]: describing the various figures.);
selecting a first number of replacement neurons for a replacement layer of the neural network (see “Selection of design parameters”: describing “[s]everal of the design parameters involved in compression (e.g., the number of intermediate neurons in the case of fully-connected layers…)” ([0114]), whereby “[i]n one example, the number of neurons in the intermediate layer r may be selected using a parameter sweep” (0115]). The neurons of an intermediate layer, such as an initial intermediate layer, can comprise a first number of neurons.), 
the first number of replacement neurons in the replacement layer being less than a number of first neurons in the first layer of the neural ([0079]: describing the compression of the fully-connected layer fc with the compressed layers fc1 and fc2 comprising a reduced r of n input neurons and m output neurons. That is, “[t]he number of neurons r1 and r2 in the intermediate layers may be determined based on the compression they achieve, and the performance of the resulting compressed network.” ([0079]).   
See also [0119]-[0122]: describing the insertion of non-linear layers to compress the network and enhance its performance. Wherein “[t]he [compression] process may further specify the size of the second layer, which may be denoted (r), such that the combined memory footprint of the compressed neural network, which includes the second and third layer, is smaller than the memory footprint (M) of the uncompressed neural network.” ([0135]). See also Figs. 5A-9B: showing the compressed and uncompressed network with the various adjacent layers, wherein the number of electrons r1 and r2 is less than both the number of neurons in the first and second layers (Figs. 6A and 6B).); 
modifying the neural network by replacing the first layer with the replacement layer to reduce a number of connections in the neural network (see “Compression of Fully-Connected Layers” on pg. 7-8 and “Compression of Convolutional Layers” on pg. 8 describing a modification of the neural network by compressing the neural network layers. See also [0123] and [0135]: describing that “the process replaces one or more layers in the neural network with multiple compressed layers to produce a compressed network.”), 
wherein the first number of replacement neurons is also selected based on the first number of replacement neurons being such that a second connection number of neuron connections are between the replacement layer and the second layer and the second connection number is less than the first connection number ([0116]-[0117]: describing techniques to determine the number of neurons in the compressed layer, wherein the number of neurons in the compressed layer is less than the number of neurons in the original uncompressed layer. See also [0079l: describing that the compression of the fully connected layer fc with the compressed layers fcl and fc2 comprising a reduced r of n input neurons and m output neurons.
See also “Insertion of Nonlinear Layers Between Compressed Layers”, describing that “nonlinear layers may be added [i.e. inserted] between compressed layers to improve the representational capacity of the network” ([0119]). Wherein “[t]he [compression] process may further specify the size of the second layer .... Alternatively, or in addition to a reduction in memory footprint, the size of the second layer may be chosen so that the inference time of the compressed neural network may be smaller than the inference time (T) of the uncompressed neural network.” ([0135]). See also Figs. 5A-9B showing the compressed and uncompressed network with the various adjacent layer.);
… 
by inserting an insertion layer between the replacement layer and the second layer ([0042]: “a layer may be compressed by replacing it with multiple layers of the same type”. 
See also “Insertion of Nonlinear Layers Between Compressed Layers”, describing that the compression of an original 9x9 neural net into a final compressed version with 3x3 convolution layers and inserting nonlinear layers between the 3x3 convolution layers ([0121]-[0122]). Wherein “the process inserts nonlinearity [layers] between the compressed layers of the compressed network. In some aspects, the process inserts the nonlinearity by applying a nonlinear activation function to neurons of the compressed layers.” ([0124]).), 
…. 

While Annapureddy teaches the limitations of claim 9 and “replacement” and “insertion”, Annapureddy does not explicitly teach: “obtaining a first value and a second value for a characteristic of the … layer, the characteristic including one or more of: a fraction of inactive neurons, weight initialization methods, a fraction of fixed-weight synapses, and a Wang discloses the claim limitations, teaching: 
“obtaining a first value and a second value for a characteristic of the … layer, the characteristic including one or more of (Wang [0032] and [0033]: describing importance value and diversity value of neurons in a neural network layer.):
a fraction of inactive neurons (Wang [0069]: “At step 402a, a set of neurons is initialized as a null set C”), weight initialization methods (Wang [0092]: “the neural network having the weights adjusted may be used as an initial network model which can be re-trained based on original training data T at a low learning rate, so as to further improve the network accuracy of the pruned neural network”.), a fraction of fixed-weight synapses, and a fraction of inactive synapses; 
determining a first performance of the modified neural network using the first value of the characteristic of the … layer (Wang [0095]: “An importance value determining unit 81 may be configured to determine importance values of neurons in a network layer to be pruned based on activation values of the neurons.” Similarly, see also Wang [0103]: describing an “importance value determining module 814 may be configured to obtain the importance value of each neuron by normalizing the variance for the neuron based on the neuron variance importance vector.” 
This determination is used to evaluate the neural network’s performance, wherein “[a] neuron selecting unit 83 may be configured to select, from the network layer to be pruned, neurons to be retained based on the importance values and diversity values of the neurons in the network layer to be pruned in accordance with a volume maximization neuron selection policy.
A pruning unit 84 may be configured to prune the other neurons from the network layer to be pruned to obtain a pruned network layer.” (Wang [0097]-[0098]).); 
determining a second performance of the modified neural network using the second value of the characteristic of the … layer (Wang [0096]: “A diversity value determining unit 82 may be configured to determine a diversity value of each neuron in the network layer to be pruned based on connecting weights between the neuron and neurons in a next network layer.” 
This determination is used to evaluate the neural network’s performance, wherein “[a] neuron selecting unit 83 may be configured to select, from the network layer to be pruned, neurons to be retained based on the importance values and diversity values of the neurons in the network layer to be pruned in accordance with a volume maximization neuron selection policy.
A pruning unit 84 may be configured to prune the other neurons from the network layer to be pruned to obtain a pruned network layer.” (Wang [0097]-[0098]).); and 
in response to at least one of the first performance and the second performance satisfying a performance threshold, modifying the modified neural network … the … layer having a second number of neurons (Wang [0075]: describing that “importance value of a neuron reflects a degree of impact the neuron has on an output result from the neural network, and a diversity of a neuron reflects its expression capability. Hence, the neurons selected in accordance with the volume maximization neuron selection policy have greater contributions to the output result from the neural network and higher expression capabilities, while the pruned neurons are neurons having smaller contributions to the output result from the neural network and lower expression capabilities. Accordingly, when compared with the original neural network, the pruned neural network may achieve good compression and acceleration effects while having little accuracy loss. Therefore, the pruning method according to the embodiments of the present disclosure may achieve good compression and acceleration effects while maintaining the accuracy of the neural network.” 
Wherein the pruned neural network has a smaller number of neurons than the original neural network since the neurons that do not meet the “volume maximization neuron selection policy” as it relates to the importance and diversity values are not retained, whereby this pruning process does not entail an insertion since pruning involves removing the number of original neurons in a neural network that do not meet the “volume maximization neuron selection policy” (Wang [0061], [0097], and [0116]).
“[T]he neurons finally selected to be retained are optimal” (Wang [0061]) to maintain “the accuracy of the neural network” (Wang [0075] and [0126]) upon compression. That is, when the evaluation of the importance and diversity values for the neurons meet some predetermined optimal threshold that can maintain the neural network’s accuracy, then those neurons will be retained while the other neurons that fall below the threshold are pruned. The pruned neural network simply comprising an updated layer with a smaller number of neurons than the corresponding layer in the original neural network, resulting in a more compressed version of the original neural network.),….”
Thus, it would have been obvious to Person Having Ordinary Skill in the Art (PHOSITA) before the effective filing date (EFD) to modify the model in Annapureddy to include the performance analysis in Wang. Doing so would enable “a method and an apparatus for neural network pruning … [that] includes: determining (101) importance values of neurons in a network layer to be pruned based on activation values of the neurons; determining (102) a diversity value of each neuron in the network layer to be pruned based on connecting weights between the neuron and neurons in a next network layer; selecting (103), from the network layer to be pruned, neurons to be retained based on the importance values and diversity values of the neurons in the network layer to be pruned in accordance with a volume maximization neuron selection policy; and pruning (104) the other neurons from the network layer to be pruned to obtain a pruned network layer. With the above method, good compression and acceleration effects can be achieved while maintaining the accuracy of the neural network.” (Wang Abstract).

While the cited references teach the limitations of claim 9, they do not explicitly teach: “wherein the second number is less than the first number of replacement neurons and is less than a third number of second neurons of the second layer in which the second number is represented by “P”, the first number is represented by “M” and the third number is represented by “N”, and “P” is calculated according to the following expression: P<(M x N)/(M +N), wherein a third connection number of neuron connections between the insertion layer and the replacement layer and between the insertion layer and the second layer is less than the second Page 5 of 13connection number Lane discloses the claim limitations, teaching: Lane Section IV: describing the “runtime layer compression” computation process “for [the] two adjacent layers (𝐿 and 𝐿+ 1) with 𝑚 and 𝑛 units respectively undergoes singular value decomposition (SVD)” as embodied by equations 1-3. Upon a determination of the new weight matrices, “which is achieved by introducing a new layer 𝐿′ with 𝑐 ≪ 𝑚, 𝑛 units between layer 𝐿 and 𝐿+1. Because 𝐿 and 𝐿+1 units are fully connected, the introduction of 𝐿′ causes the number of pairwise calculations and weight parameters to fall dramatically – from 𝑚𝑛 to (𝑚 + 𝑛) × 𝑐, this in turn translates into both a lower memory requirements and lower computational load. An overview of the SVD-based compression of a deep architecture layer is given in Figure 4.” Wherein c denotes the “biggest singular values” (Lane Section IV). 
See also Lane Fig. 4: showing that the insertion or replacement layer, 𝐿′, has a smaller number of the neurons than the original uncompressed adjacent layers, i.e. layers 𝐿 and 𝐿+1. Fig. 4 also shows “Layer compression [of the modified neural network] is shown taking place; the new generated layer is inserted into between two prior adjacent layers from the original model.
Thus, it would have been obvious to Person Having Ordinary Skill in the Art (PHOSITA) before the effective filing date (EFD) to modify the model in the cited references to include the plurality of neurons in Lane. Doing so would enable “DeepX amplifies the advantages they offer through two inference-time resource control algorithms, namely: (1) Runtime Layer Compression (RLC) and (2) Deep Architecture Decomposition (DAD). Through these runtime algorithms, DeepX can automatically decompose a deep model across available processors to maximize energy-efficiency and execution time, within fluctuating mobile resource constraints such as computation and memory.” (Lane Section I). 

Regarding claim 10, claim 10 is substantially similar to claim 2 and therefore is rejected on the same ground as claim 2. Claim 10 is a media claim that corresponds to method claim 2.

Regarding claim 13, claim 13 is substantially similar to claim 5 and therefore is rejected on the same ground as claim 5. Claim 13 is a media claim that corresponds to method claim 5.

Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to SELENE A HAEDI whose telephone number is (571)270-5762.  The examiner can normally be reached on M-F 11 AM - 7 PM EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Li B Zhen can be reached on (571)272-3768.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/S.H./Examiner, Art Unit 2121                                                                                                                                                                                                        




/Li B. Zhen/Supervisory Patent Examiner, Art Unit 2121