DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Interpretation
Examiner notes the applicant notes the specification discloses how to densify a sparse neural network in remarks filed 08/29/2022, pages 6 and as depicted in Fig. 4 and Fig. 5. Where the examiner notes depicted the same data points and connections that at best appear to be spatially displaced closer together. The specification and remarks appear to use the term sparse, dense and densified to cover arrangements and/or the process of learning parameters of network elements in a board way where both the claims and specification fail to provide an express definition requiring the disclosed features provide in the applicant’s remarks filed 08/29/2022.
During patent examination, the pending claims must be "given their broadest reasonable interpretation consistent with the specification." The Federal Circuit’s en banc decision in Phillips v. AWH Corp., 415 F.3d 1303, 1316, 75 USPQ2d 1321, 1329 (Fed. Cir. 2005) expressly recognized that the USPTO employs the "broadest reasonable interpretation" standard, see MPEP 2111. This standard is what has been applied in the examination of the pending claims in the current application. 

Response to Arguments
Applicant’s remarks and arguments filed 08/29/2022, pgs. 6-18 have been fully considered.
Regarding applicants remarks regarding the rejection of claims under 35 USC § 112(a) Written description rejection, the rejection made in the pervious action has been withdrawn in light of claim amendments.
Regarding applicants remarks regarding the rejection of claims under 35 USC § 112(b) rejection, the rejection made in the previous rejection that have been withdrawn based on the claim amendments have been removed from the current office action. 
Regarding the rejection, under 35 USC 112(b), regarding the use of the phrase zero value range, the remarks are unpersuasive. The applicant argues that the phrase zero value range would be interpreted as [0,0], this again fails to clarify the scope as there are no known numbers that with lie with such a range. Thus, the rejection has been maintained. The applicant is advised to clarify the scope to an actual range or the value zero. The rejection for antecedent basis for claim 16, no amendments have been made to address this deficiency and thus the rejection is maintained.  
Regarding applicants remarks regarding the rejection of claims under 35 USC § 103, the arguments are directed to matter not required by the claim limitations. The applicant has highlighted the preferred embodiment for densifying as requiring remapping as depicted in Fig. 1 and 4 of applicant’s specification. 
In response, the examiner notes that the courts have deemed ‘[t]hough understanding the claim language may be aided by explanations contained in the written description, it is important not to import into a claim limitations that are not part of the claim. For example, a particular embodiment appearing in the written description may not be read into a claim when the claim language is broader than the embodiment." Superguide Corp. v. DirecTV Enterprises, Inc., 358 F.3d 870, 875, 69 USPQ2d 1865, 1868 (Fed. Cir. 2004). See also Liebel-Flarsheim Co. v. Medrad Inc., 358 F.3d 898, 906, 69 USPQ2d 1801, 1807 (Fed. Cir. 2004), see MPEP 2111. The examiner notes that in this case the claim limitation are simply boarder in scope than the noted embodiment in applicant’s remarks taken from the applicant’s original specification. Examiner notes that the limitation should be amended to require the intended scope highlighted in remark as the embodiment can not be imported into the claim language as currently recited. 
In addition, the amended claim limitations, the limitations have not ben previously reviewed by the examiner, thus applicants arguments are rendered moot. The examiner refers to the rejection under 35 U.S.C. 103 for more details.


Claim Rejections - 35 USC § 112(b)
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 2-7 and  16-17 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Regarding claim 2, the limitation “zero value range” renders the claim indefinite. It is unclear if the scope requires a zero value, or some range of values close to zero, or some combination using a value and a range. It is unclear what the intended scope of the recited phase should be and thus the claim is rendered indefinite. The examiner interprets any value less than 1 as within the scope of the claim limitation as claimed .
Regarding claims 3-5, 12, and 16 the limitation “zero value range” similarly to claim 2 and the claims are rejected under the same rationale. 
Regarding claims 6-7 and 17, the limitations do not resolve the noted deficiencies in claim 2 limitation and therefore appropriately rejected.

Regarding claim 6, the limitation “value range” renders the claim indefinite because it is unclear what the intended scope should be a value, or a range, or some combination. The examiner interprets any value as the claimed “value range”.

Regarding claim 7 , the limitation “value ranges in a threshold layer” renders the claim indefinite because it is unclear what the intended scope should be. The term “threshold layer” is not a  term of art. If there is an identified “threshold layer”,  how is it ascertained is not clear from the claim limitation. In regards to the use of the phrase “value ranges” it is unclear if the value is captured, or a range of values are capture,  or some combination of both. The intended scope of the claimed limitation is unclear and thus the claim is rendered indefinite. Examiner interprets any layer as the claimed threshold layer and any value below 1 as within the scope of the claimed “value ranges”.

Regarding claim 16, the limitation “the zero value range” renders the claim indefinite because there is insufficient antecedent basis for this limitation in the claim.. There is no recitation of ‘a zero value range” in the claim 16 or claims 9 and 1 from which claim 16 is dependent upon. The phrase zero value range is rejected similarly to the claim 2 rejection.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims  1, 3 and 5-17 are rejected under 35 U.S.C. 103 as being unpatentable over David et al. (US Pub. No. 20190108436, hereinafter David) in view of Han et al. (NPL: “Dsd: Dense-sparse-dense training for deep neural networks”, hereinafter ‘Song’).

Regarding claim 1, David teaches: A method of memory remapping for utilizing dense neural network computations with a sparse neural network, the method comprising:  
densifying the … neural network into a densified neural network; (David teaches training neural network to learn all connection weights/Synapse connections as claimed densification of a neural network, in 0003-0004: An artificial neural network, or simply "neural network," is a computer model, resembling a biological network of neurons, which is trained by machine learning. A traditional neural network has an input layer, multiple middle or hidden layer(s), and an output layer. Each layer has a plurality ( e.g., 100 s to 1000 s) of artificial "neurons." Each neuron in a layer (N) may be connected by an artificial "synapse" to some or all neurons in a prior (N-1) layer and subsequent (N+l) layer to form a "partially-connected" or "fully-connected" neural network. The strength of each synapse connection is represented by a weight… A neural network (NN) is trained based on a learning dataset to solve or learn a weight of each synapse indicating the strength of that connection [determining non-zero weights as claimed densifying the … neural network into a densified neural network]. The weights of the synapses are generally initialized, e.g., randomly… Training may be repeated until the error is minimized or converges. Typically multiple passes (e.g., tens or hundreds) through the training set is performed (e.g., each sample is input into the neural network multiple times). Each complete pass over the entire training set is referred to as one "epoch"; And in 0103: In operation 600, a processor may generate [claimed densifying the … neural network into a densified neural network] or receive and store a dense neural network in a memory. The dense neural network may have a majority or above thresh­old percentage of neurons in adjacent layers connected to each other [claimed densifying the … neural network into a densified neural network]. An example of a dense neural network is described in reference to FIG. 1; where the neural network can be densified is a sparse neural network, in 0019: Embodiments of the invention provide a novel system and method to generate a sparse neural network by pruning weak synapse connections during the training phase (instead of only during post-training processing) or by evolving a sparse neural network (e.g., using evolutionary computation). Embodiments of the invention further provide a novel compact data representation for sparse neural net­works that independently indexes each weight to eliminate the need to store pruned synapse weights… ) 
remapping input and output data onto the densified neural network; (David teaches remapping the input and output as the as learning the weights using backprop until the error is minimized, in 0003-0004: An artificial neural network, or simply "neural network," is a computer model, resembling a biological network of neurons, which is trained by machine learning. A traditional neural network has an input layer, multiple middle or hidden layer(s), and an output layer. Each layer has a plurality ( e.g., 100 s to 1000 s) of artificial "neurons." Each neuron in a layer (N) may be connected by an artificial "synapse" to some or all neurons in a prior (N-1) layer and subsequent (N+l) layer to form a "partially-connected" or "fully-connected" neural network. The strength of each synapse connection  [claimed remapping input and output data onto the densified neural network] is represented by a weight… A neural network (NN) is trained based on a learning dataset to solve or learn a weight of each synapse indicating the strength of that connection. The weights of the synapses are generally initialized, e.g., randomly [claimed densifying the … neural network]… Training may be repeated until the error is minimized or converges [claimed remapping input and output data onto the densified neural network]. Typically multiple passes (e.g., tens or hundreds) through the training set is performed (e.g., each sample is input into the neural network multiple times) [claimed remapping input and output data onto the densified neural network]. Each complete pass over the entire training set is referred to as one "epoch"… ); And in 0103: In operation 600, a processor may generate [claimed remapping input and output data onto the densified neural network] or receive and store a dense neural network in a memory. The dense neural network may have a majority or above thresh­old percentage of neurons in adjacent layers connected to each other. An example of a dense neural network is described in reference to FIG. 1.)
utilizing the dense neural network computations for a prediction using the remapped input and output data (David teaches generating the claimed dense neural network, as noted above in , 0103 & 0003-0004; And the utilization of the disclosed generated dense neural network depicted in Fig.1 for making predictions, in  0005: State-of-the-art neural networks typically have between millions and billions of weights, and as a result require specialized hardware (usually a GPU) for both training and runtime (prediction) phases. It is thereby impractical to run deep learning models, even in prediction mode, on most endpoint devices (e.g., IoT devices, mobile devices, or even laptops and desktops without dedicated accelerator hardware); And in 0025: All conventional neural networks today represent the weights connecting one layer to another as a dense matrix. For example, in order to store the weights connect­ing two layers of sizes 10 and 20 neurons, and assuming the network is fully connected… This rep­resentation is useful for forward [claimed utilizing the dense neural network computations for a prediction using the remapped input and output data] and backward propagation of activations as well, e.g., given an input of 10 values in the above example, the output values of the 20 neurons in the subsequent layer could be calculated by multiplying the vector of values (size=l0) by the matrix of weights (size=10x20), and obtaining the output vector (size=20) [claimed utilizing the dense neural network computations for a prediction using the remapped input and output data].…).
	While David teaches generating a dense neural network stored in memory  as noted above.  David does not expressly teach generating a dense neural network from a sparse neural network, through training or network evolution as claimed, densifying … sparse neural network into a densified neural network. 
Song does expressly teach the generating a dense neural network from a sparse neural network as claimed, densifying the sparse neural network into a densified neural network. (Song teaches as depicted in Fig.1: 

    PNG
    media_image1.png
    309
    924
    media_image1.png
    Greyscale

In Pg. 3 Sec. Final Dense Training: … The un-pruned network parameters adjust themselves during the retraining phase, so in (c), the boundary becomes soft and forms a bimodal distribution. In (d), at the beginning of the re-dense training step [claimed densifying … sparse neural network into a densified neural network], all the pruned weights come back again and are reinitialized to zero. Finally, in (e), the pruned weights are retrained together with the un-pruned weights. In this step, we kept the same learning hyper-parameters (weight decay, learning rate, etc.) for pruned weights and un-pruned weights…)
Additionally Song teaches remapping input and output data onto the densified neural network (Song teaches the re-dense connections added between the data nodes, as claimed input and out put data points unto the densified network as should in Fig. 1 as the re added pruned weights: 

    PNG
    media_image1.png
    309
    924
    media_image1.png
    Greyscale

)
Examiner notes that the broadest reasonable interpretation (BRI) has been given in light of specification; where the specification does not redefine densify or densification, which is a process for making something of increasing the capacity/amount/weight of one or more things in a given entity/composite/object/thing/space.
The David and Song references would have been recognized by those of ordinary skill in the art as useful for applicant’s purpose in developing method for performing neural network operations using dense and sparse neural network operations.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of the prior art for densifying a sparse neural network by adding weight data as disclosed by Song with the method for performing neural network operations using generated dense and spare neural networks as disclosed by David.
One of ordinary skill in the arts would have been motivated to combine the disclosed methods of Song and David in order to perform neural network operations using re-dense operations for densifying a sparse neural network when training deep neural networks with a large number of parameters, (Song, Abstract); Doing so provides a training flow for regularizing deep neural network and achieving better optimization performance, (Song, Abstract).


Regarding claim 3, the rejection of claim 2 is incorporated and David in combination with Song further teaches David teaches the method according to claim 2: wherein the identifying of the edges of the dense neural network having the zero value range includes locating multiplication operations with a zero weight in layers of the dense neural network (David teaches in 0029: Some embodiments may generate a sparse convo­lutional neural network (CNN). A CNN is represented by a plurality of filters that connect a channel of an input layer to a channel of a convolutional layer. The filter scans the input channel, operating on each progressive region of neurons (e.g., representing a NxN pixel image region), and maps the convolution [claimed multiplication operations] or other transformation of each region to a single neuron in the convolution channel. By connecting entire regions of multiple neurons to each single convolution neuron, filters form synapses having a many-to-one neuron connection, which reduces the number of synapses in CNNs… . Some embodiments may generate a sparse CNN by pruning or zeroing entire filters that have all zero or near zero weights representing weak convolutional relationships between channels [claimed wherein the identifying of the edges of the dense neural network having the zero value range includes locating multiplication operations with a zero weight in layers of the dense neural network]. An new CNN indexing is used that independently and uniquely identifies each filter in the CNN so that pruned filters are not stored, reducing convolution operations and memory usage; And the convolution as a plurality of multiplication operations, in 0114: … In a regular img2col function, two custom matrices are constructed to represent every convolutional operation performed by a layer, such that each row and column multiplication repre­sents a convolutional operation [claimed multiplication operations] . Embodiments of the inven­tion may provide a modified img2col function, in which some of the kernels are zeroed out, and the associated matrices can be modified to omit or delete these rows and columns [claimed wherein the identifying of the edges of the dense neural network having the zero value range includes locating multiplication operations with a zero weight in layers of the dense neural network]. This results in more compact matrices associated with fewer multiplication operations to achieve the same convolutional results, compared to standard img2col opera­tions…).

Regarding claim 5, the rejection of claim 3 is incorporated and David in combination with Song further teaches David teaches the method according to claim 3: wherein the identifying of the edges of the dense neural network having the zero value range further includes locating negative weight values in layers which are followed by a conditional rectifier linear unit (ReLU) expressed as a threshold layer.  (David teaches in 0093: A bias unit may "bias" the weights of a neuron during training by adding a constant value to all of the neuron's weights. If a bias value is low enough ( e.g., a large magnitude negative value), the bias unit may shift all the neuron's weights to a negative value [claimed wherein the identifying of the edges of the dense neural network having the zero value range further includes locating negative weight values in layers which are followed by a rectifier linear unit]. The bias unit may eliminate any output from neuron, e.g., with an activation function such as rectified linear unit (RELU) [claimed wherein the identifying of the edges of the dense neural network having the zero value range further includes locating negative weight values in layers which are followed by a conditional rectifier linear unit (ReLU) expressed as a threshold layer], in which all negative or below threshold values are zeroed out…) 
Examiner notes that the activation function is applied successively through the layers of the neural network wherein the rectifier linear unit is an activation function. 


Regarding claim 6, the rejection of claim 2 is incorporated and David in combination with Song further teaches David teaches the method according to claim 2: wherein the sparse neural network is further formed by removing edges which have a value range which is less than a predetermined threshold. (David teaches in 0082: Weights and their entries may be physically deleted when the weight, though not zero, is below a near zero threshold: 
    PNG
    media_image2.png
    47
    343
    media_image2.png
    Greyscale
  [claimed wherein the sparse neural network is further formed by removing edges which have a value range which is less than a predetermined threshold]… Example thresholds include, but are not limited to, 0.1, 0.001, 0.0001, 0.00001, etc.)

Regarding claim 7, the rejection of claim 2 is incorporated and David in combination with Song further teaches David teaches the method according to claim 2: further comprising determining whether value ranges in a threshold layer are always less than or always greater than a predetermined threshold, removing computations prior to the threshold layer, and using either a first value or a second value for computations following the threshold layer depending on the determination of whether the value ranges in the threshold layer are always less than or always greater than the predetermined threshold. (David teaches threshold layers as the j layer for which the weight is the first or second value and threshold as the predetermined threshold, in 0082: Weights and their entries may be physically deleted when the weight, though not zero, is below a near zero threshold: 
    PNG
    media_image2.png
    47
    343
    media_image2.png
    Greyscale
 [claimed determining whether value ranges in a threshold layer are always less than or always greater than a predetermined threshold, removing computations prior to the threshold layer, and using either a first value or a second value for computations following the threshold layer depending on the determination of whether the value ranges in the threshold layer are always less than or always greater than the predetermined threshold] …; And alternately claimed threshold layer for eliminating the prior edges associated with pruned weights, in 0093: … The bias unit may eliminate any output from neuron, e.g., with an activation function such as rectified linear unit (RELU), in which all negative or below threshold values are zeroed out. In effect, this turnss the whole neuron off, pruning such a neuron in its entirety from the network, including all of its incoming and outgoing weights. This can be achieved by regularization methods e.g. disclosed herein, but in this case pushing the value to a negative or below threshold target instead of zero [claimed determining whether value ranges in a threshold layer are always less than or always greater than a predetermined threshold, removing computations prior to the threshold layer, and using either a first value or a second value for computations following the threshold layer depending on the determination of whether the value ranges in the threshold layer are always less than or always greater than the predetermined threshold]….)

Regarding claim 8, the rejection of claim 1 is incorporated and David in combination with Song further teaches the method according to claim 1: further comprising generating code for instructing a processor or hardware layout to utilize the dense neural network computations based on the densified neural network. (David teaches in 0096: Remote server 510 may have a memory 515 for storing a neural network and a processor 516 for training and/or predicting based on the neural network [claimed further comprising generating code for instructing a processor or hardware layout to utilize the dense neural network computations based on the densified neural network]. Remote server 510 may prune a dense neural network (e.g., 100 of FIG. 1) to generate a sparse neural network (e.g., 200 of FIG. 1), or may initially generate or receive a sparse neural network. In some embodiments, remote server 510 may have specialized hardware including a large memory 515 for storing a neural network and a specialized processor 516 (e.g., a GPU), for example, when a dense neural network is used [claimed further comprising generating code for instructing a processor or hardware layout to utilize the dense neural network computations based on the densified neural network]]. Memory 515 may store data 517 including a training dataset and data representing a plurality of weights of the neural network. Data 517 may also include code ( e.g., software code) or logic, e.g., to enable storage and retrieval of data 517 according to embodiments of the invention. )

Regarding claim 9, teaches: the rejection of claim 1 is incorporated and David in combination with Song further teaches the method according to claim 1: wherein the sparse network is formed from an initial sparse or dense neural network using an iterative process of identifying and removing disconnected edges of the initial sparse or dense neural network which do not contribute to a final result. (David teaches sparse neural network from dense neural network as depicted in Fig. 6; And pruning (i.e. removing) claimed edges not contributed to final results, in 0096: Remote server 510 may have a memory 515 for storing a neural network and a processor 516 for training and/or predicting based on the neural network. Remote server 510 may prune a dense neural network (e.g., 100 of FIG. 1) [claimed wherein the sparse network is formed from an initial sparse or dense neural network using an iterative process of identifying and removing disconnected edges of the initial sparse or dense neural network which do not contribute to a final result ] to generate a sparse neural network (e.g., 200 of FIG. 1), or may initially generate or receive a sparse neural network. In some embodiments, remote server 510 may have specialized hardware including a large memory 515 for storing a neural network and a specialized processor 516 (e.g., a GPU), for example, when a dense neural network is used. Memory 515 may store data 517 including a training dataset and data representing a plurality of weights of the neural network…; And in 0029: …Some embodiments may generate a sparse CNN by pruning or zeroing entire filters that have all zero or near zero weights representing weak convolutional relationships between channels [claimed disconnected edges of the initial sparse or dense neural network which do not contribute to a final result]…).

Regarding claim 10,  the rejection of claim 9 is incorporated and David in combination with Song further teaches the method according to claim 9: wherein the iterative process goes from an output layer toward an input layer. (David teaches in 0078-0079: Some embodiments of the invention may prune neuron connections using L1 regularization during neural network training in each of one or more iterations ( e.g., in addition to weight correcting updates such as backpropaga­tion [claimed wherein the iterative process goes from an output layer toward an input layer]). The weights wiJ of the neural network may be updated to weights w,j in each training iteration, for example, as follows: … the faster the weights will approach zero, and the larger the portion of the weights that will become absolute zero, representing a disconnection (pruning of the connection) between neurons…  In one embodiment, pruning may be performed using L1 regularization with a modification: The moment a weight becomes zero (or changes sign), the weight's memory entry is physically removed or deleted from storage (from the triplet representation table), …)

Regarding claim 11, David teaches: A system for memory remapping to transform a sparse neural network into a dense neural network, the system comprising memory and one or more processors which, alone or in combination, are configured to provide for execution of a method (Fig. 6 and [0102] e.g., “The operations of FIG. 6 may be executed by a processor (e.g., one or more processor(s) 556 of FIG. 5) using data stored in a memory (e.g., one or more memory unit(s) 558 of FIG. 5)”) comprising: 
Claim 11 limitations are similar to those recited in claim 1, and is similarly rejected.

Regarding claim 12, the rejection of claim 11 is incorporated and David in combination with Song further teaches the system according to claim 11, being further configured to form the sparse neural network from a dense neural network by identifying and removing edges of the dense neural network having a zero value range which do not contribute to a final result. (( David teaches in 0102-0104: In operation 600, a processor may generate or receive and store a dense neural network in a memory. The dense neural network may have a majority or above thresh­old percentage of neurons in adjacent layers connected to each other… the processor may generate the sparse neural network by prun­ing the weights of the dense neural network of operation 600 [claimed form the sparse neural network from a dense neural network by identifying and removing edges of the dense neural network having a zero value range which do not contribute to a final result], …. The processor may prune the neural network during and/or after a training phase of the neural network. The processor may prune weights using Ll regularization, thresholding, round­ing, and/or random zeroing [claimed and removing edges of the dense neural network having a zero value range which do not contribute to a final result t]. The processor may prune weights randomly, probabilistically, and/or heuristically… ).”  And as depicted in Fig. 6

    PNG
    media_image3.png
    639
    765
    media_image3.png
    Greyscale

And the sparse generated by removing a majority of the weights for a dense neural network, in 0034-0035: The dense neural network 100 of FIG. 1 may be transformed to generate the sparse neural network 200 of FIG. 2 by pruning a majority or an above threshold percentage of connections 104 or their associated weights of the dense neural network 100 of FIG. 1. Weights may be pruned by disconnecting previously connected neuron pairs. Additionally, sparse neural network 200 may be trained using methods such as genetic algorithms, genetic programming, reinforcement learning, etc., that evolve the neural network. Sparse neural network 200 may have a hybrid mixture of various types of connections, such as, e.g., locally connections, recurrent connections, skip connections, etc. with a sparse representation…  Sparse neural network 200 may be represented by a plurality of weights of connections 204. In conventional matrices, pruned or omitted weights are set to zero, and treated the same as connected weights, which yields no significant storage or processing benefit to pruning… Accordingly, when two neurons are disconnected (by pruning) or not connected in the first place, data structure 206 simply deletes [dense network with pruned weights and neurons claimed and removing edges of the dense neural network having a zero value range which do not contribute to a final result] or omits an entry for that connection entirely (e.g., no record of a weight or any information is stored for that connection)).
Additionally, David teaches the use of a neural network that can be used to create a sparse neural network, as the claimed dense neural network for generating the claimed sparse neural network; And the Song reference that teaches the claimed densified neural network that is generated from the sparse neural network having added edges to the nodes of the sparse neural network, where the claimed outcome for having the dense network having the non-pruned neurons in the densified neural network adding connections/edges to the remaining neurons in the sparse neural network, as disclosed above in claim 1 rejection.


Regarding claim 13, the rejection of claim 11 is incorporated and the claim recites the system of claim 9, and is similarly rejected.

Regarding claim 14, the rejection of claim 11 is incorporated and the claim recites the system of claim 8, and is similarly rejected.

Regarding claim 15, David teaches: A tangible, non-transitory computer-readable medium having instructions thereon, which, upon being executed by memory and one or more processors, provide for execution of a method (Fig. 6 and [0102] e.g., “The operations of FIG. 6 may be executed by a processor (e.g., one or more processor(s) 556 of FIG. 5) using data stored in a memory (e.g., one or more memory unit(s) 558 of FIG. 5)” [0118] e.g., “instructions, e.g., computer-executable instructions, which, when executed by a processor or controller (e.g., processor 556 of FIG. 5), carry out methods disclosed herein.”) 

Claim 15 limitations are similar to those recited in claim 1, and is similarly rejected.

Regarding claim 16,  the rejection of claim 9 is incorporated and David in combination with Song further teaches the method according to claim 9: wherein the identifying and removing disconnected edges includes, in a single iteration, removing a first edge having the zero value range from a first layer of the dense neural network and, based on removing the first edge from the first layer, removing a second edge from a second layer. (David teaches in 0078-0079: Some embodiments of the invention may prune neuron connections using L1 regularization during neural network training in each of one or more iterations ( e.g., in addition to weight correcting updates such as backpropaga­tion [claimed wherein the identifying and removing disconnected edges includes, in a single iteration, removing a first edge having the zero value range from a first layer of the dense neural network and, based on removing the first edge from the first layer, removing a second edge from a second layer]). The weights wiJ of the neural network may be updated to weights w,j in each training iteration, for example, as follows: … the faster the weights will approach zero, and the larger the portion of the weights that will become absolute zero, representing a disconnection (pruning of the connection) between neurons…  In one embodiment, pruning may be performed using L1 regularization with a modification: The moment a weight becomes zero (or changes sign), the weight's memory entry is physically removed or deleted from storage (from the triplet representation table), …; Examiner notes that backpropagation determines the weight values based on the first later proceeding a second layer; And in 0093: A bias unit may "bias" the weights of a neuron during training by adding a constant value to all of the neuron's weights. If a bias value  is low enough ( e.g., a large magnitude negative value), the bias unit may shift all the neuron's weights to a negative value. The bias unit may eliminate any output from neuron, e.g., with an activation function such as rectified linear unit (RELU), in which all negative or below threshold values  are zeroed out. In effect, this turnss the whole neuron off, pruning such a neuron in its entirety from the network, including all of its incoming and outgoing weights [claimed identifying and removing disconnected edges includes, in a single iteration, removing a first edge having the zero value range from a first layer of the dense neural network and, based on removing the first edge from the first layer, removing a second edge from a second layer]. This can be achieved by regularization methods e.g. disclosed herein, but in this case pushing the value to a negative or below threshold target instead of zero..)


Claim 2 is rejected under 35 U.S.C. 103 as being unpatentable over David et al. (US Pub. No. 20190108436, hereinafter David) in view of Han et al. (NPL: “Dsd: Dense-sparse-dense training for deep neural networks”, hereinafter ‘Song’) in further view of Zeng et al (NPL: “Compressing and Accelerating Neural Network for Facial Point Localization”, hereinafter ‘Zeng’).

Regarding claim 2, the rejection of claim 1 is incorporated and David in combination with Song further teaches the method according to claim 1: wherein the sparse neural network is formed from a dense neural network by identifying and removing edges of the dense neural network having a zero value range which do not contribute to a final result, and wherein the densified neural network has fewer neurons than the dense neural network. ( David teaches in 0102-0104: In operation 600, a processor may generate or receive and store a dense neural network in a memory. The dense neural network may have a majority or above thresh­old percentage of neurons in adjacent layers connected to each other… the processor may generate the sparse neural network by prun­ing the weights of the dense neural network of operation 600 [claimed wherein the sparse neural network is formed from a dense neural network by identifying and removing edges of the dense neural network … which do not contribute to a final result], …. The processor may prune the neural network during and/or after a training phase of the neural network. The processor may prune weights using Ll regularization, thresholding, round­ing, and/or random zeroing [claimed removing edges of the dense neural network having a zero value range which do not contribute to a final result]. The processor may prune weights randomly, probabilistically, and/or heuristically… ).”  And as depicted in Fig. 6

    PNG
    media_image3.png
    639
    765
    media_image3.png
    Greyscale

 Examiner notes that the claimed fewer neurons are disclosed by the neural network having less neurons mapped to non-zero value weights than the densified neural network, in 0032: Neural network 100 is a “dense” neural network, in which a majority or greater than or equal to a threshold percentage of neurons 102 in adjacent layers are connected (e.g., having non-zero connection weights). The threshold may be any percentage in a range of from greater than 50% (majority connected) [claimed  wherein the densified neural network has fewer neurons than the dense neural network] to 100% (“fully-connected”), and is typically 90-99% connected. In the example shown in FIG. 1, all neurons 102 in adjacent layers are connected to each other, so neural network 100 is a fully-connected neural network [claimed the dense neural network]. In this example, each pair of adjacent layers of four neurons has 16 possible connections, and with two pairs of adjacent layers, there are 32 neuron connections and associated weights; And the sparse generated by removing a majority of the weights for a dense neural network, in 0034-0035: The dense neural network 100 of FIG. 1 may be transformed to generate the sparse neural network 200 of FIG. 2 by pruning a majority or an above threshold percentage of connections 104 or their associated weights of the dense neural network 100 of FIG. 1. Weights may be pruned by disconnecting previously connected neuron pairs. Additionally, sparse neural network 200 may be trained using methods such as genetic algorithms, genetic programming, reinforcement learning, etc., that evolve the neural network. Sparse neural network 200 may have a hybrid mixture of various types of connections, such as, e.g., locally connections, recurrent connections, skip connections, etc. with a sparse representation…  Sparse neural network 200 may be represented by a plurality of weights of connections 204. In conventional matrices, pruned or omitted weights are set to zero, and treated the same as connected weights, which yields no significant storage or processing benefit to pruning… Accordingly, when two neurons are disconnected (by pruning) or not connected in the first place, data structure 206 simply deletes [dense network with pruned weights and neurons claimed the dense neural network]] or omits an entry for that connection entirely (e.g., no record of a weight or any information is stored for that connection)).
While David teaches the use of a neural network that can be used to create a sparse neural network, as the claimed dense neural network for generating the claimed sparse neural network; And the Song reference that teaches the claimed densified neural network that is generated from the sparse neural network having added edges to the nodes of the sparse neural network, where the claimed outcome for having the dense network having the non-pruned neurons in the densified neural network adding connections/edges to the remaining neurons in the sparse neural network, as disclosed above.
Additionally,  Zeng teaches the process of generating a densified neural network from a sparse neural network that has pruned neurons that have less neurons than the original dense neural network as claimed wherein the densified neural network has fewer neurons than the dense neural network. (Zeng as depicted in Fig. 1:

    PNG
    media_image4.png
    277
    1300
    media_image4.png
    Greyscale


Sec: Pruning Neurons and Connections
With the pre-trained dense network, a pruning ratio R (R> 1) is used to control the number of neurons [claimed wherein the densified neural network has fewer neurons than the dense neural network] or connections that will be pruned. Concretely, only 1/R of neurons or connections will be preserved… A fairly straightforward approach is to iteratively drop a neuron with minimum prediction error [claimed wherein the densified neural network has fewer neurons than the dense neural network]:… where x is the input, 􀀁y is the error of output, and W and are the original weight matrix and the pruned weight matrix, respectively.  To build the Wˆ , all matrix columns correspond-ing to the pruned neurons will be set to zeros…)
The David, Song, and Zeng are references would have been recognized by those of ordinary skill in the art as useful for applicant’s purpose in developing method for performing neural network operations using dense and sparse neural network operations.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of the prior art for densifying a sparse neural network by pruning neurons as disclosed by Zeng with the method for performing neural network operations using generated dense and spare neural networks as collectively disclosed by David and Song.
One of ordinary skill in the arts would have been motivated to combine the disclosed methods of Zeng, Song and David in order to perform neural network operations using importance-based pruning when training deep neural networks with a large number of parameters, (Zeng, Abstract); Doing so provides a method for compressing and accelerating large deep neural network models while maintaining the performance, (Zeng, Abstract).


	
Claims 2 and 17 are rejected under 35 U.S.C. 103 as being unpatentable over David et al. (US Pub. No. 20190108436, hereinafter David) in view of Han et al. (NPL: “Dsd: Dense-sparse-dense training for deep neural networks”, hereinafter ‘Song’) in further view of Han et al (NPL: “Learning both Weights and Connections for Efficient Neural Networks”, hereinafter ‘Han’).

Regarding claim 2, the rejection of claim 1 is incorporated and David in combination with Song further teaches the method according to claim 1: wherein the sparse neural network is formed from a dense neural network by identifying and removing edges of the dense neural network having a zero value range which do not contribute to a final result, and wherein the densified neural network has fewer neurons than the dense neural network. ( David teaches in 0102-0104: In operation 600, a processor may generate or receive and store a dense neural network in a memory. The dense neural network may have a majority or above thresh­old percentage of neurons in adjacent layers connected to each other… the processor may generate the sparse neural network by prun­ing the weights of the dense neural network of operation 600 [claimed wherein the sparse neural network is formed from a dense neural network by identifying and removing edges of the dense neural network … which do not contribute to a final result], …. The processor may prune the neural network during and/or after a training phase of the neural network. The processor may prune weights using Ll regularization, thresholding, round­ing, and/or random zeroing [claimed removing edges of the dense neural network having a zero value range which do not contribute to a final result]. The processor may prune weights randomly, probabilistically, and/or heuristically… ).”  And as depicted in Fig. 6

    PNG
    media_image3.png
    639
    765
    media_image3.png
    Greyscale

 Examiner notes that the claimed fewer neurons are disclosed by the neural network having less neurons mapped to non-zero value weights than the densified neural network, in 0032: Neural network 100 is a “dense” neural network, in which a majority or greater than or equal to a threshold percentage of neurons 102 in adjacent layers are connected (e.g., having non-zero connection weights). The threshold may be any percentage in a range of from greater than 50% (majority connected) [claimed  wherein the densified neural network has fewer neurons than the dense neural network] to 100% (“fully-connected”), and is typically 90-99% connected. In the example shown in FIG. 1, all neurons 102 in adjacent layers are connected to each other, so neural network 100 is a fully-connected neural network [claimed the dense neural network]. In this example, each pair of adjacent layers of four neurons has 16 possible connections, and with two pairs of adjacent layers, there are 32 neuron connections and associated weights; And the sparse generated by removing a majority of the weights for a dense neural network, in 0034-0035: The dense neural network 100 of FIG. 1 may be transformed to generate the sparse neural network 200 of FIG. 2 by pruning a majority or an above threshold percentage of connections 104 or their associated weights of the dense neural network 100 of FIG. 1. Weights may be pruned by disconnecting previously connected neuron pairs. Additionally, sparse neural network 200 may be trained using methods such as genetic algorithms, genetic programming, reinforcement learning, etc., that evolve the neural network. Sparse neural network 200 may have a hybrid mixture of various types of connections, such as, e.g., locally connections, recurrent connections, skip connections, etc. with a sparse representation…  Sparse neural network 200 may be represented by a plurality of weights of connections 204. In conventional matrices, pruned or omitted weights are set to zero, and treated the same as connected weights, which yields no significant storage or processing benefit to pruning… Accordingly, when two neurons are disconnected (by pruning) or not connected in the first place, data structure 206 simply deletes [dense network with pruned weights and neurons claimed the dense neural network]] or omits an entry for that connection entirely (e.g., no record of a weight or any information is stored for that connection)).
David teaches the use of a neural network that can be used to create a sparse neural network, as the claimed dense neural network for generating the claimed sparse neural network; And the Song reference that teaches the claimed densified neural network that is generated from the sparse neural network having added edges to the nodes of the sparse neural network, where the claimed outcome for having the dense network having the non-pruned neurons in the densified neural network adding connections/edges to the remaining neurons in the sparse neural network, as disclosed above.
Additionally,  Han teaches the process of generating a densified neural network from a sparse neural network that has pruned neurons that have less neurons than the original dense neural network as claimed wherein the densified neural network has fewer neurons than the dense neural network. (Han as depicted in Fig. 2:

    PNG
    media_image5.png
    370
    1143
    media_image5.png
    Greyscale


Sec. 3.5: Pruning Neurons
After pruning connections, neurons with zero input connections or zero output connections may be safely pruned [claimed wherein the densified neural network has fewer neurons than the dense neural network]. This pruning is furthered by removing all connections to or from a pruned neuron. The retraining phase automatically arrives at the result where dead neurons will have both zero input connections and zero output connections. This occurs due to gradient descent and regularization. A neuron that has zero input connections (or zero output connections) will have no contribution to the final loss, leading the gradient to be zero for its output connection (or input connection), respectively. Only the regularization term will push the weights to zero. Thus, the dead neurons will be automatically removed during retraining.)
The David, Song, and Han are references would have been recognized by those of ordinary skill in the art as useful for applicant’s purpose in developing method for performing neural network operations using dense and sparse neural network operations.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of the prior art for pruning neural network connection and neurons as disclosed by Han with the method for performing neural network operations using generated dense and spare neural networks as collectively disclosed by David and Song.
One of ordinary skill in the arts would have been motivated to combine the disclosed methods of Han, Song and David in order to implement a method to reduce the storage and computation by training the neural networks to learn only the important connections, (Han, Abstract); Doing so provides a method for compressing and accelerating large deep neural network models while maintaining the performance with no loss of accuracy, (Han, Abstract).

Regarding claim 17,  the rejection of claim 2 is incorporated and David in combination with Song further teaches the method according to claim 2: wherein … of a layer of the dense neural network are removed based on a determination that a corresponding … of a preceding layer has a zero output. ( David teaches in in 0093: A bias unit may "bias" the weights [claimed input value] of a neuron during training by adding a constant value to all of the neuron's weights. If a bias value [claimed bias value] is low enough ( e.g., a large magnitude negative value [claimed locating negative bias values in layers of the dense neural network in which a maximum input value is smaller than the bias values]), the bias unit may shift all the neuron's weights to a negative value. The bias unit may eliminate any output from neuron, e.g., with an activation function such as rectified linear unit (RELU), in which all negative or below threshold values  are zeroed out. In effect, this turnss the whole neuron off, pruning such a neuron in its entirety from the network, including all of its incoming and outgoing weights [claimed wherein … of a layer of the dense neural network are removed based on a determination that a corresponding … of a preceding layer has a zero output]. This can be achieved by regularization methods e.g. disclosed herein, but in this case pushing the value to a negative or below threshold target instead of zero…; And in 0097: … The unique index may uniquely identify a pair of artificial neurons that have a connection represented by that weight. In one embodi­ment, each weight may be represented by a triplet defining: (1) a first index value identifying a neuron in a first or "from" layer connected by the weight, (2) a second index value identifying a neuron in a second or "to" layer connected by the weight, and (3) the value of the weight. By indepen­dently indexing the weights, memory 558 may only store entries for connections with non-zero weights ( e.g., deleting or omitting entries for disconnections or no connections associated with zero weights)…. Local endpoint device(s) 550 may each include one or more processor(s) 556 for training, and/or executing prediction based on, the weights of the sparse neural network stored in memory 558. During prediction, the neural network is run forward once. During training, a neural network is run twice, once forward to generate an output and once backwards for error correction (e.g., back- propagation [claimed wherein … of a layer of the dense neural network are removed based on a determination that a corresponding … of a preceding layer has a zero output); And where the removed elements are neuron elements,  in 0034-0035: The dense neural network 100 of FIG. 1 may be transformed to generate the sparse neural network 200 of FIG. 2 by pruning a majority or an above threshold percentage of connections 104 or their associated weights of the dense neural network 100 of FIG. 1. Weights may be pruned by disconnecting previously connected neuron pairs. Additionally, sparse neural network 200 may be trained using methods such as genetic algorithms, genetic programming, reinforcement learning, etc., that evolve the neural network. Sparse neural network 200 may have a hybrid mixture of various types of connections, such as, e.g., locally connections, recurrent connections, skip connections, etc. with a sparse representation…  Sparse neural network 200 may be represented by a plurality of weights of connections 204. In conventional matrices, pruned or omitted weights are set to zero, and treated the same as connected weights, which yields no significant storage or processing benefit to pruning… Accordingly, when two neurons are disconnected (by pruning) [wherein neurons of a layer of the dense neural network are removed based on a determination that a corresponding neurons of a preceding layer has a zero output] or not connected in the first place, data structure 206 simply deletes [dense network with pruned weights and neurons claimed the dense neural network]] or omits an entry for that connection entirely (e.g., no record of a weight or any information is stored for that connection [wherein neurons of a layer of the dense neural network are removed based on a determination that a corresponding neurons of a preceding layer has a zero output]))..)
David discloses not storing information from a connections that are set to zero connection weight as the neuron output is determined by the product of the connection weight value and the activation value and pruning neuron elements, in 0093: A bias unit may "bias" the weights of a neuron during training by adding a constant value to all of the neuron's weights. If a bias value is low enough ( e.g., a large magnitude negative value), the bias unit may shift all the neuron's weights to a negative value. The bias unit may eliminate any output from neuron, e.g., with an activation function such as rectified linear unit (RELU), in which all negative or below threshold values are zeroed out. In effect, this turnss the whole neuron off, pruning such a neuron [wherein neurons of a layer of the dense neural network are removed based on a determination that a corresponding neurons of a preceding layer has a zero output]  in its entirety from the network, including all of its incoming and outgoing weights.; Additionally,  Han teaches wherein neurons of a layer of the dense neural network are removed based on a determination that a corresponding neurons of a preceding layer has a zero output (Han  teaches as depicted in Fig. 2:

    PNG
    media_image5.png
    370
    1143
    media_image5.png
    Greyscale


Sec. 3.5: Pruning Neurons
After pruning connections, neurons with zero input connections or zero output connections may be safely pruned [claimed wherein neurons of a layer of the dense neural network are removed based on a determination that a corresponding neurons of a preceding layer has a zero output]. This pruning is furthered by removing all connections to or from a pruned neuron. The retraining phase automatically arrives at the result where dead neurons will have both zero input connections and zero output connections. This occurs due to gradient descent and regularization. A neuron that has zero input connections (or zero output connections) will have no contribution to the final loss, leading the gradient to be zero for its output connection (or input connection), respectively. Only the regularization term will push the weights to zero. Thus, the dead neurons will be automatically removed during retraining.)
)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of David, Song, and Han for the same reasons disclosed above.

Claim 4 is rejected under 35 U.S.C. 103 as being unpatentable over David et al. (US Pub. No. 20190108436, hereinafter David), in view of Han et al. (NPL: “Dsd: Dense-sparse-dense training for deep neural networks”, hereinafter ‘Song’), in further view of Davis et al. (NPL: “Low-Rank Approximations for Conditional Feedforward Computation in Deep Neural Networks”, hereinafter ‘Davis’).

Regarding claim 4, the rejection of claim 1 is incorporated and David in combination with Song further teaches the method according to claim 3: wherein the identifying of the edges of the dense neural network having the zero value range further includes locating negative bias values in layers of the dense neural network in which a maximum input value is smaller than the negative bias values and which are followed by a conditional rectifier linear unit (ReLU) expressed as a threshold layer. (David teaches, 0093: A bias unit may "bias" the weights [claimed input value] of a neuron during training by adding a constant value to all of the neuron's weights. If a bias value [claimed negative bias values] is low enough ( e.g., a large magnitude negative value [claimed locating negative bias values in layers of the dense neural network in which a maximum input value is smaller than the bias values]), the bias unit may shift all the neuron's weights to a negative value. The bias unit may eliminate any output from neuron, e.g., with an activation function [claimed l locating negative bias values in layers of the dense neural network in which a maximum input value is smaller than the negative bias values and which are followed by a conditional rectifier linear unit (ReLU) expressed as a threshold layer] such as rectified linear unit (RELU), in which all negative or below threshold values  are zeroed out [claimed wherein the identifying of the edges of the dense neural network having the zero value range further includes locating negative bias values in layers of the dense neural network in which a maximum input value is smaller than the bias values]. In effect, this turnss the whole neuron off, pruning such a neuron in its entirety from the network, including all of its incoming and outgoing weights. This can be achieved by regularization methods e.g. disclosed herein, but in this case pushing the value to a negative or below threshold target instead of zero.: Examiner notes that the phrase “conditional rectifier linear unit” is not a term of art and disclosed rectifier linear unit is considered the claimed conditional rectifier linear unit).
Examiner notes that the activation function is applied successively through the layers of the neural network wherein the rectifier linear unit is an activation function that can take into account. 
David and Song do not expressly teach the rectifier linear unit is an activation function that can take into account a maximum input value.
Davis does expressly teach the rectifier linear unit is an activation function that can take into account a maximum input value. (Davis teaches in Sec. 3.1: Given the activation al of layer l of a neural network, the activation al+1 of layer l + 1 is given by: al+1 = σ(alWl) (1) where σ(·) denotes the function defining the neuron’s non-linearity, al ∈ Rn×h1 , al+1 ∈ Rn×h2 , Wl ∈ Rh1×h2… When σ(·) is the rectified-linear function, σ(x) = max(0, x) [claimed a maximum input value is smaller than the negative bias values] such that all negative elements of the linear transform alWl become zero, one only needs to estimate the sign of the elements of the linear transform in order to predict the zero-valued elements… Given a low-rank approximation Wl ≈ UlVl = ˆWl, the estimated sign of al+1 is given by sgn(al+1) ≈ sgn(al ˆWl) … Each element (al+1)i,j is given by a dot product between the row vector a(i) l and the column vector W(j) l . If sgn(al ˆW (j) l ) = −1 [claimed a maximum input value is smaller than the negative bias values], then the true activation (al+1)i,j is likely negative, and will likely become zero after the rectified-linear function is applied [claimed wherein the identifying of the edges of the dense neural network having the zero value range further includes locating negative bias values in layers of the dense neural network in which a maximum input value is smaller than the negative bias values]…)
The David, Song, and Davis are references would have been recognized by those of ordinary skill in the art as useful for applicant’s purpose in developing method for performing neural network operations using dense and sparse neural network operations.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of the prior art for networks using rectified-linear activation functions in hidden neurons for processing deep neural network operations as disclosed by Davis with the method for performing neural network operations using generated dense and spare neural networks as disclosed by David and Song.
One of ordinary skill in the arts would have been motivated to combine the disclosed methods of Song and David in order to perform neural network operations using activation functions such as rectified-linear units to help processing computations in a layer of a neural network (Davis, Sec. 3.1); Doing so provides helps speed up neural network operations, (Davis, Sec. 3.1).

Claim 18 is rejected under 35 U.S.C. 103 as being unpatentable over David et al. (US Pub. No. 20190108436, hereinafter David), in view of Han et al. (NPL: “Dsd: Dense-sparse-dense training for deep neural networks”, hereinafter ‘Song’), in further view of Ankit et al. (NPL: “ TraNNsformer: Neural Network Transformation for Memristive Crossbar based Neuromorphic System Design”,  hereinafter ‘Kit’).

Regarding claim 18, the rejection of claim 1 is incorporated and David in combination with Song further teaches the method according to claim 1, wherein remapping the input and output data onto the densified neural network further comprises remapping the input and output data onto a remaining set of neurons of the densified neural network, (David teaches remapping the input and output as the as learning the weights using backprop until the error is minimized, in 0003-0004: An artificial neural network, or simply "neural network," is a computer model, resembling a biological network of neurons, which is trained by machine learning. A traditional neural network has an input layer, multiple middle or hidden layer(s), and an output layer. Each layer has a plurality ( e.g., 100 s to 1000 s) of artificial "neurons." Each neuron in a layer (N) may be connected by an artificial "synapse" to some or all neurons in a prior (N-1) layer and subsequent (N+l) layer to form a "partially-connected" or "fully-connected" neural network. The strength of each synapse connection  [claimed remapping input and output data onto the densified neural network] is represented by a weight… A neural network (NN) is trained based on a learning dataset to solve or learn a weight of each synapse indicating the strength of that connection. The weights of the synapses are generally initialized, e.g., randomly [claimed densifying the … neural network]… Training may be repeated until the error is minimized or converges [claimed remapping input and output data onto the densified neural network]. Typically multiple passes (e.g., tens or hundreds) through the training set is performed (e.g., each sample is input into the neural network multiple times) [claimed remapping input and output data onto the densified neural network]. Each complete pass over the entire training set is referred to as one "epoch"… ); And in 0103: In operation 600, a processor may generate [claimed remapping input and output data onto the densified neural network] or receive and store a dense neural network in a memory. The dense neural network may have a majority or above thresh­old percentage of neurons in adjacent layers connected to each other. An example of a dense neural network is described in reference to FIG. 1.))
which have different memory locations than corresponding neurons in the sparse neural network. (David teaches in 0043:  According to embodiments of the invention, weak or near zero filters may be pruned and deleted to avoid their associated convolution operations and speed-up training and/or prediction of CNN 400… In contrast, the new data structure 406 uses a triplet representation with two channel indices (colunms 1-2) that uniquely define the input/output channels to which the filter 404 applies and one filter representation (colunm 3) that defines the filter's weights. Because filters 404 are explicitly indexed in each data entry, the matrix position of the data entries no longer serves as their implicit index, and filters 404 entries may be shuffled [which have different memory locations than corresponding neurons in the sparse neural network], reordered [which have different memory locations than corresponding neurons in the sparse neural network] or deleted with no loss of information… )
David in combination with Song teaches the remapping of neural networks after pruning network elements and reordering the pruned elements data entry indexes as claimed different memory locations.
Additionally Kit teaches the claimed as reordered indexes for memory associated with processing hardware elements as claimed limitation remapping the input and output data …which have different memory locations than corresponding neurons in the sparse neural network. (Kit as depicted in Fig. 2:

    PNG
    media_image6.png
    349
    1471
    media_image6.png
    Greyscale

Fig. 2: (a) Logical Flow Diagram of TraNNsformer Framework. The original DNN architecture during training undergoes clustering to form regions that can be mapped onto MCAs with high utilization factors, while pruning the connections that don't contribute to cluster formation [wherein remapping the input and output data onto the densified neural network further comprises remapping the input and output data onto a remaining set of neurons of the densified neural network]. (b) Toy example to illustrate the impact of Network pruning and TraNNsformer on a DNN connectivity matrix. Network pruning leads to irregular sparsity that cannot be mapped directly onto MCAs. TraNNsformer forms smaller clusters that can be mapped onto MCAs [remapping the input and output data …which have different memory locations than corresponding neurons in the sparse neural network.]. Note that 1/0 only represents a connection being present and not the actual value of the weight.
Pg. 535: Left col.: Neural Networks are a class of machine learning algorithms that are comprised of multiple layers of neurons (activations) interconnected with synapses (weights). MLPs are a class of neural networks with fully connected topology i.e. each neuron in a layer receives inputs from all the neurons in the previous layer… Fig. l(a) shows a two-layered MLP topology being mapped onto an MCA (shown in Fig. l(b))…. Sec. III: TRANNSFORMER FRAMEWORK 
In this section, we discuss in detail about the TraNNsformer framework (shown in Fig. 2(a)), its effect on DNN sparsity (shown in Fig. 2(b)) and the resulting benefits on two types of architectures 1. MCA based architecture and 2. CMOS based general-purpose architecture. Subsection 3.1 describes the Size Constrained Clustering Algorithm (SCIC), which converts a DNN's connectivity structure into a set of high utilization clusters that can be mapped onto MCAs…; And Pg. 537: Left Col.:  … Hence, the number of cores (num_core) has a linear dependence on the number of MCAs (num_mca) as shown in eqn. 1 (where "k" is a micro­architecture dependent constant). TraNNsformer enables tech­nology aware optimization to learn an optimally clustered network structure such that a learnt cluster can be mapped onto an MCA with high utilization factor. Consequently, it ensures that the network sparsity efficiently translates to reduction in the number of MCAs required to map the transformed DNN  [wherein remapping the input and output data onto the densified neural network further comprises remapping the input and output data onto a remaining set of neurons of the densified neural network] with respect to the original DNN… The energy profile for an MCA based architecture is comprised of MCA energy (includes neuron energy) and peripheral energy components….

Examiner notes that the broadest reasonable interpretation of the densified neural network includes the disclosed network having pruned connections and can be remapped based on the newly pruned network elements)

The David, Song, and Kit are references would have been recognized by those of ordinary skill in the art as useful for applicant’s purpose in developing method for performing neural network operations using dense and sparse neural network operations.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of the prior art for remapping hardware memory indexes after pruning actions used in processing deep neural network operations as disclosed by Kit with the method for performing neural network operations using generated dense and spare neural networks as disclosed by David and Song.
One of ordinary skill in the arts would have been motivated to combine the disclosed methods of Song and David in order to use pruning techniques and maximize clustered processing elements  (Kit, Abstract); Doing so allows the mapping of a given deep neural network to any size processing element using a maximize topology without the lost in the reliability of operations, (Kit, Abstract).




Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 

The prior art made of record and not relied upon is considered pertinent to applicant's disclosure are listed below:
Lau et al. (NPL: “Review of Adaptive Activation Function in Deep Neural Network”) teaches the model of neural networks as having the activation function in the neurons that make up the neuron layers of the neural network. 
Suo Qiu et al. (NPL: “FReLU: Flexible Rectified Linear Units for Improving Convolutional Neural Networks”): teaches the definition of a rectified linear unit as a conditional based activation function used to process bias values as input values to the layer comprising the activation function for process the input values.
Alberto Marchisio (NPL: “Optimizing Deep Neural Networks on GPUs”): teaches the use of rectified linear units to help optimize pruning operation for optimizing parallel computations using graphical processing units in implementing deep neural networks.
Rzayev et al. (MPL: “DeepRecon: Dynamically Reconfigurable Architecture for Accelerating Deep Neural Networks”): teaches reconfigurable architecture tor storage resources for accelerating deep learning techniques.
Lin et al. (NPL: “Accelerating Convolutional Networks via Global & Dynamic Filter Pruning”): teaches updating the memory mapping of the neurons and weights for accelerating convolutional networks.
Schwartz et al. (US 20180314926 A1): teaches memory handling and data management in machine learning.
Li et al. (US 20190050734 A1): teaches compression method for neural networks (e.g. LSTM), which may effectively shorten the training period of a neural network by combining pruning operation into the training process.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to OLUWATOSIN ALABI whose telephone number is (571)272-0516. The examiner can normally be reached Monday-Friday, 8:00am-5:00pm EST..
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Michael Huntley can be reached on (303) 297-4307. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/O.O.A./Examiner, Art Unit 2129                                                                                                                                                                                                                                                                                                                                                                                                                
/MICHAEL J HUNTLEY/Supervisory Patent Examiner, Art Unit 2129