DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Arguments
Applicant’s remarks and arguments filed 02/15/2022, pgs. 5-10 have been fully considered.
Regarding applicants remarks regarding the rejection of claims under 35 USC § 102 and § 103, the arguments are directed to matter addressed using newly cited prior art. Applicant’s arguments with respect to claims under 35 USC § 102 and § 103 in the previous office action, have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument. See updated rejection in the current office action below. 

Claim Rejections - 35 USC § 112 (a) : Written Description Requirement 
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.

The following is a quotation of the first paragraph of pre-AIA  35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.

Claims 1-17 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claims contain subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA  35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention. 

Regarding independent claim 1, the limitation recites “densifying the sparse neural network” contains subject matter which was not described in the specification and thus the original disclosure does not satisfy the written description requirement.
First, the specification re-recites the claim language and the benefits associated with the use of sparse and dense neural networks. Secondly, the  cited portions of the specification, in the remarks filed 02/15/2022 are devoid of the algorithm or steps/procedure taken to perform the claimed  function in sufficient detail that would allow one of ordinary skill in the art to understand how the inventor intended the function to be performed (i.e. densifying of the sparse neural network). See MPEP §§ 2163.02 and 2181, subsection IV. Similarly, Figures 4 and 5 also disclose a remapping process where there are no recitations/descriptions regarding how the claimed steps for “densifying the sparse neural network” are performed. The applicant’s algorithm for a function noted as ‘Dense’ in paragraphs 0045 - 0040 of the original specification filed 08/16/2019, discloses the use matrix multiplication iterations for removing information from a neural network. Finally, applicant remarks also appear to suggest that the intended meaning/process for the claimed densification of a sparse neural network is a remapping process of the memory, however this does not clarify how the claimed function is performed. Thus, the original specification does not appear to sufficiently provide necessary details for disclosed functional language recited in the claimed invention. 
The algorithm or steps/procedure for performing the claimed computer function are not explained in sufficient detail. Therefore, the original disclosure does not satisfy the written description requirement.
Regarding independent claims 11 and 15 the claim recites limitations similar to claim 1 limitations and are therefore rejected under the same rationale.

Regarding claims 2-10 and 16-17 that depend on the claim 1, the claims do not resolve the noted issue and are therefore appropriately rejected.
Regarding claims 12-14 that depend on the claim 11, the claims do not resolve the noted issue and are therefore appropriately rejected.

Claim Rejections - 35 USC § 112(b)
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 1-17 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.

Claim 1 recites the limitation "the densified neural network" in the claim that renders the claim indefinite because there is insufficient antecedent basis for this limitation in the claim. Alternatively, the claim limitation "the densified neural network" could refer back to the recitation in the preamble as the recited dense neural network for performing the dense neural network computations with a sparse neural network, or the dense neural network that is a result of densifying the sparse neural network, or something else. The intended scope of the claim limitation is unclear and thus, the claim is rendered indefinite. The examiner interprets any dense neural network or densified sparse neural network as within the scope of the claim limitation. 
Regarding claims 2-10 and 16-17 that depend on the claim 1, the claims do not resolve the noted issue and are therefore appropriately rejected.

Regarding claim 2, the limitation “zero value range” renders the claim indefinite. It is unclear if the scope requires a zero value, or some range of values close to zero, or some combination using a value and a range. It is unclear what the intended scope of the recited phase should be and thus the claim is rendered indefinite. The examiner interprets any value less than 1 as within the scope of the claim limitation as claimed .
Regarding claims 3-5, 12, and 16 the limitation “zero value range” similarly to claim 2 and the claims are rejected under the same rationale. 
Regarding claim 4, the claim recite the limitation “the bias value” that renders the claim indefinite because there is insufficient antecedent basis for this limitation in the claim.

Regarding claims 6-7 and 17, the limitations do not resolve the noted deficiencies in claim 2 limitation and therefore appropriately rejected.

Regarding claim 6, the limitation “value range” renders the claim indefinite because it is unclear what the intended scope should be a value, or a range, or some combination. The examiner interprets any value as the claimed “value range”.

Regarding claim 7 , the limitation “value ranges in a threshold layer” renders the claim indefinite because it is unclear what the intended scope should be. The term “threshold layer” is not a  term of art. If there is an identified “threshold layer”,  how is it ascertained is not clear from the claim limitation. In regards to the use of the phrase “value ranges” it is unclear if the value is captured, or a range of values are capture,  or some combination of both. The intended scope of the claimed limitation is unclear and thus the claim is rendered indefinite. Examiner interprets any layer as the claimed threshold layer and any value below 1 as within the scope of the claimed “value ranges”.

Regarding claim 8, the limitation “the densified neural network” noted in claim 1 and the claim is rejected under the same rationale.

Regarding claim 16, the limitation “the zero value range” renders the claim indefinite because there is insufficient antecedent basis for this limitation in the claim.. There is no recitation of ‘a zero value range” in the claim 16 or claims 9 and 1 from which claim 16 is dependent upon. 

Regarding claims 11 and 15, the claim recites the limitation “the densified neural network” that renders the claim indefinite as there in insufficient antecedent basis for this limitation. Specifically, it is unclear if it refers back to the recited densified sparse neural network, or the recited dense neural network, or something else. The examiner notes that any dense neural network reads on the claim limitation.

Regarding dependent claims 12-13 (which depend on claim 11), the claims do not resolve the noted deficiencies and the examiners notes some claims also recite the noted limitations, therefore the dependent claims are rejected under the same rationale as claim 11.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims  1-3 and 5-17 are rejected under 35 U.S.C. 103 as being unpatentable over David et al. (US Pub. No. 20190108436, hereinafter David) in view of Han et al. (NPL: “Dsd: Dense-sparse-dense training for deep neural networks”, hereinafter ‘Song’).

Regarding claim 1, David teaches: A method of memory remapping for utilizing dense neural network computations with a sparse neural network, the method comprising:  
densifying the … neural network; (David teaches training neural network to learn all connection weights/Synapse connections as claimed densification of a neural network, in 0003-0004: An artificial neural network, or simply "neural network," is a computer model, resembling a biological network of neurons, which is trained by machine learning. A traditional neural network has an input layer, multiple middle or hidden layer(s), and an output layer. Each layer has a plurality ( e.g., 100 s to 1000 s) of artificial "neurons." Each neuron in a layer (N) may be connected by an artificial "synapse" to some or all neurons in a prior (N-1) layer and subsequent (N+l) layer to form a "partially-connected" or "fully-connected" neural network. The strength of each synapse connection is represented by a weight… A neural network (NN) is trained based on a learning dataset to solve or learn a weight of each synapse indicating the strength of that connection. The weights of the synapses are generally initialized, e.g., randomly [claimed densifying the sparse neural network]… Training may be repeated until the error is minimized or converges. Typically multiple passes (e.g., tens or hundreds) through the training set is performed (e.g., each sample is input into the neural network multiple times). Each complete pass over the entire training set is referred to as one "epoch"; And in 0103: In operation 600, a processor may generate [claimed densifying the sparse neural network] or receive and store a dense neural network in a memory. The dense neural network may have a majority or above thresh­old percentage of neurons in adjacent layers connected to each other. An example of a dense neural network is described in reference to FIG. 1) 
remapping input and output data onto the densified neural network; (David teaches remapping the input and output as the as learning the weights using backprop until the error is minimized, in 0003-0004: An artificial neural network, or simply "neural network," is a computer model, resembling a biological network of neurons, which is trained by machine learning. A traditional neural network has an input layer, multiple middle or hidden layer(s), and an output layer. Each layer has a plurality ( e.g., 100 s to 1000 s) of artificial "neurons." Each neuron in a layer (N) may be connected by an artificial "synapse" to some or all neurons in a prior (N-1) layer and subsequent (N+l) layer to form a "partially-connected" or "fully-connected" neural network. The strength of each synapse connection is represented by a weight… A neural network (NN) is trained based on a learning dataset to solve or learn a weight of each synapse indicating the strength of that connection. The weights of the synapses are generally initialized, e.g., randomly [claimed densifying the sparse neural network]… Training may be repeated until the error is minimized or converges [claimed remapping input and output data onto the densified neural network]. Typically multiple passes (e.g., tens or hundreds) through the training set is performed (e.g., each sample is input into the neural network multiple times) [claimed remapping input and output data onto the densified neural network]. Each complete pass over the entire training set is referred to as one "epoch"… ); And in 0103: In operation 600, a processor may generate [claimed remapping input and output data onto the densified neural network] or receive and store a dense neural network in a memory. The dense neural network may have a majority or above thresh­old percentage of neurons in adjacent layers connected to each other. An example of a dense neural network is described in reference to FIG. 1.)
utilizing the dense neural network computations for a prediction using the remapped input and output data (David teaches generating the claimed dense neural network, as noted above in , 0103 & 0003-0004; And the utilization of the disclosed generated dense neural network depicted in Fig.1 for making predictions, in  0005: State-of-the-art neural networks typically have between millions and billions of weights, and as a result require specialized hardware (usually a GPU) for both training and runtime (prediction [claimed utilizing the dense neural network computations for a prediction using the remapped input and output data]) phases. It is thereby impractical to run deep learning models, even in prediction mode, on most endpoint devices (e.g., IoT devices, mobile devices, or even laptops and desktops without dedicated accelerator hardware); And in 0025: All conventional neural networks today represent the weights connecting one layer to another as a dense matrix. For example, in order to store the weights connect­ing two layers of sizes 10 and 20 neurons, and assuming the network is fully connected… This rep­resentation is useful for forward [claimed utilizing the dense neural network computations for a prediction using the remapped input and output data] and backward propagation of activations as well, e.g., given an input of 10 values in the above example, the output values of the 20 neurons in the subsequent layer could be calculated by multiplying the vector of values (size=l0) by the matrix of weights (size=10x20), and obtaining the output vector (size=20) [claimed utilizing the dense neural network computations for a prediction using the remapped input and output data].…).
	While David teaches generating a dense neural network stored in memory  as noted above.  David does not expressly teach generating a dense neural network from a sparse neural network as claimed, densifying … sparse neural network. 
Song does expressly teach the generating a dense neural network from a sparse neural network as claimed, densifying … sparse neural network. (Song teaches as depicted in Fig.1: 

    PNG
    media_image1.png
    309
    924
    media_image1.png
    Greyscale

In Pg. 3 Sec. Final Dense Training: … The un-pruned network parameters adjust themselves during the retraining phase, so in (c), the boundary becomes soft and forms a bimodal distribution. In (d), at the beginning of the re-dense training step [claimed densifying … sparse neural network], all the pruned weights come back again and are reinitialized to zero. Finally, in (e), the pruned weights are retrained together with the un-pruned weights. In this step, we kept the same learning hyper-parameters (weight decay, learning rate, etc.) for pruned weights and un-pruned weights…)

The David and Song references would have been recognized by those of ordinary skill in the art as useful for applicant’s purpose in developing method for performing neural network operations using dense and sparse neural network operations.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of the prior art for densifying a sparse neural network by adding weight data as disclosed by Song with the method for performing neural network operations using generated dense and spare neural networks as disclosed by David.
One of ordinary skill in the arts would have been motivated to combine the disclosed methods of Song and David in order to perform neural network operations using re-dense operations for densifying a sparse neural network when training deep neural networks with a large number of parameters, (Song, Abstract); Doing so provides a training flow for regularizing deep neural network and achieving better optimization performance, (Song, Abstract).

	
Regarding claim 2, the rejection of claim 1 is incorporated and David in combination with Song further teaches the method according to claim 1: wherein the sparse neural network is formed from a dense neural network by identifying and removing edges of the dense neural network having a zero value range which do not contribute to a final result ( David teaches in 0102-0104: In operation 600, a processor may generate or receive and store a dense neural network in a memory. The dense neural network may have a majority or above thresh­old percentage of neurons in adjacent layers connected to each other… the processor may generate the sparse neural network by prun­ing the weights of the dense neural network of operation 600 [claimed wherein the sparse neural network is formed from a dense neural network by identifying and removing edges of the dense neural network … which do not contribute to a final result], …. The processor may prune the neural network during and/or after a training phase of the neural network. The processor may prune weights using Ll regularization, thresholding, round­ing, and/or random zeroing [claimed removing edges of the dense neural network having a zero value range which do not contribute to a final result]. The processor may prune weights randomly, probabilistically, and/or heuristically… ).”  And as depicted in Fig. 6

    PNG
    media_image2.png
    639
    765
    media_image2.png
    Greyscale

).

Regarding claim 3, the rejection of claim 2 is incorporated and David in combination with Song further teaches David teaches the method according to claim 2: wherein the identifying of the edges of the dense neural network having the zero value range includes locating multiplication operations with a zero weight in layers of the dense neural network (David teaches in 0029: Some embodiments may generate a sparse convo­lutional neural network (CNN). A CNN is represented by a plurality of filters that connect a channel of an input layer to a channel of a convolutional layer. The filter scans the input channel, operating on each progressive region of neurons (e.g., representing a NxN pixel image region), and maps the convolution [claimed multiplication operations] or other transformation of each region to a single neuron in the convolution channel. By connecting entire regions of multiple neurons to each single convolution neuron, filters form synapses having a many-to-one neuron connection, which reduces the number of synapses in CNNs… . Some embodiments may generate a sparse CNN by pruning or zeroing entire filters that have all zero or near zero weights representing weak convolutional relationships between channels [claimed wherein the identifying of the edges of the dense neural network having the zero value range includes locating multiplication operations with a zero weight in layers of the dense neural network]. An new CNN indexing is used that independently and uniquely identifies each filter in the CNN so that pruned filters are not stored, reducing convolution operations and memory usage; And the convolution as a plurality of multiplication operations, in 0114: … In a regular img2col function, two custom matrices are constructed to represent every convolutional operation performed by a layer, such that each row and column multiplication repre­sents a convolutional operation [claimed multiplication operations] . Embodiments of the inven­tion may provide a modified img2col function, in which some of the kernels are zeroed out, and the associated matrices can be modified to omit or delete these rows and columns [claimed wherein the identifying of the edges of the dense neural network having the zero value range includes locating multiplication operations with a zero weight in layers of the dense neural network]. This results in more compact matrices associated with fewer multiplication operations to achieve the same convolutional results, compared to standard img2col opera­tions…).

Regarding claim 5, the rejection of claim 3 is incorporated and David in combination with Song further teaches David teaches the method according to claim 3: wherein the identifying of the edges of the dense neural network having the zero value range further includes locating negative weight values in layers which are followed by a rectifier linear unit (David teaches in 0093: A bias unit may "bias" the weights of a neuron during training by adding a constant value to all of the neuron's weights. If a bias value is low enough ( e.g., a large magnitude negative value), the bias unit may shift all the neuron's weights to a negative value [claimed wherein the identifying of the edges of the dense neural network having the zero value range further includes locating negative weight values in layers which are followed by a rectifier linear unit]. The bias unit may eliminate any output from neuron, e.g., with an activation function such as rectified linear unit (RELU) [claimed wherein the identifying of the edges of the dense neural network having the zero value range further includes locating negative weight values in layers which are followed by a rectifier linear unit], in which all negative or below threshold values are zeroed out…)

Regarding claim 6, the rejection of claim 2 is incorporated and David in combination with Song further teaches David teaches the method according to claim 2: wherein the sparse neural network is further formed by removing edges which have a value range which is less than a predetermined threshold. (David teaches in 0082: Weights and their entries may be physically deleted when the weight, though not zero, is below a near zero threshold: 
    PNG
    media_image3.png
    47
    343
    media_image3.png
    Greyscale
  [claimed wherein the sparse neural network is further formed by removing edges which have a value range which is less than a predetermined threshold]… Example thresholds include, but are not limited to, 0.1, 0.001, 0.0001, 0.00001, etc.)

Regarding claim 7, the rejection of claim 2 is incorporated and David in combination with Song further teaches David teaches the method according to claim 2: further comprising determining whether value ranges in a threshold layer are always less than or always greater than a predetermined threshold, removing computations prior to the threshold layer, and using either a first value or a second value for computations following the threshold layer depending on the determination of whether the value ranges in the threshold layer are always less than or always greater than the predetermined threshold. (David teaches threshold layers as the j layer for which the weight is the first or second value and threshold as the predetermined threshold, in 0082: Weights and their entries may be physically deleted when the weight, though not zero, is below a near zero threshold: 
    PNG
    media_image3.png
    47
    343
    media_image3.png
    Greyscale
 [claimed determining whether value ranges in a threshold layer are always less than or always greater than a predetermined threshold, removing computations prior to the threshold layer, and using either a first value or a second value for computations following the threshold layer depending on the determination of whether the value ranges in the threshold layer are always less than or always greater than the predetermined threshold] …; And alternately claimed threshold layer for eliminating the prior edges associated with pruned weights, in 0093: … The bias unit may eliminate any output from neuron, e.g., with an activation function such as rectified linear unit (RELU), in which all negative or below threshold values are zeroed out. In effect, this turnss the whole neuron off, pruning such a neuron in its entirety from the network, including all of its incoming and outgoing weights. This can be achieved by regularization methods e.g. disclosed herein, but in this case pushing the value to a negative or below threshold target instead of zero [claimed determining whether value ranges in a threshold layer are always less than or always greater than a predetermined threshold, removing computations prior to the threshold layer, and using either a first value or a second value for computations following the threshold layer depending on the determination of whether the value ranges in the threshold layer are always less than or always greater than the predetermined threshold]….)

Regarding claim 8, the rejection of claim 1 is incorporated and David in combination with Song further teaches the method according to claim 1: further comprising generating code for instructing a processor or hardware layout to utilize the dense neural network computations based on the densified neural network. (David teaches in 0096: Remote server 510 may have a memory 515 for storing a neural network and a processor 516 for training and/or predicting based on the neural network [claimed further comprising generating code for instructing a processor or hardware layout to utilize the dense neural network computations based on the densified neural network]. Remote server 510 may prune a dense neural network (e.g., 100 of FIG. 1) to generate a sparse neural network (e.g., 200 of FIG. 1), or may initially generate or receive a sparse neural network. In some embodiments, remote server 510 may have specialized hardware including a large memory 515 for storing a neural network and a specialized processor 516 (e.g., a GPU), for example, when a dense neural network is used [claimed further comprising generating code for instructing a processor or hardware layout to utilize the dense neural network computations based on the densified neural network]]. Memory 515 may store data 517 including a training dataset and data representing a plurality of weights of the neural network. Data 517 may also include code ( e.g., software code) or logic, e.g., to enable storage and retrieval of data 517 according to embodiments of the invention. )

Regarding claim 9, teaches: the rejection of claim 1 is incorporated and David in combination with Song further teaches the method according to claim 1: wherein the sparse network is formed from an initial sparse or dense neural network using an iterative process of identifying and removing disconnected edges of the initial sparse or dense neural network which do not contribute to a final result. (David teaches sparse neural network from dense neural network as depicted in Fig. 6; And pruning (i.e. removing) claimed edges not contributed to final results, in 0096: Remote server 510 may have a memory 515 for storing a neural network and a processor 516 for training and/or predicting based on the neural network. Remote server 510 may prune a dense neural network (e.g., 100 of FIG. 1) [claimed wherein the sparse network is formed from an initial sparse or dense neural network using an iterative process of identifying and removing disconnected edges of the initial sparse or dense neural network which do not contribute to a final result ] to generate a sparse neural network (e.g., 200 of FIG. 1), or may initially generate or receive a sparse neural network. In some embodiments, remote server 510 may have specialized hardware including a large memory 515 for storing a neural network and a specialized processor 516 (e.g., a GPU), for example, when a dense neural network is used. Memory 515 may store data 517 including a training dataset and data representing a plurality of weights of the neural network…; And in 0029: …Some embodiments may generate a sparse CNN by pruning or zeroing entire filters that have all zero or near zero weights representing weak convolutional relationships between channels [claimed disconnected edges of the initial sparse or dense neural network which do not contribute to a final result]…).

Regarding claim 10,  the rejection of claim 9 is incorporated and David in combination with Song further teaches the method according to claim 9: wherein the iterative process goes from an output layer toward in input layer. (David teaches in 0078-0079: Some embodiments of the invention may prune neuron connections using L1 regularization during neural network training in each of one or more iterations ( e.g., in addition to weight correcting updates such as backpropaga­tion [claimed wherein the iterative process goes from an output layer toward in input layer]). The weights wiJ of the neural network may be updated to weights w,j in each training iteration, for example, as follows: … the faster the weights will approach zero, and the larger the portion of the weights that will become absolute zero, representing a disconnection (pruning of the connection) between neurons…  In one embodiment, pruning may be performed using L1 regularization with a modification: The moment a weight becomes zero (or changes sign), the weight's memory entry is physically removed or deleted from storage (from the triplet representation table), …)

Regarding claim 11, David teaches: A system for memory remapping to transform a sparse neural network into a dense neural network, the system comprising memory and one or more processors which, alone or in combination, are configured to provide for execution of a method (Fig. 6 and [0102] e.g., “The operations of FIG. 6 may be executed by a processor (e.g., one or more processor(s) 556 of FIG. 5) using data stored in a memory (e.g., one or more memory unit(s) 558 of FIG. 5)”) comprising: 
Claim 11 limitations are similar to those recited in claim 1, and is similarly rejected.

Regarding claim 12, the rejection of claim 11 is incorporated and the claim recites the system of claim 2, and is similarly rejected.

Regarding claim 13, the rejection of claim 11 is incorporated and the claim recites the system of claim 9, and is similarly rejected.

Regarding claim 14, the rejection of claim 11 is incorporated and the claim recites the system of claim 8, and is similarly rejected.

Regarding claim 15, David teaches: A tangible, non-transitory computer-readable medium having instructions thereon, which, upon being executed by memory and one or more processors, provide for execution of a method (Fig. 6 and [0102] e.g., “The operations of FIG. 6 may be executed by a processor (e.g., one or more processor(s) 556 of FIG. 5) using data stored in a memory (e.g., one or more memory unit(s) 558 of FIG. 5)” [0118] e.g., “instructions, e.g., computer-executable instructions, which, when executed by a processor or controller (e.g., processor 556 of FIG. 5), carry out methods disclosed herein.”) 
Claim 15 limitations are similar to those recited in claim 1, and is similarly rejected.

Regarding claim 16,  the rejection of claim 9 is incorporated and David in combination with Song further teaches the method according to claim 9: wherein the identifying and removing disconnected edges includes, in a single iteration, removing a first edge having the zero value range from a first layer of the dense neural network and, based on removing the first edge from the first layer, removing a second edge from a second layer. (David teaches in 0078-0079: Some embodiments of the invention may prune neuron connections using L1 regularization during neural network training in each of one or more iterations ( e.g., in addition to weight correcting updates such as backpropaga­tion [claimed wherein the identifying and removing disconnected edges includes, in a single iteration, removing a first edge having the zero value range from a first layer of the dense neural network and, based on removing the first edge from the first layer, removing a second edge from a second layer]). The weights wiJ of the neural network may be updated to weights w,j in each training iteration, for example, as follows: … the faster the weights will approach zero, and the larger the portion of the weights that will become absolute zero, representing a disconnection (pruning of the connection) between neurons…  In one embodiment, pruning may be performed using L1 regularization with a modification: The moment a weight becomes zero (or changes sign), the weight's memory entry is physically removed or deleted from storage (from the triplet representation table), …; Examiner notes that backpropagation determines the weight values based on the first later proceeding a second layer; And in 0093: A bias unit may "bias" the weights of a neuron during training by adding a constant value to all of the neuron's weights. If a bias value  is low enough ( e.g., a large magnitude negative value), the bias unit may shift all the neuron's weights to a negative value. The bias unit may eliminate any output from neuron, e.g., with an activation function such as rectified linear unit (RELU), in which all negative or below threshold values  are zeroed out. In effect, this turnss the whole neuron off, pruning such a neuron in its entirety from the network, including all of its incoming and outgoing weights [claimed identifying and removing disconnected edges includes, in a single iteration, removing a first edge having the zero value range from a first layer of the dense neural network and, based on removing the first edge from the first layer, removing a second edge from a second layer]. This can be achieved by regularization methods e.g. disclosed herein, but in this case pushing the value to a negative or below threshold target instead of zero..)

Regarding claim 17,  the rejection of claim 2 is incorporated and David in combination with Song further teaches the method according to claim 2: wherein elements of a layer of the dense neural network are removed based on a determination that a corresponding element of a preceding layer has a zero output. ( David teaches in in 0093: A bias unit may "bias" the weights [claimed input value] of a neuron during training by adding a constant value to all of the neuron's weights. If a bias value [claimed bias value] is low enough ( e.g., a large magnitude negative value [claimed locating negative bias values in layers of the dense neural network in which a maximum input value is smaller than the bias values]), the bias unit may shift all the neuron's weights to a negative value. The bias unit may eliminate any output from neuron, e.g., with an activation function such as rectified linear unit (RELU), in which all negative or below threshold values  are zeroed out. In effect, this turnss the whole neuron off, pruning such a neuron in its entirety from the network, including all of its incoming and outgoing weights [claimed wherein elements of a layer of the dense neural network are removed based on a determination that a corresponding element of a preceding layer has a zero output]. This can be achieved by regularization methods e.g. disclosed herein, but in this case pushing the value to a negative or below threshold target instead of zero…; And in 0097: … The unique index may uniquely identify a pair of artificial neurons that have a connection represented by that weight. In one embodi­ment, each weight may be represented by a triplet defining: (1) a first index value identifying a neuron in a first or "from" layer connected by the weight, (2) a second index value identifying a neuron in a second or "to" layer connected by the weight, and (3) the value of the weight. By indepen­dently indexing the weights, memory 558 may only store entries for connections with non-zero weights ( e.g., deleting or omitting entries for disconnections or no connections associated with zero weights)…. Local endpoint device(s) 550 may each include one or more processor(s) 556 for training, and/or executing prediction based on, the weights of the sparse neural network stored in memory 558. During prediction, the neural network is run forward once. During training, a neural network is run twice, once forward to generate an output and once backwards for error correction (e.g., back- propagation [claimed wherein elements of a layer of the dense neural network are removed based on a determination that a corresponding element of a preceding layer has a zero output).)

Claim 4 is rejected under 35 U.S.C. 103 as being unpatentable over David et al. (US Pub. No. 20190108436, hereinafter David), in view of Han et al. (NPL: “Dsd: Dense-sparse-dense training for deep neural networks”, hereinafter ‘Song’), in further view of Davis et al. (NPL: “Low-Rank Approximations for Conditional Feedforward Computation in Deep Neural Networks”, hereinafter ‘Davis’).

Regarding claim 4, the rejection of claim 1 is incorporated and David in combination with Song further teaches the method according to claim 3: wherein the identifying of the edges of the dense neural network having the zero value range further includes locating negative bias values in layers of the dense neural network in which a maximum input value is smaller than the bias values and which are followed by a rectifier linear unit. (David teaches, 0093: A bias unit may "bias" the weights [claimed input value] of a neuron during training by adding a constant value to all of the neuron's weights. If a bias value [claimed bias value] is low enough ( e.g., a large magnitude negative value [claimed locating negative bias values in layers of the dense neural network in which a maximum input value is smaller than the bias values]), the bias unit may shift all the neuron's weights to a negative value. The bias unit may eliminate any output from neuron, e.g., with an activation function [claimed locating negative bias values in layers of the dense neural network in which a maximum input value is smaller than the bias values and which are followed by a rectifier linear unit] such as rectified linear unit (RELU), in which all negative or below threshold values  are zeroed out [claimed wherein the identifying of the edges of the dense neural network having the zero value range further includes locating negative bias values in layers of the dense neural network in which a maximum input value is smaller than the bias values]. In effect, this turnss the whole neuron off, pruning such a neuron in its entirety from the network, including all of its incoming and outgoing weights. This can be achieved by regularization methods e.g. disclosed herein, but in this case pushing the value to a negative or below threshold target instead of zero.).
Examiner notes that the activation function is applied successively through the layers of the neural network wherein the rectifier linear unit is an activation function that can take into account. 
David and Song do not expressly teach the rectifier linear unit is an activation function that can take into account a maximum input value.
Davis does expressly teach the rectifier linear unit is an activation function that can take into account a maximum input value. (Davis teaches in Sec. 3.1: Given the activation al of layer l of a neural network, the activation al+1 of layer l + 1 is given by: al+1 = σ(alWl) (1) where σ(·) denotes the function defining the neuron’s non-linearity, al ∈ Rn×h1 , al+1 ∈ Rn×h2 , Wl ∈ Rh1×h2… When σ(·) is the rectified-linear function, σ(x) = max(0, x) [claimed a maximum input value is smaller than the bias values] such that all negative elements of the linear transform alWl become zero, one only needs to estimate the sign of the elements of the linear transform in order to predict the zero-valued elements… Given a low-rank approximation Wl ≈ UlVl = ˆWl, the estimated sign of al+1 is given by sgn(al+1) ≈ sgn(al ˆWl) … Each element (al+1)i,j is given by a dot product between the row vector a(i) l and the column vector W(j) l . If sgn(al ˆW (j) l ) = −1 [claimed a maximum input value is smaller than the bias values], then the true activation (al+1)i,j is likely negative, and will likely become zero after the rectified-linear function is applied [claimed wherein the identifying of the edges of the dense neural network having the zero value range further includes locating negative bias values in layers of the dense neural network in which a maximum input value is smaller than the bias values ]…)
The David, Song, and Davis are references would have been recognized by those of ordinary skill in the art as useful for applicant’s purpose in developing method for performing neural network operations using dense and sparse neural network operations.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of the prior art for networks using rectified-linear activation functions in hidden neurons for processing deep neural network operations as disclosed by Davis with the method for performing neural network operations using generated dense and spare neural networks as disclosed by David and Song.
One of ordinary skill in the arts would have been motivated to combine the disclosed methods of Song and David in order to perform neural network operations using activation functions such as rectified-linear units to help processing computations in a layer of a neural network (Davis, Sec. 3.1); Doing so provides helps speed up neural network operations, (Davis, Sec. 3.1).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure are listed below:
Schwartz et al. (US 20180314926 A1): teaches memory handling and data management in machine learning.
Li et al. (US 20190050734 A1): teaches compression method for neural networks (e.g. LSTM), which may effectively shorten the training period of a neural network by combining pruning operation into the training process.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to OLUWATOSIN ALABI whose telephone number is (571)272-0516. The examiner can normally be reached Monday-Friday, 8:00am-5:00pm EST..
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Michael Huntley can be reached on (303) 297-4307. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/OLUWATOSIN O ALABI/Examiner, Art Unit 2129