DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claim(s) 1-4, 6, 8-9, and 11-15 is/are rejected under 35 U.S.C. 102(a)(2) as anticipated by David et al. (US 20190108436 A1, hereinafter David).

Regarding claim 1, David teaches: A method of memory remapping for utilizing dense neural network computations with a sparse neural network (Fig. 6 and [0102] e.g., “Reference is made to FIG. 6, which is a flowchart of a method for generating a sparse neural network in accordance with some embodiments of the invention.”), the method comprising: 
densifying the sparse neural network ([0061] e.g., “A compressed sparse row (CSR) data representation may be used to reduce storage for a sparse matrix.” [0104 - 0106] Generating a sparse neural network by pruning the weights of the dense neural network; storing non-zero weights in memory that represent connections between pairs of neurons with an association to a unique index. The unique index may uniquely identify a pair of artificial neurons that have a connection represented by the weight.); 
remapping input and output data onto the densified neural network ([0061] e.g., “A compressed sparse row (CSR) data representation may be used to reduce storage for a sparse matrix.” [0062] e.g., “A map representation may replace the conventional matrix with a map where the “from” and the “to” neuron IDs (or filter IDs) are mapped to the weight w.” [0106] Storing non-zero weights in memory that represent connections between pairs of neurons with an association to a unique index. The unique index may uniquely identify a pair of artificial neurons that have a connection represented by the weight.); and 
utilizing the dense neural network computations for a prediction using the remapped input and output data ([0059] e.g., “A new storage system optimized for sparse data representations (e.g., 206, 306, 406) may provide a significant benefit in the training and prediction performance of neural networks. … Pre-fetching non-zero values based on the NN's sparsity pattern, pre-identifies which indices need to be accessed and skips indices for zero valued weights or filters that do not need to be accesses.” [0107] e.g., “In operation 606, a processor, e.g., in prediction mode, may retrieve from memory and run the sparse neural network of operation 604 to compute an output based only on the non-zero weights (and not based on the zero weights) of the sparse neural network.”).

Regarding claim 2, David teaches: The method according to claim 1. 
David further teaches: wherein the sparse neural network is formed from a dense neural network by identifying and removing edges of the dense neural network having a zero value range which do not contribute to a final result ([0009] e.g., “Each weight may represent a unique connection between a pair of a plurality of artificial neurons in different layers of a plurality of neuron layers. … Only non-zero weights may be stored that represent connections between pairs of neurons (and zero weights may not be stored that represent no connections between pairs of neurons).” [0018] e.g., “weights could be removed without detrimental effect” [0019] e.g., “Embodiments of the invention provide a novel system and method to generate a sparse neural network by pruning weak synapse connections during the training phase (instead of only during post-training processing) or by evolving a sparse neural network (e.g., using evolutionary computation).” Examiner notes that “without detrimental effect” and/or “weak synapse connections” are mapped to “which do not contribute to a final result”.).

Regarding claim 3, David teaches: The method according to claim 2.
([0035] e.g., “In conventional matrices, pruned or omitted weights are set to zero”).

Regarding claim 4, David teaches: The method according to claim 3.
David further teaches: wherein the identifying of the edges of the dense neural network having the zero value range further includes locating negative bias values in layers of the dense neural network in which a maximum input value is smaller than the bias values and which are followed by a rectifier linear unit (ReLU) ([0093] e.g., “A bias unit may “bias” the weights of a neuron during training by adding a constant value to all of the neuron's weights. If a bias value is low enough (e.g., a large magnitude negative value), the bias unit may shift all the neuron's weights to a negative value. The bias unit may eliminate any output from neuron, e.g., with an activation function such as rectified linear unit (RELU)”).

Regarding claim 6, David teaches: The method according to claim 2.
David further teaches: wherein the sparse neural network is further formed by removing edges which have a value range which is less than a predetermined threshold ([0082] e.g., “Weights and their entries may be physically deleted when the weight, though not zero, is below a near zero threshold”).

Regarding claim 8, teaches: The method according to claim 1.
([0096] e.g., “Data 517 may also include code (e.g., software code) or logic, e.g., to enable storage and retrieval of data 517 according to embodiments of the invention.” [0107] e.g., “In operation 606, a processor, e.g., in prediction mode, may retrieve from memory and run the sparse neural network of operation 604 to compute an output based only on the non-zero weights (and not based on the zero weights) of the sparse neural network.” [0027] e.g., “This independent indexing thereby eliminates the need to store entries for disconnected synapses (reducing memory consumption) and eliminates computations performed based on disconnected synapses (increasing processing speed).”).

Regarding claim 9, teaches: The method according to claim 1.
David further teaches: wherein the sparse network is formed from an initial sparse or dense neural network using an iterative process of identifying and removing disconnected edges of the initial sparse or dense neural network which do not contribute to a final result ([0018] e.g., “weights could be removed without detrimental effect” [0019] e.g., “a novel system and method to generate a sparse neural network by pruning weak synapse connections during the training phase” [0109] e.g., “during training mode as the network becomes sparser in each iteration of training)” Examiner notes that “without detrimental effect” and/or “weak synapse connections” are mapped to “which do not contribute to a final result”.).

Regarding claim 11, David teaches: A system for memory remapping to transform a sparse neural network into a dense neural network, the system comprising memory and one or more processors which, alone or in combination, are configured to provide for execution of a method (Fig. 6 and [0102] e.g., “The operations of FIG. 6 may be executed by a processor (e.g., one or more processor(s) 556 of FIG. 5) using data stored in a memory (e.g., one or more memory unit(s) 558 of FIG. 5)”) comprising: claim 1, and is similarly analyzed.

Regarding claim 12, the claim recites the system of claim 2, and is similarly analyzed.

Regarding claim 13, the claim recites the system of claim 9, and is similarly analyzed.

Regarding claim 14, the claim recites the system of claim 8, and is similarly analyzed.

Regarding claim 15, David teaches: A tangible, non-transitory computer-readable medium having instructions thereon, which, upon being executed by memory and one or more processors, provide for execution of a method (Fig. 6 and [0102] e.g., “The operations of FIG. 6 may be executed by a processor (e.g., one or more processor(s) 556 of FIG. 5) using data stored in a memory (e.g., one or more memory unit(s) 558 of FIG. 5)” [0118] e.g., “instructions, e.g., computer-executable instructions, which, when executed by a processor or controller (e.g., processor 556 of FIG. 5), carry out methods disclosed herein.”) comprising: claim 1, and is similarly analyzed.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 5 is/are rejected under 35 U.S.C. 103 as being unpatentable over David in view of Alakuijala et al. (US 20190251444 A1, hereinafter Alakuijala).

Regarding claim 5, David teaches: The method according to claim 3. 
David further teaches: in layers which are followed by a rectifier linear unit (ReLU) ([0093] e.g., “with an activation function such as rectified linear unit (RELU), in which all negative or below threshold values are zeroed out.” [0033] e.g., “neural network 200 includes a plurality of artificial neurons 202 arranged in a hierarchy of multiple layers.”)
	David does not explicitly teach: wherein the identifying of the edges of the dense neural network having the zero value range further includes locating negative weight values.
	However, Alakuijala teaches: wherein the identifying of the edges of the dense neural network having the zero value range further includes locating negative weight values ([0035] e.g., “In one example, the proposed change in the weight can equal the negative of the weight's value if the full utility of the edge is being estimated (e.g., for use in pruning or removal decisions).”).
In view of the teachings of Alakuijala it would have been obvious for a person of ordinary skill in the art to apply the teachings of Alakuijala to David before the effective filing date of the claimed invention in order to provide resulting improvements to computing technology tasked with the distribution and use of machine-learned models (cf. [0052] e.g., “The systems and methods described herein also provide resulting improvements to computing technology tasked with the distribution and use of machine-learned models. For example, through the use of advanced compression techniques for machine-learned model distribution as described herein, computing systems may optimize bandwidth use and reduce transfer costs and more efficiently provide machine-learned models for use in various applications, such as mobile applications.”).
 
Claim(s) 7 is/are rejected under 35 U.S.C. 103 as being unpatentable over David in view of Shoaib et al. (US 20170132496 A1, hereinafter Shoaib).

Regarding claim 7, David teaches: The method according to claim 2.
David does not explicitly teach: further comprising determining whether value ranges in a threshold layer are always less than or always greater than a predetermined threshold, removing computations prior to the threshold layer, and using either a first value or a second value for computations following the threshold layer depending on the determination of whether the value ranges in the threshold layer are always less than or always greater than the predetermined threshold.
However, Alakuijala teaches: further comprising determining whether value ranges in a threshold layer are always less than or always greater than a predetermined threshold ([0019] e.g., “neural networks are composed of interconnected “neurons” that make decisions based on input value(s) and threshold(s). At a neuron, a non-linear function (also referred to as an activation function) is applied to an input, and the output of the non-linear function is compared to a threshold. Example non-linear functions include a rectified linear unit (ReLU), hyperbolic tangent (tan h), sigmoid function, or other non-linear function. The neuron can, for example, provide an output of “1” if the value of the non-linear function applied to the input is greater than the threshold or an output of “0” if the value of the non-linear function applied to the input is less than the threshold.” Examiner notes that the Instant Specification discloses “a ReLU can be expressed as a threshold layer” in [0036].), 
removing computations prior to the threshold layer ([0059] e.g., “This shows that in the frequency domain, ReLu( ) acts as a convolution with the function of known form. However, this function depends on the input, so positions are found in the x space domain: fx.sub.i>0. This can be accomplished by taking the inverse transforms of the input and solving the inequality. Thus, once x has been found, the transfer function of the ReLu is known for this input, and FFTs do not need to be calculated.”), and 
using either a first value or a second value for computations following the threshold layer depending on the determination of whether the value ranges in the threshold layer are always less than or always greater than the predetermined threshold ([0055] e.g., “Mathematically, ReLu(f(x)) can be expressed through f(x) as a multiplication with the sign(f(x)): which is equal to 1 if f(x)>0 and 0 otherwise:” Examiner notes that “1” is mapped to a “first value” and “0” is mapped to a “second value” as the Instant Specification discloses “[i]n a threshold layer (if(I <= T) O = X, else O = V) it can be that the input value range is always smaller than T, so that X can always be used, or if always bigger than T, then V can always be used. The output is designated by O, while X is the input and V is a predefined value which gets set if the threshold condition is not satisfied.” in [0036].).
In view of the teachings of Alakuijala it would have been obvious for a person of ordinary skill in the art to apply the teachings of Alakuijala to David before the effective filing date of the claimed invention in order to provide results in improved device processing speed by using ReLU (cf. [0034] e.g., “Convolution in the time domain can be converted to multiplication in the frequency domain, which reduces the complexity of convolutional weighting and results in improved device processing speed and reduced power consumption.”).

Claim(s) 10 is/are rejected under 35 U.S.C. 103 as being unpatentable over David in view of de Vries (US 5812992 A, hereinafter Vries).

Regarding claim 10, David teaches: The method according to claim 9.
David further teaches: wherein the iterative process ([0109] e.g., “during training mode as the network becomes sparser in each iteration of training)”) 
David does not explicitly teach: a process goes from an output layer toward in input layer.
However, Vries teaches: a process goes from an output layer toward in input layer ([Col. 2 ln. 33-34] e.g., “This algorithm is sequentially applied from the output layer 150 back toward the input layer 130.” Fig. 3 and [Col. 4 ln 1-49]: the method in a signal processing system computes an “unpruned” output and selects components for pruning from the output where they are minimal influence on the output of the system. The method goes from the output layer toward the input layer to compute the error for each new input.

    PNG
    media_image1.png
    545
    365
    media_image1.png
    Greyscale
).
In view of the teachings of Vries it would have been obvious for a person of ordinary skill in the art to apply the teachings of Vries to David before the effective filing date of the claimed invention in order to provide performance improvement of a neural network by reducing parameters (cf. [Col. 4 ln. 43-46] e.g., “Removing eigenmodes reduces the effective number of parameters and generally improves generalization, i.e., performance on an out-of-sample data set.”).
 
Prior Art
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure are listed below:
Schwartz et al. (US 20180314926 A1): teaches memory handling and data management in machine learning.
Li et al. (US 20190050734 A1): teaches compression method for neural networks (e.g. LSTM), which may effectively shorten the training period of a neural network by combining pruning operation into the training process.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JAEYONG J PARK whose telephone number is (571) 272-3898. The examiner can normally be reached on M-F 9:00 a.m. - 6:00 p.m.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. 
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Michael Huntley can be reached at (303) 297-4307. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status 

/JAEYONG J PARK/Examiner, Art Unit 2129
/MICHAEL J HUNTLEY/Supervisory Patent Examiner, Art Unit 2129