Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Remarks
This Office Action is responsive to Applicants' Amendment filed on August 30, 2022, in which claims 1, 3, 11, and 13 have been amended.  Claims 2 and 12 have been canceled.  Claims 1, 3-4, 6-7, 9-11, 13-14, 16-17, and 19-20 are currently pending

Response to Arguments
Applicant’s arguments with respect to rejection of claims 1, 3-4, 6-7, 9-11, 13-14, 16-17, and 19-20 under 35 U.S.C. 103 based on amendment have been considered, however, have not been deemed persuasive.  Applicant's arguments fail to comply with 37 CFR 1.111(b) because they amount to a general allegation that the claims define a patentable invention without specifically pointing out how the language of the claims patentably distinguishes them from the references.
With respect to Applicant's arguments regarding the omission of the amended limitations from the prior art of Judd, Examiner respectfully disagrees.  Judd explicitly teaches input activations and weights being represented as input neurons and synapses, respectively.  Judd teaches modifications to known neural network parallelization architectures (DiDianNao) and improvements including skipping over zero valued weights and activations (synapses and neurons) ([¶0010] "FIG. 1 is a bar graph showing the average fraction of convolutional layer multiplication input neuron values that are zero;" [¶0140] "According to an embodiment, an accelerator may also speed up backpropagation training procedures by selectively skipping values that are close to zero.").  Judd similarly teaches having a multiplier per neuron/activation ([¶0054] "Each neuron lane 140 and synapse sublane is fed respectively with a single element from an Input Neuron Buffer (NBin) 120 lane and a Synapse Buffer (SB) 110 lane. Every cycle, each neuron lane 140 broadcasts its neuron to the two corresponding synapse sublanes 160 resulting into four pairs of neurons and synapses, one per synapse sublane. A multiplier 171 per synapse sublane multiplies the neuron and synapse inputs" [¶0067] "To exploit the significant fraction of zeroes in the neuron stream, the prior art structure in which all neuron lanes are coupled together is changed...With this organization, the neuron lanes 280 are now capable of proceeding independently from one another and thus have the potential to skip over zeroes.").  

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.


	Claims 1, 3-4, 9,  10-11, 13-14, and 19-20 are rejected under U.S.C. §103 as being unpatentable over the combination of Vijayanarasimhan (US 2016/0180200 A1) and Korthikanti (US 2018/0189227 A1) and in further view of Judd (US20190205740A1).

	 Regarding claim 1, Vijayanarasimhan teaches A method of accelerating a training process of a neural network, the method comprising:([¶0004] "The neural network uses a fast locality-sensitive hashing technique to approximate a result of the matrix multiplication to allow the neural network to generate scores for a large number, e.g., millions, of output classes." Hashing is interpreted as being motivated by accelerating the network.)
	generating a bit-vector with reference to the values of output activations of forward propagation(See FIG. 1.  [¶0008] "In some implementations, the activation vector includes real number values. The method may include converting each of the real numbers in the activation vector to binary values to create a binary vector" [¶0031] " the classification system 100 only propagates gradients based on the top K nodes that were retrieved during the forward pass of the neural network" Forward pass interpreted as synonymous with forward propagation.)
	selecting activations requiring an operation from among the acquired activations by using the bit-vector; and([¶0008] "Selecting the nodes in the particular layer using the activation vector and the hash table may include selecting the one or more nodes in the particular layer by using the integers as input to the hash table." hash table interpreted as synonymous with bit-vector corresponding to the activations.)
	backward propagation using the activations performed zero padding and filters corresponding to the selected activations([¶0003] " Each layer in a deep neural network may perform a specific function, e.g., convolution, pooling, normalization, or matrix multiplication and non-linear activation." [¶0027] "When the number of nodes in a particular layer y of the neural network 102 is large, the classification system 100 only needs output from the K nodes with the highest probabilities of activating based on the activation vector x" [¶0031] "he classification system 100 trains the neural network 102 using downpour stochastic gradient descent (SGD). During back-propagation, the classification system 100 only propagates gradients based on the top K nodes that were retrieved during the forward pass of the neural network 102" Filter interpreted as weight layer to perform matrix multiplication for the purpose of convolution, which requires a filter by definition.).
	However, Vijayanarasimhan does not explicitly teach determining whether the input activation of the backward propagation is 0 with reference to the bit-vector;
	performing zero padding on the selected activations, and
	wherein the filters corresponding to the selected activations are obtained by rearranging filters used in the forward propagation where the selected activations are generated.
	 and wherein the selecting comprises selecting activations representing a non-zero value from among the acquired activations by interpreting bits included in the bit-vector until a number of the selected activations is equal to a number of multipliers..

	Judd, in the same field of endeavor, teaches determining whether the input activation of the backward propagation is 0 with reference to the bit-vector;([¶0010] "FIG. 1 is a bar graph showing the average fraction of convolutional layer multiplication input neuron values that are zero;" [¶0140] "According to an embodiment, an accelerator may also speed up backpropagation training procedures by selectively skipping values that are close to zero." Input neuron value interpreted as synonymous with input activation.)
	performing zero padding on the selected activations, and([¶0077] 'Bricks are stored starting at the position their first neuron would have been stored in the conventional 3D array format adjusted to account for the offset fields and are zero padded. The grouping in bricks maintains the ability to index the activation array in the granularity necessary to process each layer.")
	 and wherein the selecting comprises selecting activations representing a non-zero value from among the acquired activations by interpreting bits included in the bit-vector until a number of the selected activations is equal to a number of multipliers.([¶0054] "Each neuron lane 140 and synapse sublane is fed respectively with a single element from an Input Neuron Buffer (NBin) 120 lane and a Synapse Buffer (SB) 110 lane. Every cycle, each neuron lane 140 broadcasts its neuron to the two corresponding synapse sublanes 160 resulting into four pairs of neurons and synapses, one per synapse sublane. A multiplier 171 per synapse sublane multiplies the neuron and synapse inputs" [¶0067] "To exploit the significant fraction of zeroes in the neuron stream, the prior art structure in which all neuron lanes are coupled together is changed...With this organization, the neuron lanes 280 are now capable of proceeding independently from one another and thus have the potential to skip over zeroes." Multiplier per synapse sublane interpreted as synonymous with the selected activations being equal to a number of multipliers.  Skipping over zeros interpreted as synonymous with selecting activations representing a non-zero value from among the acquired activations.  Judd teaches that the neurons contain the activation and the synapse contains the weight.).

	Vijayanarasimhan as well as Judd are directed towards accelerating neural networks.  Therefore, Vijayanarasimhan as well as Judd are analogous art in the same field of endeavor.  It would have been obvious before the effective filing date of the claimed invention to combine the teachings of Vijayanarasimhan with the teachings of Judd by zero padding output matrices and skipping zero valued weights and activations.  Judd provides as additional motivation for combination ([¶0077] “The grouping in bricks maintains the ability to index the activation array in the granularity necessary to process each layer.”).  The advantage to zero padding is that many neural network accelerators are designed to ignore zero elements in matrices, as is the case in Judd ([¶0035] "Embodiments of the invention employ hierarchical data-parallel units, allowing groups of lanes to proceed mostly independently enabling them to skip over the ineffectual computations" [¶0036] "Once the capability to skip zero-operand multiplications is in place, the ineffectual operation identification criteria can be relaxed or loosened to enable further improvements with no accuracy loss."). 
	However, the combination of Vijayanarasimhan, and Judd does not explicitly teach wherein the filters corresponding to the selected activations are obtained by rearranging filters used in the forward propagation where the selected activations are generated..

	Korthikanti, in the same field of endeavor, teaches wherein the filters corresponding to the selected activations are obtained by rearranging filters used in the forward propagation where the selected activations are generated.([¶0091] "As an example, a convolution operation may need the dimensions of its filter to be arranged differently for forward propagation operations versus backward propagation operations in an artificial neural network. Accordingly, a dimension shuffle operation may be used to reorder the dimensions of a matrix in memory.").

	The combination of Vijayanarasimhan, and Judd as well as Korthikanti are directed towards accelerating neural networks.  Therefore, the combination of Vijayanarasimhan, and Judd as well as Korthikanti are analogous art in the same field of endeavor.  It would have been obvious before the effective filing date of the claimed invention to combine the teachings of the combination of Vijayanarasimhan, and Judd with the teachings of Korthikanti by rearranging the filters in the neural network described in the combination of Vijayanarasimhan and Judd.  Korthikanti provides as additional motivation for combination ([¶0015] “For example, any dimension shuffle operation may be performed to reorder the dimensions of a matrix from one format to another using a minimum number of conversions. These advantages result in reduced processing time for matrix operations, which improves performance for applications that involve complex matrix operations, such as artificial intelligence and machine learning functionality implemented using artificial neural networks”).

	 Regarding claim 3, the combination of Vijayanarasimhan, Judd, and Korthikanti teaches The method of claim 1, wherein the selecting comprises selecting the activations requiring the operation from among the acquired activations, (Vijayanarasimhan [¶0013] "FIG. 1 is an example of a classification system that uses a hash table to determine for which nodes in a particular layer y to perform matrix multiplication using an activation vector x.")
	in response to a number of selected activations being less than N, where N is the number of multipliers which is a number of multipliers in a single neural functional unit.(Judd [¶0054] "Each neuron lane 140 and synapse sublane is fed respectively with a single element from an Input Neuron Buffer (NBin) 120 lane and a Synapse Buffer (SB) 110 lane. Every cycle, each neuron lane 140 broadcasts its neuron to the two corresponding synapse sublanes 160 resulting into four pairs of neurons and synapses, one per synapse sublane. A multiplier 171 per synapse sublane multiplies the neuron and synapse inputs" [¶0067] "To exploit the significant fraction of zeroes in the neuron stream, the prior art structure in which all neuron lanes are coupled together is changed...With this organization, the neuron lanes 280 are now capable of proceeding independently from one another and thus have the potential to skip over zeroes." Multiplier per synapse sublane interpreted as synonymous with the selected activations being equal to a number of multipliers.  Skipping over zeros interpreted as synonymous with selecting activations representing a non-zero value from among the acquired activations.  Judd explicitly teaches that there is one activation per multiplier and that those activations may be skipped which is interpreted as synonymous with the selected activations being less than the number of multipliers.).
	
	 Regarding claim 4, the combination of Vijayanarasimhan, Judd, and Korthikanti teaches The method of claim 1, wherein the selecting comprises selecting the activations requiring the operation from among the acquired activations, in response to interpretation of all bits in the bit-vector not being completed.(Vijayanarasimhan [¶0047] "FIG. 2 is a flow diagram of a process 200 for processing an activation vector using selected nodes in a layer to generate an output for the layer." In response to the interpretation of zero bits of the bit-vector is interpreted as satisfying the limitation.).
	
	 Regarding claim 9, the combination of Vijayanarasimhan, Judd, and Korthikanti teaches The method of claim 1, further comprising updating filters used in forward propagation using a result of performing the backward propagation.(Vijayanarasimhan [¶0031] "During back-propagation, the classification system 100 only propagates gradients based on the top K nodes that were retrieved during the forward pass of the neural network 102. The classification system 100 may update only the weight vectors for the top K nodes that were retrieved during the forward pass of the neural network 102 using an error vector for the output of the neural network 102." Convolution filters are interpreted as synonymous with weights.).
	
	 Regarding claim 10, the combination of Vijayanarasimhan, Judd, and Korthikanti teaches A non-transitory computer-readable recording medium that, when executed by a processor, cause the processor to perform the method of claim 1.(Vijayanarasimhan [¶0065] "Embodiments of the subject matter described in this specification can be implemented as one or more computer programs, i.e., one or more modules of computer program instructions encoded on a tangible non-transitory program carrier for execution by, or to control the operation of, data processing apparatus.").

	Claims 11, 13-14, and 19 are directed towards a neural network device implementing the method of claims 1, 3-4, and 9 respectively.  Therefore, the rejections applied to claims 1, 3-4, and 9 also apply to claims 11, 13-14, and 19.  Claims 11, 13-14, and 19 also recite additional elements at least one processor (Vijayanarasimhan [¶0066] “The term “data processing apparatus” encompasses all kinds of apparatus, devices, and machines for processing data, including by way of example a programmable processor, a computer, or multiple processors or computers”) and a memory (Vijayanarasimhan [¶0070] “Generally, a central processing unit will receive instructions and data from a read-only memory or a random access memory or both”).  
	
	Claims 6-7 and 16-17 are rejected under U.S.C. §103 as being unpatentable over the combination of Vijayanarasimhan, Korthikanti, and Judd and in further view of Jourjine (US 4967369 A).

Regarding claim 20, the combination of Vijayanarasimhan, Korthikanti, and Judd teaches The neural network device of claim 11, wherein the processor is further configured to record the updated filters in a memory (Vijayanarasimhan  [¶0019] "The classification system retrieves the top K of those weight vectors, e.g., from the hash table or another location in memory" [¶0031] "The classification system 100 may update only the weight vectors" Filters interpreted as synonymous with weights.).

	 Regarding claim 6, the combination of Vijayanarasimhan, Judd, and Korthikanti teaches The method of claim 1
	However, the combination of Vijayanarasimhan, Judd, and Korthikanti doesn't explicitly teach the performing of the backward propagation comprises performing a multiplication and accumulation operation on the selected activations and the filters corresponding to the selected activations..

	Jourjine, in the same field of endeavor, teaches the performing of the backward propagation comprises performing a multiplication and accumulation operation on the selected activations and the filters corresponding to the selected activations.([Col. 1 l. 35] "Various methods are known in the prior art for feature extraction and pattern recognition. One method, known as error back propagation, involves minimization of error functional which is the sum over squared differences between the desired and actual outputs of the output processors...where each term of the sum depends on the activations and the weights of all or large part of the neural network").

	Vijayanarasimhan, Korthikanti, Judd, and Jourjine are all directed towards accelerating neural networks.  Therefore, Vijayanarasimhan, Korthikanti, Judd, and Jourjine are analogous art in the same field of endeavor.  It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the teachings of Vijayanarasimhan, Korthikanti, and Judd with the teachings of Jourjine by using multiply and accumulate operations for backpropagation.  MAC units in neural network accelerators are very well known in the art, and while Judd explicitly teaches the operation of the MAC units which are implicitly taught as being used for backpropagation, Judd doesn’t explicitly outline the combination.  Jourjine on the other hand, explicitly teaches that multiply and accumulate operations during back-propagation are well known in the art ([Col. 1 l. 32] “Various methods are known in the prior art for feature extraction and pattern recognition. One method, known as error back propagation, involves minimization of error functional which is the sum over squared differences between the desired and actual outputs of the output processors”).

	 Regarding claim 7, the combination of Vijayanarasimhan, Judd, Korthikanti, and Jourjine teaches The method of claim 6, wherein the performing of the backward propagation comprises updating the acquired activations using a result of the multiplication and accumulation operation.(Jourjine [Col. 6 l. 60-65] Transmission activation slope Wa provides the time scale of response of activation updating with regard to input from activation update unit 32).
	
Claims 16-17 are directed towards a neural network device implementing the method of claims 6-7 respectively.  Therefore, the rejections applied to claims 6-7 also apply to claims 16-17.  Claims 16-17 also recite additional elements at least one processor (Vijayanarasimhan [¶0066] “The term “data processing apparatus” encompasses all kinds of apparatus, devices, and machines for processing data, including by way of example a programmable processor, a computer, or multiple processors or computers”) and a memory (Vijayanarasimhan [¶0070] “Generally, a central processing unit will receive instructions and data from a read-only memory or a random access memory or both”).  

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Turakhia (US 20180164866 A1) is directed towards accelerating convolutional neural networks by skipping zero valued multiplication elements.  Kim (“Zena: Zero-aware neural network accelerator”, 2017) is also directed towards skipping zero valued CNN elements stored in bit vectors. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SIDNEY VINCENT BOSTWICK whose telephone number is (571)272-4720.  The examiner can normally be reached on M-F 7:30am-5:00pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Miranda Huang can be reached on (571)270-7092.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/SB/Examiner, Art Unit 2124                                                                                                                                                                                                        


/MIRANDA M HUANG/Supervisory Patent Examiner, Art Unit 2124