Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this 
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation(s) is/are:
“processor configured to” in claim 11.
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.


Claim Rejections - 35 USC § 101
101 Rejection
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-20 are rejected under 35 USC § 101 because the claimed invention is directed to non-statutory subject matter.

Regarding Claim 1:  Claim 1 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 1 is directed to an apparatus, which is directed to a machine, one of the statutory categories.
Step 2A Prong One Analysis:  Claim 1 recites a computer implemented method of processing neural networks, which, under its broadest reasonable interpretation is a series of mental processes.  For example, but for the generic computer components language, the above limitations in the context of this claim encompass neural network processing, including the following: selecting activations requiring an operation from among the acquired activations by using the bit-vector (observation, evaluation, and judgement).  Therefore, claim 1 recites an abstract idea which is a judicial exception.
Step 2A Prong Two Analysis:  Claim 1 recites additional elements “acquiring activations used in the training process and a bit-vector corresponding to the activations” which amounts to gathering data, which is insignificant extra-solution activity.  Claim 1 also recited additional elements “performing backward propagation using the selected activations and filters corresponding to the selected activations”. However, these additional features are generic functions performed on generic computer components (“A non-transitory computer-readable recording medium”, “a processor”) recited at a high-level of generality, such that they amount to no more than mere instructions to apply the judicial exception using a generic computer component.  An additional element that merely recites the words “apply it” (or an equivalent) with the judicial exception, or merely includes instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea, does not integrate the judicial exception into a practical application.  Therefore, claim 1 is directed to a judicial exception.
Step 2B Analysis:  Claim 1 does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to the lack of integration of the abstract idea into a practical application, the additional elements recited in claim 1 amount to no more than mere instructions to apply the judicial exception using a generic computer component.
This rejection applies equally to independent claims 10 and 11, which recite a computer program product and a system, respectively, as well as to dependent claims 2-9, and 12-20. The additional limitations of the dependent claims are addressed briefly below.
Dependent claims 2 and 12 recite additional insignificant extra-solution activity “the selecting comprises selecting activations representing a non-zero value from among the acquired activations by interpreting bits included in the bit-vector” comprising data gathering.
Dependent claims 3 and 13 recite additional mental processes “selecting the activations requiring the operation from among the acquired activations, in response to a number of selected activations being less than N, where N is a number of multipliers in a single neural functional unit.”
Dependent claims 4 and 14 recite additional mental processes “selecting the activations requiring the operation from among the acquired activations, in response to interpretation of all bits in the bit-vector not being completed.”
Dependent claims 5 and 15 recite additional generic functions performed on a generic computing component “performing forward propagation on the neural network.”
Dependent claims 6 and 16 recite additional generic functions performed on a generic computing component “wherein the performing of the backward propagation comprises performing a multiplication and accumulation operation on the selected activations and the filters corresponding to the selected activations.”
Dependent claims 7 and 17 recite additional generic functions performed on a generic computing component “wherein the performing of the backward propagation comprises updating the acquired activations using a result of the multiplication and accumulation operation.”
Dependent claims 8 and 18 recite additional mental processes “rearranging filters used in forward propagation where the selected activations are generated.”
Dependent claims 9 and 19 recite additional generic functions performed on a generic computing component “updating filters used in forward propagation using a result of performing the backward propagation.”
Dependent claim 20 recites addition generic functions performed on a generic computing component “to record the updated filters in a memory.”

Taken alone, the additional elements of the dependent claims above do not amount to significantly more than the above-identified judicial exception (the abstract idea). Looking at the limitations as an ordered combination adds nothing that is not already present when looking at the elements taken individually. There is no indication that the combination of elements improves the functioning of a computer or improves any other technology. Their collective functions merely provide conventional computer implementation.

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.



Claim 1, 2, 4, 5, 9, 10-12, 14, 15, 19, and 20 are rejected under 35 U.S.C. 102 as being unpatentable over Vijayanarasimhan (US 2016/0180200 A1).

Regarding claim 1, Vijayanarasimhan teaches A method of accelerating a training process of a neural network, the method comprising: ([¶0004] "The neural network uses a fast locality-sensitive hashing technique to approximate a result of the matrix multiplication to allow the neural network to generate scores for a large number, e.g., millions, of output classes." Hashing is interpreted as being motivated by accelerating the network.).
acquiring activations used in the training process and a bit-vector corresponding to the activations; (See FIG. 1.  [¶0008] "In some implementations, the activation vector includes real number values. The method may include converting each of the real numbers in the activation vector to binary values to create a binary vector").
selecting activations requiring an operation from among the acquired activations by using the bit-vector; and ([¶0008] "Selecting the nodes in the particular layer using the activation vector and the hash table may include selecting the one or more nodes in the particular layer by using the integers as input to the hash table." hash table interpreted as synonymous with bit-vector corresponding to the activations.).
performing backward propagation using the selected activations and filters corresponding to the selected activations. ([¶0003] "Each layer in a deep neural network may perform a specific function, e.g., convolution, pooling, normalization, or matrix multiplication and non-linear activation." [¶0027] "When the number of nodes in a particular layer y of the neural network 102 is large, the classification system 100 only needs output from the K nodes with the highest probabilities of activating based on the activation vector x" [¶0031] "The classification system 100 trains the neural network 102 using downpour stochastic gradient descent (SGD). During back-propagation, the classification system 100 only propagates gradients based on the top K nodes that were retrieved during the forward pass of the neural network 102" Filter interpreted as weight layer to perform matrix multiplication for the purpose of convolution, which requires a filter by definition.  Top K nodes interpreted as synonymous with representing selected activations.). 

Regarding claim 2, Vijayanarasimhan teaches The method of claim 1, wherein the selecting comprises selecting activations representing a non-zero value from among the acquired activations by interpreting bits included in the bit-vector. ([¶0007] "The method may include creating a modified activation vector by setting the values in the activation vector that correspond to the nodes that were not selected to zero. Processing the activation vector using the selected nodes to generate the output for the particular layer may include processing the modified activation vector to generate the output for the particular layer...Selecting the one or more nodes may include computing a hash code for at least a portion of the activation vector" Vijayanarasimhan explicity teaches that only selected activations are non-zero during processing. Utilizing the computed hash code for at least a portion of the activation vector is interpreted as synonymous with interpreting the bits included in the bit-vector). 

Regarding claim 4, Vijayanarasimhan teaches The method of claim 1, wherein the selecting comprises selecting the activations requiring the operation from among the acquired activations, in response to interpretation of all bits in the bit-vector not being completed. ([¶0047] "FIG. 2 is a flow diagram of a process 200 for processing an activation vector using selected nodes in a layer to generate an output for the layer."). 

Regarding claim 5, Vijayanarasimhan teaches The method of claim 1, further comprising generating the bit-vector by performing forward propagation on the neural network. ([¶0032] "the classification system 100 may update the weight vectors for the top K nodes that were retrieved during the forward pass and the positive output nodes, e.g., the output nodes that identify a correct classification of the input example. The classification system uses the updated weight vectors to compute updated hash codes for the top K nodes and moves the identifiers for the top K nodes, or a subset of these nodes, to the locations in the hash table 104 pointed to by the updated hash codes." Updating hash table with respect to the forward pass is interpreted as synonymous with generating a bit-vector by performing forward propagation.). 

Regarding claim 9, Vijayanarasimhan teaches The method of claim 1, further comprising updating filters used in forward propagation using a result of performing the backward propagation ([¶0031] "During back-propagation, the classification system 100 only propagates gradients based on the top K nodes that were retrieved during the forward pass of the neural network 102. The classification system 100 may update only the weight vectors for the top K nodes that were retrieved during the forward pass of the neural network 102 using an error vector for the output of the neural network 102." Convolution filters are interpreted as synonymous with weights. [¶0070]).

Regarding claim 10, Vijayanarasimhan teaches A non-transitory computer-readable recording medium that, when executed by a processor, cause the processor to perform the method of claim 1 ([¶0065] "Embodiments of the subject matter described in this specification can be implemented as one or more computer programs, i.e., one or more modules of computer program instructions encoded on a tangible non-transitory program carrier for execution by, or to control the operation of, data processing apparatus." [¶0070] “Computers suitable for the execution of a computer program include, by way of example, can be based on general or special purpose microprocessors or both, or any other kind of central processing unit.”).

Regarding claim 11, claim 11 effectively mirrors claim 1 and is therefore rejected under a similar interpretation.

Regarding claim 12, claim 12 effectively mirrors claim 2 and is therefore rejected under a similar interpretation.

Regarding claim 14, claim 14, effectively mirrors claim 4 and is therefore rejected under a similar interpretation.
Regarding claim 15, claim 15 effectively mirrors claim 5 and is therefore rejected under a similar interpretation.

Regarding claim 19, claim 19 effectively mirrors claim 9 and is therefore rejected under a similar interpretation.

Regarding claim 20, Vijayanarasimhan teaches The neural network device of claim 11, wherein the processor is further configured to record the updated filters in a memory ([¶0019] "The classification system retrieves the top K of those weight vectors, e.g., from the hash table or another location in memory" [¶0031] "The classification system 100 may update only the weight vectors" Filters interpreted as synonymous with weights.).

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 3 and 13 are rejected under 35 U.S.C. 103 as being unpatentable over Vijayanarasimhan and in view of Dally (US 2018/0046906 A1). 

Regarding claim 3, Vijayanarasimhan teaches The method of claim 1, wherein the selecting comprises selecting the activations requiring the operation from among the acquired activations, ([¶0013] "FIG. 1 is an example of a classification system that uses a hash table to determine for which nodes in a particular layer y to perform matrix multiplication using an activation vector x."). However, Vijayanarasimhan does not explicitly teach that the selecting is in response to a number of selected activations being less than N, where N is a number of multipliers in a single neural functional unit.  

Dally who teaches a related method of accelerating a convolutional neural network teaches that the selecting is in response to a number of selected activations being less than N, where N is a number of multipliers in a single neural functional unit. ([¶0031] "transmitting the compact encoding may reduce the number of transitions on buses, further reducing energy consumption. Finally, only the non-zero elements of weights and input activations are provided as operands to the multipliers, ensuring that each multiplier within a processing element (PE) generates a product that affects an output activation value." Dally explicitly teaches selecting only non-zero elements which is interpreted as synonymous with selecting activations requiring the activation among the acquired activations.  Furthermore Dally teaches ensuring that each multiplier product within a single neural functional unit corresponds to a selected activation which is interpreted as synonymous with selecting in response to the number of selected activations being less than or equal to the number of multipliers.). 

Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to distribute the multiply operations of Vijayanarasimhan among available processing units as recommended by Dally. The combination would have been obvious because a person of ordinary skill in the art would understand the advantages to maximizing parallelization of multiply operations in an accelerator.  Dally teaches that by applying non-zero operands to multipliers ([¶0030] “exploits weight and/or activation sparsity to reduce energy consumption and improve processing throughput”).

Regarding claim 13, claim 13 effectively mirrors claim 3 and is therefore rejected under a similar interpretation.

Claims 6, 7, 16, and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Vijayanarasimhan and in view of Jourjine (US4967369A).

Regarding claim 6, Vijayanarasimhan teaches The method of claim 1. Vijayanarasimhan explicitly teaches performing back-propagation with respect the selected activations and the corresponding weights ([¶0030] "The neural network 102 determines the weight vectors W(K) for the top K nodes and computes the probabilities for the top K nodes using the activation vector x and the weight vectors W(K)" [¶0031] "the classification system 100 only propagates gradients based on the top K nodes that were retrieved during the forward pass of the neural network 102") However, Vijayanarasimhan does not explicitly teach wherein the performing of the backward propagation comprises performing a multiplication and accumulation operation on the selected activations and the filters corresponding to the selected activations.  

Jourjine teaches wherein the performing of the backward propagation comprises performing a multiplication and accumulation operation on the selected activations and the filters corresponding to the selected activations. ([Col. 1 l. 35] "Various methods are known in the prior art for feature extraction and pattern recognition. One method, known as error back propagation, involves minimization of error functional which is the sum over squared differences between the desired and actual outputs of the output processors...where each term of the sum depends on the activations and the weights of all or large part of the neural network" Multiply and accumulation operation is interpreted as synonymous with calculating the sum over squared differences.  Neural network filters are interpreted as weights which are learned during back propagation.). 

Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the neural network system of Vijayanarasimhan with the well-known neural network system of Jourjine. The combination would have been obvious because a person of ordinary skill in the art would be able to determine from Jourjine that multiply and accumulate operations during back-propagation are well known in the art ([Col. 1 l. 32] “Various methods are known in the prior art for feature extraction and pattern recognition. One method, known as error back propagation, involves minimization of error functional which is the sum over squared differences between the desired and actual outputs of the output processors”). 

Regarding claim 7, the combination of Vijayanarasimhan, and Jourjine teaches 
The method of claim 6, wherein the performing of the backward propagation comprises updating the acquired activations using a result of the multiplication and accumulation operation. ([Col. 6 l. 61] Jourjine  “Transmission activation slope Wa provides the time scale of response of activation updating with regard to input from activation update unit 32”). 

Regarding claim 16, claim 16 effectively mirrors claim 6 and is therefore rejected under a similar interpretation.

Regarding claim 17, claim 17 effectively mirrors claim 7 and is therefore rejected under a similar interpretation.

Claims 8 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Vijayanarasimhan and in view of Korthikanti (US 2018/0189227 A1). 

Regarding claim 8, Vijayanarasimhan teaches The method of claim 1.  However, Vijayanarasimhan does not explicitly teach wherein the filters corresponding to the selected activations are obtained by rearranging filters used in forward propagation where the selected activations are generated.  

Korthikanti who teaches a related art of dimension shuffling matrices for processor acceleration teaches The method of claim 1, wherein the filters corresponding to the selected activations are obtained by rearranging filters used in forward propagation where the selected activations are generated. ([¶0091] "As an example, a convolution operation may need the dimensions of its filter to be arranged differently for forward propagation operations versus backward propagation operations in an artificial neural network. Accordingly, a dimension shuffle operation may be used to reorder the dimensions of a matrix in memory."). 

Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the hashed activation neural network in Vijayanarasimhan with the reordering of filters in Korthikanti. Korthikanti explicitly teaches the relevance of dimension shuffling with regards to neural networks ([¶0010] “These complex matrix operations (e.g., matrix multiplication and convolutions) may be used to implement the fundamental operations of neural networks, such as forward propagation, backward propagation, and weight updates.”).  The combination would have been obvious because a person of ordinary skill in the art would be able to determine from Korthikanti the added performance advantage of dimension shuffling ([¶0015] “For example, any dimension shuffle operation may be performed to reorder the dimensions of a matrix from one format to another using a minimum number of conversions. These advantages result in reduced processing time for matrix operations, which improves performance for applications that involve complex matrix operations, such as artificial intelligence and machine learning functionality implemented using artificial neural networks”).  

Regarding claim 18, claim 18 effectively mirrors claim 8 and is therefore rejected under a similar interpretation.


Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. EL-YANIV (US 2017/0286830 A1).
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SIDNEY VINCENT BOSTWICK whose telephone number is (571)272-4720.  The examiner can normally be reached on M-F 7:30am-5:00pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Miranda Huang can be reached on (571)270-7092.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/SB/Examiner, Art Unit 2124                                                                                                                                                                                                        

/MIRANDA M HUANG/Supervisory Patent Examiner, Art Unit 2124