Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Remarks
This Office Action is responsive to Applicants' Amendment filed on October 14, 2022, in which claims 1, 7, 10, and 17 are currently amended. Claims 1-20 are currently pending.

Response to Arguments
The objection to claim 7 is hereby withdrawn, as necessitated by applicant's amendments and remarks made to the rejections.
Applicant’s arguments with respect to the interpretation of claims 1, 7, 10-11 and 17 under 35 U.S.C. § 112(f) based on amendment have been considered, however, have not been deemed persuasive. 
With respect to Applicant's arguments that a neural processing unit has well-known and inherent structure and therefore should not be interpreted under 112(f), Examiner respectfully disagrees.  Examiner asserts that the recitation of a 'processing unit' would be considered a generic placeholder and that the addition of 'neural processing unit' does not overcome this interpretation.  The element amounts to a generic placeholder (a unit) coupled to a function.
The rejection to claims 11-12 under 35 U.S.C. § 112(b) are hereby withdrawn, as necessitated by applicant's amendments and remarks made to the rejections.

Applicant’s arguments with respect to rejection of claims 1-20 under 35 U.S.C. 101 based on amendment have been considered and are persuasive. The rejection is hereby withdrawn, as necessitated by applicant’s amendments and remarks made to the rejection.
Applicant’s arguments with respect to rejection of claims 1-20 under 35 U.S.C. 102/103 based on amendment have been considered, however, have not been deemed persuasive.
With respect to Applicant's argument that Song does not fairly teach a stochastic rounding operation based on an output value of the approximate multiplication operation", Examiner respectfully disagrees.  Song explicitly teaches that the weight parameters are trained ([¶0003] "The strengths of the connections, or the weight parameter of each synapse, can be adjusted through a learning process as a trainable parameter.") and one of ordinary skill in the art would recognize that neural network training is necessarily and by definition based on an output value.  Examiner asserts that in view of the broad language "based on an output value of the approximate multiplication operation" it would be extremely reasonable to interpret Song as teaching this limitation.
With respect to Applicant's arguments regarding the combination of Song and Lie, in order to clarify the combination which appears to have been misinterpreted, Examiner notes that Lie is introduced to reinforce the use of a stochastic rounding operation as a known form of quantization.  While Examiner asserts that the amendments are obvious and anticipated in view of Song alone, Lie has been relied upon to further reinforce the obviousness as necessitated by amendment.

Claim Objections
Claim 16 objected to because of the following informalities:  “based the output” should read “based on the output”.  Appropriate correction is required.

Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation(s) is/are: 
“a controller configured to” in claims 7 and 11.
“neural processing units” in claims 1, 10 and 17. 
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.  Support is found at least in ¶0054 of the instant specification wherein it is described as being implemented by a processor.  
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: 
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claims 1-20 are rejected under U.S.C. §103 as being unpatentable over the combination of Song and Lie (US20200380370A1).

	 Regarding claim 1, Song teaches A neural network processing unit configured to perform a computation based on one or more instances of input data and a plurality of weights, the neural network processing unit comprising:([¶0002] "The present invention relates to neural network circuits, and more particularly, to neural network circuits having non-volatile synapse arrays using analog values.")
	a plurality of neural processing units, wherein at least one neural processing unit of the plurality of neural processing units is configured to([¶0002]"The present invention relates to neural network circuits, and more particularly, to neural network circuits having non-volatile synapse arrays using analog values." See also FIG. 2.  Synapse interpreted as synonymous with neural processing unit.)
	receive a first value and a second value and perform an approximate multiplication operation based on the first value and the second value; and([¶0062] "When the synapse input signal is applied on WL 265, the synapse output current may approximate the multiplication of the weight (represented by 1/R) by input value Ain from the previous neuron, where Ain may be represented by a voltage on WL 265.")
	perform a [stochastic] rounding operation based on an output value of the approximate multiplication operation.([¶0063] " Trained weight parameters may be quantized and programmed into the resistive changing elements without much accuracy degradation of neural network computation. " Performing a rounding operation interpreted as synonymous with quantizing.  Song explicitly teaches quantizing the trained parameters which are based on an output value of the approximate multiplication operation.).
	Although the training features are implicit in the disclosure of song, Song does not explicitly teach perform a stochastic rounding operation based on an output value of the approximate multiplication operation.
	determine a loss of the output value based on a result of the stochastic rounding operation
	and train the at least one neural processing unit by tuning a parameter of the at least one neural processing unit based on the determined loss..

	Lie, in the same field of endeavor, teaches perform a stochastic rounding operation based on an output value of the approximate multiplication operation.([¶0756] " use of stochastic rounding of FP results reduces the systematic bias, thereby improving accuracy, decreasing training time, decreasing inference latency, and/or increasing energy efficiency. In some scenarios and/or embodiments, rounding is performed on results of dependent FP operations (e.g. FP multiply-accumulate operations), and the rounded results are then fed back into a subsequent dependent FP operation")
	determine a loss of the output value based on a result of the stochastic rounding operation([¶0378] "During Delta 402, deltas (e.g., differences between the forward result and the training output data) are propagated in the backward direction" [¶0757] "If the average magnitude of the parameter updates is small (e.g., 10% of updates are represented by an N+1-bit mantissa, and the remainder are even smaller), then without stochastic rounding the parameter updates would be rounded to zero and no learning would occur. With stochastic rounding, approximately 10% of the weights would be updated and learning would occur" In Lie the loss is taught as being synonymous with a delta value.)
	and train the at least one neural processing unit by tuning a parameter of the at least one neural processing unit based on the determined loss.([¶0343] "The PEs process the training data (e.g., via forward, delta, and chain passes) and update weights until the training is complete." [¶0378] "data flow during training is illustrated conceptually as dashed-arrows Forward 401, Delta 402, and Chain 403. During Forward 401, stimuli is applied to the input layer and activations from the input layer flow to subsequent layers, eventually reaching the output layer and producing a forward result. During Delta 402, deltas (e.g., differences between the forward result and the training output data) are propagated in the backward direction. During Chain 403, gradients are calculated based on the deltas (e.g., with respect to the weights in the neurons) as they are generated during Delta 402. In some embodiments and/or usage scenarios, processing for Delta 402 is substantially overlapped with processing for 403.").

Song and Lie are both directed towards neural network accelerators.  Therefore, Song and Lie are analogous art in the same field of endeavor.  It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the teachings of Song with the teachings of Lie by using stochastic rounding for the quantization. Lie teaches as a motivation for combination ([¶0756] “use of stochastic rounding of FP results reduces the systematic bias, thereby improving accuracy, decreasing training time, decreasing inference latency, and/or increasing energy efficiency.”).  This motivation for combination also applies to the remaining claims depending on this combination.

	 Regarding claim 2, the combination of Song, and Lie teaches The neural network processing unit of claim 1, wherein the at least one neural processing unit is further configured to alternatively select one element of the one or more instances of input data and an output value of one neural processing unit of the plurality of neural processing units, and output the selected one element as the first value.(Song [¶0051] "the electrical current value (IBL) 261 of the positive current port 241 may be the value on the positive output current BL 266 that receives a column selection signal on its respective column selection transistor 263. Likewise, the electrical current value (IBL-bar) 262 of the negative current input 242 may be the negative output current line BL-Bar 267 that receives a column selection signal on its respective column selection transistor 268").
	
	 Regarding claim 3, the combination of Song, and Lie teaches The neural network processing unit of claim 1, wherein the second value includes at least one weight of the plurality of weights.(Song [¶0062] "When the synapse input signal is applied on WL 265, the synapse output current may approximate the multiplication of the weight (represented by 1/R) by input value Ain from the previous neuron, where Ain may be represented by a voltage on WL 265.").
	
	 Regarding claim 4, the combination of Song, and Lie teaches The neural network processing unit of claim 1, wherein the at least one neural processing unit is further configured to accumulate one or more output values of the approximate multiplication operation; and(Song [¶0058] "each synapse layer (e.g., 120) in the neural network 100 may have electrical components (not shown in FIG. 2) that may be electrically coupled to BL 266 and BLB 267 and electrically process the output currents on the BL and BLB lines. For instance, the electrical components may provide differential sensing, convert the output current signals to voltage signals, further convert to digital signals and summate the digital signals in an accumulator" [¶0063] "When the resistance value, R, of the resistor R_p 313 (or R_n 314) is programmed in the training phase and a scaled synapse input signal WLs is applied through WL 265, the synapse output current, IC, on BL 266 (or BLB 267) may be described by equations (4) and (5)...where w and Ain may produce their multiplication result IC approximately." See also Eqn. 4 and 5.  Song explicitly teaches that the value accumulated from the bit lines may be an approximate multiplication with the word line inputs.)
	perform an addition operation based on the output value of the approximate multiplication operation and an output value of the accumulating.(Song [¶0039] "the relationship between input neuron signals (Ain) and output neuron signals (Aout) may be described by an activation function with the following equation: Aout=f(W×Ain+Bias). " [¶0058] "the electrical components may perform other various processing operations, such as normalization and activation, to the accumulated value, to thereby implement the activation function for Aout of equation (1)" Song explicitly teaches that the activation function output applied to the accumulated value involves an addition operation of a bias term.).
	
	 Regarding claim 5, the combination of Song, and Lie teaches The neural network processing unit of claim 4, wherein the at least one neural processing unit is configured to perform the stochastic rounding operation on the output value of the accumulating.(Lie [¶0756] " use of stochastic rounding of FP results reduces the systematic bias, thereby improving accuracy, decreasing training time, decreasing inference latency, and/or increasing energy efficiency. In some scenarios and/or embodiments, rounding is performed on results of dependent FP operations (e.g. FP multiply-accumulate operations), and the rounded results are then fed back into a subsequent dependent FP operation").
	
	 Regarding claim 6, the combination of Song, and Lie teaches The neural network processing unit of claim 1, wherein the at least one neural processing unit is configured to perform the approximate multiplication operation in response to the neural network processing unit operating in a training mode.(Song [¶0054] "In embodiments, the weight values that may be determined during the training phase may not change during the inference stage." [¶0062] "the resistance value R (=R_p or R_n) may be programmed into the resistive changing element in a training phase. When the synapse input signal is applied on WL 265, the synapse output current may approximate the multiplication of the weight (represented by 1/R) by input value Ain from the previous neuron, where Ain may be represented by a voltage on WL 265." Song explicitly teaches that the approximate multiplication occurs during the training mode.).
	
	 Regarding claim 7, the combination of Song, and Lie teaches The neural network processing unit of claim 6, further comprising: a controller configured to output a control signal to control a mode at least one neural processing unit, wherein the at least one neural processing unit is configured to, based on the control signal, operates in one mode of(Song [¶0108] " a row driver 1406 for selecting a row of synapses among the non-volatile synapse array" [¶0109] "The router/controller 1408 implements a finite state machine to control the row selection sequences by the row driver 1406.")
	a first mode in which the approximate multiplication operation is performed, and(Song [¶0062] "When the synapse input signal is applied on WL 265, the synapse output current may approximate the multiplication of the weight (represented by 1/R) by input value Ain from the previous neuron, where Ain may be represented by a voltage on WL 265.")
	a second mode in which a general multiplication operation is performed.(Lie [¶0075] "The long dependency chains of floating-point computations are performed, e.g., to train a neural network or to perform inference with respect to a trained neural network." [¶0138] "The method of EC4, wherein the floating-point operation comprises one of ... floating-point multiplication" floating point multiplication operation interpreted as synonymous with general multiplication operation.).
	
	 Regarding claim 8, the combination of Song, and Lie teaches The neural network processing unit of claim 7, wherein the at least one neural processing unit is configured to, based on the control signal, operate in the second mode in an inference mode of the neural network processing unit.(Song [¶0054] " input signals, Ain, may be applied to the neural network 100 during the inference phase, where the pre-determined weights may be used to produce output values. In embodiments, the weight values that may be determined during the training phase may not change during the inference stage.").
	
	 Regarding claim 9, the combination of Song, and Lie teaches The neural network processing unit of claim 1, wherein the at least one neural processing unit includes a fixed-point-type device.(Lie [¶0790] " Processor 2900 is enabled to optionally and/or selectively perform stochastic rounding for floating-point operations that produce integer results or fixed-point results.")
	determine a loss of the output value based on a result of the stochastic rounding operation(Lie [¶0378] "During Delta 402, deltas (e.g., differences between the forward result and the training output data) are propagated in the backward direction" [¶0757] "If the average magnitude of the parameter updates is small (e.g., 10% of updates are represented by an N+1-bit mantissa, and the remainder are even smaller), then without stochastic rounding the parameter updates would be rounded to zero and no learning would occur. With stochastic rounding, approximately 10% of the weights would be updated and learning would occur" In Lie the loss is taught as being synonymous with a delta value.)
	and train the at least one neural processing unit by tuning a parameter of the at least one neural processing unit based on the determined loss.(Lie [¶0343] "The PEs process the training data (e.g., via forward, delta, and chain passes) and update weights until the training is complete." [¶0378] "data flow during training is illustrated conceptually as dashed-arrows Forward 401, Delta 402, and Chain 403. During Forward 401, stimuli is applied to the input layer and activations from the input layer flow to subsequent layers, eventually reaching the output layer and producing a forward result. During Delta 402, deltas (e.g., differences between the forward result and the training output data) are propagated in the backward direction. During Chain 403, gradients are calculated based on the deltas (e.g., with respect to the weights in the neurons) as they are generated during Delta 402. In some embodiments and/or usage scenarios, processing for Delta 402 is substantially overlapped with processing for 403.").
	 Regarding claim 10, Song teaches one or more semiconductor intellectual property cores (IPs); and([¶0107] "As depicted, the chip 1300 may have a system-on-chip (SoC) structure and include: non-volatile neural network 1316; a CPU 1312 for controlling the elements on the chip 1300; a sensor 1314 for providing input signals to the non-volatile neural network 1316; and a memory 1318" CPU interpreted as synonymous with intellectual property core.)
	a neural network processing unit configured to receive input data from the one or more IPs, and perform a neural network computation based on the input data and a plurality of weights, the neural network processing unit including a plurality of neural processing units, wherein at least one neural processing unit of the plurality of neural processing units is configured to([¶0002]"The present invention relates to neural network circuits, and more particularly, to neural network circuits having non-volatile synapse arrays using analog values." See also FIG. 2.  Synapse interpreted as synonymous with neural processing unit.)
	receive a first value and a second value and perform an approximate multiplication operation on the first value and the second value, and([¶0062] "When the synapse input signal is applied on WL 265, the synapse output current may approximate the multiplication of the weight (represented by 1/R) by input value Ain from the previous neuron, where Ain may be represented by a voltage on WL 265.")
	perform a [stochastic] rounding operation based on an output value of the approximate multiplication operation to output a post activation regarding the output of the approximate multiplication operation.([¶0063] " Trained weight parameters may be quantized and programmed into the resistive changing elements without much accuracy degradation of neural network computation. " Performing a rounding operation interpreted as synonymous with quantizing.  Song explicitly teaches quantizing the trained parameters which are based on an output value of the approximate multiplication operation.).
	However, Song does not explicitly teach perform a stochastic rounding operation based on an output value of the approximate multiplication operation.
	determine a loss of the output value based on a result of the stochastic rounding operation
	and train the at least one neural processing unit by tuning a parameter of the at least one neural processing unit based on the determined loss..

	Lie, in the same field of endeavor, teaches perform a stochastic rounding operation based on an output value of the approximate multiplication operation.([¶0756] " use of stochastic rounding of FP results reduces the systematic bias, thereby improving accuracy, decreasing training time, decreasing inference latency, and/or increasing energy efficiency. In some scenarios and/or embodiments, rounding is performed on results of dependent FP operations (e.g. FP multiply-accumulate operations), and the rounded results are then fed back into a subsequent dependent FP operation")
	determine a loss of the output value based on a result of the stochastic rounding operation([¶0378] "During Delta 402, deltas (e.g., differences between the forward result and the training output data) are propagated in the backward direction" [¶0757] "If the average magnitude of the parameter updates is small (e.g., 10% of updates are represented by an N+1-bit mantissa, and the remainder are even smaller), then without stochastic rounding the parameter updates would be rounded to zero and no learning would occur. With stochastic rounding, approximately 10% of the weights would be updated and learning would occur" In Lie the loss is taught as being synonymous with a delta value.)
	and train the at least one neural processing unit by tuning a parameter of the at least one neural processing unit based on the determined loss.([¶0343] "The PEs process the training data (e.g., via forward, delta, and chain passes) and update weights until the training is complete." [¶0378] "data flow during training is illustrated conceptually as dashed-arrows Forward 401, Delta 402, and Chain 403. During Forward 401, stimuli is applied to the input layer and activations from the input layer flow to subsequent layers, eventually reaching the output layer and producing a forward result. During Delta 402, deltas (e.g., differences between the forward result and the training output data) are propagated in the backward direction. During Chain 403, gradients are calculated based on the deltas (e.g., with respect to the weights in the neurons) as they are generated during Delta 402. In some embodiments and/or usage scenarios, processing for Delta 402 is substantially overlapped with processing for 403.").

	Song and Lie are both directed towards neural network accelerators.  Therefore, Song and Lie are analogous art in the same field of endeavor.  It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the teachings of Song with the teachings of Lie by using stochastic rounding for the quantization. Lie teaches as a motivation for combination ([¶0756] “use of stochastic rounding of FP results reduces the systematic bias, thereby improving accuracy, decreasing training time, decreasing inference latency, and/or increasing energy efficiency.”).  This motivation for combination also applies to the remaining claims depending on this combination.

	 Regarding claim 11, the combination of Song, and Lie teaches The system on chip of claim 10, wherein the neural network processing unit further includes a controller configured to control the approximate multiplication operation, and(Song [¶0108] " a row driver 1406 for selecting a row of synapses among the non-volatile synapse array" [¶0109] "The router/controller 1408 implements a finite state machine to control the row selection sequences by the row driver 1406." [¶0062] "When the synapse input signal is applied on WL 265, the synapse output current may approximate the multiplication of the weight (represented by 1/R) by input value Ain from the previous neuron, where Ain may be represented by a voltage on WL 265.")
	the at least one neural processing unit is configured to perform, based on the controlling of the controller, the approximate multiplication operation in a training mode of the neural network processing unit.(Song [¶0054] "In embodiments, the weight values that may be determined during the training phase may not change during the inference stage." [¶0062] "the resistance value R (=R_p or R_n) may be programmed into the resistive changing element in a training phase. When the synapse input signal is applied on WL 265, the synapse output current may approximate the multiplication of the weight (represented by 1/R) by input value Ain from the previous neuron, where Ain may be represented by a voltage on WL 265." Song explicitly teaches that the approximate multiplication occurs during the training mode.).
	
	 Regarding claim 12, the combination of Song, and Lie teaches The system on chip of claim 11, wherein the at least one neural processing unit is configured to perform, based on the controlling of the controller, a general multiplication operation in an inference mode of the neural network processing unit.(Lie [¶0075] "The long dependency chains of floating-point computations are performed, e.g., to train a neural network or to perform inference with respect to a trained neural network." [¶0138] "The method of EC4, wherein the floating-point operation comprises one of ... floating-point multiplication" floating point multiplication operation interpreted as synonymous with general multiplication operation.).
	
	 Regarding claim 13, the combination of Song, and Lie teaches The system on chip of claim 10, wherein the neural network processing unit further includes data random access memory (data RAM) configured to receive training data from the one or more IPs in a training mode and store the training data.(Song [¶0004] "Other existing application-specific integrated circuit (ASIC) or field-programmable gate array (FPGA) approaches that accelerate computation of neural network with dedicated complementary metal-oxide-semiconductor (CMOS) logic can be power efficient compared to such generic CPU and GPU based approaches, but still wastes unnecessary power and latency to move data to and from the separate off-chip non-volatile memory (NVM) where the trained weight parameters are stored." [¶0127] "The external neural network accelerator device can contain its own CPU and high-density memory (DRAM, Flash Memory, SCM, etc.)").
	
	 Regarding claim 14, the combination of Song, and Lie teaches The system on chip of claim 13, wherein the at least one neural processing unit is configured to receive training data output from the data RAM and an output value of one of the plurality of neural processing units,(Song [¶0061] " the logic friendly non-volatile resistive changing element, R_p 313 (or R_n 314) may be associated with the positive (or negative) weight parameter that the synapse 300 may remember/store. In embodiments, each resistor may be electrically coupled to the source terminal of the input transistor (e.g., 311) and the reference signal line 264 may apply a reference signal to the resistor." [¶0062] " the resistance value R (=R_p or R_n) may be programmed into the resistive changing element in a training phase. When the synapse input signal is applied on WL 265, the synapse output current may approximate the multiplication of the weight (represented by 1/R) by input value Ain from the previous neuron, where Ain may be represented by a voltage on WL 265." [¶0068] "the resistor 313 (or 314) may be implemented with various circuits (or memories), such as non-volatile MRAM, RRAM, or PRAM or single-poly embedded flash memory, where the circuit may be programmed to remember (store) an associate parameter that may be represented by a reciprocal of resistance." Song explicitly teaches that the RAM may be programmed in the training phase and that the values of said RAM may be received with an output value of the neural processing units.)
	select one of the training data and the output value, and output the selected one of the training data and the output value as the first value.(Song [¶0051] "the electrical current value (IBL) 261 of the positive current port 241 may be the value on the positive output current BL 266 that receives a column selection signal on its respective column selection transistor 263. Likewise, the electrical current value (IBL-bar) 262 of the negative current input 242 may be the negative output current line BL-Bar 267 that receives a column selection signal on its respective column selection transistor 268" Song teaches selecting and outputting the output value.).
	
	 Regarding claim 15, the combination of Song, and Lie teaches The system on chip of claim 10, wherein the second value includes at least one weight of the plurality of weights.(Song [¶0062] "When the synapse input signal is applied on WL 265, the synapse output current may approximate the multiplication of the weight (represented by 1/R) by input value Ain from the previous neuron, where Ain may be represented by a voltage on WL 265.").
	
	 Regarding claim 16, the combination of Song, and Lie teaches The system on chip of claim 10, wherein the at least one neural processing unit is configured to accumulate one or more output values of the approximate multiplication operation,(Song [¶0058] "each synapse layer (e.g., 120) in the neural network 100 may have electrical components (not shown in FIG. 2) that may be electrically coupled to BL 266 and BLB 267 and electrically process the output currents on the BL and BLB lines. For instance, the electrical components may provide differential sensing, convert the output current signals to voltage signals, further convert to digital signals and summate the digital signals in an accumulator" [¶0063] "When the resistance value, R, of the resistor R_p 313 (or R_n 314) is programmed in the training phase and a scaled synapse input signal WLs is applied through WL 265, the synapse output current, IC, on BL 266 (or BLB 267) may be described by equations (4) and (5)...where w and Ain may produce their multiplication result IC approximately." See also Eqn. 4 and 5.  Song explicitly teaches that the value accumulated from the bit lines may be an approximate multiplication with the word line inputs.)
	perform an addition operation based the output value of the approximate multiplication operation and an output value of the accumulating, and(Song [¶0052] " one or more of the rows of the synapses 210 may have a fixed input signal voltage on the WLs 265 and the synapses on such rows may store bias values for their columns. In embodiments, the array of synapses may implement the matrix multiplication in equation (1) W×Ain +Bias" Song explicitly teaches performing the activation after the accumulating [¶0058] "the electrical components may perform other various processing operations, such as normalization and activation, to the accumulated value, to thereby implement the activation function for Aout of equation (1).")
	perform the stochastic rounding operation on the output value of the accumulating.(Lie [¶0756] " use of stochastic rounding of FP results reduces the systematic bias, thereby improving accuracy, decreasing training time, decreasing inference latency, and/or increasing energy efficiency. In some scenarios and/or embodiments, rounding is performed on results of dependent FP operations (e.g. FP multiply-accumulate operations), and the rounded results are then fed back into a subsequent dependent FP operation").
	
	 Regarding claim 17, Song teaches A neural network processing unit configured to perform a training operation based on one or more instances of training data and a plurality of weights in a training mode, the neural network processing unit comprising([¶0004] "Other existing application-specific integrated circuit (ASIC) or field-programmable gate array (FPGA) approaches that accelerate computation of neural network with dedicated complementary metal-oxide-semiconductor (CMOS) logic can be power efficient compared to such generic CPU and GPU based approaches, but still wastes unnecessary power and latency to move data to and from the separate off-chip non-volatile memory (NVM) where the trained weight parameters are stored." [¶0127] "The external neural network accelerator device can contain its own CPU and high-density memory (DRAM, Flash Memory, SCM, etc.)")
	a plurality of neural processing units, at least one neural processing unit of the plurality of neural processing units configured to([¶0002]"The present invention relates to neural network circuits, and more particularly, to neural network circuits having non-volatile synapse arrays using analog values." See also FIG. 2.  Synapse interpreted as synonymous with neural processing unit.)
	receive a first value and a second value and perform an approximate multiplication operation on the first value and the second value in the training mode,([¶0054] "In embodiments, the weight values that may be determined during the training phase may not change during the inference stage." [¶0062] "the resistance value R (=R_p or R_n) may be programmed into the resistive changing element in a training phase. When the synapse input signal is applied on WL 265, the synapse output current may approximate the multiplication of the weight (represented by 1/R) by input value Ain from the previous neuron, where Ain may be represented by a voltage on WL 265." Song explicitly teaches that the approximate multiplication occurs during the training mode.)
	perform an addition operation based on an output value of the approximate multiplication operation and a third value,([¶0052] " one or more of the rows of the synapses 210 may have a fixed input signal voltage on the WLs 265 and the synapses on such rows may store bias values for their columns. In embodiments, the array of synapses may implement the matrix multiplication in equation (1) W×Ain +Bias" Song explicitly teaches performing the activation after the accumulating [¶0058] "the electrical components may perform other various processing operations, such as normalization and activation, to the accumulated value, to thereby implement the activation function for Aout of equation (1).")
	accumulate an output value of the approximate multiplication operation, and([¶0058] "each synapse layer (e.g., 120) in the neural network 100 may have electrical components (not shown in FIG. 2) that may be electrically coupled to BL 266 and BLB 267 and electrically process the output currents on the BL and BLB lines. For instance, the electrical components may provide differential sensing, convert the output current signals to voltage signals, further convert to digital signals and summate the digital signals in an accumulator" [¶0063] "When the resistance value, R, of the resistor R_p 313 (or R_n 314) is programmed in the training phase and a scaled synapse input signal WLs is applied through WL 265, the synapse output current, IC, on BL 266 (or BLB 267) may be described by equations (4) and (5)...where w and Ain may produce their multiplication result IC approximately." See also Eqn. 4 and 5.  Song explicitly teaches that the value accumulated from the bit lines may be an approximate multiplication with the word line inputs.)
	perform a [stochastic] rounding operation on an accumulation value output based on the accumulating to output a post activation regarding the accumulation value.([¶0063] " Trained weight parameters may be quantized and programmed into the resistive changing elements without much accuracy degradation of neural network computation. " Performing a rounding operation interpreted as synonymous with quantizing.  Song explicitly teaches quantizing the trained parameters which are based on an output value of the approximate multiplication operation.).
	However, Song does not explicitly teach perform a stochastic rounding operation on an accumulation value output based on the accumulating
	determine a loss of the output value based on a result of the stochastic rounding operation
	and train the at least one neural processing unit by tuning a parameter of the at least one neural processing unit based on the determined loss..

	Lie, in the same field of endeavor, teaches perform a stochastic rounding operation on an accumulation value output based on the accumulating([¶0756] " use of stochastic rounding of FP results reduces the systematic bias, thereby improving accuracy, decreasing training time, decreasing inference latency, and/or increasing energy efficiency. In some scenarios and/or embodiments, rounding is performed on results of dependent FP operations (e.g. FP multiply-accumulate operations), and the rounded results are then fed back into a subsequent dependent FP operation")
	determine a loss of the output value based on a result of the stochastic rounding operation([¶0378] "During Delta 402, deltas (e.g., differences between the forward result and the training output data) are propagated in the backward direction" [¶0757] "If the average magnitude of the parameter updates is small (e.g., 10% of updates are represented by an N+1-bit mantissa, and the remainder are even smaller), then without stochastic rounding the parameter updates would be rounded to zero and no learning would occur. With stochastic rounding, approximately 10% of the weights would be updated and learning would occur" In Lie the loss is taught as being synonymous with a delta value.)
	and train the at least one neural processing unit by tuning a parameter of the at least one neural processing unit based on the determined loss.([¶0343] "The PEs process the training data (e.g., via forward, delta, and chain passes) and update weights until the training is complete." [¶0378] "data flow during training is illustrated conceptually as dashed-arrows Forward 401, Delta 402, and Chain 403. During Forward 401, stimuli is applied to the input layer and activations from the input layer flow to subsequent layers, eventually reaching the output layer and producing a forward result. During Delta 402, deltas (e.g., differences between the forward result and the training output data) are propagated in the backward direction. During Chain 403, gradients are calculated based on the deltas (e.g., with respect to the weights in the neurons) as they are generated during Delta 402. In some embodiments and/or usage scenarios, processing for Delta 402 is substantially overlapped with processing for 403.").

	Song and Lie are both directed towards neural network accelerators.  Therefore, Song and Lie are analogous art in the same field of endeavor.  It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the teachings of Song with the teachings of Lie by using stochastic rounding for the quantization. Lie teaches as a motivation for combination ([¶0756] “use of stochastic rounding of FP results reduces the systematic bias, thereby improving accuracy, decreasing training time, decreasing inference latency, and/or increasing energy efficiency.”).  This motivation for combination also applies to the remaining claims depending on this combination.

	 Regarding claim 18, the combination of Song, and Lie teaches The neural network processing unit of claim 17, wherein the first value includes one of the one or more instances of training data and an output value of one of the plurality of neural processing units, and the second value includes at least one weight of the plurality of weights.(Song [¶0063] "Trained weight parameters may be quantized and programmed into the resistive changing elements...When the resistance value, R, of the resistor R_p 313 (or R_n 314) is programmed in the training phase and a scaled synapse input signal WLs is applied through WL 265, the synapse output current, IC, on BL 266 (or BLB 267) may be described by equations (4) and (5)...where w and Ain may produce their multiplication result IC approximately.").
	
	 Regarding claim 19, the combination of Song, and Lie teaches The neural network processing unit of claim 17, wherein the third value includes the accumulation value output based on the accumulating.(Song [¶0039] "the relationship between input neuron signals (Ain) and output neuron signals (Aout) may be described by an activation function with the following equation: Aout=f(W×Ain+Bias). " [¶0058] "the electrical components may perform other various processing operations, such as normalization and activation, to the accumulated value, to thereby implement the activation function for Aout of equation (1)" Song explicitly teaches that the activation function output applied to the accumulated value involves an addition operation of a bias term.).
	
	 Regarding claim 20, the combination of Song, and Lie teaches The neural network processing unit of claim 17, wherein the training operation includes a fixed-point-type training operation.(Lie [¶0353] "all or any portions of weight information determined via a deep learning accelerator is post-processed outside of the accelerator before inference usage..an example of post-processing comprises quantizing Weights 114 and/or Weights 115 (e.g., converting from a floating-point number format to a fixed-point number format). " Lie teaches exactly two modes being a training and inference mode such that post-processing prior to inference is interpreted as synonymous with a training operation.).
	
Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SIDNEY VINCENT BOSTWICK whose telephone number is (571)272-4720. The examiner can normally be reached M-F 7:30am-5:00pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Miranda Huang can be reached on (571)270-7092. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/SB/Examiner, Art Unit 2124                                                                                                                                                                                                        
/MIRANDA M HUANG/Supervisory Patent Examiner, Art Unit 2124