DETAILED ACTION

The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

This action is responsive to the original application filed on 5/8/2018 and the Remarks and Amendments filed on 2/8/2022.

Claim Objections

Claim 1 and its dependents are objected to because of the following informalities:  Claim 1 recites the limitation “the forward pass including computation of a dot product of input vector and weight vector” which should read as “the forward pass including computation of a dot product of an input vector and a weight vector” Appropriate correction is required.

Claim 8 and its dependents are objected to because of the following informalities:  Claim 8 recites the limitation “the one or more training phases including a forward pass that includes computation of a dot product of input vector and weight vector” which should read as “the one or more training phases including a forward pass that includes computation of a dot product of an input vector and a weight vector” Appropriate correction is required.


Claim 16 and its dependents are objected to because of the following informalities:  Claim 16 recites the limitation “the forward pass including computation of a dot product of input vector and weight vector” which should read as “the forward pass including computation of a dot product of an input vector and a weight vector” Appropriate correction is required.

Claim Rejections - 35 USC § 103

In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claims 1, 6, 8, 9, 15, and 16 are rejected under 35 U.S.C. 103 as being obvious over Narang et al. (Narang et al., “Mixed Precision Training”, Feb. 15, 2018, ICLR 2018, pp. 1-12, hereinafter “Narang”) in view of Mellempudi et al. (US 20180322607 A1, hereinafter “Mellempudi”) and Liu et al (US 20190138922 A1, hereinafter “Liu”).

	Regarding claim 1, Narang discloses [a] system for training a neural network, the system comprising: (Abstract; “We introduce methodology for training deep neural networks using half-precision floating point numbers, without losing model accuracy or having to modify hyperparameters”)
perform forward pass and back propagation calculations in a training operation for the neural network using the received training data, the forward pass and back propagation calculations using a first precision (Page 2, §2; “First, all tensors and arithmetic for forward and backward passes use reduced precision, FP16 in our case. Second, no hyper-parameters (such as layer width) are adjusted. Lastly, models trained with these techniques do not incur accuracy loss when compared to single-precision baseline”, which discloses that forward pass and back propagation calculations are performed in a training operation for the neural network using training data; and Page 2, §3.1; “In each iteration an FP16 copy of the master weights is used in the forward and backward pass, halving the storage and bandwidth needed by FP32 training”, which again discloses the forward pass and backpropagation; and Page 3, Figure 1; the figure discloses “FWD” (forward pass), “BWD-Activ” (backpropagation), and “BWD-weight”, wherein the input/output format is F16, or a first precision of F16, or sixteen floating point bits; and Page 5, §3.3; “By and large neural network arithmetic falls into three categories: vector dot-products, reductions, and point-wise operations”, which discloses that the forward and backward calculations are being part of a dot product calculation)
update weights in the neural network as part of the training operation, the weights updated using a second precision, the first precision being less precise than the second precision; (Page 2, §3.1; “In order to match the accuracy of the FP32 networks, an FP32 master copy of weights is maintained and updated with the weight gradient during the optimizer step”, which discloses the updating of the weights in the NN as part of the training operation, the weights being updated using a second precision (FP32), the first precision (FP16) being less precise than the second precision (FP32); and Page 3, Figure 1;  the figure discloses the box “weight update” wherein the formation of the input/output weights is F32, the second precision that is more precise than the first precision (F16))
modify the neural network with the updated weights (Page 2, §3.1; “In order to match the accuracy of the FP32 networks, an FP32 master copy of weights is maintained and updated with the weight gradient during the optimizer step”, which discloses the modifying of the NN with the updated weights; and Page 3, Figure 1; the figure discloses the box “weight update”, and the NN is modified based on the updated weights).
	Narang fails to explicitly disclose but Mellempudi discloses at least one processor; and at least one memory comprising computer program code, the at least one memory and the computer program code configured to, with the at least one processor, cause the at least one processor to: (Figure 13, Elements 1308 and 1305; and Figure 1, Elements 102 and 104)
receive training data; (Figure 11, Elements 1102 and 1104;  the figure discloses, under a broadest reasonable interpretation of the claim language, the receiving of training data into a neural network for training purposes).
Narang and Mellempudi are analogous because both are concerned with neural network processing.  It would have been obvious to one of ordinary skill in the art of neural network computing before the effective filing date of the claimed invention to combine the processor, memory, and receiving of training data of Mellempudi with the neural network training system of Narang to yield the predictable result of at least one processor; and at least one memory comprising computer program code, the at least one memory and the computer program code configured to, with the at least one processor, cause the at least one processor to:.  The motivation for doing so would be to provide for the training and deployment of a deep neural network (Mellempudi; [0168]).
Narang fails to explicitly disclose but Liu discloses the forward pass including computation of a dot product of input vector and weight vector ([0026]; “The forward propagation computation of multilayer artificial neural networks according to embodiments of the present disclosure comprises operations in two or more layers. Each layer may refer to a group of operations. With respect to each layer, a dot product operation may be performed to an input vector and a weight vector. An output neuron may be obtained based on the result of the dot product operation by applying an activation function”, which discloses the forward pass operation that computes a dot product between an input and weight vector).
Narang, Mellempudi, and Liu are analogous because all are concerned with neural network processing.  It would have been obvious to one of ordinary skill in the art of neural network computing before the effective filing date of the claimed invention to combine the dot product computation of Liu with the neural network training system of Narang and Mellempudi to yield the predictable result of the forward pass including computation of a dot product of input vector and weight vector.  The motivation for doing so would be to obtain an output neuron based on applying an activation function to the result of the dot product operation (Liu; [0026]).


	Regarding claim 8, Narang discloses [a] computerized method for training a neural network, the computerized method comprising: (Abstract; “We introduce methodology for training deep neural networks using half-precision floating point numbers, without losing model accuracy or having to modify hyperparameters”)
performing lower precision format training calculations using lower precision format data at one or more training phases of the neural network (Page 2, §2; “First, all tensors and arithmetic for forward and backward passes use reduced precision, FP16 in our case. Second, no hyper-parameters (such as layer width) are adjusted. Lastly, models trained with these techniques do not incur accuracy loss when compared to single-precision baseline”, which discloses performing lower precision format training calculations using lower precision format data at one or more training phases; and Page 2, §3.1; “In each iteration an FP16 copy of the master weights is used in the forward and backward pass, halving the storage and bandwidth needed by FP32 training”)
converting one or more results from the lower precision format training calculations to higher precision format data (Page 2, §3.1; “In order to match the accuracy of the FP32 networks, an FP32 master copy of weights is maintained and updated with the weight gradient during the optimizer step”; and Page 3, Figure 1; the figure discloses converting one or more results from the lower precision format training calculations (F16) to higher precision format data (F32))
performing higher precision format training calculations using the higher precision format data at one or more additional training phases (Page 2, §3.1; “In each iteration an FP16 copy of the master weights is used in the forward and backward pass, halving the storage and bandwidth needed by FP32 training”, which discloses the higher (FP32) precision format training calculation at an additional training phase; and Page 3, Figure 1; the figure discloses the higher precision (F32) training; and Page 3, §3.1; “As shown in 2a, we match FP32 training results when updating an FP32 master copy of weights after FP16 forward and backward passes,”)
modifying the neural network using the results from the one or more additional training phase (Page 2, §3.1; “In order to match the accuracy of the FP32 networks, an FP32 master copy of weights is maintained and updated with the weight gradient during the optimizer step”, which discloses the modifying of the NN with the updated weights which are the results from the one or more additional training phase; and Page 3, Figure 1; the figure discloses the box “weight update”, and the NN is modified based on the updated weights).
	Narang fails to explicitly disclose receiving training data.
	Mellempudi discloses receiving training data; (Figure 11, Elements 1102 and 1104; the figure discloses, under a broadest reasonable interpretation of the claim language, the receiving of training data into a neural network for training purposes).
The motivation to combine Narang and Mellempudi is the same as discussed above with respect to claim 1.
Narang fails to explicitly disclose but Liu discloses the one or more training phases including a forward pass that includes computation of a dot product of input vector and weight vector ([0026]; “The forward propagation computation of multilayer artificial neural networks according to embodiments of the present disclosure comprises operations in two or more layers. Each layer may refer to a group of operations. With respect to each layer, a dot product operation may be performed to an input vector and a weight vector. An output neuron may be obtained based on the result of the dot product operation by applying an activation function”, which discloses the forward pass operation that computes a dot product between an input and weight vector).
The motivation to combine Narang, Mellempudi, and Liu is the same as discussed above with respect to claim 1.

Regarding claim 16, it is a computer storage media claim corresponding to the steps of claim 1, and is rejected for the same reasons as claim 1.

Regarding claim 6, the rejection of claim 1 is incorporated and Narang further discloses wherein the first precision is defined by a first bit format and the second precision is defined by a second bit format, the first bit format being different than the second bit format (Page 3, Figure 1; the figure discloses the first bit format (F16) and the second bit format (F32), both of which are different from each other).


Regarding claim 9, the rejection of claim 8 is incorporated and Narang further discloses wherein the one or more training phases comprise performing back propagation calculations, and (Page 3, Figure 1; the figure discloses “FWD” (forward pass), “BWD-Activ” (backpropagation) the one or more additional training phases comprise performing updating weights in the neural network (Page 2, §3.1; “In order to match the accuracy of the FP32 networks, an FP32 master copy of weights is maintained and updated with the weight gradient during the optimizer step”, which discloses the updating of the weights in the NN as part of the training operation, the weights being updated using a second precision (FP32), the first precision (FP16) being less precise than the second precision (FP32); and Page 3, Figure 1;  the figure discloses the box “weight update” wherein the formation of the input/output weights is F32, the second precision that is more precise than the first precision (F16); and Page 5, §3.3; “By and large neural network arithmetic falls into three categories: vector dot-products, reductions, and point-wise operations”, which discloses that the forward and backward calculations are being part of a dot product calculation).

Regarding claim 15, the rejection of claim 8 is incorporated and Narang further discloses wherein the lower precision format data is defined by a first bit format and the higher precision format data is defined by a second bit format, the first bit format being different than the second bit format (Page 3, Figure 1; the figure discloses the first bit format (F16) and the second bit format (F32), both of which are different from each other).

Claims 2, 4, 10, 13, and 17 are rejected under 35 U.S.C. 103 as being obvious over Narang in view of Mellempudi and Liu and further in view of Das et al. (Das et al., “MIXED PRECISION TRAINING OF CONVOLUTIONAL NEURAL NETWORKS USING INTEGER OPERATIONS”, Feb. 23, 2018, ICLR 2018, pp. 1-11, hereinafter “Das”).

Regarding claims 2, 10, and 17, the rejection of claims 1, 8, 9, and 16 are incorporated but Narang fails to explicitly disclose calculate an integer gradient value in the back propagation calculations and convert the integer gradient value to a floating point value before performing the updating of the weights, wherein the floating point value is at least a 16-bit value.
Das discloses calculate an integer gradient value in the back propagation calculations and convert the integer gradient value to a floating point value before performing the updating of the weights, wherein the floating point value is at least a 16-bit valu (Page 6, §4.3; “Here we first convert the INT32 result to FP32 using the VCVTINTFP32 instruction, followed by a scale and accumulate into the final FP32 result using the VFP32MADD instruction”, which discloses converting the integer gradient value to a FP value of at least 16 (32 is at least 16); and Page 7, Algorithm 2, Lines 26-31; the algorithm discloses the integer to floating point conversion; and Figure 2; the figure discloses the back propagation calculations; and §4; “BPROP” is the backpropagation operation referenced throughout the section).
Narang, Mellempudi, Liu, and Das are analogous because all are concerned with neural network processing.  It would have been obvious to one of ordinary skill in the art of neural network computing before the effective filing date of the claimed invention to combine the integer to floating point conversion of Das with the neural network training system of Narang and Mellempudi and Liu to yield the predictable result of calculate an integer gradient value in the forward pass calculations and convert the integer gradient value to a floating point value before performing the updating of the weights, wherein the floating point value is at least a 16-bit value.  The motivation for doing so would be to prevent neural network computation overflows (Das; Page 6, §4.3).

Regarding claims 4 and 19, the rejection of claims 1 and 16 are incorporated but Narang fails to explicitly disclose wherein the calculations at the first precision comprise using integer values and the calculations at the second precision comprise non- block floating point values.
Das discloses wherein the calculations at the first precision comprise using integer values and the calculations at the second precision comprise non- block floating point values (Page 6, §4.3; “Here we first convert the INT32 result to FP32 using the VCVTINTFP32 instruction, followed by a scale and accumulate into the final FP32 result using the VFP32MADD instruction”, which discloses using integer value at the first precision (INT16) and the second precision using non-block floating point values (FP32); and Page 7, Algorithm 2; the algorithm discloses the integers at the first precision and the FPs at the second precision).
The motivation to combine Narang, Mellempudi, Liu, and Das is the same as discussed above with respect to claim 2.

Regarding claim 13, the rejection of claim 8 is incorporated but Narang fails to explicitly disclose wherein performing the lower precision format training calculations using the lower precision format data comprises using integer values and the calculations using the higher precision format data comprise using non-block floating point values.
Das discloses wherein performing the lower precision format training calculations using the lower precision format data comprises using integer values and the calculations using the higher precision format data comprise using non-block floating point values (Page 6, §4.3; “Here we first convert the INT32 result to FP32 using the VCVTINTFP32 instruction, followed by a scale and accumulate into the final FP32 result using the VFP32MADD instruction”, which discloses using integer value at the first precision (INT16) and the second precision using non-block floating point values (FP32); and Page 7, Algorithm 2; the algorithm discloses the integers at the first precision and the FPs at the second precision).
The motivation to combine Narang, Mellempudi, Liu, and Das is the same as discussed above with respect to claim 2.

Claims 3, 11, 12, and 18 are rejected under 35 U.S.C. 103 as being obvious over Narang in view of Mellempudi and Liu and further in view of Nurvitadhi et al. (US 20190205746 A1, hereinafter “Nurvitadhi”).

Regarding claims 3 and 18, the rejection of claims 1 and 16 are incorporated and Narang further discloses the calculations at the second precision comprise using floating point values, and the floating point values comprise at least sixteen bits (Page 3, Figure 1; the figure discloses the calculations at the second precision (F32) comprise floating point values that are at least 16 bits (F32)).
the calculations at the second precision including updating the weights (Page 2, §3.1; “In mixed precision training, weights, activations and gradients are stored as FP16. In order to match the accuracy of the FP32 networks, an FP32 master copy of weights is maintained and updated with the weight gradient during the optimizer step.”, the weights being updated at FP32).
Narang fails to explicitly disclose but Nurvitadhi discloses wherein the calculations at the first precision comprise using block floating point values ([0253]; “The present design provides arithmetic compute architecture to accommodate above trends of NNs including efficient support for dynamic adjustments in precisions (i.e., variable precision), efficient support for mix precisions (i.e., operands have different precision), efficient support for very low precisions (less than 8 bits), efficient support for fix point as well as dynamic floating point (or block floating point), and efficient support for sparsity” (emphasis added), which discloses the use of a block floating point representation for a first precision; and [0454]; “a block floating point (FP) management unit 3120 to support block FP operations” (emphasis added)).
Narang additionally discloses the calculations at the second precision including updating the weights ([0199]; “The network can then learn from those errors using an algorithm, such as the stochastic gradient descent algorithm, to update the weights of the of the neural network”).
Narang, Mellempudi, Liu, and Nurvitadhi are analogous because all are concerned with neural network processing.  It would have been obvious to one of ordinary skill in the art of neural network computing before the effective filing date of the claimed invention to combine the block floating point representation of Nurvitadhi with the neural network training system of Narang and Mellempudi  and Liu to yield the predictable result of wherein the calculations at the first precision comprise using block floating point values and the calculations at the second precision comprise using floating point values, and the floating point values comprise at least sixteen bits.  The motivation for doing so would be to accommodate trends of NNs including efficient support for dynamic adjustments in precisions (Nurvitadhi; [0253]).

Regarding claim 11, the rejection of claim 8 is incorporated and Narang further discloses the calculations using the higher precision format data comprise using floating point values (Page 3, Figure 1; the figure discloses the calculations at the second precision (F32) comprise floating point values that are at least 16 bits (F32))
the calculations using the higher precision format data including updating weights in the neural network (Page 2, §3.1; “In mixed precision training, weights, activations and gradients are stored as FP16. In order to match the accuracy of the FP32 networks, an FP32 master copy of weights is maintained and updated with the weight gradient during the optimizer step”).
Narang fails to explicitly disclose wherein performing the lower precision format training calculations using the lower precision format data comprises using block floating point values.
Nurvitadhi discloses wherein performing the lower precision format training calculations using the lower precision format data comprises using block floating point values ([0253]; “The present design provides arithmetic compute architecture to accommodate above trends of NNs including efficient support for dynamic adjustments in precisions (i.e., variable precision), efficient support for mix precisions (i.e., operands have different precision), efficient support for very low precisions (less than 8 bits), efficient support for fix point as well as dynamic floating point (or block floating point), and efficient support for sparsity” (emphasis added), which discloses the use of a block floating point representation for a first precision; and [0454]; “a block floating point (FP) management unit 3120 to support block FP operations” (emphasis added)).
The motivation to combine Narang, Mellempudi, Liu, and Nurvitadhi is the same as discussed above with respect to claim 3.

Regarding claim 12, the rejection of claims 8 and 11 are incorporated and Narang further discloses wherein the floating point values comprise at least sixteen bits (Page 3, Figure 1; the figure discloses the calculations at the second precision (F32) comprise floating point values that are at least 16 bits (F32)).

Claims 5, 14, and 20 are rejected under 35 U.S.C. 103 as being obvious over Narang in view of Mellempudi and Liu and further in view of Langhammer et al. (US 20190155575 A1, hereinafter “Langhammer”).

Regarding claims 5 and 20, the rejection of claims 1 and 16 are incorporated but Narang fails to explicitly disclose wherein multiplication is performed at the first precision and accumulation is performed at the second precision.
Langhammer discloses wherein multiplication is performed at the first precision and accumulation is performed at the second precision (Figure 5, Elements 506 and 508; the figure discloses multiplication at the first precision followed by an accumulation or sum at a second precision; and [0056-0057]).
Narang, Mellempudi, Liu, and Langhammer are analogous because all are concerned with neural network processing.  It would have been obvious to one of ordinary skill in the art of neural network computing before the effective filing date of the claimed invention to combine the multiplication and accumulation of Langhammer with the neural network training system of Narang and Mellempudi and Liu to yield the predictable result of wherein multiplication is performed at the first precision and accumulation is performed at the second precision.  The motivation for doing so would be to increase the functional density of machine learning algorithms (Langhammer; [0012]).

Regarding claim 14, the rejection of claim 8 is incorporated but Narang fails to explicitly disclose wherein multiplication is performed using the lower precision format data and accumulation is performed using the higher precision format data.
Langhammer discloses wherein multiplication is performed using the lower precision format data and accumulation is performed using the higher precision format data (Figure 5, Elements 506 and 508; the figure discloses multiplication at the first precision followed by an accumulation or sum at a second precision; and [0056-0057]).
The motivation to combine Narang, Mellempudi, Liu, and Langhammer is the same as discussed above with respect to claim 5.

Claim 7 is rejected under 35 U.S.C. 103 as being obvious over Narang in view of Mellempudi and Liu and further in view of Kim et al. (Kim et al., “A HIGHLY SCALABLE RESTRICTED BOLTZMANN MACHINE FPGA IMPLEMENTATION”, Sep. 2, 2009, 2009 International Conference on Field Programmable Logic and Applications, pp. 367-372, hereinafter “Kim”).

Regarding claim 7, the rejection of claim 1 is incorporated but Narang fails to explicitly disclose one or more Field- programmable Gate Arrays (FPGAs) and wherein the calculations at the second precision are performed by the one or more FPGAs.
Kim discloses one or more Field- programmable Gate Arrays (FPGAs) and wherein the calculations at the second precision are performed by the one or more FPGAs (Abstract; “we describe a novel architecture and FPGA implementation that accelerates the training of general RBMs in a scalable manner, with the goal of producing a system that machine learning researchers can use to investigate ever-larger networks . . . We show that only 16-bit arithmetic precision is necessary, and we consequently use embedded hardware multiply-and-add (MADD) units”, which discloses the use of a FPGA at a precision to train a neural network; and Page 368, §4.1; the section discloses the use of the FPGA to perform the second training operations at a 16 bit precision).
Narang, Mellempudi, Liu, and Kim are analogous because all are concerned with neural network processing.  It would have been obvious to one of ordinary skill in the art of neural network computing before the effective filing date of the claimed invention to combine the FPGA of Kim with the neural network training system of Narang and Mellempudi and Liu to yield the predictable result of one or more Field- programmable Gate Arrays (FPGAs) and wherein the calculations at the second precision are performed by the one or more FPGAs.  The motivation for doing so would be to accelerate the training of general RBMs in a scalable manner (Kim; Abstract).



Response to Arguments

Applicant’s arguments and amendments, filed on 2/8/2021, with respect to the 35 USC § 103 rejection of claims 1-20 have been considered but are but are moot because the arguments do not apply to any of the references being used in the current rejection to reject the claim amendments of independent claims 1, 8, and 16. Narang, Mellempudi, and Liu are now being used to render amended claims 1, 8, and 16 obvious under 35 USC § 103.

Conclusion


Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 


Any inquiry concerning this communication or earlier communications from the examiner should be directed to Brent Hoover whose telephone number is (303)297-4403. The examiner can normally be reached Monday - Friday 9-5 MST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Abdullah al Kawsar can be reached on 517-270-3169. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/BRENT JOHNSTON HOOVER/Examiner, Art Unit 2127