Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Status of Claims
This action is in response to the amendments filed 10/19/2021. Claims 6, 10, 14, and 17-20 have been amended. Claims 1-20 are currently pending.
	
Information Disclosure Statement
The information disclosure statements (IDS) submitted on 05/07/2021 and 11/07/2021 were filed before the mailing date of this office action. The submission is in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statements are being considered by the examiner.	

Response to Arguments
Applicant’s remarks state that claim 18 has been cancelled; however, the amended claims submitted on 10/19/2021 include an amended claim 18, therefore for purposes of examination claim 18 will continue to be treated on its merits.
In light of Applicant’s amendment, the objection to claims 18-20 have been withdrawn.
In light of Applicant’s amendments, the objection to figure 6 has been withdrawn.
In light of Applicant’s amendment, the 112(b) rejections of claims 6 and 14 have been withdrawn.
In light of Applicant’s amendment, the 101 rejection of claims 17-20 has been withdrawn.
Applicant’s amendments and arguments regarding the prior art rejection have been fully considered but they are not persuasive. Applicant's argues on page 10 that neither Mellempudi 733 nor Mellempudi 607 teach converting already quantized block floating-point values to a lower-precision block floating-point format, however Examiner respectfully disagrees. As written, claim 1 does not require the first floating point format to be quantized, only that the first floating point format has a “first numerical precision”. Claim 1 then requires a second floating point format with “a second numerical precision less than the first numerical precision”. While the broadest reasonable interpretation of this claim includes moving from a quantized floating point format to a lower precision quantized floating point format as Applicant has stated, it also includes moving from a normal floating point format to a quantized floating point format, as supported by figures 3 and 6-8, and at least paragraphs [0026], [0054], [0072]-[0073], [0089], and [0099] of the Applicant’s own specification. 
Applicant also argues on page 12 that since the examples in Mellempudi 607 are directed to converting between a floating point format and a dynamic fixed point format that Mellempudi 607 does not teach converting between two floating point formats. However, paragraph [0212] of Mellempudi 607 states that “While dynamic precision is described herein with respect to fixed-point computations, dynamic precision operations can be also extended to floating-point, low precision floating-point, and custom defined floating-point data types. In particular, blocked dynamic precision operations, as described further in FIG. 21A-21D, can be applied generally to low-precision data types to enable computations on data having a larger 
Finally, Applicant argues on pages 13-14 that neither Mellempudi 733 nor Mellempudi 607 teach claim 10 and its dependent claims, specifically the limitations regarding “storing the compressed activation values in a bulk memory” and “delaying access to the compressed activation values in the bulk memory until forward propagation through all layers of the neural network has completed. However, these arguments are moot in view of the new grounds of rejection necessitated by Applicant’s amendment to claim 10. The prior art rejections have been updated to include the amended limitations and to clarify the reasoning given for the limitations that were not amended.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-9 and 17-20 are rejected under 35 U.S.C. 103 as being unpatentable over Mellempudi et al (US 20180285733 A1, hereinafter Mellempudi 733),  in view of Mellempudi et al (US 20180322607 A1, hereinafter Mellempudi 607).
Regarding claim 1, Mellempudi 733 teaches a computing system (Mellempudi 733 fig. 1 and para. [0013] recite that each computing node 102 may be embodied as any type of computation or computer device (i.e. a computing system)) comprising:
one or more processors; bulk memory comprising computer-readable storage devices and/or memory (Mellempudi 733 fig. 1 and para. [0013] recite the computing node 102 illustratively includes a processor 120 (i.e. one or more processors), an input/output ("I/0") subsystem 126, a memory 128 (i.e. bulk memory), a data storage device 130, and communication circuitry 132);
a block floating-point compressor (Mellempudi 733 fig. 2 block 210 and para. [0019] recite sender computing node 102a, which includes an application 202, a quantization library 204, a quantization controller 206, a quantizer 208, and a compressor 210 (i.e. a block floating-point compressor)) formed from at least one of the processors, the block floating-point compressor being in communication with the bulk memory; and the computing system being configured to:
produce first activation values in a first block floating-point format, the first block floating-point format having a first numerical precision (Mellempudi 733 para. [0021] the message may include one or more artificial neural network training algorithm values, and each artificial neural network training algorithm value may be embodied as an activation value, a weight value, or a weight update value (i.e. first activation values). Para. [0024] recites that the quantizer 208 is configured to operate with integer and floating-point real numbers (i.e. a first numerical precision for a first floating-point format)),
(Mellempudi 733 fig. 3 and para. [0036] recite that in block 312, the host fabric interface 124 determines a quantization level requested by the application 202. Para. [0037] recites that in block 314, the host fabric interface 124 quantizes the message based on the determined quantization level (i.e. converts an activation value to a second block floating-point format to produce compressed activation values)), the second block floating-point format having a second numerical precision less than the first numerical precision (Mellempudi 733 para. [0032] recites that the quantization level may be any level of quantization that reduces the precision of the value, thereby reducing the number of bits required to represent the message (i.e. the second floating-point block format is less precise than the first floating-block point format after quantization)).
However, Mellempudi 733 does not explicitly teach with at least one of the processors, performing forward propagation for a layer of a neural network; and with at least one of the processors, storing the compressed activation values in the bulk memory.
Mellempudi 607 teaches: 
with at least one of the processors, perform forward propagation (Mellempudi 607 para. [0139] recites that data received at the nodes of an input layer of a feedforward network are propagated (i.e. "fed forward" or forward propagated) to the nodes of the output layer via an activation function that calculates the states of the nodes of each successive layer in the network based on coefficients ("weights") respectively associated with each of the edges connecting the layers) for a layer of a neural network; and
(Mellempudi 607 para. [0188] recites a method and apparatus are described to perform quantization and data representation of low-precision tensors (i.e. activation values) in deep learning applications. Each low-precision tensor may contain a data buffer and associated metadata represented as a data structure. The metadata may contain information pertaining to data type (integer, fixed-point, float or any other custom data type), precision and shared exponent(s)/scaling factor(s) necessary for performing data conversions and arithmetic operations. The data buffer may be stored as one contiguous block or many smaller blocks with as many exponents/scaling factors corresponding to each block (i.e. storing activation values in bulk memory)).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine these teachings by adapting the quantization and compression methods from Mellempudi 733 to be used with the infrastructure of the graphics processing unit from Mellempudi 607. The combination would allow one of ordinary skill to expand the quantization and compression methods between nodes of a neural network from Mellempudi 733 and use them during the forward and backward propagations between layers of a neural network, which would save computation time and resources while training the neural network.   
Regarding claim 2, the combination of Mellempudi 733 and Mellempudi 607 teaches the computing system according to claim 1, wherein the second block floating-point format has a lower-precision mantissa and/or a lower-precision exponent than the first block floating-point format (Mellempudi 607 para. [0234] recites that a higher dynamic range requires more precision bits to represent the full range using integers. One solution is to split the tensor into smaller blocks with independent shared exponent while maintaining the integer data at lower precision (i.e. using a lower-precision exponent for the lower precision second block floating-point format)).
Regarding claim 3, the combination of Mellempudi 733 and Mellempudi 607 teaches the computing system according to claim 1, wherein the computing system is further configured to:
convert the first activation values to a normal-precision format, producing converted normal-precision values (Examiner notes that the conversion of first activation values to a normal precision format corresponds to dequantizing the first activation values based on para. [0101] from the specification “the activation values are dequantized 724 to a normal precision format prior to converting to the second block-floating-point format and storing in the bulk memory 770”. Mellempudi 733 fig. 4 and para. [0046] recites that in block 420, the host fabric interface 124 dequantizes the received message to reconstruct the dequantized message based on the determined quantized level of the quantized message (i.e. converts first activation values to a normal precision format)); and
convert the at least one of the activation values to the second block floating-point format by converting the converted normal-precision values to the second block floating- point format to produce compressed activation values (Mellempudi 733 fig. 4 and para. [0039] recite that in block 318, the host fabric interface 124 compresses the message or quantized message as requested by the application 202 (i.e. converts values from a normal precision format to a second precision format via compression))
Regarding claim 4, the combination of Mellempudi 733 and Mellempudi 607 teaches the computing system according to claim 1, wherein the second block floating-point format has a different sharing format of a common exponent than the first block floating- point format, the sharing format being different based on per-row, per-column, or per-tile sharing of a common exponent for the compressed activation values (Mellempudi 607 fig. 21B and para. [0237] recite a blocked multi-dimensional fine-grained quantized tensor 2110, according to an embodiment. The tensor 2110 can be partitioned into blocks along one or more dimensions, with each block having a separate shared exponent (i.e. the sharing format being different based on the partition, which could be based on the row, the column, or the tile). The shared exponent data for the quantized tensor 2110 can be stored in metadata for the tensor. The metadata can maintain the exponent scaling factor for each block, as well as the block size for each block).
Regarding claim 5, the combination of Mellempudi 733 and Mellempudi 607 teaches the computing system according to claim 1, wherein the compressor is further configured to further compress the compressed activation values (Mellempudi fig. 4 and para. [0039] recite that in block 318, the host fabric interface 124 compresses the message or quantized message as requested by the application 202. The host fabric interface 124 may use any appropriate technique to compress the message (i.e. further compress the compressed activation values)) prior to the storing by performing at least one or more of the following: entropy compression, zero compression, run length encoding (Mellempudi 733 para. [0039] recites that the host fabric interface 124 may perform run-length encoding or another lossless compression algorithm)
Regarding claim 6, the combination of Mellempudi 733 and Mellempudi 607 teaches the computing system according to claim 1, wherein the computing system is further configured to:
convert the stored, compressed activation values to activation values in the first block floating-point format (Mellempudi 733 fig. 4 and para. [0043] recite that in block 410, the host fabric interface 124 decompresses the received message (i.e. converts compressed activation values to a quantized first block floating-point format)) to uncompressed activation values (Mellempudi 733 fig.4 and para. [0046] recite that in block 420, the host fabric interface 124 dequantizes the received message to reconstruct the dequantized message based on the determined quantized level of the quantized message (i.e. converts quantized first block floating-point activation values to an uncompressed format)); and
perform a gradient operation with the uncompressed activation values (Mellempudi 607 para. [0158] recites that after back-propagation, the network can then learn from those errors (that have been back-propagated and converted into an uncompressed format) using an algorithm, such as the stochastic gradient descent algorithm, to update the weights of the neural network).
Regarding claim 7, the combination of Mellempudi 733 and Mellempudi 607 teaches the computing system according to claim 1, wherein the layer is a first layer, the compressed activation values are first compressed activation values, and wherein the computing system is further configured to:
with at least one of the processors, perform forward propagation (Mellempudi 607 para. [0139] recites that data received at the nodes of an input layer of a feedforward network are propagated (i.e. "fed forward" or forward propagated) to the nodes of the output layer via an activation function that calculates the states of the nodes of each successive layer in the network based on coefficients ("weights") respectively associated with each of the edges connecting the layers) for a different, second layer of a neural network to produce second activation values in the first block floating-point format (Mellempudi 607 fig. 9A and para. [0165] recite that the first convolutional layer 904 of fig. 9A can output to the second convolutional layer 906 (i.e. a different, second layer));
with the block floating-point compressor, for at least one of the second activation values, convert the at least one of the second activation values to a third block floating-point format to produce second compressed activation values, the third block floating-point format having a numerical precision different than the second numerical precision (Mellempudi 733 fig. 4 and para. [0039] recite that in block 318, the host fabric interface 124 compresses the message or quantized message as requested by the application 202 (i.e. converts values from a first block floating-point precision format to a third block floating-point precision format via compression); and
with at least one of the processors, storing the second compressed activation values in the bulk memory (Mellempudi 607 para. [0188] recites a method and apparatus are described to perform quantization and data representation of low-precision tensors (i.e. activation values) in deep learning applications. Each low-precision tensor may contain a data buffer and associated metadata represented as a data structure. The metadata may contain information pertaining to data type (integer, fixed-point, float or any other custom data type), precision and shared exponent(s)/scaling factor(s) necessary for performing data conversions and arithmetic operations. The data buffer may be stored as one contiguous block or many smaller blocks with as many exponents/scaling factors corresponding to each block (i.e. storing activation values in bulk memory)).
Regarding claim 8, the combination of Mellempudi 733 and Mellempudi 607 teaches the computing system according to claim 1, wherein the processors comprise at least one of the following: a tensor processing unit, a neural network accelerator, a graphics processing unit (Mellempudi 607 para. [0043] recites a graphics processing unit (GPU), communicatively coupled to host/processor cores to accelerate graphics operations, machine-learning operations, pattern analysis operations, and various general-purpose GPU (GPGPU) functions), or a processor implemented in a reconfigurable logic array; and
the bulk memory is situated on a different integrated circuit than the processors (Mellempudi 607 fig. 1 and para. [0045] recite that the computing system 100 includes a processing subsystem 101 having one or more processor(s) 102 and a system memory 104 communicating via an interconnection path that may include a memory hub 105).
Regarding claim 9, the combination of Mellempudi 733 and Mellempudi 607 teaches the computing system according to claim 1, wherein the bulk memory includes dynamic random access memory (DRAM) or embedded DRAM (Mellempudi 607 fig. 22 and para. [0249] recite that memory device 2220 can be a dynamic random access memory (DRAM) device, a static random-access memory (SRAM) device, flash memory device, phase-change memory device, or some other memory device having suitable performance to serve as process memory) and the system further comprises a hardware accelerator including a memory temporarily storing the first activation values for at least a portion of only one layer of the neural network, the hardware accelerator memory including static RAM (SRAM) or a register file (Mellempudi 607 fig. 4B and para. [0095] recite an accelerator integration circuit 436 (i.e. a hardware accelerator), which provides cache management, memory access, context management, and interrupt management services on behalf of a plurality of graphics processing engines 431, 432, N of the graphics acceleration module 446. Para. [0097] recites within the accelerator integration circuit 436, a set of registers 449 that store context data for threads executed by the graphics processing engines 431-432, N and a context management circuit 448 manages the thread contexts).
Regarding claim 17, Mellempudi 733 teaches one or more computer-readable storage devices or media, excluding modulated data signals, storing computer-executable instructions (Mellempudi 733 para. [0010] recites that the disclosed embodiments may also be implemented as instructions carried by or stored on one or more transitory or non-transitory machine-readable (e.g., computer-readable) storage media, which may be read and executed by one or more processors), which when executed by a computer, cause the computer to perform a method of configuring a computer system to implement an artificial neural network (Mellempudi 733 fig. 1 and para. [0012] recite an illustrative embodiment, a system 100 for scaling multilayered artificial neural network training algorithms includes several computing nodes 102 in communication over a network 104 (i.e. an artificial neural network)), the instruction comprising:
instructions that cause the computer system to implement a first layer of neural network using first weights and/or first activation values (Mellempudi 733 para. [0021] recites [0021] the message may include one or more artificial neural network training algorithm values, and each artificial neural network training algorithm value may be embodied as an activation value (i.e. first activation values), a weight value, or a weight update value) expressed in a first block floating- point format (Mellempudi 733 para. [0024] recites that the quantizer 208 is configured to operate with integer and floating-point real numbers (i.e. expressed in a first block floating-point format)).
However, Mellempudi 733 does not explicitly teach instructions that cause the computer system to forward propagate values from the first layer of the neural network to a second layer of the neural network, and instructions that cause the computer system to, prior to performing back propagation for the neural network, store the second activation values in a second, different block floating-point format in a bulk memory in communication with the computer system.
Mellempudi 607 teaches teach instructions that cause the computer system to forward propagate values from the first layer of the neural network to a second layer of the neural network (Mellempudi 607 para. [0139] recites that data received at the nodes of an input layer of a feedforward network are propagated (i.e. "fed forward" or forward propagated) to the nodes of the output layer via an activation function that calculates the states of the nodes of each successive layer in the network based on coefficients ("weights") respectively associated with each of the edges connecting the layers), and 
instructions that cause the computer system to, prior to performing back propagation for the neural network, store the second activation values in a second, different block floating-point format in a bulk memory in communication with the computer system (Mellempudi 607 para. [0188] recites a method and apparatus are described to perform quantization and data representation of low-precision tensors (i.e. activation values) in deep learning applications. Each low-precision tensor may contain a data buffer and associated metadata represented as a data structure. The metadata may contain information pertaining to data type (integer, fixed-point, float or any other custom data type), precision and shared exponent(s)/scaling factor(s) necessary for performing data conversions and arithmetic operations. The data buffer may be stored as one contiguous block or many smaller blocks with as many exponents/scaling factors corresponding to each block (i.e. storing activation values in bulk memory)).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine these teachings by adapting the quantization and compression methods from Mellempudi 733 to be used with the infrastructure of the graphics processing unit from Mellempudi 607. The combination would allow one of ordinary skill to expand the quantization and compression methods between nodes of a neural network from Mellempudi 733 and use them during the forward and backward propagations between layers of a neural network, which would save computation time and resources while training the neural network.
Regarding claim 18, the combination of Mellempudi 733 and Mellempudi 607 teaches the computer-readable storage devices or media according to claim 17, further comprising:
instructions that cause the computer system temporarily store the first weights and/or the first activation values in a different memory than the bulk memory or storage device (Mellempudi 607 para. [0077] recites the register file 258 provides a set of registers for the functional units of the graphics multiprocessor 234. The register file 258 provides temporary storage for operands connected to the data paths of the functional units (e.g. GPGPU cores 262, load/store units 266) of the graphics multiprocessor 234).
Regarding claim 19, the combination of Mellempudi 733 and Mellempudi 607 teaches the computer-readable storage devices or media according to claim 17, further comprising:
instructions that cause the computer system to further compress the second activation values prior to storing the further compressed values in the bulk memory or storage device (Mellempudi fig. 4 and para. [0039] recite that in block 318, the host fabric interface 124 compresses the message or quantized message as requested by the application 202. The host fabric interface 124 may use any appropriate technique to compress the message (i.e. further compress the compressed activation values)).
Regarding claim 20, the combination of Mellempudi 733 and Mellempudi 607 teaches the computer-readable storage devices or media according to claim 17, further comprising:
instructions that cause the computer system to select the second block floating-point format based on an aspect of the second layer (Mellempudi 607 para. [0160] recites that the convolutional layers are sparsely connected, which differs from traditional neural network configuration found in the fully connected layers. The dimensionality reduction performed within the convolutional layers is one aspect that enables the CNN to scale to process large images (i.e. selecting the less precise second block floating-point format based on the sparsely connected convolution layer)).

Claims 10-16 are rejected under 35 U.S.C. 103 as being unpatentable over Mellempudi et al (US 20180285733 A1, hereinafter Mellempudi 733),  in view of Mellempudi et al (US .
*Gao et al relies on the provisional application priority date of 9/12/2018. Examiner has noted the sections of the provisional application that support the cited sections of the non-provisional application below.
Regarding claim 10, Mellempudi 733 teaches a method of operating a computing system implementing a neural network (Mellempudi para. [0032] recites that the host fabric interface 124 of the sender computing node 102a may execute a method 300 for sending training algorithm messages), the method comprising:
with the computing system:
generate activation values in a first floating-point format (Mellempudi 733 para. [0021] the message may include one or more artificial neural network training algorithm values, and each artificial neural network training algorithm value may be embodied as an activation value, a weight value, or a weight update value (i.e. first activation values). Para. [0024] recites that the quantizer 208 is configured to operate with integer and floating-point real numbers (i.e. a first numerical precision for a first floating-point format));
converting at least one of the activation values to a second floating-point format different than the first floating-point format, generating compressed activation values (Mellempudi 733 fig. 3 and para. [0036] recite that in block 312, the host fabric interface 124 determines a quantization level requested by the application 202. Para. [0037] recites that in block 314, the host fabric interface 124 quantizes the message based on the determined quantization level (i.e. converts an activation value to a second block floating-point format to produce compressed activation values)).
However, Mellempudi 733 does not explicitly teach forward propagating a layer of the neural network, a block floating point format, and storing the compressed activation values in a computer-readable memory or storage device.
Mellempudi 607 teaches forward propagating a layer of the neural network (Mellempudi 607 para. [0139] recites that data received at the nodes of an input layer of a feedforward network are propagated (i.e. "fed forward" or forward propagated) to the nodes of the output layer via an activation function that calculates the states of the nodes of each successive layer in the network based on coefficients ("weights") respectively associated with each of the edges connecting the layers),
a block floating point format (Mellempudi 607 fig. 21C and para. [0238] recite hardware logic can be configured to perform low-precision computations using blocked, dynamic precision data. In one embodiment, the filter data can be quantized to a blocked, dynamic precision format that requires a reduced number of bits to store the data, while shifting the scale factor in a fine-grained manner to avoid data loss. The activations at each layer can be quantized to a low-precision format, such as a dynamic fixed-point or blocked flow-precision floating point format (i.e. a block floating point format)), 
and storing the compressed activation values in a bulk memory (Mellempudi 607 para. [0188] recites a method and apparatus are described to perform quantization and data representation of low-precision tensors (i.e. activation values) in deep learning applications. Each low-precision tensor may contain a data buffer and associated metadata represented as a data structure. The metadata may contain information pertaining to data type (integer, fixed-point, float or any other custom data type), precision and shared exponent(s)/scaling factor(s) necessary for performing data conversions and arithmetic operations. The data buffer may be stored as one contiguous block or many smaller blocks with as many exponents/scaling factors corresponding to each block (i.e. storing activation values in bulk memory)).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine these teachings by adapting the quantization and compression methods from Mellempudi 733 to be used with the infrastructure of the graphics processing unit from Mellempudi 607, as well as using the block floating point formats from Mellempudi 607 in the quantization and compression methods from Mellempudi 733. The combination would allow one of ordinary skill to expand the quantization and compression methods between nodes of a neural network from Mellempudi 733 and use them during the forward and backward propagations between layers of a neural network, which would save computation time and resources while training the neural network. Using the block floating point format from Mellempudi 607 would also allow one of ordinary skill to process quantization and compression calculations from Mellempudi 733 faster, which would improve the overall performance.
However, the combination of Mellempudi 733 and Mellempudi 607 does not teach delaying access to the compressed activation values stored in the bulk memory until forward propagation through all layers of the neural network has been completed.
 (fig. 4 and para. [0050] recite training engine 201 generates 402 a first one or more quantized activation outputs of a first one or more layers of a neural network. For example, training engine 201 may add an activation quantization layer to each layer and/or convolutional block in the first one or more layers that generates an activation output. The activation quantization layer may convert floating point activation outputs from the preceding layer into values that are represented using fewer bits than the floating point activation outputs. Para. [0051] recites training engine 201 freezes 404 weights in the first one or more layers. Para. [0052] recites training engine 201 then fine-tunes 406 weights in a second one or more layers of the neural network following the first one or more layers based at least on the first one or more quantized activation outputs. Para. [0053] recites training engine 201 may continue generating quantized activation outputs of certain layers of the neural network, freezing weights in the layers, and fine-tuning weights in subsequent layers of the neural network until activation quantization in the neural network is complete 408 (i.e. delaying access to activation values until forward propagation though all layers is complete). Examiner’s Note: this citation from the non-provisional application is supported by at least figures 2-3 and paragraphs [0027-31] and [0038] of the provisional application)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine these teachings by using the method of freezing completed layers of the neural network from Gao to delay access to compressed activation values stored in the bulk memory from Mellempudi 607 until the forward 
Regarding claim 11, the combination of Mellempudi 733, Mellempudi 607 and Gao teaches the method according to claim 10, wherein the second block floating-point differs from the first block floating-point format in at least one of the following ways: a different mantissa format, a different exponent format, or a different exponent sharing scheme (Mellempudi 607 para. [0234] recites that a higher dynamic range requires more precision bits to represent the full range using integers. One solution is to split the tensor into smaller blocks with independent shared exponent while maintaining the integer data at lower precision (i.e. using a lower precision exponent for the second block floating-point format, lower being a kind of difference between two formats)).
Regarding claim 12, the combination of Mellempudi 733, Mellempudi 607 and Gao teaches the method according to claim 10, wherein the second block floating-point format has a lower numerical precision than the first block floating-point format (Mellempudi 733 para. [0032] recites that the quantization level may be any level of quantization that reduces the precision of the value, thereby reducing the number of bits required to represent the message (i.e. after quantization, the second format differs from the first format used before quantization))
Regarding claim 13, the combination of Mellempudi 733, Mellempudi 607 and Gao teaches the method according to claim 10, further comprising:
prior to the storing, further compressing the compressed activation values (Mellempudi fig. 4 and para. [0039] recite that in block 318, the host fabric interface 124 compresses the message or quantized message as requested by the application 202. The host fabric interface 124 may use any appropriate technique to compress the message (i.e. further compress the compressed activation values)) stored in the computer-readable memory or storage device by one or more of the following techniques:
entropy compression, zero compression, run length encoding (Mellempudi 733 para. [0039] recites that the host fabric interface 124 may perform run-length encoding or another lossless compression algorithm), compressed sparse row compression, or compressed sparse column compression.
Regarding claim 14, the combination of Mellempudi 733, Mellempudi 607 and Gao teaches the method according to claim 10, further comprising:
with the computing system, converting the stored, compressed activation values to activation values in the first block floating-point format (Mellempudi 733 fig. 4 and para. [0043] recite that in block 410, the host fabric interface 124 decompresses the received message (i.e. converts compressed activation values to a quantized first block floating-point format)) to uncompressed activation values (Mellempudi 733 fig.4 and para. [0046] recite that in block 420, the host fabric interface 124 dequantizes the received message to reconstruct the dequantized message based on the determined quantized level of the quantized message (i.e. converts quantized first block floating-point activation values to an uncompressed format)); and
with the computing system, performing a gradient operation with the uncompressed activation values; and with the computing system, updating weights for at least one node of the neural network based on the uncompressed activation values (Mellempudi 607 para. [0158] recites that after backpropagation, the network can then learn from those errors (that have been backpropagated and converted into an uncompressed format) using an algorithm, such as the stochastic gradient descent algorithm (i.e. performing a gradient operation with the uncompressed activation values), to update the weights of the neural network (i.e. updating weights for at least one node of the neural network based on the uncompressed activation values)).
Regarding claim 15, the combination of Mellempudi 733, Mellempudi 607 and Gao teaches the method according to claim 10, further comprising:
with the computing system, performing backward propagation (Mellempudi 607 para. [0158] recites that backpropagation of errors is a common method used to train neural networks. An input vector is presented to the network for processing. The output of the network is compared to the desired output using a loss function and an error value is calculated for each of the neurons in the output layer. The error values are then propagated backwards until each neuron has an associated error value which roughly represents its contribution to the original output) for a layer of the neural network by converting the stored, compressed activation values to activation values in the first block floating-point format (Mellempudi 733 fig. 4 and para. [0043] recite that in block 410, the host fabric interface 124 decompresses the received message (i.e. converts compressed activation values to a quantized first block floating-point format)) to uncompressed activation values (Mellempudi 733 fig.4 and para. [0046] recite that in block 420, the host fabric interface 124 dequantizes the received message to reconstruct the dequantized message based on the determined quantized level of the quantized message (i.e. converts quantized first block floating-point activation values to an uncompressed format)); and
with the computing system, performing a gradient operation with the uncompressed activation values; and with the computing system, updating weights for a portion of at least one node of the neural network based on the uncompressed activation values (Mellempudi 607 para. [0158] recites that after backpropagation, the network can then learn from those errors (that have been backpropagated and converted into an uncompressed format) using an algorithm, such as the stochastic gradient descent algorithm (i.e. performing a gradient operation with the uncompressed activation values), to update the weights of the neural network (i.e. updating weights for at least one node of the neural network based on the uncompressed activation values)), wherein the at least one node is one of the following: a long-short term memory node (LSTM) (Mellempudi 607 para. [0242] recites that while CNN training is illustrated, the techniques described herein can also be applied to other types of neural networks, such as RNNs, LSTM, and GANs (generative adversarial networks)), a gated recurrent unit (GRU).
Regarding claim 16, the combination of Mellempudi 733, Mellempudi 607 and Gao teaches the method according to claim 10, further comprising:
(Mellempudi 607 para. [0160] recites that the convolutional layers are sparsely connected, which differs from traditional neural network configuration found in the fully connected layers. The dimensionality reduction performed within the convolutional layers is one aspect that enables the CNN to scale to process large images (i.e. selecting the less precise second block floating-point format based on the sparsely connected convolution layer)), the layer comprising a long-short term memory node (LSTM), the layer comprising a gated recurrent unit (GRU), the layer being fully-connected to another layer, the layer being sparsely-connected to another layer, the layer being an attention layer, the layer being a normalization layer.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to LEAH M FEITL whose telephone number is (571)272-8350. The examiner can normally be reached on M-F 0800-1700.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Li B. Zhen can be reached on (571) 272-3768. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll- free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
	/L.M.F./             Examiner, Art Unit 2121