Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claims 1-12 and 14-17 are pending. 
Response to Arguments
In view of amendment filed October 13, 2020 to the title, the claim interpretation under 35 U.S.C. § 112(f) is withdrawn, however the correct claim interpretation under U.S.C. § 112(f) has been issued below.
Applicant presents the following arguments in the May 19, 2022 amendment:
Lo does not discuss, a metadata generation module for generating metadata associated with at least one of the plurality of blocks of uncompressed neural network activation data, (page 8 lines 1-26 and page 9 lines 1-4).
Examiner presents the following responses to Applicant's arguments:
With respect to applicant's argument A, Examiner respectfully disagrees with applicant's
arguments. Regarding applicant's remarks stating that “a metadata generation module for generating metadata associated with at least one of the plurality of blocks of uncompressed neural network activation data.” Lo discloses the neural network data 200 can include source code, executable code, metadata, configuration data, data structures and/or files for representing the neural network model. The compiler 132 can generate metadata that can be used to identify subgraphs, edge groupings, training data, and various other information about the neural network model during runtime (for example, portions of the neural network model). A plurality of layers of a neural network that can be executed on the neural network. For example, the metadata can include information for interfacing between the different subgraphs or other portions of the neural network model. The system may also include neural network based on the uncompressed activation values and performing backward propagation for a layer of the neural network, (see Lo: Para. 0045-0052, 0071-0087, 0124 and 0137-0149). Therefore, the applicant's claim concept is similar to what Lo discloses. Furthermore, for clarification purposes, if the applicant looks at figure 2, Lo discloses neural network and figure 2 is a neural network structure or diagram. Neural network includes inputs, hidden layers, and outputs. As the applicant points out, that metadata relates to source code/executable code, however when closely reviewing paragraph 0049 the neural network data specifically includes source code, executable code and metadata and is not merely related. Furthermore, in paragraph 0052 the metadata can include information for interfacing between the different subgraphs or other portions (a subgraph, an individual layer, or a plurality of layers of a neural network) of the neural network models. Activation function/ activation value / activation data in a neural network defines how the weighted sums of the input. The input layer takes the input data/information and once it sums it up with the weights in the hidden layers, that takes it through other layers and portions (block), dependent upon the weight the portion data will be estimated with the portion data that has a stronger weight. In paragraph 0124, 0137, and 0140 Lo discloses uncompressed activation values and that can be selected. This discloses the current claim language. Further clarification through amendments to the claim language may aid in differentiating from the current prior at citations. 
Applicant presents the following arguments in the May 19, 2022 amendment:
Lo does not discuss, the input module obtains uncompressed neural network activation data, and that said data is split into a plurality of blocks of uncompressed neural network activation data, (see Page 8 lines 27-28 and page 9 lines 1-4).
Examiner presents the following responses to Applicant's arguments:
With respect to applicant's argument B, Examiner respectfully disagrees with applicant's
arguments. Regarding applicant's remarks stating that “the input module obtains uncompressed neural network activation data, and that said data is split into a plurality of blocks of uncompressed neural network activation data and the selection and subsequent application of a compression scheme based on the metadata associated with the given block.” Lo discloses programmed to perform operations for all or a portion of a layer of a NN, (see Lo: Para. 0040-0045, 0050-0051, 0070 and 0137). For example, once activation values for a layer have been stored of the bulk memory, forward propagation can continue for a number of different layers in the neural network. Neural network training, such as temporary storage of activation values, can be improved by compressing a portion of these values (e.g., for an input, hidden, or output layer of a neural network). The quantization accelerator 186 can access a local memory used for storing weights, biases, input values, output values, forget values, state values, and so forth. The quantization accelerator 186 can have many inputs, where each input can be weighted by a different weight value. For example, input tensors for a neural network represented as normal floating- point numbers (for example, in a 32-bit or 16-bit floating point format) can be converted to the illustrated block floating point format. The block floating point format numbers, there is one exponent value that is shared by all of the numbers of the illustrated set. Three alternative block floating-point formats, as can be used in certain examples of the disclosed technology. These formats may be useful for two-dimensional convolutions, but the formats can be generalized to higher-dimensional convolutions as well., (see Lo: Para. 0026-0040, 0041-0045, 0073-0080, 0085-0095 and FIG. 2-8). For example, using entropy compression, or another suitable compression scheme. The format for the uncompressed activation values can be selected based on a hardware accelerator used to perform neural network operations at process. A different exponent format, or having a different exponent sharing scheme (forward propagation and backward propagation), (see Lo: Para. 0090 and 0115). Therefore, the applicant's claim concept is similar to what Lo discloses. Neural networks are a series of algorithms that operations to recognize relationships between amounts of data. Respectfully, the examiner asks the applicant to consider the following: Lo discloses that neural networks are a set of algorithms that are designed to recognize data. Neural network algorithms are structured by including inputs, hidden layers and outputs. The input data is uncompressed (in its natural or existing form without changes made to it). The weights of the input data is summed within the hidden layers, which is when the switches are turned on or off when the layers are being connected with the different nodes, which is related to the activation data. The plurality data of the uncompressed data or the input data is going to split. In Figure 2, Lo discloses the different layers and splitting to connect to the hidden layers or in order to get the output layer. Once the weight is summed up in the hidden layer, the data is compressed based on the selections of the portions of data to get to the output layer. The scheme or structure or plan within the algorithm, so that the input data can be sorted or selected according to the different portions or blocks. The metadata has to do with the neural network model as discussed above in argument A. This discloses the current claim language. Further clarification through amendments to the claim language may aid in differentiating from the current prior at citations. 
Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  
Such claim limitation(s) is/are:
"metadata generation module" in claim 1
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
Claim 1: ‘metadata generation module’ referring to, the metadata generation module 140 is arranged to generate metadata for each block of input data 110 using hardware logic, such as AND, OR, NOR NANO, and NOT gates along with other hardware such as registers, and flip-flops, Application ¶ [0020].
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.
                                                     Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claims 1, 3-6, 8-10, 12 and 15-17 are rejected under 35 U.S.C. 102{a){2) as being anticipated Lo et al. (US 2020/0210838 A1, hereinafter Lo).
Regarding independent claim(s) 1, Lo discloses a processor arranged to compress neural network activation data comprising: an input module for obtaining uncompressed neural network activation data from memory (Lo discloses processor (also referred to as a central processing unit (CPU)), (see Lo: Para. 0037-0045). The computing system may also include a compressor that is further configured to further compress the compressed activation values prior to the store. the disclosed technology, a neural network accelerator is configured to performing training operations for layers of a neural network, (see Lo: Para. 0040-0042 and 0137-0140). The interconnect can be used to transmit input/output data to and from the quantization accelerators (for storing weights, biases, input values and output values), (see Lo: Para. 0045-0046). A gradient operation is performed with the uncompressed activation values. Such gradient operations are used as part of a neural network training process. The neural network based on the uncompressed activation values, where the at least one node is one of the following: a long-short term memory node, (see Lo: Para. 0125 and 1037-0140). This reads 0n the claim concepts of a processor arranged to compress neural network activation data comprising: an input module for obtaining uncompressed neural network activation data from memory);
a block creation module arranged to split the uncompressed neural network activation data into a plurality of blocks of uncompressed neural network activation data (Lo discloses the computing system may also perform a gradient operation with the uncompressed activation values and perform a gradient operation with the uncompressed activation values (layers of a neural network/split). A plurality of layers of a neural network. For example, the quantization accelerator 186 can be programmed to perform operations for all or a portion of a layer of a NN, (see Lo: Para. 0040-0045, 0050-0051, 0070 and 0137). For example, once activation values for a layer have been stored of the bulk memory, forward propagation can continue for a number of different layers in the neural network. Neural network training, such as temporary storage of activation values, can be improved by compressing a portion of these values (e.g., for an input, hidden, or output layer of a neural network). The quantization accelerator 186 can access a local memory used for storing weights, biases, input values, output values, forget values, state values, and so forth. The quantization accelerator 186 can have many inputs, where each input can be weighted by a different weight value. For example, input tensors for a neural network represented as normal floating- point numbers (for example, in a 32-bit or 16-bit floating point format) can be converted to the illustrated block floating point format. The block floating point format numbers, there is one exponent value that is shared by all of the numbers of the illustrated set. Three alternative block floating-point formats, as can be used in certain examples of the disclosed technology. These formats may be useful for two-dimensional convolutions, but the formats can be generalized to higher-dimensional convolutions as well, (see Lo: Para. 0026-0040, 0041-0045, 0073-0080, 0085-0095 and FIG. 2-8). This reads on the claim concepts of a block creation module arranged to split the uncompressed neural network activation data into a plurality of blocks of uncompressed neural network activation data); 
a metadata generation module for generating metadata associated with at least one of the plurality of blocks of uncompressed neural network activation data (Lo discloses the neural network data 200 can include source code, executable code, metadata, configuration data, data structures and/or files for representing the neural network model. The compiler 132 can generate metadata that can be used to identify subgraphs, edge groupings, training data, and various other information about the neural network model during runtime (for example, portions of the neural network model). A plurality of layers of a neural network that can be executed on the neural network. For example, the metadata can include information for interfacing between the different subgraphs or other portions of the neural network model. The system may also include neural network based on the uncompressed activation values and performing backward propagation for a layer of the neural network, (see Lo: Para. 0045-0052, 0071-0087, 0124 and 0137-0149). This reads on the claim concepts of a metadata generation module for generating metadata associated with at least one of the plurality of blocks of uncompressed neural network activation data);
a selection module for selecting a compression scheme for each of the plurality of blocks of uncompressed neural network activation data, wherein the compression scheme is based on the metadata associated with the block of uncompressed neural network activation data (Lo disclose the neural network data 200 can include a description of nodes, edges, groupings, weights, biases, activation functions, and/or tensor values. The shared exponent 330 is selected to be the largest exponent from among the original normal-precision numbers in the neural network model. For example, using entropy compression, or another suitable compression scheme. The format for the uncompressed activation values can be selected based on a hardware accelerator used to perform neural network operations at process. A different exponent format, or having a different exponent sharing scheme (forward propagation and backward propagation). For example, the metadata can include information for interfacing between the different subgraphs or other portions of the neural network model. The system may also include neural network based on the uncompressed activation values and performing backward propagation for a layer of the neural network. The compiler 132 can generate metadata that can be used to identify subgraphs, edge groupings, training data, and various other information about the neural network model during runtime. (see Lo: Para. 0052, 0076, 0091, 0094, 0115, 0124 and 0140). This reads on the claim concepts of a selection module for selecting a compression scheme for each of the plurality of blocks of uncompressed neural network activation data, wherein the compression scheme is based on the metadata associated with the block of uncompressed neural network activation data);    
a compression module for applying the selected compression scheme to the corresponding block of uncompressed neural network activation data to produce compressed neural network activation data (Lo discloses the neural network module 130 can further provide utilities to allow for training and retraining of a neural network implemented with the module (plurality of layers of a neural network). Perform backward propagation for a layer of the neural network by converting the stored, compressed activation values in the second block floating point format to uncompressed activation values. For example, when performing forward propagation for a layer of a neural network, first activation values are produced in a first block floating-point format. These first activation values can be converted to a second block floating-point format to produce compressed activation values in the second block floating-point format. For example, using entropy compression, or another suitable compression scheme, (see Lo: Para. 0042-0045, 0091-0097, 0115 and FIG. 2-8). This reads on the claim concepts of a compression module for applying the selected compression scheme to the corresponding block of uncompressed neural network activation data to produce compressed neural network activation data); and
an output module for outputting the compressed neural network activation data (Lo discloses the output value can be accessed and sent to a different NN processor core and/or to the neural network module. The training data includes a set of input data for applying to the neural network model 200 and a desired output from the neural network model for each respective dataset of the input data. The modelling framework 131 can be used to train the neural network model with the training data. An output of the training is the weights and biases that are associated with each node of the neural network model. An output by applying a weight to each input generated from the preceding node and collecting the weights to produce an output value. In some examples, each individual node can have an activation function (e.g., for an input, hidden, or output layer of a neural network), (see Lo: Para. 0041-0051, 0065-0070 and 0090-0095 and 0137). This reads on the claim concepts of an output module for outputting the compressed neural network activation data). 
Regarding dependent claim(s) 3, Lo disclose the processor arranged to compress neural network activation data according to claim 1. Lo further discloses further comprising a combination module for combining a plurality of outputs of the compression module (Lo discloses a layer can include nodes that have a subset of common inputs with the other nodes of the layer and/or provide outputs to a subset of common destinations of the other nodes of the layer. The inputs, outputs, and parameters of the layers are tensors. The output function of the layer can be the de-quantized representation off () or alternatively, the output function can include additional terms, such as an activation function or the addition of a bias, that are performed using normal-precision floating point (after de-quantization) or using quantized floating point (before de-quantization). An output layer is formed from a fourth set 240 of nodes (including node 245), (see Lo: Para. 0042-0065 and 0090). This reads on the claim concepts of further comprising a combination module for combining a plurality of outputs of the compression module).
Regarding dependent claim(s) 4, Lo disclose the processor arranged to compress neural network activation data according to claim 1. Lo further discloses wherein the processor is a neural processing unit (Lo discloses the neural network accelerator 180 can include a tensor processing unit 182, reconfigurable logic devices 184, and/or one or more neural processing cores {such as the quantization accelerator 186). models are typically executed on a general-purpose processor {also referred to as a central processing unit (CPU)). The processing unit 1110 executes computer executable instructions and may be a real or a virtual processor, (see Lo: Para. 0044-0059 and 0129- 0137). This reads on the claim concepts of wherein the processor is a neural processing unit).
Regarding dependent claim(s) 5, Lo disclose the processor arranged to compress neural network activation data according to claim 1. Lo further discloses wherein the processor is any of: an image processor; a central processing unit; and a graphics processing unit (Lo discloses deep neural network (DNN) 200 that can be used to perform enhanced image processing using disclosed BFP implementations. The modelling framework 131 can use the CPU 120 and the special purpose processors (e.g., the GPU 122 and/or the neural network accelerator 180) to execute the neural network model with increased performance as compare with using only the CPU 120, (see Lo: Para. 0042-0062 and 0129-0137). This reads on the claim concepts of wherein the processor is any of: an image processor; a central processing unit; and a graphics processing unit).
Regarding dependent claim(s) 6, Lo disclose the processor arranged to compress neural network activation data according to claim 5. Lo further discloses wherein the output module is arranged to output the compressed neural network activation data to at least one neural processing unit (Lo discloses the neural network accelerator 180 can include a tensor processing unit 182, reconfigurable logic devices 184, and/or one or more neural processing cores (such as the quantization accelerator 186). The quantization accelerator 186 can be configured in hardware, software, or a combination of hardware and software. The quantization accelerator 186 can access a local memory used for storing weights, biases, input values, output values, forget values, state values, and so forth (the activation values can be later retrieved), (see Lo: Para. 0040-0045). This reads on the claim concepts of discloses wherein the output module is arranged to output the compressed neural network activation data to at least one neural processing unit).
Regarding dependent claim(s) 8, Lo disclose the processor arranged to compress neural network activation data according to claim 1. Lo further discloses further comprising a plurality of metadata generation modules, selection modules and compression modules, and the wherein output module is arranged to combine the compressed neural network activation data of each of the compression modules into a single compressed output (Lo discloses the neural network data 200 can include source code, executable code, metadata, configuration data, data structures and/or files for representing the neural network model. The compiler 132 can generate metadata that can be used to identify subgraphs, edge groupings, training data, and various other information about the neural network model during runtime (for example, portions of the neural network model). A plurality of layers of a neural network that can be executed on the neural network, (see Lo: Para. 0045-0052 and 0071-0087). This reads on the claim concepts of a plurality of metadata generation modules. The compressed activation values are expressed in a second block floating-point format that can differ from a first block floating-point format used to perform forward propagation calculations and at least one of the following ways: having a different mantissa format, having a different exponent format, or having a different exponent sharing scheme. The values can be further compressed by, for example using entropy compression, or another suitable compression scheme. The first activation values are converted to a normal precision floating-point format prior to be converted to the second block floating-point format. The neural network data 200 can include source code, executable code, metadata, configuration data, data structures and/or files for representing the neural network model. The neural network module 130 can further provide utilities to allow for training and retraining of a neural network implemented with the module. A single exponent 524 and the spatial pixel values 526 share a single exponent 528. For example, the functions of the different tools (131, 132, and 133) can be combined into a single modelling and execution environment. A first set 210 of nodes (including nodes 215 and 216) form an input layer. Each node of the set 210 is connected to each node in a first hidden layer formed from a second set 220 of nodes (including nodes 225 and 226). A second hidden layer is formed from a third set 230 of nodes, including node 235. An output layer is formed from a fourth set 240 of nodes (including node 245), (see Lo: Para. 0030-0048, 0062-0065, 0080-0088, 0091- 0097, 0107-0115, 0122, 0139 and FIG. 2-8). This reads on the claim concepts of selection modules and compression modules, and the wherein output module is arranged to combine the compressed neural network activation data of each of the compression modules into a single compressed output).              
Regarding dependent claim(s) 9, Lo disclose the processor arranged to compress neural
network activation data according to claim 1. Lo further discloses wherein the output module comprises
a memory arranged to store the compressed output associated with each block, and wherein the output
module is arranged to combine the compressed neural network activation data associated with each of
the plurality of blocks (Lo discloses storing activation values from a neural network in a compressed format for use during forward and backward propagation training of the neural network. The compressed activation values are stored in the bulk memory for use during backward propagation. NN
weights and activation values can be represented in a lower-precision quantized format with an
acceptable level of error introduced. Neural network training, such as temporary storage of activation
values, can be improved by compressing a portion of these values (e.g., for an input, hidden, or output layer of a neural network). The quantization accelerator 186 can access a local memory used for storing weights, biases, input values, output values, forget values, state values, and so forth. The
quantization accelerator 186 can have many inputs, where each input can be weighted by a different
weight value. For example, input tensors for a neural network represented as normal floating- point
numbers (for example, in a 32-bit or 16-bit floating point format) can be converted to the illustrated
block floating-point format. The block floating point format numbers, there is one exponent value
that is shared by all of the numbers of the illustrated set. three alternative block floating-point
formats, as can be used in certain examples of the disclosed technology. These formats may be useful
for two-dimensional convolutions, but the formats can be generalized to higher-dimensional
convolutions as well. The neural network module 130 can further provide utilities to allow for training
and retraining of a neural network implemented with the module {plurality of layers of a neural
network). The neural network can be compressed and stored in memory, (see Lo: Para. 0026-0040,
0041-0045, 0073-0080, 0085-0095 and FIG. 2-8). This reads on the claim concepts of wherein the
output module comprises a memory arranged to store the compressed output associated with each
block, and wherein the output module is arranged to combine the compressed neural network
activation data associated with each of the plurality of blocks). 
Regarding independent claim(s) 10, Lo discloses a method for compressing neural network activation data, the method comprising the steps of: obtaining uncompressed neural network activation data from memory; splitting the uncompressed neural network activation data into a plurality of blocks (Lo discloses the computing system may also perform a gradient operation with the uncompressed activation values and perform a gradient operation with the uncompressed activation values (layers of a neural network/split). A plurality of layers of a neural network. For example, the quantization accelerator 186 can be programmed to perform operations for all or a portion of a layer of a NN, (see Lo: Para. 0040-0045, 0050-0051, 0070 and 0137). For example, once activation values for a layer have been stored of the bulk memory, forward propagation can continue for a number of different layers in the neural network. Neural network training, such as temporary storage of activation values, can be improved by compressing a portion of these values (e.g., for an input, hidden, or output layer of a neural network). The quantization accelerator 186 can access a local memory used for storing weights, biases, input values, output values, forget values, state values, and so forth. The quantization accelerator 186 can have many inputs, where each input can be weighted by a different weight value. For example, input tensors for a neural network represented as normal floating- point numbers (for example, in a 32-bit or 16-bit floating point format) can be converted to the illustrated block floating point format. The block floating point format numbers, there is one exponent value that is shared by all of the numbers of the illustrated set. Three alternative block floating-point formats, as can be used in certain examples of the disclosed technology. These formats may be useful for two-dimensional convolutions, but the formats can be generalized to higher-dimensional convolutions as well., (see Lo: Para. 0026-0040, 0041-0045, 0073-0080, 0085-0095 and FIG. 2-8). This reads on the claim concepts of a method for compressing neural network activation data, the method comprising the steps of: obtaining uncompressed neural network activation data from memory; splitting the uncompressed neural network activation data into a plurality of blocks);
	generating metadata for at least one of the plurality of blocks (Lo discloses the neural network data 200 can include source code, executable code, metadata, configuration data, data structures and/or files for representing the neural network model. The compiler 132 can generate metadata that can be used to identify subgraphs, edge groupings, training data, and various other information about the neural network model during runtime (for example, portions of the neural network model). A plurality of layers of a neural network that can be executed on the neural network, (see Lo: Para. 0045-0052 and 0071-0087). This reads on the claim concepts of generating metadata for at least one of the plurality of blocks);
	selecting a compression scheme for each of the plurality of blocks, wherein the compression scheme is based on the metadata associated with the block; applying the selected compression schemes to the corresponding block, to produce compressed neural network activation data (Lo discloses the compressed activation values are expressed in a second block floating-point format that can differ from a first block floating-point format used to perform forward propagation calculations and at least one of the following ways: having a different mantissa format, having a different exponent format, or having a different exponent sharing scheme. The values can be further compressed by, for example using entropy compression, or another suitable compression scheme. The first activation values are converted to a normal precision floating-point format prior to be converted to the second block floating-point format. The neural network data 200 can include source code, executable code, metadata, configuration data, data structures and/or files for representing the neural network model. The neural network module 130 can further provide utilities to allow for training and retraining of a neural network implemented with the module (plurality of layers of a neural network), (see Lo: Para. 0042-0045, 0091-0097, 0107-0115, 0122, 0139 and FIG. 2-8). This reads on the claim concepts of selecting a compression scheme for each of the plurality of blocks, wherein the compression scheme is based on the metadata associated with the block; applying the selected compression schemes to the corresponding block, to produce compressed neural network activation data); and 
	outputting the compressed neural network activation data (Lo discloses the output value can be accessed and sent to a different NN processor core and/or to the neural network module. The training data includes a set of input data for applying to the neural network model 200 and a desired output from the neural network model for each respective dataset of the input data. The modelling framework 131 can be used to train the neural network model with the training data. An output of the training is the weights and biases that are associated with each node of the neural network model. An output by applying a weight to each input generated from the preceding node and collecting the weights to produce an output value. In some examples, each individual node can have an activation function (e.g., for an input, hidden, or output layer of a neural network), (see Lo: Para. 0041-0051, 0065-0070 and 0090-0095 and 0137). This reads on the claim concepts of outputting the compressed neural network activation data).    
Regarding dependent claim(s) 12, Lo disclose the method for compressing neural network
activation data according to claim 10. Lo further discloses wherein each block is an 8 x 8 block of data
(Lo discloses a 16-bit floating-point format, a 32-bit floating-point format, a 64-bit floating-point (8 x 8
block of data) format, or an 80-bit floating-point format) can be converted to quantized-precision
format numbers may allow for performance benefits in performing operations, (see Lo: Para. 0026-
0037 and 0055). This reads on the claim competes of wherein each block is an 8 x 8 block of data). 
Regarding independent claim(s) 15, Lo discloses a system for compressing neural network
activation data, the system comprising a processor arranged to compress neural network activation data
according to claim 1 (Lo discloses storing activation values from a neural network in a compressed
format for use during forward and backward propagation training of the neural network. Neural
network training, such as temporary storage of activation values, can be improved by compressing a
portion of these values (e.g., for an input, hidden, or output layer of a neural network). The
quantization accelerator 186 can access a local memory used for storing weights, biases, input values,
output values, forget values, state values, and so forth. The block floating point format numbers,
there is one exponent value that is shared by all of the numbers of the illustrated set. The computing
system may also include a compressor that is further configured to further compress the compressed
activation values. (see Lo: Para. 0026-0040, 0041-0045, 0073-0080, 0085-0095, 0137 and FIG. 2-8). This
reads on the claim concepts of a system for compressing neural network activation data, the system
comprising a processor arranged to compress neural network activation data according to claim 1).
	Regarding dependent claim(s) 16, Lo disclose the system for compressing neural network
activation data according to claim 15. Lo further discloses wherein the processor is a neural processing
unit (Lo discloses the neural network accelerator 180 can include a tensor processing unit 182, reconfigurable logic devices 184, and/or one or more neural processing cores {such as the quantization accelerator 186). models are typically executed on a general-purpose processor (also
referred to as a central processing unit (CPU)). The processing unit 1110 executes computer executable instructions and may be a real or a virtual processor, (see Lo: Para. 0044-0059 and 0129-
0137). This reads on the claim concepts of wherein the processor is a neural processing unit). 
	Regarding dependent claim(s) 17, Lo disclose the system for compressing neural network
activation data according to claim 15. Lo further discloses wherein the processor is any of an image
processor; a central processing unit; and a graphics processing unit, and wherein the system further
comprises a neural processing unit for receiving compressed neural network activation data output by
the processor (Lo discloses deep neural network {DNN) 200 that can be used to perform enhanced
image processing using disclosed BFP implementations. The modelling framework 131 can use the
CPU 120 and the special purpose processors (e.g., the GPU 122 and/or the neural network accelerator
180) to execute the neural network model with increased performance as compare with using only
the CPU 120. The neural network module 130 with a message including an identifier of a receiving
node at the server and a payload that includes values such as weights, biases, and/or tensors that are
sent back to the overall neural network model. The dequantizer 724 can receive the activation values
from local memory used to implement the quantized layer 710. The system can receive the activation
values and use additional functions, such as an error function or an objective function, to calculate the
output and including multi-processor systems, (see Lo: Para. 0042-0062, 0092-0110 and 0128-0137).
This reads on the claim concepts of wherein the processor is any of an image processor; a central
processing unit; and a graphics processing unit, and wherein the system further comprises a neural
processing unit for receiving compressed neural network activation data output by the processor). 
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 2 and 11 are rejected under 35 U.S.C. 103 as being unpatentable over Lo et al. (US 2020/0210838 A1, hereinafter Lo), in view of Desai et al. (US 2019/0041961 A1, hereinafter Desai). 
Regarding dependent claim(s) 2, Lo disclose the processor arranged to compress neural network activation data according to claim 1. However, Lo does not appear to specifically disclose wherein the metadata generation module is arranged to calculate at least one of: a zero-counter arranged to calculate a number of zero point values of the block of neural network activation data; a unique non-zero counter arranged to calculate a number of unique non-zero point values of the block of neural network activation data; and a non-zero counter arranged to calculate a number of non-zero point values of the block of neural network activation data. 
In the same field of endeavor, Desai discloses wherein the metadata generation module is arranged to calculate at least one of: a zero-counter arranged to calculate a number of zero-point values of the block of neural network activation data (Desai discloses the hidden layer transforms input received by the input layer into a representation that is useful for generating output in the output layer (neural network). The metadata (auxiliary surface) indicating a zero-value on all activation nodes. A processor 2210 to compare activation node metadata values to the output of activation. A neural network by updating metadata to indicate zero values, the neural network including a plurality of layers, and wherein the apparatus is to compare outputs for the neural network to the metadata values, (see Desai: Para. 0161-0170, 0202, 0220-0229, 0232 and FIG. 16-21). This reads on the claim concepts of wherein the metadata generation module is arranged to calculate at least one of: a zero-counter arranged to calculate a number of zero-point values of the block of neural network activation data);
a unique non-zero counter arranged to calculate a number of unique non-zero point values of the block of neural network activation data; and a non-zero counter arranged to calculate a number of non-zero point values of the block of neural network activation data (Desai discloses a processor, comparing activation node metadata values to the output of activation, with the value being written to memory only if the value is non-zero (non-zero point values). Utilizing a Write if Not Zero {unique non-zero) to limit writing to non-zero values. The comparator 2125 is to compare values and is only to write non-zero values back to memory 2150. The neural network including a plurality of layers {block of neural network); comparing outputs for the neural network to the metadata values and the neural network to the metadata values and to write an output to memory only if the output is non-zero. A CNN layer 2305, activations 2310 and weighs 2350 are utilized in calculations, implementing matrix multiplication. Also shown is the receipt of bias 0 values 2320 and the generation of the bias 1 values (activations of blocks to be zero or low values), (see Desai: Para. 0204-0214, 0216-0224, 0229-0242 and 0255-0264). This reads on the claim concepts of a unique non-zero counter arranged to calculate a number of unique non-zero point values of the block of neural network activation data; and a nonzero counter arranged to calculate a number of non-zero point values of the block of neural network activation data).
Accordingly, it would have been obvious to a person of ordinarily skill in the art before the effective filing date of the claimed invention to modify the neural network activation compression of Lo in order to have incorporated the unique non-zero point values and non-zero point values, as disclosed by Desai, since both of these mechanisms are directed to a neural network defines how the weighted sum of the input is transformed into an output from a node or nodes in a layer of the network. The choice of activation function has a large impact on the capability and performance of the neural network, and different activation functions may be used in different parts of the model. The activation function is used within or after the internal processing of each node in the network, although networks are designed to use the same activation function for all nodes in a layer. A network may have three types of layers: input layers that take raw input from the domain, hidden layers that take input from another layer and pass output to another layer, and output layers that make a prediction. All hidden layers typically use the same activation function. The output layer will typically use a different activation function from the hidden layers and is dependent upon the type of prediction required by the model. A hidden layer in a neural network is a layer that receives input from another layer (such as another hidden layer or an input layer) and provides output to another layer (such as another hidden layer or an output layer). A neural network may have zero or more hidden layers. A neural network in a compressed format for use during forward and backward propagation training of the neural network. The compressed activation values are stored in the bulk memory for use during backward propagation. Compressing or reducing in size and/or latency means the model has fewer and smaller parameters and requires lesser RAM. compression methods (both classical and learned codecs), with the original data between them. They are asked to pick the data that is closer to the original. Split this data into training and test sets and trained a network to predict which of each pair of reconstructed data human annotators preferred. These compressed data for other tasks, such as instance segmentation (finding the boundaries of objects) and object recognition. Performance and power consumption of neural networks becomes more crucial in compact devices that have constraints in processing capability and power. Neural networks are applied in processing in numerous technologies, with the areas of applications continuing to grow. Incorporating the teachings of Desai into Lo would produce a neural network processing to perform a fast clear operation to initialize activation buffers for a neural network by updating metadata to indicate zero values, the neural network including a plurality of layers, wherein the apparatus is to compare outputs for the neural network to the metadata values and to write an output to memory only if the output is non-zero, as disclosed by Desai, (see Abstract).
Regarding claim 11, (drawn method): claim 11 is method claims respectively that correspond to processor of claim 2. Therefore, 11 is rejected for at least the same reasons as the processor of 2. 
Claims 7 and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Lo et al. (US 2020/0210838 A1, hereinafter Lo), in view of Fishel et al. (US 2019/0340488 A1, hereinafter Fishel). 
Regarding dependent claim(s) 7, Lo disclose the processor arranged to compress neural network activation data according to claim 1. However, Lo does not appear to specifically disclose wherein the compression module comprises at least one of: a masking compression unit for applying a masking compression technique; a look-up table compression unit for applying a look-up table compression technique; a value packing compression unit for applying a value packing compression technique; and a position packing compression unit for applying a position packing compression technique.
In the same field of endeavor, Fishel discloses wherein the compression module comprises at least one of: a masking compression unit for applying a masking compression technique; a look-up table compression unit for applying a look-up table compression technique (Fishel discloses the compression scheme such as representative values in the look up table, and patterns of zero or non-zero coefficients to be represented by masks may be determined by a compiler during a compilation process of a neural network into a series of tasks for execution by the neural processor circuit. The neural network operations using kernel data that is decompressed from compressed kernel data received from a memory external to the neural processor circuit. A mask to reconstruct a kernel from compressed kernel data. each mask may represent a specific number of kernel coefficients. Compressed Kernel Data with a Look-Up Table (LUT) to represent actual values in a look-up table corresponding to a representative kernel coefficient, (see Fishel: Para. 0022-0048, 0055-0084 and 0087-0094). This reads on the claim concepts of wherein the compression module comprises at least one of: a masking compression unit for applying a masking compression technique; a look-up table compression unit for applying a look-up table compression technique);
a value packing compression unit for applying a value packing compression technique; and a position packing compression unit for applying a position packing compression technique (Fishel discloses the input value and the corresponding kernel coefficient are multiplied in each of MAD circuits to generate a processed value 412. the original kernel of -0.044 can be compressed to index value 0 of the LUT corresponding to representative kernel coefficient -0.040. The number and locations of non-zero coefficients can be represented by a mask. The mask indicates positions of the sparse kernel with a coefficient that is zero, (see Fishel: Para. 0085-0088 and 0090-0095). This reads on the claim concept of a value packing compression unit for applying a value packing compression technique; and a position packing compression unit for applying a position packing compression technique). 
Accordingly, it would have been obvious to a person of ordinarily skill in the art before the effective filing date of the claimed invention to modify the neural network activation compression of Lo in order to have incorporated the look-up table and masking compression technique, as disclosed by Fishel, since both of these mechanisms are directed to the types of input data and operations to be performed, these machine learning systems or models can be configured differently. Such varying configuration would include, for example, pre-processing operations, number of channels in input data, kernel data to be used, non-linear function to be applied to convolution result, and applying of various post processing operations. Using a central processing unit (CPU) and its main memory to instantiate and execute machine learning systems or models of various configuration is relatively easy because such systems or models can be instantiated with mere updates to code. However, relying solely on the CPU for various operations of these machine learning systems or models would consume significant bandwidth of a central processing unit (CPU) as well as increase the overall power consumption. A class of machine learning technique that primarily uses convolution between input data and kernel data, which can be decomposed into multiplication and accumulation operations. Masking is a way to tell sequence-processing layers that certain timesteps in an input are missing, and thus should be skipped when processing the data. A mask is a binary image consisting of zero- and non-zero values. If a mask is applied to another binary or to a grayscale image of the same size, all pixels which are zero in the mask are set to zero in the output image. A lookup table, also known as a LUT, is an array that holds values which would otherwise need to be calculated. The table may be manually populated when the program is written, or the program may populate the table with values as it calculates them. An artificial neural network (ANN) is a computing system or model that uses a collection of connected nodes to process input data. The ANN is typically organized into layers where different layers perform different types of transformation on their input. Extensions or variants of ANN such as convolution neural network (CNN), recurrent neural networks (RNN) and deep belief networks (DBN) have come to receive much attention. Incorporating the teachings of Fishel into Lo would produce a neural processor circuit that includes a kernel access circuit and multiple neural engine circuits. The kernel access circuit reads compressed kernel data from memory external to the neural processor circuit. Each neural engine circuit receives compressed kernel data from the kernel access circuit, as disclosed by Fishel, (see Abstract). 
Regarding claim 14, (drawn method): claim 14 is method claims respectively that correspond to processor of claim 7. Therefore, 14 is rejected for at least the same reasons as the processor of 7. 
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action.
                                                              Contact Information
Any inquiry concerning this communication or earlier communications from the examiner should be directed to YOHANES Demiss KELEMEWORK whose telephone number is (571)272-8772. The examiner can normally be reached Monday-Friday 8:00 am-5:00 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ashish Thomas can be reached on 571-272-0631. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. 
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/YOHANES D KELEMEWORK/               Examiner, Art Unit 2164                                                                                                                                                                                         

/ASHISH THOMAS/               Supervisory Patent Examiner, Art Unit 2164