Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Status of Claims
This action is in response to the amendments filed 07/22/2022. Claims 1, 3-9, 11-17, 19 and 20 have been amended, claims 2, 10, and 18 have been cancelled. Claims 1, 3-9, 11-17, 19 and 20 are currently pending.

Response to Arguments
Claims 2, 10, and 18 have been cancelled, therefore the rejections of claims 2, 10, and 18 no longer stand.
The terminal disclaimer filed on 07/22/2022 disclaiming the terminal portion of any patent granted on this application which would extend beyond the expiration date of US 20200234130 A1 has been reviewed and is accepted.  The terminal disclaimer has been recorded.
Applicant’s arguments regarding the 101 rejection have been fully considered but they are not persuasive. Applicant argues on page 10 that the claimed subject matter provides an improvement in the functioning of a computing-type device in that it reduced the memory requirements associated with a neural network. Examiner notes that the use of encoding to reduce memory requirements is a well understood technology, as shown by Kodama (US 20130077881 A1), which teaches in paragraph [0003] that the “technology of encoding data to be stored in a memory in a predetermined coding unit (for example, a block size unit) is well known in order to reduce a use amount of the memory”. One of ordinary skill would recognize that the neural network in this claim is merely the field of use or technological environment in which the well-known process of encoding data is occurring in order to reduce memory – see MPEP 2106.05(h). Applicant further describes a three part process on page 10: sparsification, quantization, and entropy coding. These steps are not claimed in such a way that precludes them from being interpreted as mathematical calculations used to reorganize and store data.  The 101 rejections have been updated to include the amended limitations and to clarify the reasoning given for the limitations that were not amended.
Applicant’s arguments regarding the prior art rejection have been fully considered but are moot because of the new grounds of rejection. Applicant argues on page 12 that none of the previously cited references teach a lossless compression mode selected from a group including Sparse-Exponential-Golomb encoding, Sparse-Exponential-Golomb-RemoveMin encoding, and Sparse fixed length encoding. The Marpe reference was previously relied upon to teach a fixed length encoding method, the Marpe reference has been replaced by Yang et al (“Supervised Translation-Invariant Sparse Coding”) to teach a sparse fixed length encoding technique. The prior art rejections have been updated to include the amended limitations and to clarify the reasoning given for the limitations that were not amended.

Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

This application includes one or more claim limitations that use the word “means” or “step” but are nonetheless not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph because the claim limitations recites sufficient structure, materials, or acts to entirely perform the recited function.  Such claim limitations are: 
“…a processor configured to: sparsify a number of non-zero values of the activation map…” in claim 1.
“…wherein the processor is further configured to encode the at least one block of values…” in claim 4.
“…wherein the processor is further configured to output the at least one block of values…” in claim 6.
“…wherein the processor is further configured to decode the at least one block of values…” in claim 7.
“…wherein the processor is further configured to quantize the floating-point values…” in claim 8.
Because this/these claim limitations are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, they are not being interpreted to cover only the corresponding structure, material, or acts described in the specification as performing the claimed function, and equivalents thereof.
If applicant intends to have these limitations interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to remove the structure, materials, or acts that performs the claimed function; or (2) present a sufficient showing that the claim limitation(s) does/do not recite sufficient structure, materials, or acts to perform the claimed function.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 1, 3-9, 11-17, and 19-20 are rejected under 35 U.S.C. 101. Claims 1 and 3-8 are directed to a system, claims 9 and 11-16 are directed to a method, and claims 17 and 19-20 are directed to a separate method from claim 9 and its dependent claims; therefore, claims 1, 3-9, 11-17, and 19-20 fall within one of the four statutory categories (i.e., process, machine, manufacture, or composition of matter). However, claims 1, 3-9, 11-17, and 19-20 fall within the judicial exception of an abstract idea, specifically the abstract ideas of “Mental Processes” (including observation, evaluation, and opinion) and “Mathematical Concepts (including mathematical calculations and relationships)”.
Claim 1: 
Step 1: Claim 1 is directed to an apparatus; therefore, falls into one of the four statutory categories.
Step 2A, Prong 1: Claim 1 recites the following abstract ideas:
sparsify a number of non-zero values of the activation map (sparsifying a number of non-zero values is interpreted as a mathematical process of adjusting numbers to zero through a modified regularization equation);
configure the activation map as a tensor having a tensor size of H x W x C in which H represents a height of the tensor, W represents a width of the tensor, and C represents a number of channels of the tensor (configuring a tensor to be a certain size is interpreted as a mathematical relationship; storing the sets of values in a tensor could also be interpreted as storing data in memory – see MPEP 2106.05(d)(II));
format the tensor into at least one block of values (formatting a tensor is interpreted as a mathematical process of organizing data for encoding – see MPEP 2106.04(a)(2)(A)(ii));
and encode the at least one block of values independently from other blocks of values of the tensor using at least one lossless compression mode selected from a group including Sparse-Exponential-Golomb encoding, Sparse-Exponential-Golomb-RemoveMin encoding, and Sparse fixed length encoding to reduce a memory size used by the neural network (encoding data is interpreted as a mathematical process – see MPEP 2106.04(a)(2)(A)(ii); compressing blocks of data independently and selecting a specific lossless compression mode are interpreted as selecting a particular data source or type of data to be manipulated – see MPEP 2106.05(g). For purposes of compact prosecution, Examiner notes that Kodama (US 20130077881 A1) teaches in paragraph [0003] that the “technology of encoding data to be stored in a memory in a predetermined coding unit (for example, a block size unit) is well known in order to reduce a use amount of the memory”. One of ordinary skill would recognize that the neural network in this claim is merely the field of use or technological environment in which the well-known process of encoding data is occurring in order to reduce memory – see MPEP 2106.05(h)).
Step 2A, Prong 2: Claim 1 recites the following additional elements: using a processor. The processor is recited at a high degree of generality and is interpreted as a generic computer component, using a processor is interpreted as mere instructions for implementing the abstract idea on a computer and does not integrate the judicial exception into a practical application.
Step 2B: Claim 1 recites the following additional elements: using a processor. The processor is recited at a high degree of generality and is interpreted as a generic computer component, using a processor is interpreted as mere instructions for implementing the abstract idea on a computer and does not amount to significantly more than the judicial exception (see MPEP 2106.05(d) and MPEP 2106.05(f)).

Claim 9 is a method claim and its limitation is included in claim 1. The only difference is that claim 9 requires a method. Therefore, claim 9 is rejected for the same reasons as claim 1.

Claim 17:
Step 1: Claim 17 is directed to a method, therefore, the claim falls into one of the four statutory categories.
Step 2A, Prong 1: Claim 17 recites the following abstract ideas:
decompressing a compressed block of values of a bitstream representing values of the sparsified activation map to form at least one decompressed block of values, the decompressed block of values being independently decompressed from other blocks of the activation map using at least one decompression mode corresponding to at least one lossless compression mode used to compress the at least one block of values selected from a group including Sparse-Exponential-Golomb encoding, Sparse-Exponential-Golomb-RemoveMin encoding, and Sparse fixed length encoding to reduce a memory size used by the neural network (decompressing data is interpreted as a mathematical process – see MPEP 2106.04(a)(2)(A)(ii); decompressing blocks of data independently and selecting a specific lossless compression mode are interpreted as selecting a particular data source or type of data to be manipulated – see MPEP 2106.05(g). For purposes of compact prosecution, Examiner notes that Kodama (US 20130077881 A1) teaches in paragraph [0003] that the “technology of encoding data to be stored in a memory in a predetermined coding unit (for example, a block size unit) is well known in order to reduce a use amount of the memory”. One of ordinary skill would recognize that the neural network in this claim is merely the field of use or technological environment in which the well-known process of encoding data is occurring in order to reduce memory – see MPEP 2106.05(h));
and deformatting the decompressed block to be part of a tensor having a size of H x W x C in which H represents a height of the tensor, W represents a width of the tensor, and C represents a number of channels of the tensor, the tensor being the decompressed activation map (deformatting data is interpreted as a mathematical process of organizing data – see MPEP 2106.04(a)(2)(A)(ii)).
Step 2A, Prong 2: Claim 17 recites the following additional elements: using a processor. The processor is recited at a high degree of generality and is interpreted as a generic computer component, using a processor is interpreted as mere instructions for implementing the abstract idea on a computer and does not integrate the judicial exception into a practical application.
Step 2B: Claim 17 recites the following additional elements: using a processor. The processor is recited at a high degree of generality and is interpreted as a generic computer component, using a processor is interpreted as mere instructions for implementing the abstract idea on a computer and does not amount to significantly more than the judicial exception. See MPEP 2106.05(f)).
The independent claims are not patent eligible.

Dependent claims 3-8, 11-16, and 19-20 when analyzed as a whole are held to be patent ineligible under 35 U.S.C. 101 because the additional recited limitations fail to establish that the claims are not directed to an abstract idea, as they recite further embellishment of the judicial exception.
Claim 3:
Step 1: Claim 3 is directed to an apparatus, therefore, the claim falls into one of the four statutory categories. 
Step 2A, Prong 1: Claim 3 recites the abstract ideas of claim 1 on which it depends. 
Step 2A, Prong 2: Claim 3 recites the following additional elements: wherein the at least one lossless compression mode selected to encode the at least one block of values is different from a lossless compression mode selected to encode another block of values of the tensor. Selecting different lossless compression modes is interpreted as selecting a particular data source or type of data to be manipulated and does not integrate the abstract idea into a practical application. 
Step 2B: Claim 3 recites the following additional elements: wherein the at least one lossless compression mode selected to encode the at least one block of values is different from a lossless compression mode selected to encode another block of values of the tensor. Selecting different lossless compression modes is interpreted as selecting a particular data source or type of data to be manipulated and does not amount to significantly more than the judicial exception. See MPEP 2106.05(g)).
Claim 4: 
Step 1: claim 4 is directed to an apparatus, therefore, the claim falls into one of the four statutory categories. 
Step 2A, Prong 1: Claim 4 recites the abstract ideas of claim 1 on which it depends. 
Step 2A, Prong 2: Claim 4 recites the following additional elements: using a processor and encode the at least one block of values encoded independently from other blocks of values of the tensor using a plurality of the lossless compression modes. The processor is recited at a high degree of generality and is interpreted as a generic computer component, using a processor is interpreted as mere instructions for implementing the abstract idea on a computer; encoding data is interpreted as a mathematical process – see MPEP 2106.04(a)(2)(A)(ii); encoding blocks of data independently is interpreted as selecting a particular data source or type of data to be manipulated; these elements do not integrate the abstract idea into a practical application.
Step 2B: Claim 4 recites the following additional elements: using a processor and encode the at least one block of values encoded independently from other blocks of values of the tensor using a plurality of the lossless compression modes. The processor is recited at a high degree of generality and is interpreted as a generic computer component, using a processor is interpreted as mere instructions for implementing the abstract idea on a computer; encoding data is interpreted as a mathematical process – see MPEP 2106.04(a)(2)(A)(ii); encoding blocks of data independently is interpreted as selecting a particular data source or type of data to be manipulated; these elements do not amount to significantly more than the judicial exception – see MPEP 2106.05(d) and MPEP2106.05(g)).
Claim 5:
Step 1: Claim 5 is directed to an apparatus, therefore, the claim falls into one of the four statutory categories. 
Step 2A, Prong 1: Claim 5 recites the abstract ideas of claim 1 on which it depends. 
Step 2A, Prong 2: Claim 5 recites the following additional elements: wherein the at least one block of values comprises 48 bits. Further describing the data used in the encoding process is interpreted as selecting a particular data source or type of data to be manipulated and does not integrate the abstract idea into a practical application. 
Step 2B: Claim 5 recites the following additional elements: wherein the at least one block of values comprises 48 bits. Further describing the data used in the encoding process is interpreted as selecting a particular data source or type of data to be manipulated and does not amount to significantly more than the judicial exception. See MPEP 2106.05(g)).
Claim 6:
Step 1: Claim 6 is directed to an apparatus, therefore, the claim falls into one of the four statutory categories. 
Step 2A, Prong 1: Claim 6 recites the abstract ideas of claim 1 on which it depends. 
Step 2A, Prong 2: Claim 6 recites the following additional elements: output at least one block of values encoded as a bitstream. Outputting encoded data is interpreted as sending and receiving data and does not integrate the abstract idea into a practical application. 
Step 2B: Claim 6 recites the following additional elements: output at least one block of values encoded as a bitstream. Outputting encoded data is interpreted as sending and receiving data and does not amount to significantly more than the judicial exception. See MPEP 2106.05(d)(II)).
Claim 7:
Step 1: Claim 7 is directed to an apparatus, therefore, the claim falls into one of the four statutory categories. 
Step 2A, Prong 1: Claim 7 recites the following abstract ideas: decode the at least one block of values independently from other blocks of values of the tensor using at least one decompression mode corresponding to the at least one compression mode used to compress the at least one block of values; (decoding data is interpreted as a mathematical process – see MPEP 2106.04(a)(2)(A)(ii); decoding blocks of data independently is interpreted as selecting a particular data source or type of data to be manipulated – see MPEP 2106.05(g));
and de-format the at least one block into a tensor having the size of H x W x C. (deformatting data is interpreted as a mathematical process of organizing data – see MPEP 2106.04(a)(2)(A)(ii)).
Step 2A, Prong 2: claim 7 recites the following additional elements: using a processor. The processor is recited at a high degree of generality and is interpreted as a generic computer component, using a processor is interpreted as mere instructions for implementing the abstract idea on a computer and does not integrate the abstract idea into a practical application.
Step 2B: Claim 7 recites the following additional elements: using a processor. The processor is recited at a high degree of generality and is interpreted as a generic computer component, using a processor is interpreted as mere instructions for implementing the abstract idea on a computer and does not amount to significantly more than the judicial exception (see MPEP 2106.05(d) and 2106.05(f)).
Claim 8: 
Step 1: claim 8 is directed to an apparatus, therefore, the claim falls into one of the four statutory categories.
Step 2A, Prong 1: Claim 8 recites the following abstract ideas: wherein the processor is further configured to quantize the floating-point values of the activation map to be integer values (quantization is interpreted as a mathematical calculation of process of rounding); 
Step 2A, Prong 2: Claim 8 recites the following additional elements: using a processor and wherein the sparsified activation map includes floating-point values. The processor is interpreted as a generic computer component and the type of values included in the sparsified activation map is interpreted as selecting a particular data source or type of data to be manipulated and does not integrate the abstract idea into a practical application. 
Step 2B: Claim 8 recites the following additional elements: using a processor and wherein the sparsified activation map includes floating-point values. The processor is interpreted as a generic computer component and the type of values included in the sparsified activation map is interpreted as selecting a particular data source or type of data to be manipulated and does not amount to significantly more than the judicial exception – see MPEP 2106.05(d) and MPEP 2106.05(g)).

Claim 11 is a method claim and its limitations are included in claim 3. Claim 11 is rejected for the same reasons as claim 3.
Claim 12 is a method claim and its limitation is included in claim 4. Claim 12 is rejected for the same reasons as claim 4.
Claim 13 is a method claim and its limitation is included in claim 5. Claim 13 is rejected for the same reasons as claim 5. 
Claim 14 is a method claim and its limitation is included in claim 6. Claim 14 is rejected for the same reasons as claim 6.
Claim 15 is a method claim and its limitations are included in claim 7. Claim 15 is rejected for the same reasons as claim 7.
Claim 16 is a method claim and its limitations are included in claim 8. Claim 16 is rejected for the same reasons as claim 8.

Claim 19:
Step 1: Claim 19 is directed to a method; therefore, the claim falls into one of the four statutory categories. 
Step 2A, Prong 1: claim 19 recites the following abstract ideas:
sparsifying, using the processor, a number of non-zero values of the activation map; sparsifying a number of non-zero values is interpreted as a mathematical process of adjusting numbers to zero through a modified regularization equation – see analysis of claim 1);
configuring the activation map as a tensor having a tensor size of H x W x C (configuring a tensor to be a certain size is interpreted as a mathematical relationship; storing the sets of values in a tensor could also be interpreted as storing data in memory – see analysis of claim 1); 
formatting the tensor into at least one block of values (formatting a tensor is interpreted as a mathematical process of organizing data for encoding – see the analysis of claim 1); 
and encoding the at least one block independently from other blocks of the tensor using at least one lossless compression mode (encoding data is interpreted as a mathematical process – see the analysis of claim 1). 
Step 2A, Prong 2: Claim 19 recites the following additional elements: using a processor. The processor is recited at a high degree of generality and is interpreted as a generic computer component, using a processor is interpreted as mere instructions for implementing the abstract idea on a computer and does not integrate the abstract idea into a practical application.
Step 2B: Claim 19 recites the following additional elements: using a processor. The processor is recited at a high degree of generality and is interpreted as a generic computer component, using a processor is interpreted as mere instructions for implementing the abstract idea on a computer and does not amount to significantly more than the judicial exception (see MPEP 2106.05(d) and 2106.05(f)).

Claim 20: 
Step 1: Claim 20 is directed to a method; therefore, the claim falls into one of the four statutory categories. 
Step 2A, Prong 1: Claim 20 recites the abstract ideas from claim 19 on which it depends. 
Step 2A, Prong 2: Claim 20 recites the following additional elements: wherein the at least one lossless compression mode selected to compress the at least one block of values is different from a lossless compression mode selected to compress another block of values of the tensor of the received at least one activation map; and wherein compressing the at least one block further comprises compressing the at least one block independently from other blocks of the tensor of the received at least one activation map using a plurality of the lossless compression modes. Selecting different lossless compression modes for different blocks of values and compressing blocks of values independently are interpreted as selecting a particular data source or type of data to be manipulated; these elements do not integrate the abstract idea into a practical application.  
Step 2B: Claim 20 recites the following additional elements: wherein the at least one lossless compression mode selected to compress the at least one block of values is different from a lossless compression mode selected to compress another block of values of the tensor of the received at least one activation map; and wherein compressing the at least one block further comprises compressing the at least one block independently from other blocks of the tensor of the received at least one activation map using a plurality of the lossless compression modes. Selecting different lossless compression modes for different blocks of values and compressing blocks of values independently are interpreted as selecting a particular data source or type of data to be manipulated; these elements do not amount to significantly more than the judicial exception (see MPEP 2106.05(a)(II), example vii and MPEP 2106.05(g)).
Viewed as a whole, these additional claim elements do not provide meaningful limitations to transform the abstract idea into a patent eligible application of the abstract idea such that the claims amount to significantly more than the abstract idea itself. Therefore, the claims are rejected under 35 U.S.C. 101 as being directed to non-statutory subject matter.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1, 3-6, 9, and 11-14 are rejected under 35 U.S.C. 103 as being unpatentable over Choi et al ("Near-Lossless Deep Feature Compression for Collaborative Intelligence”, herein Choi), in view  of Yan et al (US 20200234130 A1, herein Yan), in further view of Yang et al ("Supervised Translation-Invariant Sparse Coding", herein Yang).
Regarding claim 1, Choi teaches a system to compress an activation map of a layer of a neural network, the system comprising: (the abstract recites "Collaborative intelligence is a new paradigm for efficient deployment of deep neural networks across the mobile-cloud infrastructure. By dividing the network between the mobile and the cloud, it is possible to distribute the computational workload such that the overall energy and/or latency of the system is minimized. However, this necessitates sending deep feature data from the mobile to the cloud in order to perform inference. In this work, we examine the differences between the deep feature data and natural image data, and propose a simple and effective near¬ lossless deep feature compressor. The proposed method achieves up to 5% bit rate reduction compared to HEVClntra and even more against other popular image codecs. Finally, we suggest an approach for reconstructing the input image from compressed deep features that could serve to supplement the inference performed by the deep model");
configure the activation map as a tensor having a tensor size of H x W x C in which H represents a height of the tensor, W represents a width of the tensor, and C represents a number of channels of the tensor (section II recites "Feature values are typically quantized using an n-bit uniform quantizer (Q-layer in Fig. 1) prior to lossless [3] or lossy [4] compression.
Ṽ = round ((V - min(V)) / max(V) - min(V)) • (2n - 1)
where V ϵ RNxMxc is the feature tensor with N rows, M columns, and C channels at the point of split, Vis the quantized feature tensor, and min(V) and max(V) are the minimum and maximum value in V, respectively. In the studies performed so far [3], [4], [7], this uniform n-bit quantization was shown to have negligible effect on image classification and object detection accuracy, for n 6. For this reason, when such uniform quantizer is followed up by a lossless encoder, we refer to the resulting approach as near-lossless compression. In this work, the Q-layer performs uniform 8-bit quantization. Note that min(V) and max(V) need to be transferred to the cloud for the inverse Q process. The quantized features Ṽ are rearranged in a tiled image, as shown in Fig. 2" (i.e. H x W x C and N x M x C both represent three dimensions of a tensor. H and N are analogous as rows and height. W and M are analogous as width and columns. C is used to represent channels in both expressions));
format the tensor into at least one block of values (section II recites "Feature values are typically quantized using an n-bit uniform quantizer (Q-layer in Fig. 1) prior to lossless [3] or lossy [4] compression.
Ṽ = round ((V - min(V)) / max(V) - min(V)) · (2n - 1)
where V e RNxMxc is the feature tensor with N rows, M columns, and C channels at the point of split, Ṽ is the quantized feature tensor, and min(V) and max(V) are the minimum and maximum value in V, respectively. In the studies performed so far [3], [4], [7], this uniform n-bit quantization was shown to have negligible effect on image classification and object detection accuracy, for n  6. For this reason, when such uniform quantizer is followed up by a lossless encoder, we refer to the resulting approach as near-lossless compression. In this work, the Q-layer performs uniform 8-bit quantization.  Note that min(V) and max(V) need to be transferred to the cloud for the inverse Q process. The quantized features Ṽ are rearranged in a tiled image, as shown in Fig. 2” (i.e. rearranging features Ṽ into tiled images is a process of formatting the tensor into at least one block of values));
and encode the at least one block of values independently from other blocks of values of the tensor using at least one lossless compression mode to reduce a memory size used by the neural network (section III recites "Before coding the quantized feature data, the following parameters are encoded directly using fixed-length coding: dimensions of the feature tensor, min(V) and max(V) (32-bit each) and the eight most frequent feature values, mi for i = 0, 1, ...7. The set of {mi} is obtained over the entire quantized feature tensor. A vector of these values, p = (p0, p1, ..., p7), is referred to as the palette vector."; "initially, the palette vector is sorted according to the frequency of these values in the first tile, so that p0 is the most frequent of the mi's in the first tile, p1 is the next most frequent, etc. As we move to other tiles, the palette vector p = (p0, p1, ..., p7) is re-sorted according to the frequency of occurrence of mi's up to the previously coded tile, so that p0 is the most frequent mi up to that point, and so on. At the tile boundary, once p is updated, one element of p is chosen to minimize the mean absolute difference (MAD) from the feature values in the to-be-coded tile.";" The most frequently used mode among them is considered the mpm. If the current block's mode is the same as  mpm,  bit  1  is  coded  by  CABAC  to  indicate  it.  Otherwise, bit 0 is coded, followed by two bits to indicate the model").
However, Choi does not explicitly teach a processor configured to sparsify a number of non-zero values of the activation map.
Yan teaches a processor (para. [0043] recites "In some embodiments, the one or more processors 102 each include one or more processor cores 107 to process instructions which, when executed, perform operations for system and user software");
and to sparsify a number of non-zero values of the activation map (para. [0248] recites "Further, as discussed with reference to FIG. 21, this technique provides for a learning scheme, such as where sparse regularization is added to and at channel-level that is suitable for both training new neural network and/or fine-tuning existing neural networks, while reducing neural network parameters, runtime memory consumption and demand, FLOPs, etc." (i.e. sparse regularization reduces neural network parameters, including values of the activation map. Para. [0027] of Applicant's specification indicates the step of sparsifying is similarly achieved through regularization, "The activation maps 106 of the neural network 105 are sparsified by the sparsifier 107 to form sparsified activation maps 111 that have an increased number of values that are equal to zero so that the lossless compression performed by the encoder 110 will be more effective. The sparsifier stage 107 fine tunes a pre-trained neural network using an additional regularization in a cost function. Typically, when neural networks are trained, a cost function L(w) is minimized with respect to the weights w. The cost function L(w) contains two terms: a data term and a regularization term. The data term is usually a cross-entropy loss, while the regularization term is typically an L2 norm on the network weights. During fine-tuning of the pre-trained network, the cost function L(w) is modified by adding a new regularization term")).
It would have been obvious to an artisan of ordinary skill before the filing date of the claimed invention to combine the system of Choi with the processor programmed to initiate executable operations comprising: sparsifying, using the processor, a number of non-zero values of the activation map in order to facilitate slimming of neural networks. (Yan, Abstract)
However, the combination of Choi and Yan does not teach a lossless compression mode selected from a group including Sparse-Exponential-Golomb encoding, Sparse-Exponential-Golomb-RemoveMin encoding, and Sparse fixed length encoding.
Yang teaches a lossless compression mode selected from a group including Sparse-Exponential-Golomb encoding, Sparse-Exponential-Golomb-RemoveMin encoding, and Sparse fixed length encoding (section 3.1 recites “In our hierarchical model, an image is represented by a local descriptor set X = [x1, x2 . . . , xN] ∈ Rd×N, where Xi denotes the i-th local descriptor of the image in column vector. Suppose we are given a dictionary B ∈ Rd×K that can sparsely represent these local descriptors, where K is the size of the dictionary and is typically greater than Zd. The sparse representations of a descriptor set are computed as

    PNG
    media_image1.png
    45
    309
    media_image1.png
    Greyscale

where Ẑ ∈ RK×N contains the sparse representations in columns for the descriptors in X. In order for classification, where we need fixed length feature vectors, we define the image level feature over the sparse representation matrix Ẑ by max pooling β = ξmax (Ẑ), where ξmax is defined on each row of Z^∈RK×N, returning a vector β∈RK with the i-th element being βi=max{|Ẑi1|,|Ẑi2|, . . ., |ẐiN|}” (i.e. sparse fixed length coding of feature vectors)).
It would have been obvious to an artisan of ordinary skill before the effective filing date of the claimed invention to improve the known system of Choi (as modified by Yan) with the known sparse fixed length encoding method from Yang. As Choi and Yang are both directed to improving neural network image analysis systems, one of ordinary skill in the art would benefit from using Yang’s encoding method to improve the performance of Choi’s lossless compression system (Yang, Abstract)
With respect to claim 3, the combination of Choi, Yan, and Yang teaches the system of claim 1, and Choi also teaches wherein the at least one lossless compression mode selected to encode the at least one block is different from a lossless compression mode selected to encode another block of the tensor (Choi section III recites "Prediction residuals for each 4 x 4 block are coded by CABAC. The first bit is the SKIP indicator. If the residual is all-zero, the SKIP indicator is set to 1 and the encoder moves to the next block. Otherwise, the SKIP indicator is set to 0 and residuals are coded using one of three scan orders: horizontal, vertical, and zig-zag. For the Ver (Hor) prediction mode, vertical (horizontal) scan order is used. Other modes use the zig-zag scan order. Locations of non-zero residuals are first indicated by binarizing the scanned block, with 1's placed at the locations of non-zero residuals and O's placed elsewhere. This binary vector is coded using CABAC. Finally, the non-zero residual values are coded in a manner similar to HEVC [15]: values larger than 1 or 2 are flagged, the flags are CABAC-coded, and the non-flagged values are binarized using exponential Golomb-Rice coding, then coded by CABAC" (i.e. blocks are encoded individually and may be assigned different compression modes, such as CABAC coding or Golomb-Rice coding))
With respect to claim 4, the combination of Choi, Yan, and Yang teaches the system of claim 1, wherein encoding the at least one block comprises encoding the at least one block encoded independently from other blocks of the tensor using a plurality of the lossless compression modes (Choi section III recites "Prediction residuals for each 4 x 4 block are coded by CABAC. The first bit is the SKIP indicator. If the residual is all-zero, the SKIP indicator is set to 1 and the encoder moves to the next block. Otherwise, the SKIP indicator is set to 0 and residuals are coded using one of three scan orders: horizontal, vertical, and zig-zag. For the Ver (Hor) prediction mode, vertical (horizontal) scan order is used. Other modes use the zig-zag scan order. Locations of non-zero residuals are first indicated by binarizing the scanned block, with 1's placed at the locations of non-zero residuals and O's placed elsewhere. This binary vector is coded using CABAC. Finally, the non-zero residual values are coded in a manner similar to HEVC: values larger than 1 or 2 are flagged, the flags are CABAC-coded, and the non-flagged values are binarized using exponential Golomb-Rice coding, then coded by CABAC." Blocks are encoded individually and may be assigned different compression modes, such as CABAC coding or Golomb-Rice coding).
With respect to claim 5, the combination of Choi, Yan, and Yang teaches the system of claim 1, wherein the at least one block comprises 48 bits (Choi section II recites "Feature values are typically quantized using an n-bit uniform quantizer (Q-layer in Fig. 1) prior to lossless [3] or lossy [4] compression.
Ṽ = round ((V - min(V)) / max(V) - min(V)) • (2n - 1)
where V ϵ RNxMxc is the feature tensor with N rows, M columns, and C channels at the point of split, Vis the quantized feature tensor, and min(V) and max(V) are the minimum and maximum value in V, respectively. In the studies performed so far [3], [4], [7], this uniform n-bit quantization was shown to have negligible effect on image classification and object detection accuracy, for n >= 6." (i.e. the tensor is quantized to n-bits, where n >= 6 includes n = 48, without affecting performance)). 
Regarding  claim 6, the combination of Choi, Yan, and Yang teaches the system of claim 1, wherein the processor is configured to output the at least one block of values encoded as a bit stream (Choi section III recites "If the current block's mode is the same as mpm, bit 1 is coded by CABAC to indicate it. Otherwise, bit 0 is coded, followed by two bits to indicate the mode1• Prediction residuals for each 4 x 4 block are coded by CABAC. The first bit is the SKIP indicator. If the residual is all-zero, the SKIP indicator is set to 1 and the encoder moves to the next block. Otherwise, the SKIP indicator is set to 0 and residuals are coded using one of three scan orders: horizontal, vertical, and zig-zag. For the Ver (Hor) prediction mode, vertical (horizontal) scan order is used. Other modes use the zig-zag scan order. Locations of non-zero residuals are first indicated by binarizing the scanned block, with 1's placed at the locations of non-zero residuals and O's placed elsewhere. This binary vector is coded using CABAC. Finally, the non-zero residual values are coded in a manner similar to HEVC: values larger than 1 or 2 are flagged, the flags are CABAC-coded, and the non-flagged values are binarized using exponential Golomb-Rice coding, then coded by CABAC" (i.e. Choi indicates the process by which blocks are presented as a bitstream))
Claim 9 is a method claim and its limitations are included in claim 1. The only difference is that claim 9 requires a method (the abstract of Choi recites "In this work, we examine the differences between the deep feature data and natural image data, and propose a simple and effective near-lossless deep feature compressor. The proposed method achieves up to 5% bit rate reduction compared to HEVClntra and even more against other popular image codecs"). Therefore, claim 9 is rejected for the same reasons as claim 1.
With respect to claim 11, it is substantially similar to claim 3 and is rejected in the same manner, the same art and reasoning applying.
With respect to claim 12, it is substantially similar to claim 4 and is rejected in the same manner, the same art and reasoning applying.
With respect to claim 13, it is substantially similar to claim 5 and is rejected in the same manner, the same art and reasoning applying.
Claim 14 is a method claim and its limitation is included in claim 6. Claim 14 is rejected for the same reasons as claim 6.

Claims 7-8, 15-17, and 19-20 are rejected under 35 U.S.C. 103 as being unpatentable over Choi (Choi et al., "Near-Lossless Deep Feature Compression for Collaborative Intelligence", August 2018, IEEE, 2018 IEEE 20th International Workshop on Multimedia Signal Processing (MMSP), pages 1-6), further in view of Yan (US 20200234130 A1), in further view of Luo et al (" DeepSIC: Deep Semantic Image Compression", herein Luo), in further view of Yang et al ("Supervised Translation-Invariant Sparse Coding", herein Yang).
With respect to claim 7, the combination of Choi, Yan, and Yang teaches the system of claim 6, and deformatting the at least one block into a tensor having the size of H x W x C (Choi section IV recites "To demonstrate this, we construct a mirror model, indicated in the bottom right of Fig. 4, based on the network in the mobile. Specifically, given the network in the mobile, the mirror model consists of the same number of layers, but in reverse order: convolutional layers from the mobile network are mapped to the convolutional layers with the same kernel size in the mirror model, while max-pooling layers from mobile network are mapped to up¬ sampling layers. The goal of the mirror model is to reconstruct the input image from the deep features transmitted to the cloud" (i.e. the decoder/decompression mirrors the encoder/compression to form a model of the neural network using the decoded feature maps, deformatted to a tensor of corresponding size H x W x C)).
However, the combination of Choi, Yan, and Yang does not teach decoding the at least one block independently from other blocks of the tensor at least one decompression mode corresponding to the at least one compression mode used to compress the at least one block. 
Luo teaches decoding the at least one block independently from other blocks of the tensor at least one decompression mode corresponding to the at least one compression mode used to compress the at least one block (section 38 recites "We exploit this low entropy by lossless compression via entropy coding, to be specific, we implement an entropy coding based on the context-adaptive binary arithmetic coding (CABAC) framework proposed by [20]. Arithmetic entropy codes are designed to compress discrete-valued data to bit rates closely approaching the entropy of the representation, assuming that the probability model used to design the code approximates the data well. We associate each bit location in Q(f(x)) with a context, which comprises a set of features indicating the bit value. These features are based on the position of the bit as well as the values of neighboring bits. We train a classifier to predict the value of each bit from its context feature, and then use the resulting belief distribution to compress b. Given y = Q(f(x)) denotes the quantized code, after entropy encoding y into its binary representation Ay, we retrieve the compression code sequence. During decoding, we decompress the code by performing the inverse operation. Namely, we interleave between computing the context of a particular bit using the values of previously decoded bits. The obtained context is employed to retrieve the activation probability of the bit to be decoded. Note that this constrains the context of each bit to only involve features composed of bits already decoded."; Section 3C, "Although arithmetic entropy encoding is lossless, the quantization will bring in some loss in accuracy, the result of a-1(Q(f(x)) is not exactly the same as the output of feature extraction. It is an approximation of f(x)" (i.e. Luo's decoding uses a decompression mode that matches the compression mode used on a particular block)).
It would have been obvious to an artisan of ordinary skill before the effective filing date of the claimed invention to combine the system of Choi (as modified by Yan and Yang) with decoding the at least one block independently from other blocks of the tensor at least one decompression mode corresponding to the at least one compression mode used to compress the at least one block in order to decompress the code by performing the inverse operation. (Luo, Section 38)
Regarding claim 8, the combination of Choi, Yan, and Yang teaches the system of claim 1.
However, the combination of Choi, Yan, and Yang does not teach wherein the sparsified activation map includes floating-point values and wherein the executable operations further comprise quantizing the floating-point values of the activation map to be integer values.
Luo teaches wherein the sparsified activation map includes floating-point values (Luo section 38 recites "Given the extracted tensor f(x) ϵ RcxHxw, before entropy coding the tensor, we first perform quantization. The feature tensor is optimally quantized to a lower bit precision 8: The quantization bin 8 we use here is 6 bit."; Section 3D, "The input feature maps of semantic analysis module in pre-semantic DeepSIC are under floating point precision.")
and wherein the executable operations further comprise quantizing the floating-point values of the activation map to be integer values (Luo section 38 recites "Given the extracted tensor f(x) e RcxHxw, before entropy coding the tensor, we first perform quantization. The feature tensor is optimally quantized to a lower bit precision 8: The quantization bin 8 we use here is 6 bit."; Section 3D recites "The input feature maps of semantic analysis module in pre-semantic DeepSIC are under floating point precision.")
It would have been obvious to an artisan of ordinary skill before the effective filing date of the claimed invention to combine the system of modified Choi with wherein the sparsified activation map includes floating-point values, and wherein the executable operations further comprise quantizing the floating-point values of the activation map to be integer values in order to quantize the feature output to a lower bit precision. (Luo, Section 3)
With respect to claim 15, it is substantially similar to claim 7 and is rejected in the same manner, the same art and reasoning applying.
With respect to claim 16, it is substantially similar to claim 8 and is rejected in the same manner, the same art and reasoning applying.
Regarding claim 17, Choi teaches a method to decompress a [sparsified] activation map of a neural network, the method comprising: (the abstract recites "Collaborative intelligence is a new paradigm for efficient deployment of deep neural networks across the mobile-cloud infrastructure. By dividing the network between the mobile and the cloud, it is possible to distribute the computational workload such that the overall energy and/or latency of the system is minimized. However, this necessitates sending deep feature data from the mobile to the cloud in order to perform inference. In this work, we examine the differences between the deep feature data and natural image data, and propose a simple and effective near¬ lossless deep feature compressor. The proposed method achieves up to 5% bit rate reduction compared to HEVClntra and even more against other popular image codecs. Finally, we suggest an approach for reconstructing the input image from compressed deep features that could serve to supplement the inference performed by the deep model."; Section 4 recites "To demonstrate this, we construct a mirror model, indicated in the bottom right of Fig. 4, based on the network in the mobile. Specifically, given the network in the mobile, the mirror model consists of the same number of layers, but in reverse order: convolutional layers from the mobile network are mapped to the convolutional layers with the same kernel size in the mirror model, while max-pooling layers from mobile network are mapped to up¬ sampling layers. The goal of the mirror model is to reconstruct the input image from the deep features transmitted to the cloud");
decompressing, by a processor, a compressed block of values of a bitstream representing values of the [sparsified] activation map to form at least one decompressed block of values (the abstract recites "Collaborative intelligence is a new paradigm for efficient deployment of deep neural networks across the mobile-cloud infrastructure. By dividing the network between the mobile and the cloud, it is possible to distribute the computational workload such that the overall energy and/or latency of the system is minimized. However, this necessitates sending deep feature data from the mobile to the cloud in order to perform inference. In this work, we examine the differences between the deep feature data and natural image data, and propose a simple and effective near-lossless deep feature compressor. The proposed method achieves up to 5% bit rate reduction compared to HEVClntra and even more against other popular image codecs. Finally, we suggest an approach for reconstructing the input image from compressed deep features that could serve to supplement the inference performed by the deep model."; Section 4, "To demonstrate this, we construct a mirror model, indicated in the bottom right of Fig. 4, based on the network in the mobile. Specifically, given the network in the mobile, the mirror model consists of the same number of layers, but in reverse order: convolutional layers from the mobile network are mapped to the convolutional layers with the same kernel size in the mirror model, while max-pooling layers from mobile network are mapped to up-sampling layers. The goal of the mirror model is to reconstruct the input image from the deep features transmitted to the cloud");
and deformatting, by the processor, the decompressed block of values to be part of a tensor having a size of H x W x C in which H represents a height of the tensor, W represents a width of the tensor, and C represents a number of channels of the tensor, the tensor being the decompressed activation map (Choi section IV recites "To demonstrate this, we construct a mirror model, indicated in the bottom right of Fig. 4, based on the network in the mobile. Specifically, given the network in the mobile, the mirror model consists of the same number of layers, but in reverse order: convolutional layers from the mobile network are mapped to the convolutional layers with the same kernel size in the mirror model, while max-pooling layers from mobile network are mapped to up¬ sampling layers. The goal of the mirror model is to reconstruct the input image from the deep features transmitted to the cloud" (i.e. the decoder/decompression mirrors the encoder/compression to form a model of the neural network using the decoded feature maps, deformatted to a tensor of corresponding size H x W x C)).
However, Choi does not explicitly teach a sparsified activation map.
Yan teaches a sparsified activation map (Yan para. [0248] recites "Further, as discussed with reference to FIG. 21, this technique provides for a learning scheme, such as where sparse regularization is added to and at channel-level that is suitable for both training new neural network and/or fine-tuning existing neural networks, while reducing neural network parameters, runtime memory consumption and demand, FLOPs, etc." (i.e. sparse regularization reduces neural network parameters, including values of the activation map. Para. [0027] of Applicant's specification indicates the step of sparsifying is similarly achieved through regularization, "The activation maps 106 of the neural network 105 are sparsified by the sparsifier 107 to form sparsified activation maps 111 that have an increased number of values that are equal to zero so that the lossless compression performed by the encoder 110 will be more effective. The sparsifier stage 107 fine tunes a pre-trained neural network using an additional regularization in a cost function. Typically, when neural networks are trained, a cost function L(w) is minimized with respect to the weights w. The cost function L(w) contains two terms: a data term and a regularization term. The data term is usually a cross-entropy loss, while the regularization term is typically an L2 norm on the network weights. During fine-tuning of the pre-trained network, the cost function L(w) is modified by adding a new regularization term").
It would have been obvious to an artisan of ordinary skill before the filing date of the claimed invention to combine the system of Choi with sparsifying, using the processor, a number of non-zero values of the activation map in order to facilitate slimming of neural networks. (Yan, Abstract)
However, the combination of Choi and Yan does not explicitly teach the decompressed block of values being independently decompressed from other blocks of the activation map using at least one decompression mode corresponding to at least one lossless compression mode used to compress the at least one block.
Luo teaches the decompressed block of values being independently decompressed from other blocks of the activation map using at least one decompression mode corresponding to at least one lossless compression mode used to compress the at least one block (section 38 recites "We exploit this low entropy by lossless compression via entropy coding, to be specific, we implement an entropy coding based on the context-adaptive binary arithmetic coding (CABAC) framework proposed by [20]. Arithmetic entropy codes are designed to compress discrete-valued data to bit rates closely approaching the entropy of the representation, assuming that the probability model used to design the code approximates the data well. We associate each bit location in Q(f(x)) with a context, which comprises a set of features indicating the bit value. These features are based on the position of the bit as well as the values of neighboring bits. We train a classifier to predict the value of each bit from its context feature, and then use the resulting belief distribution to compress b. Given y = Q(f(x)) denotes the quantized code, after entropy encoding y into its binary representation Ay, we retrieve the compression code sequence. During decoding, we decompress the code by performing the inverse operation. Namely, we interleave between computing the context of a particular bit using the values of previously decoded bits. The obtained context is employed to retrieve the activation probability of the bit to be decoded. Note that this constrains the context of each bit to only involve features composed of bits already decoded."; Section 3C recites "Although arithmetic entropy encoding is lossless, the quantization will bring in some loss in accuracy, the result of Q-1(Q(f(x)) is not exactly the same as the output of feature extraction. It is an approximation of f(x)" (i.e. Luo's decoding uses a decompression mode that matches the compression mode used on a particular block))
It would have been obvious to an artisan of ordinary skill before the effective filing date of the claimed invention to combine the method of Choi (as modified by Yan) with the decompressed block of values being independently decompressed from other blocks of the activation map using at least one decompression mode corresponding to at least one lossless compression mode used to compress the at least one block in order to decompress the code by performing the inverse operation. (Luo, Section 38)
However, the combination of Choi, Yan, and Luo does not teach wherein the at least one lossless compression mode is selected from a group including Sparse-Exponential-Golomb encoding, Sparse-Exponential-Golomb-RemoveMin encoding, and Sparse fixed length encoding.
Yang teaches wherein the at least one lossless compression mode is selected from a group including Sparse-Exponential-Golomb encoding, Sparse-Exponential-Golomb-RemoveMin encoding, and Sparse fixed length encoding (section 3.1 recites “In our hierarchical model, an image is represented by a local descriptor set X = [x1, x2 . . . , xN] ∈ Rd×N, where Xi denotes the i-th local descriptor of the image in column vector. Suppose we are given a dictionary B ∈ Rd×K that can sparsely represent these local descriptors, where K is the size of the dictionary and is typically greater than Zd. The sparse representations of a descriptor set are computed as

    PNG
    media_image1.png
    45
    309
    media_image1.png
    Greyscale
 
where Ẑ ∈ RK×N contains the sparse representations in columns for the descriptors in X. In order for classification, where we need fixed length feature vectors, we define the image level feature over the sparse representation matrix Ẑ by max pooling β = ξmax (Ẑ), where ξmax is defined on each row of Z^∈RK×N, returning a vector β∈RK with the i-th element being βi=max{|Ẑi1|,|Ẑi2|, . . ., |ẐiN|}” (i.e. sparse fixed length coding of feature vectors))
See claim 1 for motivation to combine.
With respect to claim 19, the combination of Choi, Yan, Luo, and Yang teaches the system of claim 17, further comprising: 
sparsifying, using the processor, a number of non-zero values of the activation map (Yan para. [0248] recites "Further, as discussed with reference to FIG. 21, this technique provides for a learning scheme, such as where sparse regularization is added to and at channel-level that is suitable for both training new neural network and/or fine-tuning existing neural networks, while reducing neural network parameters, runtime memory consumption and demand, FLOPs, etc." (i.e. sparse regularization reduces neural network parameters, including values of the activation map. Para. [0027] of Applicant's specification indicates the step of sparsifying is similarly achieved through regularization, "The activation maps 106 of the neural network 105 are sparsified by the sparsifier 107 to form sparsified activation maps 111 that have an increased number of values that are equal to zero so that the lossless compression performed by the encoder 110 will be more effective. The sparsifier stage 107 fine tunes a pre-trained neural network using an additional regularization in a cost function. Typically, when neural networks are trained, a cost function L(w) is minimized with respect to the weights w. The cost function L(w) contains two terms: a data term and a regularization term. The data term is usually a cross-entropy loss, while the regularization term is typically an L2 norm on the network weights. During fine-tuning of the pre-trained network, the cost function L(w) is modified by adding a new regularization term");
configuring the activation map as a tensor having a tensor size of H x W x C (Choi section II recites "Feature values are typically quantized using an n-bit uniform quantizer (Q-layer in Fig. 1) prior to lossless [3] or lossy [4] compression.
Ṽ = round ((V - min(V)) / max(V) - min(V)) · (2" - 1)
where V ϵ RNxMxc is the feature tensor with N rows, M columns, and C channels at the point of split, Ṽ is the quantized feature tensor, and min(V) and max(V) are the minimum and maximum value in V, respectively. In the studies performed so far [3], [4], [7], this uniform n-bit quantization was shown to have negligible effect on image classification and object detection accuracy, for n  6. For this reason, when such uniform quantizer is followed up by a lossless encoder, we refer to the resulting approach as near-lossless compression. In this work, the Q-layer performs uniform 8-bit quantization. Note that min(V) and max(V) need to be transferred to the cloud for the inverse Q process. The quantized features Ṽ are rearranged in a tiled image, as shown in Fig. 2" (i.e. H x W x C and N x M x C both represent three dimensions of a tensor. H and N are analogous as rows and height. W and M are analogous as width and columns. C is used to represent channels in both expressions));
formatting the tensor into at least one block of values (Choi section 11 recites "Feature values are typically quantized using an n-bit uniform quantizer (Q-layer in Fig. 1) prior to lossless [3] or lossy [4] compression.
V = round ((V - min(V)) / max(V) - min(V)) · (2n - 1)
where Ṽ ϵ RNxMxc is the feature tensor with N rows, M columns, and C channels at the point of split, Ṽ is the quantized feature tensor, and min(V) and max(V) are the minimum and maximum value in V, respectively. In the studies performed so far [3], [4], [7], this uniform n-bit quantization was shown to have negligible effect on image classification and object detection accuracy, for n >= 6. For this reason, when such uniform quantizer is followed up by a lossless encoder, we refer to the resulting  approach  as  near-lossless  compression.  In  this  work,  the  Q-layer performs uniform 8-bit quantization. Note that min(V) and max(V) need to be transferred to the cloud for the inverse Q process. The quantized features Ṽ are rearranged in a tiled image, as shown in Fig. 2" (i.e. rearranging features Ṽ into tiled images is a process of formatting the tensor into at least one block of values));
and encoding, by the processor, the at least one block of values independently from other blocks of the tensor using at least one lossless compression mode (Choi section 3 recites "Before coding the quantized feature data, the following parameters are encoded directly using fixed-length coding: dimensions of the feature tensor, min(V) and max(V) (32-bit each) and the eight most frequent feature values, mi for i = 0, 1, ...7. The set of {mi} is obtained over the entire quantized feature tensor. A vector of these values, p = (p0, p1, ..., p7), is referred to as the palette vector."; "initially, the palette vector is sorted according to the frequency of these values in the first tile, so that p0 is the most frequent of the mi's in the first tile, p1 is the next most frequent, etc. As we move to other tiles, the palette vector p = (p0, p1, ..., p7) is re-sorted according to the frequency of occurrence of mi's up to the previously coded tile, so that p0 is the most frequent mi up to that point, and so on. At the tile boundary, once p is updated, one element of p is chosen to minimize the mean absolute difference (MAD) from the feature values in the to-be-coded tile.";" The most frequently used mode among them is considered the mpm. If the current block's mode is the same as mpm,  bit  1  is  coded  by  CABAC  to  indicate  it.  Otherwise, bit 0 is coded, followed by two bits to indicate the mode1" (i.e. independently encoding blocks using at least one lossless compression mode)).
Regarding claim 20, the combination of Choi, Yan, Luo, and Yang teaches the method of claim 19, wherein the at least one lossless compression mode selected to compress the at least one block of values is different from a lossless compression mode selected to compress another block of values of the tensor of the received at least one activation map (Choi section 3 recites "Prediction residuals for each 4 x 4 block are coded by CABAC. The first bit is the SKIP indicator. If the residual is all-zero, the SKIP indicator is set to 1 and the encoder moves to the next block. Otherwise, the SKIP indicator is set to 0 and residuals are coded using one of three scan orders: horizontal, vertical, and zig-zag. For the Ver (Hor) prediction mode, vertical (horizontal) scan order is used. Other modes use the zig-zag scan order. Locations of non-zero residuals are first indicated by binarizing the scanned block, with 1's placed at the locations of non-zero residuals and O's placed elsewhere. This binary vector is coded using CABAC. Finally, the non-zero residual values are coded in a manner similar to HEVC: values larger than 1 or 2 are flagged, the flags are CABAC-coded, and the non-flagged values are binarized using exponential Golomb-Rice coding, then coded by CABAC" (i.e. blocks are encoded individually and may be assigned different compression modes, such as CABAC coding or Golomb-Rice coding));
and wherein compressing the at least one block of values further comprises compressing, by the processor, the at least one block of values independently from other blocks of values of the tensor of the received at least one activation map using a plurality of the lossless compression modes. (Choi section 3 recites "Prediction residuals for each 4 x 4 block are coded by CABAC. The first bit is the SKIP indicator. If the residual is all-zero, the SKIP indicator is set to 1 and the encoder moves to the next block. Otherwise, the SKIP indicator is set to 0 and residuals are coded using one of three scan orders: horizontal, vertical, and zig¬ zag. For the Ver (Hor) prediction mode, vertical (horizontal) scan order is used. Other modes use the zig-zag scan order. Locations of non-zero residuals are first indicated by binarizing the scanned block, with 1's placed at the locations of non-zero residuals and O's placed elsewhere. This binary vector is coded using CABAC. Finally, the non-zero residual values are coded in a manner similar to HEVC [15]: values larger than 1 or 2 are flagged, the flags are CABAC-coded, and the non-flagged values are binarized using exponential Golomb-Rice coding, then coded by CABAC" (i.e. blocks are encoded independently and may be assigned different compression modes, such as CABAC coding or Golomb-Rice coding)).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
US 20150030238 A1 (Yang et al) teaches an image classification system that utilizes a sparse fixed length encoding technique.
“Comparison of encoding techniques for transmission of image data obtained using compressed sensing in wireless sensor networks” (Loganathan et al) compares several sparse encoding techniques, including fixed length and exponential-Golomb encoding. 
“An Efficient Deep Quantized Compressed Sensing Coding Framework of Natural Images” (Cui et al) teaches a Deep Quantization Block-based Compressed Sensing (DQBCS) framework for compressing deep neural networks used in image analysis.
Applicant's amendment necessitated the new grounds of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to LEAH M FEITL whose telephone number is (571)272-8350. The examiner can normally be reached on M-F 0800-1700.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Li B. Zhen can be reached on (571) 272-3768. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll- free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
	/L.M.F./             Examiner, Art Unit 2121                                                                                                                                                                                           
	/Li B. Zhen/             Supervisory Patent Examiner, Art Unit 2121