DETAILED ACTION
This Office Action is in response to the RCE entered on 5/24/2022. Claims 1-20 were amended. Claims 21-22 were added. Claims 1-22 are pending.
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 5/24/2022 has been entered.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 1-22 stand rejected under 35 U.S.C. 101 because the claimed invention is directed to a judicial exception without significantly more.
Step 1 analysis:
In the instant case, the claims are directed to a method (claims 1-5, 21-22), apparatus (claims 6-15) and medium (claims 16-20). Thus, each of the claims falls within one of the four statutory categories (i.e., process, machine, manufacture, or composition of matter).
Step 2A analysis:
Based on the claims being determined to be within of the four categories (Step 1), it must be determined if the claims are directed to a judicial exception (i.e., law of nature, natural phenomenon, and abstract idea), in this case the claims fall within the judicial exception of an abstract idea. Specifically the abstract idea of  Mental Processes-“Concepts performed in the human mind (including an observation, evaluation, judgment, opinion)” and Mathematical Concepts (including mathematical relationships, formulas, and/or calculations). 
Step 2A: Prong 1 analysis:
The claim(s) recite(s):
Claim 1:
“determining a first compression ratio for a first layer of the neural network based on a first threshold associated with a first system event” - this limitation amounts to determining (analyzing or evaluating which is a mental process) a compression ratio (mathematical relationship which is a mathematical concept) based on a threshold (mathematical relationship);
“compressing at least one of activations or weights in the first layer of the neural network based at least in part on the first compression ratio and detection of the first system event to produce at least one of compressed activations or compressed weights” - this limitation amounts to compressing activation or weights (mathematical calculation) based on a compression ratio (which is a mathematical relationship);
“…compute an inference based on the at least one of the compressed activations or the compressed weights” – this limitation amounts to computing (mathematical calculation) based on the previous calculation (compressed activation or weights).
Step 2A: Prong 2 analysis:
This judicial exception is not integrated into a practical application because it only recites these additional elements: 
 “operating the neural network …” - this limitation recites the use of a neural network; however, it is recited at a high-level of generality such that it amounts to utilizing the neural network to perform the calculation, and this is no more than mere instructions to apply the exception on a computer (see MPEP 2106.05(f));
Accordingly, this additional element do not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claims are directed to an abstract idea.
Step 2B analysis:
The claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements explained above amount to mere instructions to implement an abstract idea or other exception on a computer (see MPEP 2106.05(f)).
The claims are not patent eligible. 
Independent claims 6, 11 and 16 are analogous claims, therefore the same rejection and rationale applies to them.
In addition, independent claim 6 recites the additional elements analyzed under Step 2A: prong 2 and Step 2B:
“a memory; and at least one processor coupled to the memory”- this limitation amounts to generic computer components (see MPEP 2106.05(b)).
In addition, independent claim 16 recites the additional elements analyzed under Step 2A: prong 2 and Step 2B:
“computer readable medium storing computer executable code”- this limitation amounts to generic computer components (see MPEP 2106.05(b)).
Dependent claim(s) 2-5, 7-10, 12-15, and 17-22 when analyzed as a whole are held to be patent ineligible under 35 U.S.C. 101 because the additional recited limitation(s) fail(s) to establish that the claim(s) is/are not directed to an abstract idea. The claims are reciting further embellishment of the judicial exception.   
Claim 2: this claim recites further embellishment about the computation (calculation). Claims 7, 12 and 17 are analogous to claim 2. 
Claim 3: this claim recites further embellishment about the compression (mathematical calculation) and the condition for it to happen, and this limitation does no more than generally link a judicial exception to a particular technological environment, such as power, bandwidth, debug or thermal events (see MPEP 2106.05(h)- Field of Use). Claims 8, 13 and 18 are analogous to claim 3.
Claim 4: this limitation does no more than generally link a judicial exception to a particular technological environment, such as power, bandwidth, debug or thermal events (see MPEP 2106.05(h)- Field of Use). Claims 9, 14 and 19 are analogous to claim 4.
Claim 5: this claim recites further embellishment about the compression ratio and a compression map, which amounts to mathematical relationships. Claims 10, 15 and 20 are analogous to claim 5.
Claim 21: this claim recites a second and third compression ratios which are further mathematical concepts (calculating/relationships), which is also an abstract idea . 
Claim 22: this claim recites a compression map indicating different ratios, which are further mathematical relationships, also abstract idea.
Viewed as a whole, these additional claim element(s) do not provide meaningful limitation(s) to transform the abstract idea into a patent eligible application of the abstract idea such that the claim(s) amounts to significantly more than the abstract idea itself.  Therefore, the claim(s) are rejected under 35 U.S.C. 101 as being directed to non-statutory subject matter.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1-20 are rejected under 35 U.S.C. 103 as being unpatentable over Agrawal et al (US Pub. No. 2019/0171935- hereinafter Agrawal) in view of Heaton et al (US Pub. No. 2018/0000385- hereinafter Heaton).
Referring to Claim 1, Agrawal teaches a method of operating a neural network, comprising: 
determining a first compression ratio for a first layer of the neural network (see [0055]; Agrawal teaches “providing a new compression technique that assists in minimizing the amount of data exchanged among accelerators. In particular, an adaptive residual gradient compression scheme is provided that utilizes localized selection of gradient residues, in which the residual gradient compression scheme is able to automatically tune the compression rate based on local activity”. Therefore, Agrawal’s adaptive residual gradient compression scheme corresponds to the claimed ‘compression ratio’);
compressing at least one of activations or weights in the first layer of the neural network based at least in part on the first compression ratio and detection of the first system event to produce at least one of compressed activations or compressed weights (see [0055]; Agrawal teaches “providing a new compression technique that assists in minimizing the amount of data exchanged among accelerators. In particular, an adaptive residual gradient compression scheme is provided that utilizes localized selection of gradient residues, in which the residual gradient compression scheme is able to automatically tune the compression rate based on local activity”. Therefore, Agrawal’s adaptive residual gradient compression scheme corresponds to the claimed ‘compression of activations or weights’; and being able to automatically tune the compression rate based on local activity corresponds to the claimed ‘based at least in art on a compression ratio and a system event’. In addition, see [0056] “[t]he compression scheme can be applied to every layer separately at each learner” and “by exploiting both sparsity and quantization, end-to-end compression rates of about 200× for fully-connected and recurrent layers and 40× for convolution layers may be achieved without noticeable degradation in model accuracy (e.g., <1% degradation)”; therefore, since the compression is applied to every layer, at least one layer is included; and these rates (200x or 40x) between layers corresponds to the “compression ratio”. Moreover, at [0068], Agrawal further teaches “[t]o save communication bandwidth, each worker sends partial value of gradients and keep the reminding residues locally. System 600 provides a compression technique that assists in minimizing the amount of data exchanged among accelerators. In particular, system 600 employs an adaptive residual gradient compression scheme”. Therefore, it is interpreted that the local activity (system event) would be bandwidth); and 
operating the neural network to compute an inference based on the at least one of the compressed activations or the compressed weights (see [0067]; Agrawal teaches “the CNN 500 is being trained to identify handwriting, the input maps 510 are combined with a filter bank that includes convolution kernels representing a vertical line. The resulting output map 530 identifies vertical lines which may be present in the input maps”. Therefore, the output being the identification of a letter (see also Fig. 4 which shows the letter “w”), this corresponds to claimed “inference”).
However, Agrawal fails to explicitly teach determining a first compression ratio for a first layer of the neural network based on a first threshold associated with a first system event.
Heaton teaches, in an analogous system, determining a first compression ratio for a first layer of the neural network based on a first threshold associated with a first system event (see Heaton at [0048]: “a threshold number of sequential outputs of the compressed fall detection model indicate that the resident was falling during a corresponding contiguous sequence of sampling periods”. Further, at [0064]: “[i]n particular, the resident's mobility can be inversely correlated with risk of falling, and the wearable device can therefore adjust sensitivity of the compressed fall detection model to a fall event by reducing the local threshold score as a function of decreasing mobility of the resident”. Therefore, the compression model depends directly on the threshold to detect a fall (which corresponds to the first system event)).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Agrawal with the above teachings of Heaton by compressing a neural network, as taught by Agrawal, and compressing the neural network based on a threshold associated with an event, as taught by Heaton. The modification would have been obvious because one of ordinary skill in the art would be motivated to increase the sensitivity of a fall detection by tuning weights of connections between neurons in a neural network (see Heaton at [0023]). 

Referring to Claim 2, Agrawal teaches the method of claim 1, wherein the compression occurs during computation of an inference (see [0055]; Agrawal teaches “providing a new compression technique that assists in minimizing the amount of data exchanged among accelerators. In particular, an adaptive residual gradient compression scheme is provided that utilizes localized selection of gradient residues, in which the residual gradient compression scheme is able to automatically tune the compression rate based on local activity”. Therefore, since Agrawal teaches an adaptive (emphasis added) residual gradient compression scheme, wherein it automatically tunes (emphasis added) based on local activity, it is interpreted that the compression is performed adaptively/dynamically during execution of the neural network (inferencing)).

Referring to Claim 3, Agrawal teaches the method of claim 1, wherein the first compression ratio is adapted mid-layer in response to a change in at least one of a power condition, a bandwidth condition, a debug condition, or a thermal condition (see [0055]; Agrawal teaches “providing a new compression technique that assists in minimizing the amount of data exchanged among accelerators. In particular, an adaptive residual gradient compression scheme is provided that utilizes localized selection of gradient residues, in which the residual gradient compression scheme is able to automatically tune the compression rate based on local activity”. Moreover, at [0068], Agrawal further teaches “[t]o save communication bandwidth, each worker sends partial value of gradients and keep the reminding residues locally. System 600 provides a compression technique that assists in minimizing the amount of data exchanged among accelerators. In particular, system 600 employs an adaptive residual gradient compression scheme”. Therefore, it is interpreted that the local activity (system event) would be bandwidth, and since Agrawal teaches an adaptive residual gradient compression scheme that automatically tunes (emphasis added), this is interpreted as being in response to a change in bandwidth. In addition, see [0056] “[t]he compression scheme can be applied to every layer separately at each learner”, therefore since the compression is applied to every layer, the mid layer is included).

Referring to Claim 4, Agrawal teaches the method of claim 1, wherein the first system event comprises at least one of a bandwidth condition, a power condition, a debug condition, or a thermal condition (see [0055]; Agrawal teaches “providing a new compression technique that assists in minimizing the amount of data exchanged among accelerators. In particular, an adaptive residual gradient compression scheme is provided that utilizes localized selection of gradient residues, in which the residual gradient compression scheme is able to automatically tune the compression rate based on local activity”. Moreover, at [0068], Agrawal further teaches “[t]o save communication bandwidth, each worker sends partial value of gradients and keep the reminding residues locally. System 600 provides a compression technique that assists in minimizing the amount of data exchanged among accelerators. In particular, system 600 employs an adaptive residual gradient compression scheme”. Therefore, it is interpreted that the local activity (system event) would be bandwidth, and since Agrawal teaches an adaptive residual gradient compression scheme, this is interpreted as being in response to a change in bandwidth).

Referring to Claim 5, Agrawal teaches the method of claim 4, wherein the first compression ratio is determined using a compression map, the compression map being specified based at least in part on at least one of sparsity estimates or a loss threshold for a specified bandwidth, power or thermal profile (see [0083-0084]; Agrawal teaches “[a]lgorithm 2 selects up to 10 and 100 elements respectively within each bin through sparsity for bin sizes (L.sub.T) between 50 and 500 elements. In some embodiments of the present invention, a sparse-index representation of 8-bits is used for L.sub.T sizes that are less than 40 elements. In some embodiments of the present invention, a 16-bit representation is used for large L.sub.T sizes (e.g., greater than 500 elements and/or up to 10K elements)”. In addition it teaches “[a]lgorithm 2 applies a compression scheme that sends additional residual gradients that are close to the local maximum in each bin, and can therefore automatically adapt based on the number of important gradients in a mini-batch”. Therefore, the compression is based on sparsity).

Referring to Claim 6, Agrawal teaches an apparatus of operating a neural network, comprising: 
a memory (see [0058]: “[c]omputer system 300 also includes a main memory 310”; and 
at least one processor coupled to the memory (see [0058]: “[c]omputer system 300 includes one or more processors”), the at least one processor being configured to: 
determine a first compression ratio for a first layer of the neural network (see [0055]; Agrawal teaches “providing a new compression technique that assists in minimizing the amount of data exchanged among accelerators. In particular, an adaptive residual gradient compression scheme is provided that utilizes localized selection of gradient residues, in which the residual gradient compression scheme is able to automatically tune the compression rate based on local activity”. Therefore, Agrawal’s adaptive residual gradient compression scheme corresponds to the claimed ‘compression ratio’);
compress at least one of activations or weights in the first layer of the neural network based at least in part on the first compression ratio and detection of the first system event to produce at least one of compressed activations or compressed weights (see [0055]; Agrawal teaches “providing a new compression technique that assists in minimizing the amount of data exchanged among accelerators. In particular, an adaptive residual gradient compression scheme is provided that utilizes localized selection of gradient residues, in which the residual gradient compression scheme is able to automatically tune the compression rate based on local activity”. Therefore, Agrawal’s adaptive residual gradient compression scheme corresponds to the claimed ‘compression of activations or weights’; and being able to automatically tune the compression rate based on local activity corresponds to the claimed ‘based at least in art on a compression ratio and a system event’. In addition, see [0056] “[t]he compression scheme can be applied to every layer separately at each learner” and “by exploiting both sparsity and quantization, end-to-end compression rates of about 200× for fully-connected and recurrent layers and 40× for convolution layers may be achieved without noticeable degradation in model accuracy (e.g., <1% degradation)”; therefore, since the compression is applied to every layer, at least one layer is included; and these rates (200x or 40x) between layers corresponds to the “compression ratio”. Moreover, at [0068], Agrawal further teaches “[t]o save communication bandwidth, each worker sends partial value of gradients and keep the reminding residues locally. System 600 provides a compression technique that assists in minimizing the amount of data exchanged among accelerators. In particular, system 600 employs an adaptive residual gradient compression scheme”. Therefore, it is interpreted that the local activity (system event) would be bandwidth); and 
operate the neural network to compute an inference based on the at least one of the compressed activations or the compressed weights (see [0067]; Agrawal teaches “the CNN 500 is being trained to identify handwriting, the input maps 510 are combined with a filter bank that includes convolution kernels representing a vertical line. The resulting output map 530 identifies vertical lines which may be present in the input maps”. Therefore, the output being the identification of a letter (see also Fig. 4 which shows the letter “w”), this corresponds to claimed “inference”).

However, Agrawal fails to explicitly teach determining a first compression ratio for a first layer of the neural network based on a first threshold associated with a first system event.
Heaton teaches, in an analogous system, determining a first compression ratio for a first layer of the neural network based on a first threshold associated with a first system event (see Heaton at [0048]: “a threshold number of sequential outputs of the compressed fall detection model indicate that the resident was falling during a corresponding contiguous sequence of sampling periods”. Further, at [0064]: “[i]n particular, the resident's mobility can be inversely correlated with risk of falling, and the wearable device can therefore adjust sensitivity of the compressed fall detection model to a fall event by reducing the local threshold score as a function of decreasing mobility of the resident”. Therefore, the compression model depends directly on the threshold to detect a fall (which corresponds to the first system event)).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Agrawal with the above teachings of Heaton by compressing a neural network, as taught by Agrawal, and compressing the neural network based on a threshold associated with an event, as taught by Heaton. The modification would have been obvious because one of ordinary skill in the art would be motivated to increase the sensitivity of a fall detection by tuning weights of connections between neurons in a neural network (see Heaton at [0023]).


Referring to Claim 7, Agrawal teaches the apparatus of claim 6, wherein the at least one of the activations or the weights in the first layer of the neural network is compressed during computation of an inference (see [0055]; Agrawal teaches “providing a new compression technique that assists in minimizing the amount of data exchanged among accelerators. In particular, an adaptive residual gradient compression scheme is provided that utilizes localized selection of gradient residues, in which the residual gradient compression scheme is able to automatically tune the compression rate based on local activity”. Therefore, since Agrawal teaches an adaptive (emphasis added) residual gradient compression scheme, wherein it automatically tunes (emphasis added) based on local activity, it is interpreted that the compression is performed adaptively/dynamically during execution of the neural network (inferencing)).

Referring to Claim 8, Agrawal teaches the apparatus of claim 6, wherein the first system event comprises at least one of a bandwidth condition, a power condition, a debug condition, or a thermal condition (see [0055]; Agrawal teaches “providing a new compression technique that assists in minimizing the amount of data exchanged among accelerators. In particular, an adaptive residual gradient compression scheme is provided that utilizes localized selection of gradient residues, in which the residual gradient compression scheme is able to automatically tune the compression rate based on local activity”. Moreover, at [0068], Agrawal further teaches “[t]o save communication bandwidth, each worker sends partial value of gradients and keep the reminding residues locally. System 600 provides a compression technique that assists in minimizing the amount of data exchanged among accelerators. In particular, system 600 employs an adaptive residual gradient compression scheme”. Therefore, it is interpreted that the local activity (system event) would be bandwidth, and since Agrawal teaches an adaptive residual gradient compression scheme, this is interpreted as being in response to a change in bandwidth).

Referring to Claim 9, Agrawal teaches the apparatus of claim 8, wherein the at least one processor is further configured to determine the first compression ratio using a compression map, the compression map being specified based at least in part on at least one of sparsity estimates or a loss threshold for a specified bandwidth, power or thermal profile (see [0083-0084]; Agrawal teaches “[a]lgorithm 2 selects up to 10 and 100 elements respectively within each bin through sparsity for bin sizes (L.sub.T) between 50 and 500 elements. In some embodiments of the present invention, a sparse-index representation of 8-bits is used for L.sub.T sizes that are less than 40 elements. In some embodiments of the present invention, a 16-bit representation is used for large L.sub.T sizes (e.g., greater than 500 elements and/or up to 10K elements)”. In addition it teaches “[a]lgorithm 2 applies a compression scheme that sends additional residual gradients that are close to the local maximum in each bin, and can therefore automatically adapt based on the number of important gradients in a mini-batch”. Therefore, the compression is based on sparsity).

Referring to Claim 10, Agrawal teaches the apparatus of claim 6, wherein the at least one processor is further configured to adapt the first compression ratio at mid-layer in response to a change in at least one of a power condition, a debug condition, a bandwidth condition or a thermal condition (see [0055]; Agrawal teaches “providing a new compression technique that assists in minimizing the amount of data exchanged among accelerators. In particular, an adaptive residual gradient compression scheme is provided that utilizes localized selection of gradient residues, in which the residual gradient compression scheme is able to automatically tune the compression rate based on local activity”. Moreover, at [0068], Agrawal further teaches “[t]o save communication bandwidth, each worker sends partial value of gradients and keep the reminding residues locally. System 600 provides a compression technique that assists in minimizing the amount of data exchanged among accelerators. In particular, system 600 employs an adaptive residual gradient compression scheme”. Therefore, it is interpreted that the local activity (system event) would be bandwidth, and since Agrawal teaches an adaptive residual gradient compression scheme that automatically tunes (emphasis added), this is interpreted as being in response to a change in bandwidth. In addition, see [0056] “[t]he compression scheme can be applied to every layer separately at each learner”, therefore since the compression is applied to every layer, the mid layer is included).

Referring to independent Claim 11 and Claim 16, they are rejected on the same basis as independent claims 1 and 6, respectively, since they are analogous claims.

Referring to dependent Claim 12 and Claim 17, it is rejected on the same basis as dependent claims 2 and 7, respectively, since they are analogous claims.

Referring to dependent Claim 13 and Claim 18, it is rejected on the same basis as dependent claims 3 and 8, respectively, since they are analogous claims.

Referring to dependent Claim 14 and Claim 19, it is rejected on the same basis as dependent claims 4 and 9, respectively, since they are analogous claims.

Referring to dependent Claim 15 and Claim 20, it is rejected on the same basis as dependent claims 5 and 10, respectively, since they are analogous claims.

Claims 21-22 are rejected under 35 U.S.C. 103 as being unpatentable over Agrawal et al (US Pub. No. 2019/0171935- hereinafter Agrawal) in view of Heaton et al (US Pub. No. 2018/0000385- hereinafter Heaton) and further in view of Appu et al. (US Pub. No. 2018/0299841- hereinafter Appu).
Referring to Claim 21, the combination of Agrawal and Heaton teaches the method of claim 1, however, fails to teach further comprising: 
determining a second compression ratio for a second layer of the neural network different from the first layer based on a second threshold associated with the first system event, the second compression ratio being different from the first compression ratio; and 
determining a third compression ratio for a third layer of the neural network different from the first layer and the second layer based on a third threshold associated with the first system event, the third compression ratio being different from the first compression ratio and the second compression ratio.
Appu teaches, in an analogous system,
determining a second compression ratio for a second layer of the neural network different from the first layer based on a second threshold associated with the first system event, the second compression ratio being different from the first compression ratio (see Appu at Fig. 7A and [0140]: “different layers of neural networks generally have variable content or context which will exhibit different data characteristics (e.g., as discussed herein, for example with reference to FIGS. 6A or 6B). To reduce memory pressure/traffic, variable compression algorithms (e.g., having different compression ratio(s)) may be applied to different layers of a CNN”. Therefore, it can be seen that layer 1 has a compression algorithm having different compression ratio, layer two has a different one, up to layer N having a different one. Furthermore, see [0285]: “cause application of a first compression algorithm to a first layer of a neural network and to cause application of a second compression algorithm to a second layer of the neural network, wherein the logic is to determine whether to apply different compression algorithms to the first layer and the second layer based at least in part on one or more characteristics of the first layer and the second layer” and “wherein the logic is to determine whether to apply the different compression algorithms based at least in part on comparison of the one or more characteristics of the first layer and the second layer against one or more threshold values”. Therefore, each layer has a different compression ratio based on threshold values indicating characteristics, and this is interpreted as the claimed different thresholds); and 
determining a third compression ratio for a third layer of the neural network different from the first layer and the second layer based on a third threshold associated with the first system event, the third compression ratio being different from the first compression ratio and the second compression ratio (see Appu at Fig. 7A and [0140]: “different layers of neural networks generally have variable content or context which will exhibit different data characteristics (e.g., as discussed herein, for example with reference to FIGS. 6A or 6B). To reduce memory pressure/traffic, variable compression algorithms (e.g., having different compression ratio(s)) may be applied to different layers of a CNN”. Therefore, it can be seen that layer 1 has a compression algorithm having different compression ratio, layer two has a different one, up to layer N which is interpreted as the third layer. Furthermore, see [0285]: “cause application of a first compression algorithm to a first layer of a neural network and to cause application of a second compression algorithm to a second layer of the neural network, wherein the logic is to determine whether to apply different compression algorithms to the first layer and the second layer based at least in part on one or more characteristics of the first layer and the second layer” and “wherein the logic is to determine whether to apply the different compression algorithms based at least in part on comparison of the one or more characteristics of the first layer and the second layer against one or more threshold values”. Therefore, each layer (layer 1, 2…N which is interpreted as the third layer) has a different compression ratio based on threshold values indicating characteristics, and this is interpreted as the claimed different thresholds).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the combination of Agrawal and Heaton with the above teachings of Appu by compressing a neural network based on a threshold associated with an event, as taught by Agrawal and Heaton, and having different compression ratios for each layer, as taught by Appu. The modification would have been obvious because one of ordinary skill in the art would be motivated to  reduce memory traffic between the system’s memory by implementing variable compression ratios per layer (see Appu at [0140]). 

Referring to Claim 22, the combination of Agrawal and Heaton teaches the method of claim 1, however, fails to teach further comprising determining a compression map indicating different compression ratios between multiple layers of the neural network based on different thresholds associated with one or more system events of the neural network.
Appu teaches, in an analogous system, determining a compression map indicating different compression ratios between multiple layers of the neural network based on different thresholds associated with one or more system events of the neural network (see Appu at Fig. 7A and [0140]: “different layers of neural networks generally have variable content or context which will exhibit different data characteristics (e.g., as discussed herein, for example with reference to FIGS. 6A or 6B). To reduce memory pressure/traffic, variable compression algorithms (e.g., having different compression ratio(s)) may be applied to different layers of a CNN”. Therefore, it can be seen that layer 1 has a compression algorithm having different compression ratio, layer two has a different one, up to layer N having a different one. Furthermore, see [0285]: “cause application of a first compression algorithm to a first layer of a neural network and to cause application of a second compression algorithm to a second layer of the neural network, wherein the logic is to determine whether to apply different compression algorithms to the first layer and the second layer based at least in part on one or more characteristics of the first layer and the second layer” and “wherein the logic is to determine whether to apply the different compression algorithms based at least in part on comparison of the one or more characteristics of the first layer and the second layer against one or more threshold values”. Therefore, each layer has a different compression ratio based on threshold values indicating characteristics, and this is interpreted as the claimed different thresholds).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the combination of Agrawal and Heaton with the above teachings of Appu by compressing a neural network based on a threshold associated with an event, as taught by Agrawal and Heaton, and having different compression ratios for each layer, as taught by Appu. The modification would have been obvious because one of ordinary skill in the art would be motivated to  reduce memory traffic between the system’s memory by implementing variable compression ratios per layer (see Appu at [0140]). 
Response to Arguments
The Applicant’s arguments regarding the rejection of above-mentioned claims have been fully considered.
In reference to Applicant’s arguments about:
Claim rejections under 35 USC 103.
Examiner’s response:
 The remarks regarding the amendments to independent claims are mainly directed to the newly added limitation “determining a first compression ratio for a first layer of the neural network based on a first threshold associated with a first system event”. This newly added limitation necessitated new grounds of rejection, therefore, these respective arguments are moot in view of the new grounds of rejection.
Rejections to claims 1-20 are still maintained.













Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to LUIS A SITIRICHE whose telephone number is (571)270-1316. The examiner can normally be reached M-F 9am-6pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ann Lo can be reached on (571) 272-9767. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/LUIS A SITIRICHE/Primary Examiner, Art Unit 2126