DETAILED ACTION
This Office Action is in response to the remarks entered on 11/17/2021. Claims 1, 6, 11 and 16 were amended. No claims were added or cancelled. Claims 1-20 are pending.
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.

Claims 1-20 are rejected under 35 U.S.C. 103 as being unpatentable over Agrawal et al (US Pub. No. 2019/0171935- hereinafter Agrawal).
Referring to Claim 1, Agrawal teaches a method of operating a neural network, comprising: 
compressing at least one of activations or weights in at least one layer of the neural network based at least in part on a compression ratio and detection of a system event to produce at least one of compressed activations or compressed weights (see [0055]; Agrawal teaches “providing a new compression technique that assists in minimizing the amount of data exchanged among accelerators. In particular, an adaptive residual gradient compression scheme is provided that utilizes localized selection of gradient residues, in which the residual gradient compression scheme is able to automatically tune the compression rate based on local activity”. Therefore, Agrawal’s adaptive residual gradient compression scheme corresponds to the claimed ‘compression of activations or weights’; and being able to automatically tune the compression rate based on local activity corresponds to the claimed ‘based at least in art on a compression ratio and a system event’. In addition, see [0056] “[t]he compression scheme can be applied to every layer separately at each learner” and “by exploiting both sparsity and quantization, end-to-end compression rates of about 200× for fully-connected and recurrent layers and 40× for convolution layers may be achieved without noticeable degradation in model accuracy (e.g., <1% degradation)”; therefore, since the compression is applied to every layer, at least one layer is included; and these rates (200x or 40x) between layers corresponds to the “compression ratio”. Moreover, at [0068], Agrawal further teaches “[t]o save communication bandwidth, each worker sends partial value of gradients and keep the reminding residues locally. System 600 provides a compression technique that assists in minimizing the amount of data exchanged among accelerators. In particular, system 600 employs an adaptive residual gradient compression scheme”. Therefore, it is interpreted that the local activity (system event) would be bandwidth); and 
operating the neural network to compute an inference based on the at least one of the compressed activations or the compressed weights (see [0067]; Agrawal teaches “the CNN 500 is being trained to identify handwriting, the input maps 510 are combined with a filter bank that includes convolution kernels representing a vertical line. The resulting output map 530 identifies vertical lines which may be present in the input maps”. Therefore, the output being the identification of a letter (see also Fig. 4 which shows the letter “w”), this corresponds to claimed “inference”).

Referring to Claim 2, Agrawal teaches the method of claim 1, wherein the compression is performed during computation of an inference (see [0055]; Agrawal teaches “providing a new compression technique that assists in minimizing the amount of data exchanged among accelerators. In particular, an adaptive residual gradient compression scheme is provided that utilizes localized selection of gradient residues, in which the residual gradient compression scheme is able to automatically tune the compression rate based on local activity”. Therefore, sicne Agrawal teaches an adaptive (emphasis added) residual gradient compression scheme, wherein it automatically tunes (emphasis added) based on local activity, it is interpreted that the compression is performed adaptively/dynamically during execution of the neural network (inferencing)).

Referring to Claim 3, Agrawal teaches the method of claim 1, wherein the compressing ratio is adapted mid-layer in response to a change in at least one of a power condition, a bandwidth condition, a debug condition, or a thermal condition (see [0055]; Agrawal teaches “providing a new compression technique that assists in minimizing the amount of data exchanged among accelerators. In particular, an adaptive residual gradient compression scheme is provided that utilizes localized selection of gradient residues, in which the residual gradient compression scheme is able to automatically tune the compression rate based on local activity”. Moreover, at [0068], Agrawal further teaches “[t]o save communication bandwidth, each worker sends partial value of gradients and keep the reminding residues locally. System 600 provides a compression technique that assists in minimizing the amount of data exchanged among accelerators. In particular, system 600 employs an adaptive residual gradient compression scheme”. Therefore, it is interpreted that the local activity (system event) would be bandwidth, and since Agrawal teaches an adaptive residual gradient compression scheme that automatically tunes (emphasis added), this is interpreted as being in response to a change in bandwidth. In addition, see [0056] “[t]he compression scheme can be applied to every layer separately at each learner”, therefore since the compression is applied to every layer, the mid layer is included).

Claim 4, Agrawal teaches the method of claim 1, wherein the system event comprises at least one of a bandwidth condition, a power condition, a debug condition, or a thermal condition (see [0055]; Agrawal teaches “providing a new compression technique that assists in minimizing the amount of data exchanged among accelerators. In particular, an adaptive residual gradient compression scheme is provided that utilizes localized selection of gradient residues, in which the residual gradient compression scheme is able to automatically tune the compression rate based on local activity”. Moreover, at [0068], Agrawal further teaches “[t]o save communication bandwidth, each worker sends partial value of gradients and keep the reminding residues locally. System 600 provides a compression technique that assists in minimizing the amount of data exchanged among accelerators. In particular, system 600 employs an adaptive residual gradient compression scheme”. Therefore, it is interpreted that the local activity (system event) would be bandwidth, and since Agrawal teaches an adaptive residual gradient compression scheme, this is interpreted as being in response to a change in bandwidth).

Referring to Claim 5, Agrawal teaches the method of claim 4, wherein the compression ratio is determined using a compression map, the compression map being specified based at least in part on at least one of sparsity estimates or a loss threshold for a specified bandwidth, power or thermal profile (see [0083-0084]; Agrawal teaches “[a]lgorithm 2 selects up to 10 and 100 elements respectively within each bin through sparsity for bin sizes (L.sub.T) between 50 and 500 elements. In some embodiments of the present invention, a sparse-index representation of 8-bits is used for L.sub.T sizes that are less than 40 elements. In some embodiments of the present invention, a 16-bit representation is used for large L.sub.T sizes (e.g., greater than 500 elements and/or up to 10K elements)”. In addition it teaches “[a]lgorithm 2 applies a compression scheme that sends additional residual gradients that are close to the local maximum in each bin, and can therefore automatically adapt based on the number of important gradients in a mini-batch”. Therefore, the compression is based on sparsity).

Referring to Claim 6, Agrawal teaches an apparatus of operating a neural network, comprising: 
a memory (see [0058]: “[c]omputer system 300 also includes a main memory 310”; and 
at least one processor coupled to the memory (see [0058]: “[c]omputer system 300 includes one or more processors”), the at least one processor being configured to: 
compress at least one of activations or weights in at least one layer of the neural network based at least in part on a compression ratio and detection of a system event to produce at least one of compressed activations or compressed weights (see [0055]; Agrawal teaches “providing a new compression technique that assists in minimizing the amount of data exchanged among accelerators. In particular, an adaptive residual gradient compression scheme is provided that utilizes localized selection of gradient residues, in which the residual gradient compression scheme is able to automatically tune the compression rate based on local activity”. Therefore, Agrawal’s adaptive residual gradient compression scheme corresponds to the claimed ‘compression of activations or weights’; and being able to automatically tune the compression rate based on local activity corresponds to the claimed ‘based at least in art on a compression ratio and a system event’. In addition, see [0056] “[t]he compression scheme can be applied to every layer separately at each learner” and “by exploiting both sparsity and quantization, end-to-end compression rates of about 200× for fully-connected and recurrent layers and 40× for convolution layers may be achieved without noticeable degradation in model accuracy (e.g., <1% degradation)”; therefore, since the compression is applied to every layer, at least one layer is included; and these rates (200x or 40x) between layers corresponds to the “compression ratio”. Moreover, at [0068], Agrawal further teaches “[t]o save communication bandwidth, each worker sends partial value of gradients and keep the reminding residues locally. System 600 provides a compression technique that assists in minimizing the amount of data exchanged among accelerators. In particular, system 600 employs an adaptive residual gradient compression scheme”. Therefore, it is interpreted that the local activity (system event) would be bandwidth); and 
operate the neural network to compute an inference based on the at least one of the compressed activations or the compressed weights (see [0067]; Agrawal teaches “the CNN 500 is being trained to identify handwriting, the input maps 510 are combined with a filter bank that includes convolution kernels representing a vertical line. The resulting output map 530 identifies vertical lines which may be present in the input maps”. Therefore, the output being the identification of a letter (see also Fig. 4 which shows the letter “w”), this corresponds to claimed “inference”).

Referring to Claim 7, Agrawal teaches the apparatus of claim 6, wherein the at least one processor is further configure to perform the compression during computation (see [0055]; Agrawal teaches “providing a new compression technique that assists in minimizing the amount of data exchanged among accelerators. In particular, an adaptive residual gradient compression scheme is provided that utilizes localized selection of gradient residues, in which the residual gradient compression scheme is able to automatically tune the compression rate based on local activity”. Therefore, since Agrawal teaches an adaptive (emphasis added) residual gradient compression scheme, wherein it automatically tunes (emphasis added) based on local activity, it is interpreted that the compression is performed adaptively/dynamically during execution of the neural network (inferencing)).

Referring to Claim 8, Agrawal teaches the apparatus of claim 6, wherein the system event comprises at least one of a bandwidth condition, a power condition, a debug condition, or a thermal condition (see [0055]; Agrawal teaches “providing a new compression technique that assists in minimizing the amount of data exchanged among accelerators. In particular, an adaptive residual gradient compression scheme is provided that utilizes localized selection of gradient residues, in which the residual gradient compression scheme is able to automatically tune the compression rate based on local activity”. Moreover, at [0068], Agrawal further teaches “[t]o save communication bandwidth, each worker sends partial value of gradients and keep the reminding residues locally. System 600 provides a compression technique that assists in minimizing the amount of data exchanged among accelerators. In particular, system 600 employs an adaptive residual gradient compression scheme”. Therefore, it is interpreted that the local activity (system event) would be bandwidth, and since Agrawal teaches an adaptive residual gradient compression scheme, this is interpreted as being in response to a change in bandwidth).

Referring to Claim 9, Agrawal teaches the apparatus of claim 8, wherein the at least one processor is further configured to determine the compression ratio using a compression map, the compression map being specified based at least in part on at least one of sparsity estimates or a loss threshold for a specified bandwidth, power or thermal profile (see [0083-0084]; Agrawal teaches “[a]lgorithm 2 selects up to 10 and 100 elements respectively within each bin through sparsity for bin sizes (L.sub.T) between 50 and 500 elements. In some embodiments of the present invention, a sparse-index representation of 8-bits is used for L.sub.T sizes that are less than 40 elements. In some embodiments of the present invention, a 16-bit representation is used for large L.sub.T sizes (e.g., greater than 500 elements and/or up to 10K elements)”. In addition it teaches “[a]lgorithm 2 applies a compression scheme that sends additional residual gradients that are close to the local maximum in each bin, and can therefore automatically adapt based on the number of important gradients in a mini-batch”. Therefore, the compression is based on sparsity).

Referring to Claim 10, Agrawal teaches the apparatus of claim 6, wherein the at least one processor is further configured to adapt the compression ratio mid-layer in response to a change in at least one of a power condition, a debug condition, a bandwidth condition or a thermal condition (see [0055]; Agrawal teaches “providing a new compression technique that assists in minimizing the amount of data exchanged among accelerators. In particular, an adaptive residual gradient compression scheme is provided that utilizes localized selection of gradient residues, in which the residual gradient compression scheme is able to automatically tune the compression rate based on local activity”. Moreover, at [0068], Agrawal further teaches “[t]o save communication bandwidth, each worker sends partial value of gradients and keep the reminding residues locally. System 600 provides a compression technique that assists in minimizing the amount of data exchanged among accelerators. In particular, system 600 employs an adaptive residual gradient compression scheme”. Therefore, it is interpreted that the local activity (system event) would be bandwidth, and since Agrawal teaches an adaptive residual gradient compression scheme that automatically tunes (emphasis added), this is interpreted as being in response to a change in bandwidth. In addition, see [0056] “[t]he compression scheme can be applied to every layer separately at each learner”, therefore since the compression is applied to every layer, the mid layer is included).

Referring to independent Claim 11 and Claim 16, they are rejected on the same basis as independent claims 1 and 6, respectively, since they are analogous claims.

Referring to dependent Claim 12 and Claim 17, it is rejected on the same basis as dependent claims 2 and 7, respectively, since they are analogous claims.

Referring to dependent Claim 13 and Claim 18, it is rejected on the same basis as dependent claims 3 and 8, respectively, since they are analogous claims.

Claim 14 and Claim 19, it is rejected on the same basis as dependent claims 4 and 9, respectively, since they are analogous claims.

Referring to dependent Claim 15 and Claim 20, it is rejected on the same basis as dependent claims 5 and 10, respectively, since they are analogous claims.
Response to Arguments
The Applicant’s arguments regarding the rejection of above-mentioned claims have been fully considered.
In reference to Applicant’s arguments about:
Claim rejections under 35 USC 103.
Examiner’s response:
 In regards to arguments about Agrawal failing to teach or suggest the features of claim 1, examiner respectfully disagrees. After further consideration and analysis, examiner understands that there is no patentable distinction (emphasis added) between the broadest reasonable interpretation (BRI) of the amended claim limitations and the prior art. 
Agrawal is directed to a computer-implemented method for adaptive residual gradient compression for training of a deep learning neural network (DNN). Instant application is directed to machine learning and, more particularly, to improving systems and methods of lossy layer compression for dynamic scaling of neural network processing. Therefore, both the instant application and the prior art are analogous. In particular, Agrawal’s “adaptive residual gradient compression” is interpreted as being analogous to the instant’s application “compression for dynamic scaling”. Agrawal further based on (emphasis added) local activity. Therefore, Agrawal’s basis for this automatic compression depends on the rate and the activity, which is interpreted as the claimed “detection”. In view of this teaching, examiner interprets the automatic tune of the compression rate to be equivalent as the claimed compression at least one of activations or weights. Furthermore, Agrawal discloses end-to-end compression rates of about 200× for fully-connected and recurrent layers and 40× for convolution layers may be achieved without noticeable degradation in model accuracy, which reinforces the interpretation of the compression ratio. Agrawal’s goal is to save communication bandwidth and to minimize the amount of data exchanged among accelerators by applying this adaptive residual gradient compression scheme, therefore, by being adaptive, it is interpreted that it changes based on the ratio and the bandwidth needed to optimize resource use by leveraging a metering capability appropriate to the type of service (e.g. storage, processing, bandwidth) (see Agrawal’s [0033]). By metering the capability, it is reasonably interpreted as detecting the communication bandwidth, which is the system event. Similar rationale is applied to claims 6, 11 and 16, as being similar claims.
For these abovementioned reasons explained, rejections to claims 1-20 are still maintained.



Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to LUIS A SITIRICHE whose telephone number is (571)270-1316.  The examiner can normally be reached on M-F 9am-6pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ann Lo can be reached on (571) 272-9767.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published 
/LUIS A SITIRICHE/           Primary Examiner, Art Unit 2126