DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Objections
Claim 9 objected to because of the following informalities:  
Claim 9 reads “used to update the compress parameters”, should read “used to update the compressed parameters”
Appropriate correction is required.
Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.

This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation(s) is/are: the processing unit in claims 10 and 12, and the compression subsystem in claim 13.
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 2, 3, 10-15, 17, and 18 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claims 2 and 17 recites the limitation, "a static memory cache that is local relative to the hardware processing unit". It is unclear what constitutes as local relative to the hardware processing unit. The limitation as recited covers both a static memory cache that is on the same device, in which the hardware processing unit is a unit of the device, as well as a static memory cache located within the hardware processing unit of the device. Examiner interprets this claim as a static memory cache that is either within the hardware processing unit, or within the same device that the hardware processing unit is a unit of. Therefore, Claims 2 and 17 are indefinite.  
Claims 3, 12 and 18 recites the limitation, “a dynamic memory device that is remote relative to the special-purpose hardware processing unit”. It is unclear what constitutes as remote relative to the special-purpose hardware processing unit. The limitation as recited covers both a dynamic memory device that is not located within the special-purpose hardware processing unit, but on the same device that the special-purpose hardware processing unit is a unit of, and a dynamic memory device that is not located on the same device that special-purpose hardware processing unit is a unit of. Examiner interprets this claim as a dynamic memory device that is either within the same device that the hardware processing unit is a unit of, but located off-chip from the hardware processing unit or a dynamic memory device that is located in another device that the hardware processing unit is not a unit of.  Therefore, Claims 3, 12 and 18 are indefinite.
Claim 10 recites the limitation "a cache that stores the parameters locally on the special-purpose hardware accelerator" in 10.  There is insufficient antecedent basis for this limitation in the claim. The claim recites three different parameters, the parameters first received by the special-purpose hardware accelerator, the decompressed parameters after they have been decompressed by the special-purpose hardware accelerator, and the decompressed parameters after they were applied to an arithmetic operation. It is unclear whether the cache is supposed to store the original parameters received by the special-purpose hardware accelerator or the decompressed parameters. It is also unclear whether the cache is supposed to store decompressed parameters before or after the decompressed parameters are applied to an arithmetic operation. Examiner interprets this claim as the cache storing decompressed parameters after they were applied to an arithmetic operation. Because claim 11-15 depend on claim 10, claim 11-15 are similarly rejected.
Claim 12 recites the limitation “receive the parameters from a memory device that is remote relative to the special-purpose hardware accelerator”. Similarly to claim 3, see explanation above, it is unclear was constitutes a memory device that is remote to the special-purpose hardware accelerator. The limitation as recited covers both receiving parameters from a memory device that is not located within the special-purpose hardware processing unit, but on the same device that the special-purpose hardware processing unit is a unit of, and a memory device that is not located on the same device that special-purpose hardware processing unit is a unit of. Therefore, claim 12 is indefinite.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1 and 10 are rejected under 35 U.S.C 101 because the claimed invention is directed to an abstract idea without significantly more. 
Claim 1 recites the limitation “apply the decompressed parameters in an arithmetic operation”. This limitation is a mathematical process. “Applying”, in context of this claim, encompasses the user, using the parameters which are numbers or value, in an algorithm either all in the mind or with the aid of pen and paper. If a claim limitation, under its broadest reasonable interpretation, covers mathematical concepts, relationships, formulas or equations or calculations, then it falls within the “Mathematical Processes” grouping of abstract ideas. Accordingly, the claim recites an abstract idea. 
This judicial exception is not integrated into a practical application. In particular, claim 1 recites the additional elements, “receive the compressed parameters”, “decompress parameters”, and “updated in a layer of a neural network”. Receiving data is considered insignificant extra-solution activity. The decompressing of the parameters and updating in a layer of a neural network is recited at a high-level of generality such that it amounts to no more than mere linking the judicial exception to a particular technological environment of field of use. Accordingly, this additional element does not integrate the abstract idea because they do not pose any meaningful limit or on practicing the abstract idea. The claim is directed to an abstract idea. 
Finally, claim 1 does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of decompressing the parameters and updating the parameters in a layer of neural network amounts to no more than mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. Further, additional element was considered to be insignificant extra-solution activity and generally linking the judicial exception to a particular technological environment or field of use in Step 2A Prong 2, and thus it is re-evaluated in Step 2B to determine if it is more than insignificant extra-solution activity and generally linking the judicial exception to a particular technological environment or field of use. The court decisions cited in MPEP 2106.05(g) indicate that mere data gathering does not amount to an inventive concept. The court decisions cited in MPEP 2106.05(h) indicate that generally linking the use of the judicial exception to a particular technological environment or field of use or technological environment does not amount to an inventive concept. Therefore, Claim 1 is not patent eligible.
Claim 10 recites the limitation “apply the decompressed parameters in an arithmetic operation”. This limitation is similarly recited in claim 1. The same reasoning given for claim 1, equally applies to claim 10.  
This judicial exception is not integrated into a practical application. In particular, claim 1 recites the additional elements, “receive parameters”, “decompress parameters”, and “updated in a layer of a neural network”. Receiving data is considered insignificant extra-solution activity. The decompressing of the parameters and updating in a layer of a neural network is recited at a high-level of generality such that it amounts to no more than mere linking the judicial exception to a particular technological environment of field of use. Accordingly, this additional element does not integrate the abstract idea because they do not pose any meaningful limit or on practicing the abstract idea. The claim is directed to an abstract idea. 
Finally, claim 10 does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of decompressing the parameters and updating the parameters in a layer of neural network amounts to no more than mere instructions to apply an exception using a generic computer component cannot provide an
inventive concept. Further, additional element was considered be insignificant extra-solution activity and generally linking the judicial exception to a particular technological environment or field of use in Step 2A Prong 2, and thus it is re-evaluated in Step 2B to determine if it is more than insignificant extra-solution activity and generally linking the judicial exception to a particular technological environment or field of use. The court decisions cited in MPEP 2106.05(g) indicate that mere data gathering does not amount to an inventive concept. The court decisions cited in MPEP 2106.05(h) indicate that generally linking the use of the judicial exception to a particular technological environment or field of use or technological environment does not amount to an inventive concept. Therefore, Claim 10 is not patent eligible.

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claim(s) 1-4, 10-13, and 16-18 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Maaninen (US 20150199963 A1).
Regarding claim 1, Maaninen teaches a computing system comprising (Maaninen [Claim 1] “A mobile computing device, comprising”): 
a memory device that stores parameters of a layer of a neural network that have been compressed (Maaninen [0006; 0044; 0046; Figure 3; Figure 4] “The speech recognition software application provides configuration settings for the hardware accelerator as well as provides bitstreams of compressed or uncompressed weights and bias terms for the neural network calculations to the hardware accelerator…In arrangement 400, mobile device 310 may include a control software layer (e.g., control SW 20) to control operation of hardware accelerator 312 via a system bus 21. System bus 21 may be a standard bus interface such as the AMBA, AHB or AXI. System bus 21 may, for example, contain a DMA engine for accessing external memory, and memory-mapped registers that host CPU 311 may use to control and enable hardware accelerator 312…Application 320 may also supply matrix weight and bias terms 24 for the neural network computations to hardware accelerator 312 via control SW 20. Weight and bias terms 24 may be supplied in compressed form to reduce bus loading (e.g., of system bus 21).”, where Application 320 on the mobile device 310 which contains memory 313 (in Fig. 3) can be configured to supply the compressed parameters, weights and bias terms 24 stored on mobile device 310 in memory 313, where memory 313 is the memory device for mobile device 310. Memory 313 is the memory device comprising both a local (to the device) and a remote component (to the hardware accelerator 312), which mobile speech recognition application 320 is stored and ran by mobile device 310, where weights and bias terms 24 are stored/cached in Fig. 4 from application 320 running on mobile device 310.); and 
a special-purpose hardware processing unit programmed to, for the layer of the neural network: receive the compressed parameters from the memory device (Maaninen [0006; 0046; 0048; Figure 3; Figure 4] “The speech recognition software application provides configuration settings for the hardware accelerator as well as provides bitstreams of compressed or uncompressed weights and bias terms for the neural network calculations to the hardware accelerator… Application 320 may also supply matrix weight and bias terms 24 for the neural network computations to hardware accelerator 312 via control SW 20… With reference to FIGS. 2 and 3, in operation, hardware accelerator 312 may start calculations to classify an input frame (e.g., input matrix 26) through a layer of the neural network by caching activation function coefficients 25 in memory (e.g., in RAM 13)”, where weights and bias terms 24 are compressed stored in memory 313, the memory device, on mobile device 310. Hardware accelerator 312 receives the compressed parameters from weight and bias terms 24 via system bus 21 shown in Fig. 4.); 
decompress the compressed parameters (Maaninen [0048; Figure 4] “and decoding or decompressing a weight matrix and bias terms for the layer received from external memory (e.g., weight and bias terms 24) into separate internal memory buffers (e.g., RAMs 14, 15 and 16).”, the weights and bias terms 24 containing the compressed parameters to be decompressed, that pass through data transceiver/decompressor 19 to be decompressed before going into RAMs 14-16 shown in Fig. 4.) ; and
 apply the decompressed parameters in an arithmetic operation of the layer of the neural network (Maaninen [0043; Claim 1] “In arrangement 400, hardware accelerator 312 may include a MAC unit 10 to carry out matrix multiply and add operations, an activation function unit 12 to apply an activation…function to the output of MAC unit 10, and a data transceiver/decompressor unit 19 configured to receive and decode compressed or uncompressed weights and bias terms data (e.g., weights and bias terms 24) streamed to it, For calculations that are performed weight matrix mw-by-mw as A*X+B. by MAC unit 10, at least one row of the weight matrix and one column of the bias matrix may be buffered at a time…a multiplier-accumulator (MAC) unit configured to multiply the received matrix data representing one or more frames of the audio signal with a weight matrix add a bias matrix to the multiplication results; and accumulate the addition results; circuitry configured to pass the accumulated results through an activation function to generate an output matrix representing an output of the first layer of the neural network for the frame”).

Regarding claim 2, Maaninen teaches the computing system wherein: the memory device comprises a static memory cache that is local relative to the hardware processing unit (Maaninen [0006; 0044; 0046; Figure 4] “The speech recognition software application provides configuration settings for the hardware accelerator as well as provides bitstreams of compressed or uncompressed weights and bias terms for the neural network calculations to the hardware accelerator… System bus 21 may, for example, contain a DMA engine for accessing external memory, and memory-mapped registers that host CPU 311 may use to control and enable hardware accelerator 312… In arrangement 400, hardware accelerator 312 may include…a data transceiver/decompressor unit 19 configured to receive and decode compressed or uncompressed weights and bias terms data (e.g., weights and bias terms 24) streamed to it.”, where weights and bias 24 is located in memory 313 in Fig. 3 on mobile device 310 for use by the hardware accelerator 312, where weights and bias terms 24 in Fig. 4 is streamed to data transceiver/decompressor unit 19 which is located on-chip, within the hardware accelerator 312. Data transceiver/decompressor unit 19 is a static memory cache because it is only allocated to receive the parameters, weights and bias 24. Data transceiver/decompressor unit 19 is local relative to the hardware processing unit, hardware accelerator 312, because it located within hardware accelerator 312, or on-chip or local relative to the hardware processing unit as seen in Fig. 4. The term relative is not clearly defined in the claim or in the specification so the broadest reasonable interpretation stated in the rejections for 112b for claim 2 was applied here.) ; and the memory devices retains the compressed parameters in the static memory cache while the layer of the neural network is being processed (Maaninen [Figure 3; Figure 4] Where the weights and bias terms 24 contains the compressed parameters stored in memory 313 on mobile device 310, shown in Fig. 3 by static memory cache, weights and bias terms 24 shown in Fig. 4. While the neural network is being processed the data in static memory cache weights and bias terms 24 while the neural network is being processed by the hardware accelerator 312. )

Regarding claim 3, Maaninen teaches the computing system wherein the memory device comprises a dynamic memory device that is remote relative to the special-purpose hardware processing unit (Maaninen [0048; Figure 3 and Figure 4] “With reference to FIGS. 2 and 3, in operation, hardware accelerator 312 may start calculations to classify an input frame (e.g., input matrix 26) through a layer of the neural network by caching activation function coefficients 25 in memory (e.g., in RAM 13), and decoding or decompressing a weight matrix and bias terms for the layer received from external memory (e.g., weight and bias terms 24)”, where memory 313 is the memory device of mobile device 310 (Fig. 3), in which data stored on memory 313 a dynamic memory where all types of data for mobile device 310 is stored, not specifically partitioned for any one purpose, therefore dynamic, including the weights and bias terms 24 in which hardware accelerator 312, receives the parameters through system bus 21, from weights and bias 24 that is stored on memory 313 external to hardware accelerator 312. System bus 21 is a DMA engine for accessing memory external or remote to the hardware accelerator 312, the special-purpose hardware processing unit, seen in Fig. 4. This memory device is remote to the special-purpose hardware processing unit because it is located off-chip from the hardware accelerator 312. The term relative is not clearly defined in the claim or in the specification so the broadest reasonable interpretation stated in the rejections for 112b for claim 3 was applied here.) 

Regarding claim 4, Maaninen teaches the computing system further comprising a compression subsystem that is communicatively coupled to the memory device and configured to compress parameters (Maaninen [0045; 0056; Figure 5] “In operation, application 320 may supply configuration settings (e.g., configuration settings 23) to control SW 20 to prepare hardware accelerator 312. In arrangement 500, application 320 may be configured to compress weight matrix coefficients, bias vector values and approximation function coefficients into a single bitstream 32 sent to hardware accelerator 312.”) and store the compressed parameters in the memory device (Maaninen [0046; Figure 4] “Application 320 may also supply matrix weight and bias terms 24 for the neural network computations to hardware accelerator 312 via control SW 20. Weight and bias terms 24 may be supplied in compressed form to reduce bus loading (e.g., of system bus 21). The compressed weight and bias terms may be decompressed by decompressor 19”, where Weight and Bias Terms 24 is a memory device shown in Fig. 4. When the parameters of a neural network is supplied to the system in compressed form they are stored in weights and bias 24 before they are optionally decompressed by decompressor 19.)

Regarding claim 10, Maaninen teaches a special-purpose hardware accelerator comprising a processing unit configured to, for a layer of a neural network (Maaninen [Claim 1] “A mobile computing device, comprising: a processor configured to…a hardware accelerator comprising”): receive parameters for the layer of the neural network from a memory device (Maaninen [0006; 0046; 0048; Figure 3; Figure 4] “The speech recognition software application provides configuration settings for the hardware accelerator as well as provides bitstreams of compressed or uncompressed weights and bias terms for the neural network calculations to the hardware accelerator… Application 320 may also supply matrix weight and bias terms 24 for the neural network computations to hardware accelerator 312 via control SW 20… With reference to FIGS. 2 and 3, in operation, hardware accelerator 312 may start calculations to classify an input frame (e.g., input matrix 26) through a layer of the neural network by caching activation function coefficients 25 in memory (e.g., in RAM 13)”, where weights and bias terms 24 are compressed stored in memory 313, the memory device, on mobile device 310. Hardware accelerator 312 receives the compressed parameters from weight and bias terms 24 via system bus 21 shown in Fig. 4.) ; decompress the parameters (Maaninen [0048] “and decoding or decompressing a weight matrix and bias terms for the layer received from external memory (e.g., weight and bias terms 24) into separate internal memory buffers (e.g., RAMs 14, 15 and 16).”, the weight matrix containing the compressed parameters to be decompressed.); and apply the decompressed parameters in an arithmetic operation (Maaninen [0043; 0048; Claim 1] “In arrangement 400, hardware accelerator 312 may include a MAC unit 10 to carry out matrix multiply and add operations, an activation function unit 12 to apply an activation…function to the output of MAC unit 10, and a data transceiver/decompressor unit 19 configured to receive and decode compressed or uncompressed weights and bias terms data (e.g., weights and bias terms 24) streamed to it, For calculations that are performed weight matrix mw-by-mw as A*X+B. by MAC unit 10, at least one row of the weight matrix and one column of the bias matrix may be buffered at a time…a multiplier-accumulator (MAC) unit configured to multiply the received matrix data representing one or more frames of the audio signal with a weight matrix add a bias matrix to the multiplication results; and accumulate the addition results; circuitry configured to pass the accumulated results through an activation function to generate an output matrix representing an output of the first layer of the neural network for the frame”); and a cache that stores the parameters locally on the special-purpose hardware accelerator (Maaninen [0043; Figure 4] “Hardware accelerator 312 may also include various buffers or registers (e.g., RAMs 13-18) to store, for example, weights, bias terms, activation function co-efficients, input data, and intermediate output data, etc. The weights may be double buffered, for example, to allow parallel data decompression and MAC operations by decompressor unit 19 and MAC unit 10.”, where decompressed parameters are stored in locally in RAMs 14-16 located within the hardware accelerator 312 shown in Fig. 4, and stored in RAMs 17-18 after the arithmetic operation is applied to the decompressed parameters.).

	Regarding claim 11, Maaninen teaches the special-purpose hardware accelerator wherein the cache stores the parameters by retaining the parameters in the cache while the layer of the neural network is being processed (Maaninen [0056; Claim 1; Figure 5] FIG. 5 shows an example arrangement 500 of various components and sub-components of system 300 that utilizes a FIFO scheme for transmitting weight coefficients and bias vectors inside hardware accelerator 312. In arrangement 500, a FIFO register 30 is disposed in hardware accelerator 312 between decompressor 19 and MAC 10. In arrangement 500, application 320 may be configured to compress weight matrix coefficients, bias vector values and approximation function coefficients into a single bitstream 32 sent to hardware accelerator 312… Data transceiver/decompressor 19 may be further configured to push the decoded weight matrix coefficients and bias vector terms for a first layer (e.g., layer 1) into FIFO register 30 from where they can be read or fetched by MAC 10 processes…a multiplier-accumulator (MAC) unit configured to: multiply the received matrix data representing one or more frames of the audio signal with a weight matrix add a bias matrix to the multiplication results; and accumulate the addition results; circuitry configured to pass the accumulated results through an activation function to generate an output matrix representing an output of the first layer of the neural network for the frame”, where the FIFO register is a static memory cache).

	Regarding claim 12, Maaninen teaches the special-purpose hardware accelerator wherein the processing unit is configured to receive the parameters from a memory device that is remote relative to the special-purpose hardware accelerator (Maaninen [0048; Figure 3 and Figure 4] “With reference to FIGS. 2 and 3, in operation, hardware accelerator 312 may start calculations to classify an input frame (e.g., input matrix 26) through a layer of the neural network by caching activation function coefficients 25 in memory (e.g., in RAM 13), and decoding or decompressing a weight matrix and bias terms for the layer received from external memory (e.g., weight and bias terms 24)”, where memory 313 is the memory device of mobile device 310 (Fig. 3), in which data stored on memory 313 a dynamic memory where all types of data for mobile device 310 is stored, not specifically partitioned for any one purpose, therefore dynamic, including the weights and bias terms 24 in which hardware accelerator 312, receives the parameters through system bus 21, from weights and bias 24 that is stored on memory 313 external to hardware accelerator 312. System bus 21 is a DMA engine for accessing memory external or remote to the hardware accelerator 312, the special-purpose hardware processing unit, seen in Fig. 4. This memory device is remote to the special-purpose hardware processing unit because it is located off-chip from the hardware accelerator 312. The term relative is not clearly defined in the claim or in the specification so the broadest reasonable interpretation stated in the rejections for 112b for claim 12 was applied here.) 

Regarding claim 13, Maaninen teaches The special-purpose hardware accelerator further comprising a compression subsystem that is configured to compress the parameters before the parameters are stored in the cache (Maaninen [0044; 0045; 0056; Claim 1; Figure 5] “In arrangement 400, mobile device 310 may include a control software layer (e.g., control SW 20) to control operation of hardware accelerator 312 via a system bus 21. System bus 21 may be a standard bus interface such as the AMBA, AHB or AXI. System bus 21 may, for example, contain a DMA engine for accessing external memory, and memory mapped registers that host CPU 311 may use to control and enable hardware accelerator 312…In operation, application 320 may supply configuration settings (e.g., configuration settings 23) to control SW 20 to prepare hardware accelerator 312. In arrangement 500, application 320 may be configured to compress weight matrix coefficients, bias vector values and approximation function coefficients into a single bitstream 32 sent to hardware accelerator 312…Data transceiver/decompressor 19 may be further configured to push the decoded weight matrix coefficients and bias vector terms for a first layer (e.g., layer 1) into FIFO register 30 from where they can be read or fetched by MAC 10 processes…a multiplier-accumulator (MAC) unit configured to: multiply the received matrix data representing one or more frames of the audio signal with a weight matrix add a bias matrix to the multiplication results; and accumulate the addition results; circuitry configured to pass the accumulated results through an activation function to generate an output matrix representing an output of the first layer of the neural network for the frame”, where the FIFO register is a cache”).

Regarding claim 16, Maaninen teaches a method comprising (Maaninen [Abstract] “A method for executing a mobile speech recognition software application based on a multi-layer neural network model.): 
compressing parameters of a layer of a neural network (Maaninen [0044; 0045] “In arrangement 400, mobile device 310 may include a control software layer (e.g., control SW 20) to control operation of hardware accelerator 312 via a system bus 21. System bus 21 may be a standard bus interface such as the AMBA, AHB or AXI. System bus 21 may, for example, contain a DMA engine for accessing external memory, and memory mapped registers that host CPU 311 may use to control and enable hardware accelerator 312…“In operation, application 320 may supply configuration settings (e.g., configuration settings 23) to control SW 20 to prepare hardware accelerator 312. In arrangement 500, application 320 may be configured to compress weight matrix coefficients, bias vector values and approximation function coefficients into a single bitstream 32 sent to hardware accelerator 312.”); 
storing the compressed parameters in a memory device (Maaninen [0046; Figure 4] “Application 320 may also supply matrix weight and bias terms 24 for the neural network computations to hardware accelerator 312 via control SW 20. Weight and bias terms 24 may be supplied in compressed form to reduce bus loading (e.g., of system bus 21). The compressed weight and bias terms may be decompressed by decompressor 19”, where Weight and Bias Terms 24 is a memory device shown in Fig. 4. When the parameters of a neural network is supplied to the system in compressed form they are stored in weights and bias 24 before they are optionally decompressed by decompressor 19.); 
receiving, at a special-purpose hardware accelerator, the compressed parameters from the memory device  (Maaninen [0006; 0046; 0048; Figure 3; Figure 4] “The speech recognition software application provides configuration settings for the hardware accelerator as well as provides bitstreams of compressed or uncompressed weights and bias terms for the neural network calculations to the hardware accelerator… Application 320 may also supply matrix weight and bias terms 24 for the neural network computations to hardware accelerator 312 via control SW 20… With reference to FIGS. 2 and 3, in operation, hardware accelerator 312 may start calculations to classify an input frame (e.g., input matrix 26) through a layer of the neural network by caching activation function coefficients 25 in memory (e.g., in RAM 13)”, where weights and bias terms 24 are compressed stored in memory 313, the memory device, on mobile device 310. Hardware accelerator 312 receives the compressed parameters from weight and bias terms 24 via system bus 21 shown in Fig. 4.); 
decompressing, at the special-purpose hardware accelerator, the compressed parameters (Maaninen [0048] “and decoding or decompressing a weight matrix and bias terms for the layer received from external memory (e.g., weight and bias terms 24) into separate internal memory buffers (e.g., RAMs 14, 15 and 16).”, the weight matrix containing the compressed parameters to be decompressed by the hardware accelerator); 
and applying, at the special-purpose hardware accelerator, the decompressed parameters in an arithmetic operation (Maaninen [0043 and Claim 1] “In arrangement 400, hardware accelerator 312 may include a MAC unit 10 to carry out matrix multiply and add operations, an activation function unit 12 to apply an activation…function to the output of MAC unit 10, and a data transceiver/decompressor unit 19 configured to receive and decode compressed or uncompressed weights and bias terms data (e.g., weights and bias terms 24) streamed to it, For calculations that are performed weight matrix mw-by-mw as A*X+B. by MAC unit 10, at least one row of the weight matrix and one column of the bias matrix may be buffered at a time…a multiplier-accumulator (MAC) unit configured to multiply the received matrix data representing one or more frames of the audio signal with a weight matrix add a bias matrix to the multiplication results; and accumulate the addition results; circuitry configured to pass the accumulated results through an activation function to generate an output matrix representing an output of the first layer of the neural network for the frame”).

	Regarding claim 17, Maaninen teaches the method  wherein: the memory device comprises a static memory cache that is local relative to the special-purpose hardware accelerator (Maaninen [0006; 0044; 0046; Figure 4] “The speech recognition software application provides configuration settings for the hardware accelerator as well as provides bitstreams of compressed or uncompressed weights and bias terms for the neural network calculations to the hardware accelerator… System bus 21 may, for example, contain a DMA engine for accessing external memory, and memory-mapped registers that host CPU 311 may use to control and enable hardware accelerator 312…Application 320 may also supply matrix weight and bias terms 24 for the neural network computations to hardware accelerator 312 via control SW 20. Weight and bias terms 24 may be supplied in compressed form to reduce bus loading (e.g., of system bus 21).”, where weights and bias 24 is located in memory 313 in Fig. 3 on mobile device 310 for use by the hardware accelerator 312, where weights and bias terms 24 in Fig. 4 is a static memory cache of memory 313 in Fig. 3 that is local or located within the mobile device 310 shown in Fig. 3.) ; and the static memory cache retains the compressed parameters in the static memory cache while the layer of the neural network is being processed (Maaninen [Figure 3; Figure 4] Where the weights and bias terms 24 contains the compressed parameters stored in memory 313 on mobile device 310, shown in Fig. 3 by static memory cache, weights and bias terms 24 shown in Fig. 4. While the neural network is being processed the data in static memory cache weights and bias terms 24 while the neural network is being processed by the hardware accelerator 312. ).

Regarding claim 18, Maaninen teaches the method wherein the memory device comprises a dynamic memory device that is remote relative to the special-purpose hardware accelerator (Maaninen [0041; 0044 Figure 3 and Figure 4] “In system 300, a neural-network based mobile speech recognition application 320 may be hosted on a mobile device 310 (e.g., a cell phone or smart phone) that includes a CPU 311 coupled to a hardware accelerator 312, a memory 313 and an I/O unit 314. Hardware accelerator 312 may be configured to perform the processor-intensive calculations (e.g., algorithm 200) that are required for neural network computations. Hardware accelerator 312 may be implemented, for example, in silicon as an ASIC IP core or as a FPGA…In arrangement 400, mobile device 310 may include a control software layer (e.g., control SW 20) to control operation of hardware accelerator 312 via a system bus 21. System bus 21 may be a standard bus interface such as the AMBA, AHB or AXI. System bus 21 may, for example, contain a DMA engine for accessing external memory, and memory-mapped registers that host CPU 311 may use to control and enable hardware accelerator 312.”, where memory 313 is the memory device of mobile device 310. System bus 21 is a DMA engine for accessing memory external or remote to the hardware accelerator 312, the special-purpose hardware processing unit, in Fig. 4. Mobile devices also require RAM to operate, like operating application 320 and RAM is dynamic.).

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claims 5 and 6 is rejected under 35 U.S.C. 103 as being unpatentable over Maaninen (US 20150199963 A1), in view of, Huang (US 20170293659 A1).
Regarding claim 5, Maaninen teaches all elements of the claim except wherein the special-purpose hardware processing unit comprises the compression subsystem. Maaninen does teach compressing the parameters but does not explicitly teach the special-purpose hardware processing unit comprising a compression subsystem. 
Huang et al. teaches wherein the special-purpose hardware processing unit comprises the compression subsystem (Huang [0021; Figure 10] “FIG. 10 illustrates an exemplary processor structure, in accordance with an embodiment of the present invention… FIG. 10 illustrates an exemplary processor structure, in accordance with an embodiment of the present invention. The present embodiment includes a NPU 1021, a compression and packing unit 1023, and a L1/L2 buffer or cache 1027. The present embodiment illustrates that NPU 1021 may compress feature maps through training and prediction process and weights data through training process. The data may store in L1/L2 Cache or Buffer 1027. The features map and weight data may store in the same buffer or different buffer.”, NPU 1021 is a special-purpose hardware processing unit, and within NPU 1021 is compression and packing unit 1023, the compression subsystem.). 
Before the effective filing date of the invention, it would have been obvious to a person of ordinary skill in the art, having the teachings of Maaninen and Huang, to use the compression and packing unit of Huang in the hardware accelerator of Maaninen to compress parameters of a neural network. The suggestion and/or motivation for doing so is to reduce the size of model parameters for storage, especially in large scale neural networks with many parameters. 

Regarding claim 6, Maaninen teaches the computing system wherein the compression subsystem is configured to compress the parameters (Maaninen [0044; 0045; 0056; Figure 5] “In arrangement 400, mobile device 310 may include a control software layer (e.g., control SW 20) to control operation of hardware accelerator 312 via a system bus 21. System bus 21 may be a standard bus interface such as the AMBA, AHB or AXI. System bus 21 may, for example, contain a DMA engine for accessing external memory, and memory mapped registers that host CPU 311 may use to control and enable hardware accelerator 312…In operation, application 320 may supply configuration settings (e.g., configuration settings 23) to control SW 20 to prepare hardware accelerator 312…In arrangement 500, application 320 may be configured to compress weight matrix coefficients, bias vector values and approximation function coefficients into a single bitstream 32 sent to hardware accelerator 312.”) 
Maaninen does not teach wherein the compression subsystem is configured to compress the parameters by distinguishing between sparse and non-sparse data in the parameters; and apply a compression algorithm to the parameters based on the distinguishing between the sparse and the non-sparse data in the parameters.
Huang teaches wherein the compression subsystem is configured to compress the parameters by (Huang [Abstract; 0006; 0008] “A method, system and program product includes examining elements of a first matrix in a sequential fashion… One or more embodiments of the invention generally relate to data compression. More particularly, certain embodiments of the invention relates to mask based compression scheme… A typical neural network may use input data and weight to classify an object. The weight and feature map typically may be large. In numerical analysis, a sparse matrix is a matrix in which most of the elements are zero.”, where the matrix contains the parameters to be compressed by the system.): distinguishing between sparse and non-sparse data in the parameters (Huang [0008; Claim 1] “The weight and feature map typically may be large. In numerical analysis, a sparse matrix is a matrix in which most of the elements are zero. By contrast, if most of the elements are nonzero, then the matrix is considered dense. The number of zero-valued elements divided by the total number of elements is called the sparsity of the matrix…A method comprising the steps of: examining elements of a first matrix in a sequential fashion; determining values of the examined elements; setting a corresponding bit of a first mask to a first value if a determined value is zero; setting a corresponding bit of a first mask to a second value if a determined value is non-zero; and packing the non-zero values in a first vector, wherein bits of at least the first mask determine operations on packed values.”, where the non-zero values, the non-sparse data, are distinguished from the sparse data, values that are zero, by being placed in a first vector.) and apply a compression algorithm to the parameters based on the distinguishing between the sparse and the non-sparse data in the parameters (Huang [0008; 0140; Fig. 19] “Sparse data is by nature more easily compressed and thus require significantly less storage… FIG. 19 illustrates an exemplary method for compression, in accordance with an embodiment of the present invention. In the present embodiment, a process 1900 starts where elements of matrix 405 may be examined in a step 1902. In a step 1906 it may be determined if all the elements have been examined. If they have not all been examined, then in a step 1910 it may be determined if the examined element is non-zero. If the value is non-zero, then in a step 1914 the value of the element may be stored in vector 410 and in a step 1918, a corresponding bit of first mask 425 may be assigned a value of 1. The process may then return to step 1902. If the value of the element is zero, the corresponding bit of first mask 425 may be assigned a value of 0. In other embodiments, the values of the mask bits may be reversed where, without limitation, a 1 corresponds to a zero element and a 0 corresponds to a non-zero element.”). The same motivation utilized to combine Maaninen and Huang in claim 5 equally applies to claim 6.

Claim 7, 15, and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Maaninen (US 20150199963 A1), in view of Zhang et al. (“Application of Artificial Neural Network in Video Compression Coding”).
Regarding claim 7, Maaninen teaches all elements of the claim except wherein compressing the parameters comprises compressing the parameters via a lossy compression algorithm.
Zhang et al. teaches wherein compressing the parameters comprises compressing the parameters via a lossy compression algorithm (Zhang et al. [Section 2, Page 208, Col. 1, lines 7-13] “In this paper, we mainly discuss this problem: making up degradation in image quality to lossy image compression in the receiver by using rate-distortion model. The method of making up is to estimate quantization parameter estimation model leading to the decrease of quality in the compression process by the RBF artificial neural networks”, where the image contains the parameters to be compressed by a lossy compression algorithm for the neural network.)
	Before the effective filing date of the invention, it would have been obvious to a person of ordinary skill in the art, having the teachings of Maaninen and Zhang to implement a lossy compression algorithm to compress the parameters of the neural network. The suggestion and/or motivation for doing so is that the compression ratio for lossy compression is high, meaning you can compress the parameters to a size in which they can meet low data requirements of certain memory types, as suggested by Zhang. 
	Regarding claim 15, Maaninen teaches all the elements of the claim except wherein the parameters are compressed via a lossy compression scheme. 
Zhang et al. teaches wherein compressing the parameters comprises compressing the parameters via a lossy compression scheme (Zhang et al. [Section 2, Page 208, Col. 1, lines 7-13] “In this paper, we mainly discuss this problem: making up degradation in image quality to lossy image compression in the receiver by using rate-distortion model. The method of making up is to estimate quantization parameter estimation model leading to the decrease of quality in the compression process by the RBF artificial neural networks”, where the image contains the parameters to be compressed by a lossy compression scheme for the neural network.). The same motivation utilized to combine Maaninen and Zhang in claim 7, equally applies to claim 15. 
Regarding claim 19, Maaninen teaches all the elements of the claim except wherein the parameters are compressed via a lossy compression scheme. 
Zhang et al. teaches wherein compressing the parameters comprises compressing the parameters via a lossy compression algorithm (Zhang et al. [Section 2, Page 208, Col. 1, lines 7-13] “In this paper, we mainly discuss this problem: making up degradation in image quality to lossy image compression in the receiver by using rate-distortion model. The method of making up is to estimate quantization parameter estimation model leading to the decrease of quality in the compression process by the RBF artificial neural networks”, where the image contains the parameters to be compressed by a lossy compression algorithm for the neural network.). The same motivation utilized to combine Maaninen and Zhang in claim 7, equally applies to claim 19. 

Claims 8 and 20 are rejected under 35 U.S.C 103 as being unpatentable over Maaninen (US 20150199963 A1) in view of Chen et al. (“BenchNN: On the Broad Potential Application Scope of Hardware Neural Network Accelerators”).
Regarding claim 8, Maaninen teaches all the elements of the claim except wherein the special-purpose hardware processing unit is further programmed to: update the parameters of the layer; compress the updated parameters and store the compressed, updated parameters in the memory device. 
Chen et al. teaches wherein the hardware processing unit is further programmed to (Chen et al. [Abstract] “In this paper, we want to highlight that a hardware neural network accelerator is indeed compatible with many of the emerging high-performance workloads, currently accepted
as benchmarks for high-performance micro-architectures. For that purpose, we develop and evaluate software neural network implementations of 5 (out of 12) RMS applications from the PARSEC Benchmark Suite.”): update the parameters of layer (Chen et al. [Section 2, Subsection A, Paragraph 3: NN Algorithm] “The most traditional form of artificial neural network is a Multi-Layer Perceptron, which typically contains one input layer, one output layer, and one or several hidden layers. The optimal number of layers and the number of neurons per layer are typically explored during a training phase… MLPs are feed-forward networks, where information flows from the input layer (l = 0) to the output layer (l = 2). Each neuron performs the following computations…We train the network using back-propagation, the most popular training algorithm.”, where training the neural network is updating the parameters contained in the layers of the neural network.); compress the updated parameters (Chen et al. [Section 2, Subsection E, Paragraph 1: Problem Description; Section 2, Subsection E, Paragraph 2: From PARSEC Code to NN] “To compress a file, dedup processes it through different pipelined stages. In the first stage, the program breaks the input file into coarse-grained chunks that can be processed in parallel. In the second stage, each of the coarse-grained chunks is divided into fragments. Third, each of the unique smaller fragments is assigned a unique hash value. The fourth stage builds a global database of fragments indexed via the hash value. If a fragment has not been encountered before, it is compressed using Ziv-Lempel algorithm and is added to the database. The final stage generates the output file that consists of compressed fragments and hash values such that each of the compressed fragments occurs exactly once in the output file… To compress a file by chunks of 16 bytes, we use an unsupervised neural network with 16 inputs and 256 outputs The 16 inputs correspond to 16 bytes of the file, while the ID of the 256 output neurons serves as signatures to be written to the output file… . If the network does not recognize the current 16-byte pattern in the queue, it is trained on the current pattern. The oldest byte in the queue is popped out of the queue is placed in the compression buffer… It should be noted that in the compressed buffer (the input to the compression stage)… Once the entire input file is processed, the contents in the compression buffer are compressed using a neural network based algorithm.”, where after the neural network is trained and the parameters are updated, the parameters of the neural network are then compressed.) and store the compressed, updated parameters in the memory device (Chen et al. [Section 2, Subsection C, Paragraph 2: From PARSEC Code to NN] “These vectors are then “compressed” into a compact bit vector (the sketch), which is then either stored in the database (construction) or compared against database elements (query), depending on the phase.”, where database implies a memory device.)

Maaninen and Chen et al. are analogous art because they are from the same field of hardware accelerators for compressing or decompressing a neural network. Before the effective filing date of the invention it would have been obvious to a person of ordinary skill in the art having the teachings of Maaninen, Chen et al to compress the updated parameters and store the compressed updated parameters into a memory device of the hardware accelerator according and incorporate it into the hardware accelerator of Maaninen. The suggestion and/or motivation for doing so is to reduce the size of parameters for storage within the hardware accelerator unit, especially for larger neural networks that contain a large number of parameters to be processed and stored. 

Regarding Claim 20,  Maaninen teaches all elements of the claim except the method further comprising updating the parameters of the layer; compressing the updated parameters; and storing the compressed updated parameters in the memory device.
Chen et al. teaches the method further comprising updating the parameters of layer (Chen et al. [Section 2, Subsection A, Paragraph 3: NN Algorithm] “The most traditional form of artificial neural network is a Multi-Layer Perceptron, which typically contains one input layer, one output layer, and one or several hidden layers. The optimal number of layers and the number of neurons per layer are typically explored during a training phase… MLPs are feed-forward networks, where information flows from the input layer (l = 0) to the output layer (l = 2). Each neuron performs the following computations…We train the network using back-propagation, the most popular training algorithm.”, where training the neural network is updating the parameters contained in the layers of the neural network.); compressing the updated parameters (Chen et al. [Section 2, Subsection E, Paragraph 1: Problem Description; Section 2, Subsection E, Paragraph 2: From PARSEC Code to NN] “To compress a file, dedup processes it through different pipelined stages. In the first stage, the program breaks the input file into coarse-grained chunks that can be processed in parallel. In the second stage, each of the coarse-grained chunks is divided into fragments. Third, each of the unique smaller fragments is assigned a unique hash value. The fourth stage builds a global database of fragments indexed via the hash value. If a fragment has not been encountered before, it is compressed using Ziv-Lempel algorithm and is added to the database. The final stage generates the output file that consists of compressed fragments and hash values such that each of the compressed fragments occurs exactly once in the output file… To compress a file by chunks of 16 bytes, we use an unsupervised neural network with 16 inputs and 256 outputs The 16 inputs correspond to 16 bytes of the file, while the ID of the 256 output neurons serves as signatures to be written to the output file… . If the network does not recognize the current 16-byte pattern in the queue, it is trained on the current pattern. The oldest byte in the queue is popped out of the queue is placed in the compression buffer… It should be noted that in the compressed buffer (the input to the compression stage)… Once the entire input file is processed, the contents in the compression buffer are compressed using a neural network based algorithm.”, where after the neural network is trained and the parameters are updated, the parameters of the neural network are then compressed.) and store the compressed, updated parameters in the memory device (Chen et al. [Section 2, Subsection C, Paragraph 2: From PARSEC Code to NN] “These vectors are then “compressed” into a compact bit vector (the sketch), which is then either stored in the database (construction) or compared against database elements (query), depending on the phase.”, where database implies a memory device.). The same motivation utilized to combine Maaninen and Chen et al in claim 8 is equally applicable to claim 20. 

Claim 9 is rejected under 35 U.S.C 103 as being unpatentable over Maaninen (US 20150199963 A1) in view of in view of Chen et al. (“BenchNN: On the Broad Potential Application Scope of Hardware Neural Network Accelerators”), hereinafter, Huang (US 20170293659 A1).
Regarding claim 9, Maaninen in view of Chen et al. teaches all elements of the claim except wherein the special-purpose hardware processing unit updates the parameters based on a compression scheme that will be used to update the compress parameters. 
Huang teaches the special-purpose hardware processing unit updates the parameters based on a compression scheme that will be used to update the compress parameters (Huang [Abstract; 0006; 0008; 0140; Fig. 9] “A method, system and program product includes examining elements of a first matrix in a sequential fashion… One or more embodiments of the invention generally relate to data compression. More particularly, certain embodiments of the invention relates to mask based compression scheme… A typical neural network may use input data and weight to classify an object. The weight and feature map typically may be large. In numerical analysis, a sparse matrix is a matrix in which most of the elements are zero… FIG. 19 illustrates an exemplary method for compression, in accordance with an embodiment of the present invention.”, where the matrix contains the parameters that are updated once the parameters are compressed by a mask based compression scheme.). 
Maaninen, Chen et al. and Huang are analogous art because they are from the same field of data compression. Before the effective filing date of the invention it would have been obvious to a person of ordinary skill in the art to having the teachings of Maaninen,  Chen et al., and Huang to update the compressed parameters based on a compression scheme. The suggestion and/or motivation for doing so is to enable the special-purpose hardware processing unit to compress the parameters.  

Claim 14 is rejected under 35 U.S.C. 103 as being unpatentable over Maaninen (US 20150199963 A1), in view of Yang et al. (US 20180129939 A1).
Regarding claim 14, Maaninen teaches all elements of the claim except wherein compression of the parameters for storage in the cache is less complex than compression of the parameters for storage in a remote memory device.
Yang teaches compression of the parameters for storage in the cache is less complex than compression of the parameters for storage in a remote memory device (Yang [0055] “The memory copy operation “Memcopy” may be performed by a resource “SDMA” (e.g., a smart DMA in a local device) based on an arithmetic algorithm of “perf_opt” and an implementation of “lossy”, by a resource “MDMA” (e.g., a memory DMA in a local device) based on an arithmetic algorithm of “pwr_opt” and an implementation of “lossless”, or by a resource “RDMA” (e.g., a DMA in a remote device) based on an arithmetic algorithm of “lowLatency” and an implementation of “P2P”.”, where lossy compression is for implementation for storage in a cache “MDMA”, and lossless is used for a remote memory device, “RDMA”. Lossy compression is less complex than lossless compression.)
Maaninen and Yang are analogous art because they are both in the same field of data compression for neural networks. Before the effective filing date of the invention, it would have been obvious to a person of ordinary skill in the art, having the teachings of Yang and Maaninen to use a less complex, lossy, compression of the parameters for storing in a cache compared to a remote memory device. The suggestion and/or motivation for doing so is that cache storage is usually relatively small, so it is cost-effective and enable efficient use of data. Therefore, the compressed data stored in the cache must be smaller in size, than a remote memory device, which can be any size and even larger than cache memory. Cache memory also implies on-chip storage whether within the hardware accelerator unit or a device that includes the hardware accelerator unit, so frequently used parameters can be stored on cache so they can quickly accessed and need to smaller in size to be so it can compressed or decompressed rapidly. Therefore, it would motivate one to implement a lossy compression technique to remove un-required bits of information, making the data or parameter smaller in size, therefore less complex, for storage in the cache. And gain the advantage of being able to raise the storage ability of the cache device. Remote memory devices may have a larger memory with more memory bandwidth, so it would motivate one to implement a lossless compression technique for storage in a remote memory device, as when the data or parameter is compressed, the data will be the same replica of the actual data, including the bits of information that otherwise would be removed through lossy compression. Therefore, a more complex compression.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Kletter (US 20160099723 A1) teaches a method and system for compressing sparse data recorded in double precision floating point format. 
Annapureddy (US 20160217369 A1) teaches a method of compressing a neural network by replacing one layer in the neural network with compressed layers to produce a compressed neural network. 




 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to IAN K ALLEYNE whose telephone number is (571)272-1327. The examiner can normally be reached 8:30 - 5:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Abdullah Kawsar can be reached on 571-270-3169. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/IAN K ALLEYNE/Examiner, Art Unit 2127                                                                                                                                                                                                        
/Jue Louie/
Primary Examiner, Art Unit 2121