DETAILED ACTION
Status of Claims
This is a non-final office action on the merits in response to the arguments and amendments filed on 25 July 2022 and the request for continued examination filed on 25 July 2022.
Claim 22 is new. Claims 1, 2, 4, 5, 7,9, 10, 12, 13, an 15-21 were amended. Claims 1-5 and 7-22 are currently pending and have been examined. 

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on July 2022 has been entered.

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 30 June 2022 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Claim Objections
Claims 9 and 16 are objected to because of the following informalities:
Claim 9 recites “segmenting the weight matrix into a plurality original sub-components” which appears to include typographical error of omission, and should recite “segmenting the weight matrix into a plurality of original sub-components.”
Claim 9 recites “wherein respective original sub components comprising a subset of the weights in the weight matrix”, which appears to contain a typographical error, and should recite “wherein respective original sub components comprise a subset of the weights in the weight matrix.”  
Claim 16 recites “the program instructions executable by processor to cause the processor to” which appears to include a typographical error of omission, and should recite “the program instructions executable by a processor to cause the processor to.” 
Appropriate correction is required.

Claim Rejections - 35 USC § 112(a)
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.

The following is a quotation of the first paragraph of pre-AIA  35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.

Claim 5 is rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA  35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention. 

Amended claim 5 recites the non-original limitation “wherein the segmentation component employs interpolation to generate the plurality of original sub-components.” The specification does not appear to support the identified limitation. Applicant’s remarks assert that no new matter is added, but they do not identify support for the limitations. The most relevant portion of the original disclosure states:
[0026] In various implementations, it is to be appreciated that spatial locality enforcement, and interpolation are done at beginning of training, when the weights are initialized. Instead of initializing the entire weight matrix at random, corners of each sub-block are initialized at random. Then interpolation is done to fill up the other values within each sub-component. The training process involves tweaking the weights over many examples and epochs. When weights are required to be transmitted, e.g., from a sender to a receiver, where the sender is a CPU/parameter server/learner and the receiver is memory/learner, compression is performed at the sender. Padding and decompression is performed at the receiver, such that data transmitted on the channel is minimized. However, initialization and interpolation steps are subsequently not performed.

The above disclosure specifically describes using interpolation as part of a random initialization of the matrix: “corners of each sub-block are initialized at random. Then interpolation is done to fill up the other values.” This interpolation during random initialization does not appear to be part of a process of receiving weight matrix data. Thus the interpolation disclosed in [0026] is not supported as part of a segmentation of received weight data. The remainder of the original disclosure similarly fails to support the original disclosure. Because the claims include a non-original limitation that is not supported by the original disclosure, one of ordinary skill in the art would not recognize applicant as possessing the claimed invention at the time of filing. Thus the claims are rejected based on the written description requirement.  

Claim Rejections - 35 USC § 112(b)
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 1-5 and 7-22 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention. Claims not listed below are rejected for dependency.

Claim 1 recites “a transform component that applies a transform to the respective original sub-components to generate a distribution of spatial weights as a function of the frequency of respective spatial weights in the respective original sub-component.” One of ordinary skill in the art would not understand the identified limitation, rendering the scope of the claim indefinite. 
First, it is not clear what it means to generate a “distribution of spatial weights.” One of ordinary skill in the art would not know whether this describes a rearrangement of a set of spatial weights, or whether it describes something analogous to a DCT. The plain reading of the limitation suggests that the product of the transform is “a distribution of spatial weights”, which in the context of the present claims would be some arrangement of the prior spatial weights of a given sub-component. However, such an operation finds apparently no support in the original disclosure; there appears to be no discussion at all of a transformation which produces spatial domain data from special domain data. The specification, but not the claim, would suggest to one of ordinary skill in the art that the process is applying a Discrete Cosine Transformation (DCT) to transform the spatial weights of a sub-component of into a distribution of frequency domain weights. Such a process is suggested by disclosures such as “The transform component 112 can apply a transform (e.g., DCT) to the sub-block of spatial weights to generate the weight with pertinent data in a low frequency section” [0024] and “At 304, a transform (e.g., DCT transform) is performed on a sub-block of spatial weights to produce a compressed sub-data block with the data concentrated in the low frequency zone (e.g., using transform component 112)” [0033]. But a DCT does not produce a “distribution of spatial weights”, as the output of the DCT is of the frequency domain. Given the conflict, one of ordinary skill in the art would not understand the meaning of the claim, and would not be able to determine the boundaries of the claim, rendering the claim indefinite. 
Secondly, it is not clear what is meant by “the frequency of respective spatial weights.” The plain reading of this limitation suggests that the frequencies are associated with specific spatial weights, e.g., the weight “3” shows up 4 times in the subcomponent, and thus the frequency of weight “3” is “4”. However, such a reading uses “frequency” entirely different from the specification. The specification discusses transforming spatial data into a frequency domain, where the “frequency” of the spatial weights refers to the frequency components of a frequency domain representation of the prior spatial weight value. Given the conflict, one of ordinary skill in the art would not understand the meaning of the claim, and would not be able to determine the boundaries of the claim, rendering the claim indefinite.
Claim 9 is similarly rejected for “applying a transform to the respective sub-components generating a distribution of spatial weights as a function of the frequency of respective spatial weights in the respective original sub-components” and claim 16 is similarly rejected for “generate a spatial weight distribution for the spatial weights, wherein low-frequency spatial weights are located…”
For the purposes of examination, the limitation will be interpreted as requiring the application of a transform to the respective original sub-components to generate a matrix of frequency domain weights determined based on the spatial weights of the respective original sub-component. 

Claim 7 recites “the plurality of low-frequency initialized sub-components”. There is no antecedent basis for “initialized” sub-components, making the scope of the claim unclear and indefinite. Claim 9 is similarly rejected based on “the respective transformed initialized sub-components”.
For the purposes of examination, the limitations will be interpreted without regard to any initialization.   

Claim 9 recites “applying a generalized weight distribution to the respective original sub-components to generate a spatial weight distribution for the weights the respective original sub-components comprise.” One of ordinary skill in the art would not understand what it means to apply a “generalized weight distribution” to “generate a spatial weight distribution.” The specification provides no details, examples, or clarity regarding this operation. Further, one of ordinary skill in the art would understand the “original sub-components” to already have a spatial weight distribution. That understanding is supported at least by the abstract which describes “A segmentation component segments the initial weight matrix into original sub-components, wherein respective original sub-components have spatial weights.” Because one of ordinary skill in the art would not know the meaning of “applying a generalized weight distribution”, and based on the confusion behind the claims apparently describing the generation of a “spatial weight distribution for the weights” when the weight already have a spatial distribution, one of ordinary skill in the art would not be able to determine the boundaries of the claim, rendering the claim indefinite.
For the purposes of examination, the limitation will be interpreted as any sort of operation which modifies the existing spatial weight distribution of the weights to another spatial weight distribution. 

Claim 9 recites “applying a transform to the respective sub-components”. There is not clear antecedent basis for “the respective sub-components” in the claims. Note that the claim previously recites “segmenting the weight matrix into a plurality original sub-components, wherein respective original sub-components comprising a subset of the spatial weights in the weight matrix.” From this it would appear that “the respective sub-components” refers to the segmented “respective original sub-components.” The claim subsequently recites “applying a generalized weight distribution to the respective original sub-components to generate a spatial weight distribution.” The claim subsequently recites the limitation at issue applying a transform to the respective sub-components. The structure of the claim would indicate to one of ordinary skill in the art that the transformation application is applied to the sub-components as altered by the generalized weight distribution, but the plainest reading of the claim could be taken to suggest that the generalized weight distribution and the transform are parallel processes both operating on “original sub-components.” One of ordinary skill in the art would not know how to interpret the claim, rendering the claim indefinite. 
For the purposes of examination, the former interpretation will be used. 

Claim 22 recites “a sampling component that, for respective sub-components, generates a spatial weight matrix for the subset of the weights in the respective original sub-components.” One of ordinary skill in the art would understand the weights of the respective original sub-components to be a spatial weight matrix. From this, one of ordinary skill in the art would not understand what, if anything, is achieved by the claimed “sampling component.” Thus one of ordinary skill in the art would not be able to determine the scope of the claim, rendering the claim indefinite. 


Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-5 and 7-22 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. Claim 9, which is representative of claims 1 and 16, recites in part segmenting the weight matrix into a plurality original sub-components, wherein respective original sub-components comprising a subset of the weights in the weight matrix; applying a generalized weight distribution to the respective original sub-components to generate a spatial weight distribution for the weights the respective original sub-components comprise; apply a transformation to the respective sub-components generating a distribution of spatial weights as a function of the frequency of respective spatial weights in the respective original sub-components; and cropping high-frequency weights of the respective transformed initialized sub-components while retaining the low frequency spatial weights generating a set of sub-components comprising the low-frequency spatial weights. These limitations describe a mathematical calculations. Note that the October 2019 Update states that “[a] mathematical calculation is a mathematical operation (such as multiplication) or an act of calculating using mathematical methods.” Because the claims describe a mathematical calculation the claims are determined to recite a “mathematical concept” for the purposes of the analysis set forth by the 2019 PEG. Thus the claims are determined to recite an abstract idea.
Under the 2019 PEG, the additional elements of the claims are considered for whether they integrate an abstract idea into a practical application. Claim 9 recites employing a processor and memory to execute computer executable components to perform the limitations of the abstract idea. Claim 1 recites a system comprising a memory; a processor which execute the limitations of the abstract idea. Claim 16 recites a computer program product comprising a computer readable storage medium having program instructions to cause a processor to perform the limitations of the abstract idea. These additional elements are all described at a high level of generality, and may be interpreted as generic computing devices used to implement the abstract idea. However, under the 2019 PEG, the use of a generic computing device to implement an abstract idea does not integrate that abstract idea into a practical application. Thus these additional elements do not integrate the abstract idea into a practical application. The claims further recite the additional element of receiving neural network data in the form of a weight matrix. This additional element amounts to necessary data gathering in conjunction with the abstract idea identified above, and as such is interpreted as insignificant extra-solution activity. Per MPEP 2106, adding insignificant extra solution activity to a judicial exception is not enough to integrate a judicial exception into a practical application. Thus this additional element does not integrate the abstract idea into a practical application. There are no further additional elements. When considered as a combination, the additional elements do not reflect any improvement to technology, do not require the use of a particular machine, do not effect a transformation of an article, and do not meaningfully limit the abstract idea. Instead the combination of additional elements only generally links the abstract idea to a computing environment. As such, the combination of additional elements do not integrate the abstract idea into a practical application. Thus the claims are determined to be directed to an abstract idea. 
In Step 2B of the Mayo/Alice analysis, the additional elements of the claims are considered for whether they amount to significantly more than the abstract idea. As previously noted, the claims recite additional elements which may be interpreted as generic computing devices used to implement the abstract idea. However, implementing an abstract idea on a generic computer does not add significantly more, similar to how the recitation of the computer in the claim in Alice amounted to mere instructions to apply the abstract idea of intermediated settlement on a generic computer. As such, these elements do not provide an inventive concept and do not constitute significantly more. As previously noted, the claims recite an additional element which amounts to extra-solution activity. However, Per MPEP 2106, the courts have found adding insignificant extra-solution activity such as mere data gather to be insufficient to qualify as “significantly more.” As such, this additional element does not amount to significantly more. There are no further additional elements. As previously noted, when considered as a combination, the additional elements generally link the abstract idea to a computing environment. However, per MPEP 2106, generally linking the use of a judicial exception to a particular technological environment has been found by the courts as insufficient to amount to significantly more. Therefore, when considered individually and as an ordered combination, the additional elements of the independent claims do not amount to significantly more than the judicial exception. Thus the independent claims are not patent eligible.  
	Dependent claims 2-5, 7, 10-15, 17, and 22 only further narrow the identified abstract idea. However, the claims continue to recite an abstract idea. The previously identified additional elements fail to integrate the narrowed abstract idea into a practical application or amount to significantly more than the narrowed abstract idea. Dependent claim 8 and 21 recite the additional element of transmitting data. This additional element may be interpreted as extra-solution activity as it generally describes necessary data outputting. Thus this additional element, individually and in combination with the prior additional elements, does not integrate the abstract idea into a practical application. Further, the courts have recognized transmission over a network as a conventional computing functionality. Thus this additional element, individually and in combination with the prior additional elements, does not amount to significantly more than the abstract idea. Thus as the dependent claims remain directed to a judicial exception, and as the additional elements of the claims do not amount to significantly more, the dependent claims are not patent eligible.

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.


Claims 1-4, 7-11, 13, 15-18, and 20-22 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Chalfin et al. (US 2018/0239992 A1). 

Regarding Claim 1: Chalfin discloses a system for compressing data during neural network training, comprising: a memory that stores computer executable components and neural network data (See at least [0089]); a processor that executes computer executable components stored in the memory (See at least [0089]), wherein the computer executable components comprise:
a receiving component that receives neural network data in the form of a weight matrix, wherein the weight matrix comprises respective weights to be applied to the neural network (In step 402, a set of weight values for the artificial neural network is represented in the form of an array of weight values. Then, in step 404, the GPU 106 uses an image compression scheme to compress the array of weight values to provide compressed weight data for the artificial neural network. See at least [0121]);
a segmentation component that segments the weight matrix into a plurality of original sub-components, wherein respective subcomponents in the plurality of original sub-components comprise a subset of the respective weights in the weight matrix (the array of weight values is divided into blocks of 8×8 weight values 702 that are compressed on a block-by-block basis. See at least [0126]);
a transform component that applies a transform to the respective original sub-components to generate a distribution of spatial weights as a function of the frequency of respective spatial weights in the respective original sub-components (Then, a discrete cosine transform (DCT) 704 is performed to generate a set of coefficients for a block. See at least [0126]. Examiner’s note: One of ordinary skill in the art would recognizing the application of a DCT transforms a set of values from a spatial domain to a frequency domain); and
a cropping component that crops high-frequency spatial weights from respective transformed sub-components to generate, for every transformed sub-component, a low-frequency sub-component comprising the low-frequency spatial weights to generate a compressed representation of the original sub-components (In embodiments, the compression scheme may comprise JPEG compression. … Using the compression scheme to compress the array of weight values may comprise quantisation of the coefficients to generate quantised coefficients. Using the compression scheme to compress the array of weight values may comprise (entropy (e.g. zigzag)) encoding of the (quantised) coefficients to generate encoded (quantised) coefficients. Thus, in embodiments, the compressed weight data may comprise encoded (quantised) coefficients. See at least [0051]. Also: quantisation (Q) 706 is performed to generate a set of quantised coefficients. Then, zigzag entropy encoding 708 is performed to generate compressed weight data in the form of a set of encoded quantised coefficients 710 for the block. See at least [0126]. Examiner’s note: One of ordinary skill in the art would understand and recognize that quantization crops high-frequency weights from the coefficients) during training of the neural network (FIG. 10 shows a method of training weights for an artificial neural network according to an embodiment of the technology described herein. See at least [0015] and Fig. 10). 

Regarding Claim 9: Chalfin discloses a computer-implemented method, comprising employing a processor and a memory to execute computer executable components (See at least [0092]) to perform the following acts: 
receiving neural network data in the form of a weight matrix, wherein the weight matrix comprising respective weights to be applied to the neural network during training of the neural network (In step 402, a set of weight values for the artificial neural network is represented in the form of an array of weight values. Then, in step 404, the GPU 106 uses an image compression scheme to compress the array of weight values to provide compressed weight data for the artificial neural network. See at least [0121]. Also: FIG. 10 shows a method of training weights for an artificial neural network according to an embodiment of the technology described herein. See at least [0015] and Fig. 10);
segmenting the weight matrix into a plurality of original sub-components, wherein respective original sub-components comprising a subset of the weights in the weight matrix (the array of weight values is divided into blocks of 8×8 weight values 702 that are compressed on a block-by-block basis. See at least [0126]);
applying a generalized weight distribution to the respective original sub-components to generate a spatial weight distribution for the weights the respective original sub-components comprise (the compression scheme may comprise JPEG compression. See at least [0051]. Examiner’s note: One of ordinary skill in the art would recognize JPEG compression as including level shifting that reads on the identified limitation. For example, Katz (“Baseline JPEG compression juggles image quality and size”) states “the DCT coder usually requires that the expected average value for all pixels is zero. Therefore, before the DCT is performed, a value of 128 may be subtracted from each pixel (normally ranging from 0 to 255) to shift it to a range of –127 to 127.” See at least Page 2).
applying a transform to the respective sub-components generating a distribution of spatial weights as a function of the frequency of respective spatial weights in the respective original sub-components (Then, a discrete cosine transform (DCT) 704 is performed to generate a set of coefficients for a block. See at least [0126]. Examiner’s note: One of ordinary skill in the art would recognizing the application of a DCT transforms a set of values from a spatial domain to a frequency domain analog); and
cropping high-frequency weights of the respective transformed initialized sub-components while retaining the low frequency spatial weights generating a set of sub-components comprising the low-frequency spatial weights (In embodiments, the compression scheme may comprise JPEG compression. … Using the compression scheme to compress the array of weight values may comprise quantisation of the coefficients to generate quantised coefficients. Using the compression scheme to compress the array of weight values may comprise (entropy (e.g. zigzag)) encoding of the (quantised) coefficients to generate encoded (quantised) coefficients. Thus, in embodiments, the compressed weight data may comprise encoded (quantised) coefficients. See at least [0051]. Also: quantisation (Q) 706 is performed to generate a set of quantised coefficients. Then, zigzag entropy encoding 708 is performed to generate compressed weight data in the form of a set of encoded quantised coefficients 710 for the block. See at least [0126]. Examiner’s note: One of ordinary skill in the art would understand and recognize that quantization crops high-frequency weights from the coefficients). 

Regarding Claim 16: Chalfin discloses a computer program product for compressing training data, the computer program product comprising a computer readable storage medium having program instructions embodied therewith (See at least [0101]), the program instructions executable by processor to cause the processor to: 
receive neural network data in the form of a weight matrix, wherein the weight matrix comprising respective weights to be applied to the neural network during training of the neural network (In step 402, a set of weight values for the artificial neural network is represented in the form of an array of weight values. Then, in step 404, the GPU 106 uses an image compression scheme to compress the array of weight values to provide compressed weight data for the artificial neural network. See at least [0121]. Also: FIG. 10 shows a method of training weights for an artificial neural network according to an embodiment of the technology described herein. See at least [0015] and Fig. 10);
segment the weight matrix into original sub-components, wherein respective original sub-components comprise a subset of the weights in the weight matrix (the array of weight values is divided into blocks of 8×8 weight values 702 that are compressed on a block-by-block basis. See at least [0126]);
determine spatial weights for respective subset of weights, wherein the spatial weights have a frequency distribution (the array of weight values is divided into blocks of 8×8 weight values 702 that are compressed on a block-by-block basis. See at least [0126]); 
generate a spatial weight distribution for the spatial weights, wherein low-frequency spatial weights are located in a first region of the distribution and high frequency spatial weights are located in a second region of the distribution (In embodiments, the compression scheme may comprise JPEG compression. Using the compression scheme to compress the array of weight values may comprise applying a transformation (such as a discrete cosine transform (DCT)) to (a or each block of) the array of weight values to generate coefficients. See at least [0051]. Examiner’s Note: One of ordinary skill in the art would understand and recognize that the DCT produces a set of frequency domain values with the values corresponding to the lowest frequencies in the upper left corner of the matrix, and with the values corresponding to the highest frequencies in the lower right corner of the matrix); and 
crop high-frequency weights from the respective spatial weight distributions to generate a set of sub-components comprising low-frequency spatial weights (In embodiments, the compression scheme may comprise JPEG compression. … Using the compression scheme to compress the array of weight values may comprise quantisation of the coefficients to generate quantised coefficients. Using the compression scheme to compress the array of weight values may comprise (entropy (e.g. zigzag)) encoding of the (quantised) coefficients to generate encoded (quantised) coefficients. Thus, in embodiments, the compressed weight data may comprise encoded (quantised) coefficients. See at least [0051]. Also: quantisation (Q) 706 is performed to generate a set of quantised coefficients. Then, zigzag entropy encoding 708 is performed to generate compressed weight data in the form of a set of encoded quantised coefficients 710 for the block. See at least [0126]. Examiner’s note: One of ordinary skill in the art would understand and recognize that quantization crops high-frequency weights from the coefficients) during training of the neural network (FIG. 10 shows a method of training weights for an artificial neural network according to an embodiment of the technology described herein. See at least [0015] and Fig. 10).

Regarding Claim 2, 10, and 17: Chalfin discloses the above limitations. Additionally, Chalfin discloses an inverse transform component that applies an inverse transform to the plurality of transformed sub-components to recover a modified version of the original sub-components (The decompression scheme can take any desired and suitable form that corresponds to (e.g. is the inverse of) the compression scheme, as discussed above. For example, the decompression scheme may allow selected decompressed weight values to be derived from the compressed weight data without decompressing all of the compressed weight data. For another example, the decompression scheme may be block-based. In these embodiments, using the decompression scheme to derive the decompressed weight values may comprise decompressing weight values for one or more blocks separately and/or in parallel using the decompression scheme. For another example, using the image decompression scheme to derive the decompressed weight values from the compressed weight data may comprise using different sets of compression parameters for deriving respective subsets (partitions) of decompressed weight values. See at least [0070]. Also: FIG. 8 shows a method 800 of decompressing weight values. In step 802, compressed weight data is retrieved from the memory 114. Then, in step 804, the GPU 106 uses an image decompression scheme to decompress weight values for the artificial neural network. The image decompression is the inverse of an image compression scheme as discussed above. Then, in step 806, the decompressed weight values are applied in the artificial neural network. Applying the weight values comprises the GPU 106 generating and then executing threads for applying the weight values. See at least [0128]. Examiner’s note: One of ordinary skill in the art would understand and recognize that the inverse of the JPEG compression scheme involves the application of an inverse discrete cosine transformation). 

Regarding Claim 3, 11, and 18: Chalfin discloses the above limitations. Additionally, Chalfin discloses wherein the transform component applies a discrete cosine transform (In embodiments, the compression scheme may comprise JPEG compression. Using the compression scheme to compress the array of weight values may comprise applying a transformation (such as a discrete cosine transform (DCT)) to (a or each block of) the array of weight values to generate coefficients. See at least [0051]). 

Regarding Claim 4: Chalfin discloses the above limitations. Additionally, Chalfin discloses wherein the low-frequency spatial weights are located in a first region of the distribution in the original sub-component and high-frequency spatial weights are located in a second region of the distribution in the original sub-component, where the first region is located in a corner of the respective transformed sub-components (In embodiments, the compression scheme may comprise JPEG compression. Using the compression scheme to compress the array of weight values may comprise applying a transformation (such as a discrete cosine transform (DCT)) to (a or each block of) the array of weight values to generate coefficients. See at least [0051]. Examiner’s Note: One of ordinary skill in the art would understand and recognize that the DCT produces a set of frequency domain values with the values corresponding to the lowest frequencies in the upper left corner of the matrix, and with the values corresponding to the highest frequencies in the lower right corner of the matrix). 

Regarding Claim 7, 13, and 20: Chalfin discloses the above limitations. Additionally, Chalfin discloses wherein the inverse transform component applies an inverse discrete cosine transform function to transform the plurality of low-frequency initialized sub-components to a spatial domain (The decompression scheme can take any desired and suitable form that corresponds to (e.g. is the inverse of) the compression scheme, as discussed above. For example, the decompression scheme may allow selected decompressed weight values to be derived from the compressed weight data without decompressing all of the compressed weight data. For another example, the decompression scheme may be block-based. In these embodiments, using the decompression scheme to derive the decompressed weight values may comprise decompressing weight values for one or more blocks separately and/or in parallel using the decompression scheme. For another example, using the image decompression scheme to derive the decompressed weight values from the compressed weight data may comprise using different sets of compression parameters for deriving respective subsets (partitions) of decompressed weight values. See at least [0070]. Also: FIG. 8 shows a method 800 of decompressing weight values. In step 802, compressed weight data is retrieved from the memory 114. Then, in step 804, the GPU 106 uses an image decompression scheme to decompress weight values for the artificial neural network. The image decompression is the inverse of an image compression scheme as discussed above. Then, in step 806, the decompressed weight values are applied in the artificial neural network. Applying the weight values comprises the GPU 106 generating and then executing threads for applying the weight values. See at least [0128]. Examiner’s note: One of ordinary skill in the art would understand and recognize that the inverse of the JPEG compression scheme involves the application of an inverse discrete cosine transformation).

Regarding Claim 8 and 21: Chalfin discloses the above limitations. Additionally, Chalfin discloses a communication component that transmits the compressed representation of the original sub-components (Compressing the weight values using the image compression scheme can reduce the amount of bandwidth and storage needed when transferring and storing the weight values for use later use in the artificial neural network. For example, in some embodiments, the weight data can be reduced from around 600 Mb to less than 100 Mb. See at least [0121]).

Regarding Claim 15: Chalfin discloses the above limitations. Additionally, Chalfin discloses low frequency sub-components as a compressed representation of the weight matrix (In embodiments, the compression scheme may comprise JPEG compression. … Using the compression scheme to compress the array of weight values may comprise quantisation of the coefficients to generate quantised coefficients. Using the compression scheme to compress the array of weight values may comprise (entropy (e.g. zigzag)) encoding of the (quantised) coefficients to generate encoded (quantised) coefficients. Thus, in embodiments, the compressed weight data may comprise encoded (quantised) coefficients. See at least [0051]. Also: quantisation (Q) 706 is performed to generate a set of quantised coefficients. Then, zigzag entropy encoding 708 is performed to generate compressed weight data in the form of a set of encoded quantised coefficients 710 for the block. See at least [0126]. Examiner’s note: One of ordinary skill in the art would understand and recognize that quantization crops high-frequency weights from the coefficients) during training of the neural network (FIG. 10 shows a method of training weights for an artificial neural network according to an embodiment of the technology described herein. See at least [0015] and Fig. 10).

Regarding Claim 22: Chalfin discloses the above limitations. Additionally, Chalfin discloses a sampling component that, for respective sub-components, generates a spatial weight matrix for the subset of the weights in the respective original sub-components (the array of weight values is divided into blocks of 8×8 weight values 702 that are compressed on a block-by-block basis. See at least [0126]);

Claims 5, 12, 14, and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Chalfin et al (US 2018/0239992 A1) in view of G. Uma Vetri Selvi (DICOM Image compression using Bilinear Interpolation) [hereafter referenced as “Selvi”]. 

Regarding Claim 5: Chalfin discloses the above limitations. Chalfin does not appear to disclose wherein the segmentation component employs interpolation to generate the plurality of original sub-components. However, Selvi teaches employing interpolation to generate a plurality of original sub-components (“DICOM images are compressed using bilinear interpolation. This method presents a technique for classification of the image blocks on the basis of threshold value of variance. The image is divided into blocks. The blocks are classified as significant or insignificant depending on their variance. The comer pixels (bilinear coefficients) of the blocks are stored and the remaining pixels are obtained by bilinear interpolation. The difference between original and the interpolated image is calculated and the two data are individually quantized and encoded.” See at least Page 1. Also: “The corner pixels of the blocks are stored. Bilinear interpolation is applied to those pixels to find the other pixels, the difference between the original and the interpolated image is calculated, from the block classification data insignificant blocks are identified and the difference value corresponding to it are made zero. For Significant blocks the difference value is taken and quantised. The Bilinear coefficients Bc(i,j) are quantized.” See at least Page 3). 
	Chalfin provides a system which compresses neural network information by applying JPEG image compression techniques, upon which the claimed invention’s interpolation of data points can be seen as an improvement. However, Selvi demonstrates that the prior art already knew of using corner interpolation to compress image data. One of ordinary skill in the art could have trivially applied the techniques of Selvi to the neural network compression system of Chalfin. Further, one of ordinary skill in the art would have recognized that such an application of Selvi would have resulted in an improved system which would produce a superior compression of neural network data. As such, the application of Selvi and the claimed invention would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention in view of the disclosures of Chalfin and the teachings of Selvi. 

Regarding Claim 12 and 19: Chalfin discloses the above limitations. Chalfin does not appear to disclose wherein applying the generalized weight distribution further comprises employing at least one of bilinear interpolation, exponential interpolation or spline interpolation. 
However, Selvi teaches employing at least one of bilinear interpolation, exponential interpolation or spline interpolation (“DICOM images are compressed using bilinear interpolation. This method presents a technique for classification of the image blocks on the basis of threshold value of variance. The image is divided into blocks. The blocks are classified as significant or insignificant depending on their variance. The comer pixels (bilinear coefficients) of the blocks are stored and the remaining pixels are obtained by bilinear interpolation. The difference between original and the interpolated image is calculated and the two data are individually quantized and encoded.” See at least Page 1. Also: “The corner pixels of the blocks are stored. Bilinear interpolation is applied to those pixels to find the other pixels, the difference between the original and the interpolated image is calculated, from the block classification data insignificant blocks are identified and the difference value corresponding to it are made zero. For Significant blocks the difference value is taken and quantised. The Bilinear coefficients Bc(i,j) are quantized.” See at least Page 3). 
	Chalfin provides a system which compresses neural network information by applying JPEG image compression techniques, upon which the claimed invention’s use of interpolation can be seen as an improvement. However, Selvi demonstrates that the prior art already knew of using corner interpolation to compress image data. One of ordinary skill in the art could have trivially applied the techniques of Selvi to the neural network compression system of Chalfin. Further, one of ordinary skill in the art would have recognized that such an application of Selvi would have resulted in an improved system which would produce a superior compression of neural network data. As such, the application of Selvi and the claimed invention would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention in view of the disclosures of Chalfin and the teachings of Selvi. 

Regarding Claim 14: Chalfin discloses the above limitations. Chalfin does not appear to explicitly disclose padding zeros of the set. However, Selvi teaches padding zeros of the set (“If the size of the image is not a multiple of m and n round rows to ‘m’ and columns to ‘n’ by adding zeros at the bottom and right and then perform block division.” See at least Page 2). 
Chalfin provides a system which compresses neural network information by applying JPEG image compression techniques, upon which the claimed invention’s zero padding can be seen as an improvement. However, Selvi demonstrates that the prior art already knew of using zero padding to compress irregularly sized image data. One of ordinary skill in the art could have trivially applied the techniques of Selvi to the neural network compression system of Chalfin. Further, one of ordinary skill in the art would have recognized that such an application of Selvi would have resulted in an improved system which could compress neural network data that doesn’t evenly divide into data blocks. As such, the application of Selvi and the claimed invention would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention in view of the disclosures of Chalfin and the teachings of Selvi.

Response to Arguments
Applicant’s Argument Regarding 112(a) Rejections of claims 1-21: Claims 1, 9, and 16 have been amended. 
Examiner’s Response: Applicant's amendments filed 19 April 2022 have been fully considered. The rejections that were not overcome by the amendments have been updated, and the remaining rejections are withdrawn. 

Applicant’s Argument Regarding 112(b) Rejections of claims 1-21: Claims 1, 9, and 16 have been amended. 
Examiner’s Response: Applicant's amendments filed 19 April 2022 have been fully considered. The prior identified issue has been resolve, and as such the prior rejections are withdrawn. 

Applicant’s Argument Regarding 101 Rejections of claims 1-21: 
The claim does not recite a mathematical relationship, formula, or calculation. 
As discussed in Example 38, limitations that may be based on mathematical concepts, where the mathematical concepts are not recited in the claims, do not recite a judicial exception. 
The potentially vast scope of the neural network, and the weight matrix applied thereto, squarely places the concept of reducing the weight matrix to be beyond that of a mental process. 
Examiner’s Response: Applicant's arguments filed 25 July 2022 have been fully considered but they are not persuasive.
The claims appear to the examiner to clearly recite a mathematical calculation. For example, the limitation “a transformation that applies a transform to the respective original sub-component to generate a distribution of spatial weights as a function of the frequency of respective spatial weights in the respective original sub-components”, as best interpreted as explained above, is clearly focused on the application of a specific mathematical calculation. Thus examiner finds the assertion that the claims do not recite a mathematical calculation unpersuasive. 
The present claims do not appear to be “based on mathematical concepts, where the mathematical concepts are not recited in the claims”, as the limitations do not merely involve or relate to a calculation, but describe a calculation. 
Examiner notes that it is not sufficient to rule out a mental process merely by the scope encompassing non-practically performable material, but rather that the claim must require the non-practically performable material. Thus a “potentially vast scope” of calculation would not exclude a claim from being found to recite a mental process. However, the issue is moot as the claim has not been identified as setting forth a mental process. 

Additional Considerations
The prior art made of record and not relied upon that is considered pertinent to applicant’s disclosure can be found in the PTO-892 Notice of References Cited. 
	Bar-On et al. (US 2018/02983758 A1) is noted again for its particular relevance in discussing compressing a convolutional neural network by transforming it into the frequency domain and quantizing the frequency domain values. 
DataGenetics (Discrete Cosine Transformations) illustrates aspects of one of ordinary skill in the art’s understanding of the discrete cosine transformation including how the DCT produces a matrix of values with the values corresponding to the lowest frequencies in the upper left corner of the matrix, and with the values corresponding to the highest frequencies in the lower right corner of the matrix. DataGenetics further explains how quantization positionally scales the DCT values in such a way that “biases the top left corner where we know the most ‘important’ coefficients reside.”

    PNG
    media_image1.png
    290
    483
    media_image1.png
    Greyscale

E. Roberts (Data Compression) illustrates aspects of one of ordinary skill in the art’s understanding of the JPEG compression algorithm, including how quantization works and how “one can notice the smaller DCT coefficients of high-frequency elements divided by the larger quantum values will most often result in the high-frequency coefficients being rounded down to zero” and “The high-frequency areas of the matrix have, for the most part, been reduced to zero, eliminating their effect on the decompressed image.”

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Bion A Shelden whose telephone number is (571)270-0515. The examiner can normally be reached M-F, 12pm-10pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Hajime S Rojas can be reached on (571)270-5491. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/Bion A Shelden/Examiner, Art Unit 3681                                                                                                                                                                                                        2022-08-07