111DETAILED ACTION
Status of Claims
This is a first office action on the merits in response to the application filed on 30 November 2017. 
Claims 1-21 are currently pending and have been examined. 

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statements (IDS(s)) submitted on 30 November 2017, 12 June 2019, and 5 November 2020 are in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statements are being considered by the examiner.

Claim Rejections - 35 USC § 112(a)
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.

The following is a quotation of the first paragraph of pre-AIA  35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.

Claims 1-21 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA  35 U.S.C. 112, the inventor(s), at 

Claim 1 recites the original limitation “applies a generalized weight distribution to the respective original sub-components to generate respective normalized sub-components.” One of ordinary skill in the art would not recognize applicant as in possession of the identified limitation at the time of filing. 
Note MPEP2163, which states “An original claim may lack written description support when (1) the claim defines the invention in functional language specifying a desired result but the disclosure fails to sufficiently identify how the function is performed or the result is achieved”.
The identified limitation uses highly functional language to specify a desired result. What appears to be the most relevant portion of the original disclosure states: 
[0024] In accordance with the system 100, the memory 104 can store computer executable components executable by the processor 102. The receiving component 107 can receive input weight matrices and transport data to a respective destination. The segmentation component 108 can segment the initial weight matrices into the original sub-components. The sampling component 110 can apply a general weight distribution to the original sub-components. The transform component 112 can apply a transform (e.g., DCT) to the sub-block of spatial weights to generate the weight with pertinent data in a low frequency section. The cropping component 114 can clip off high frequency components so that a low frequency data set remains with concentrated information. The inverse transform component 116 can transform the compressed data set from the frequency domain back to the spatial domain. It is to be appreciated that the subject innovation is not limited to use of DCT transforms, and any suitable frequency transform (e.g., Fourier transform, LaPlace transform, Wavelet transform, Z-transform . . . ) can be employed.

	The above disclosure repeats the functional language of the claim, but provides no explanation or details as to how the generalize weight distribution is used to generate sub-component data. Based on the lack of disclosure, one of ordinary skill in the art would not recognize applicant as possessing the claimed invention at the time of filing. As such, the claim is rejected for lack of written description support. Claims 9 and 16 are similarly rejected.

Claim 5 recites the original limitation “employs interpolation to generate the respective values for the respective normalized sub-components.” One of ordinary skill in the art would not recognize applicant as in possession of the identified limitation at the time of filing. 

The identified limitation uses highly functional language to specify a desired result. What appears to be the most relevant portion of the original disclosure states: 
[0032] Fig. 2 illustrates an embodiment of a process of enforcing spatial locality in frequency component / recurrent layers at initialization. An initial weight matrix 202 is broken up into smaller regions (sub blocks) 204 where parameters within that region have some degree of spatial correlation. The corner weight values of each sub-block can be sampled from a distribution of random numbers. In one embodiment, this distribution could be the same distribution as used for random initialization of the weight matrix were the technique not employed. Bilinear interpolation 206 can be used to fill up remaining values. At this point, the server contains unified copies of the weights, and it transforms respective sub-components into the frequency domain using DCT or another transform. An exemplar sub block 208 contains significant relevant data in the low frequency segment with the higher frequency segment containing data with little value or zeroes. Subsequently the 208 data block high frequency component region is clipped off leaving the low frequency portion only region (e.g., triangle) 210. This is the compressed frequency representation of the weights that the server sends out to each receiver. The reduced size of the compressed frequency domain representation of the weights facilitates improved efficiency and performance, by reducing size of data transmitted. At the receiver(s), the empty section of the region 210 is padded with zeroes and an inverse transform (e.g., inverse DCT transform) is performed to yield a data block of spatial weights 212 that are an approximate representation of the original data block 204. It is to be appreciated that the subject innovations are not limited to corners or certain shapes of regions of relevance. Rather, the innovation is intended to encompass any suitable technique for imposing spatial locality (e.g., weights in a particular region are similar to each other) on weight matrices to facilitate frequency compression and reduce size of data transmitted across a distributed neural network

The above disclosure repeats the functional language of the claim, but provides no explanation or details as to how interpolation is employed to generate sub-component data. Based on the lack of disclosure, one of ordinary skill in the art would not recognize applicant as possessing the claimed invention at the time of filing. As such, the claim is rejected for lack of written description support. Claims 12 and 19 are similarly rejected. 

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-21 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. Claim 9, which is representative of claims 1 and 16, recites in part segmenting the weight matrix into original sub-components, wherein respective original sub-components have spatial weights; applying a generalized weight distribution to the respective original sub-components to generate respective normalized sub-components; applying a transform to the respective normalized sub-components; and cropping high-frequency weights of the respective transformed normalized sub-components to generate a set of low-frequency normalized sub-components. These limitations describe a mathematical calculation. Note that the October 2019 Update states that “[a] mathematical calculation is a mathematical operation (such as multiplication) or an act of calculating using mathematical methods.” Because the claims describe a mathematical calculation the claims are determined to recite a “mathematical concept” for the purposes of the analysis set forth by the 2019 PEG. Thus the claims are determined to recite an abstract idea.
Under the 2019 PEG, the additional elements of the claims are considered for whether they integrate an abstract idea into a practical application. Claim 9 recites employing a processor and memory to execute computer executable components to perform the limitations of the abstract idea. Claim 1 recites a system comprising a memory; a processor which execute the limitations of the abstract idea. Claim 16 recites a computer program product comprising a computer readable storage medium having program instructions to cause a processor to perform the limitations of the abstract idea. These additional elements are all described at a high level of generality, and may be interpreted as generic computing devices used to implement the abstract idea. However, under the 2019 PEG, the use of a generic computing device to implement an abstract idea does not integrate that abstract idea into a practical application. Thus these additional elements do not integrate the abstract idea into a practical application. The claims further recite the additional element of receiving neural network data in the form of a weight matrix. This additional element amounts to necessary data gathering in conjunction with the abstract idea identified above, and as such is interpreted as insignificant extra-solution activity. Per MPEP 2106, adding insignificant extra solution activity to a judicial exception is not enough to integrate a judicial exception into a practical application. Thus this additional element does not integrate the abstract idea into a practical application. There are no further additional elements. When considered as a combination, the 
In Step 2B of the Mayo/Alice analysis, the additional elements of the claims are considered for whether they amount to significantly more than the abstract idea. As previously noted, the claims recite additional elements which may be interpreted as generic computing devices used to implement the abstract idea. However, implementing an abstract idea on a generic computer does not add significantly more, similar to how the recitation of the computer in the claim in Alice amounted to mere instructions to apply the abstract idea of intermediated settlement on a generic computer. As such, these elements do not provide an inventive concept and do not constitute significantly more. As previously noted, the claims recite an additional element which amounts to extra-solution activity. However, Per MPEP 2106, the courts have found adding insignificant extra-solution activity such as mere data gather to be insufficient to qualify as “significantly more.” As such, this additional element does not amount to significantly more. There are no further additional elements. As previously noted, when considered as a combination, the additional elements generally link the abstract idea to a computing environment. However, per MPEP 2106, generally linking the use of a judicial exception to a particular technological environment has been found by the courts as insufficient to amount to significantly more. Therefore, when considered individually and as an ordered combination, the additional elements of the independent claims do not amount to significantly more than the judicial exception. Thus the independent claims are not patent eligible.  
	Dependent claims 2-7, 10-14, and 17-20 only further narrow the identified abstract idea. However, the claims continue to recite an abstract idea. The previously identified additional elements fail to integrate the narrowed abstract idea into a practical application or amount to significantly more than the narrowed abstract idea. Dependent claim 8, 15, and 21 recite the additional element of transmitting data. This additional element may be interpreted as extra-solution activity as it generally describes necessary data outputting. Thus this additional element, individually and in combination with the prior additional transmission over a network as a conventional computing functionality. Thus this additional element, individually and in combination with the prior additional elements, does not amount to significantly more than the abstract idea. Thus as the dependent claims remain directed to a judicial exception, and as the additional elements of the claims do not amount to significantly more, the dependent claims are not patent eligible.

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.


Claims 1-4, 7-11, 13, 15-18, 20, and 21 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Chalfin et al (US 2018/0239992 A1).

Regarding Claim 1, 9, and 16: Chalfin discloses a system for compressing data during neural network training, comprising: a memory that stores computer executable components and neural network data (See at least [0089]); a processor that executes computer executable components stored in the memory (See at least [0089]), wherein the computer executable components comprise:
a receiving component that receives neural network data in the form of a weight matrix (In step 402, a set of weight values for the artificial neural network is represented in the form of an array of weight values. Then, in step 404, the GPU 106 uses an image compression scheme to compress the array of weight values to provide compressed weight data for the artificial neural network. See at least [0121]). 
a segmentation component that segments the weight matrix into original sub-components, wherein respective original sub-components have spatial weights (the array of weight values is 
a sampling component that applies a generalized weight distribution to the respective original sub-components to generate respective normalized sub-components (the compression scheme may comprise JPEG compression. See at least [0051]. Examiner’s note: One of ordinary skill in the art would understand JPEG compression as including level shifting that reads on the identified limitation. For example, Katz (“Baseline JPEG compression juggles image quality and size”) states “the DCT coder usually requires that the expected average value for all pixels is zero. Therefore, before the DCT is performed, a value of 128 may be subtracted from each pixel (normally ranging from 0 to 255) to shift it to a range of –127 to 127.” See at least Page 2). 
a transform component that applies a transform to the respective normalized sub-components (Then, a discrete cosine transform (DCT) 704 is performed to generate a set of coefficients for a block. See at least [0126]); and 
a cropping component that crops high-frequency weights of the respective transformed normalized sub-components to generate a set of low-frequency normalized sub-components to generate a compressed representation of the original sub-components (Then, quantisation (Q) 706 is performed to generate a set of quantised coefficients. See at least [0126]). 

Regarding Claim 2, 10, and 17: Chalfin discloses the above limitations. Additionally, Chalfin discloses an inverse transform component that applies an inverse transform to the set of low-frequency normalized sub-components to recover a modified version of the original sub-components (FIG. 8 shows a method 800 of decompressing weight values. In step 802, compressed weight data is retrieved from the memory 114. Then, in step 804, the GPU 106 uses an image decompression scheme to decompress weight values for the artificial neural network. The image decompression is the inverse of an image compression scheme as discussed above. See at least [0128]). 

Regarding Claim 3, 11, and 18: Chalfin discloses the above limitations. Additionally, Chalfin discloses wherein the transform component applies a discrete cosine transform (Using the compression scheme to 

Regarding Claim 4: Chalfin discloses the above limitations. Additionally, Chalfin discloses wherein the segmentation component samples corner values of the original sub-components (the compression scheme may comprise JPEG compression. See at least [0051]. Examiner’s note: One of ordinary skill in the art would understand JPEG compression as including level shifting that reads on the identified limitation. For example, Katz (“Baseline JPEG compression juggles image quality and size”) states “the DCT coder usually requires that the expected average value for all pixels is zero. Therefore, before the DCT is performed, a value of 128 may be subtracted from each pixel (normally ranging from 0 to 255) to shift it to a range of –127 to 127.” See at least Page 2).

Regarding Claim 7, 13, and 20: Chalfin discloses the above limitations. Additionally, Chalfin discloses wherein the inverse transform component applies an inverse discrete cosine transform function to transform the set of low-frequency normalized sub-components to a spatial domain (FIG. 8 shows a method 800 of decompressing weight values. In step 802, compressed weight data is retrieved from the memory 114. Then, in step 804, the GPU 106 uses an image decompression scheme to decompress weight values for the artificial neural network. The image decompression is the inverse of an image compression scheme as discussed above. See at least [0128]. Also: Using the compression scheme to compress the array of weight values may comprise applying a transformation (such as a discrete cosine transform (DCT)) to (a or each block of) the array of weight values to generate coefficients. See at least [0051]).

Regarding Claim 8, 15, and 21: Chalfin discloses the above limitations. Additionally, Chalfin discloses a communication component that transmits the compressed representation of the original sub-components (Compressing the weight values using the image compression scheme can reduce the amount of bandwidth and storage needed when transferring and storing the weight values for use later use in the . 

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claims 5, 6, 12, 14, and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Chalfin et al (US 2018/0239992 A1) in view of G. Uma Vetri Selvi (DICOM Image compression using Bilinear Interpolation). 

Regarding Claim 5: Chalfin discloses the above limitations. Chalfin does not appear to disclose wherein the segmentation component employs interpolation to generate the respective values for the respective normalized sub-components. However, Selvi teaches wherein a segmentation component employs interpolation to generate the respective values for the respective normalized sub-components (“DICOM images are compressed using bilinear interpolation. This method presents a technique for classification of the image blocks on the basis of threshold value of variance. The image is divided into blocks. The blocks are classified as significant or insignificant depending on their variance. The comer pixels (bilinear 
	Chalfin provides a system which compresses neural network information by applying JPEG image compression techniques, upon which the claimed invention’s Interpolation of data points can be seen as an improvement. However, Selvi demonstrates that the prior art already knew of using corner interpolation to compress image data. One of ordinary skill in the art could have trivially applied the techniques of Selvi to the neural network compression system of Chalfin. Further, one of ordinary skill in the art would have recognized that such an application of Selvi would have resulted in an improved system which would produce a superior compression of neural network data. As such, the application of Selvi and the claimed invention would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention in view of the disclosures of Chalfin and the teachings of Selvi. 

Regarding Claim 6: Chalfin in view of Selvi teaches the above limitations. As previously noted in combination with Chalfin, Selvi teaches wherein the sampling component applies at least one of bilinear interpolation, exponential interpolation or spline interpolation (“DICOM images are compressed using bilinear interpolation. This method presents a technique for classification of the image blocks on the basis of threshold value of variance. The image is divided into blocks. The blocks are classified as significant or insignificant depending on their variance. The comer pixels (bilinear coefficients) of the blocks are stored and the remaining pixels are obtained by bilinear interpolation. The difference between original and the interpolated image is calculated and the two data are individually quantized and encoded.” See at least Page 1. Also: “The corner pixels of the blocks are stored. Bilinear interpolation is applied to those pixels to find the other pixels, the difference between the original and the interpolated image is calculated, from 

Regarding Claim 12 and 19: Chalfin discloses the above limitations. Additionally, Chalfin discloses sampling corner values of the original sub-components (the compression scheme may comprise JPEG compression. See at least [0051]. Examiner’s note: One of ordinary skill in the art would understand JPEG compression as including level shifting that reads on the identified limitation. For example, Katz (“Baseline JPEG compression juggles image quality and size”) states “the DCT coder usually requires that the expected average value for all pixels is zero. Therefore, before the DCT is performed, a value of 128 may be subtracted from each pixel (normally ranging from 0 to 255) to shift it to a range of –127 to 127.” See at least Page 2). Chalfin does not appear to disclose employing at least one of bilinear interpolation, exponential interpolation or spline interpolation to generate the respective values for the respective normalized sub-components. However, Selvi teaches employing at least one of bilinear interpolation, exponential interpolation or spline interpolation to generate the respective values for the respective normalized sub-components (“DICOM images are compressed using bilinear interpolation. This method presents a technique for classification of the image blocks on the basis of threshold value of variance. The image is divided into blocks. The blocks are classified as significant or insignificant depending on their variance. The comer pixels (bilinear coefficients) of the blocks are stored and the remaining pixels are obtained by bilinear interpolation. The difference between original and the interpolated image is calculated and the two data are individually quantized and encoded.” See at least Page 1. Also: “The corner pixels of the blocks are stored. Bilinear interpolation is applied to those pixels to find the other pixels, the difference between the original and the interpolated image is calculated, from the block classification data insignificant blocks are identified and the difference value corresponding to it are made zero. For Significant blocks the difference value is taken and quantised. The Bilinear coefficients Bc(i,j) are quantized.” See at least Page 3).
Chalfin provides a system which compresses neural network information by applying JPEG image compression techniques, upon which the claimed invention’s Interpolation of data points can be 

Regarding Claim 14: Chalfin discloses the above limitations. Chalfin does not appear to explicitly disclose padding zeros of the set. However, Selvi teaches padding zeros of the set (“If the size of the image is not a multiple of m and n round rows to ‘m’ and columns to ‘n’ by adding zeros at the bottom and right and then perform block division.” See at least Page 2). 
Chalfin provides a system which compresses neural network information by applying JPEG image compression techniques, upon which the claimed invention’s zero padding can be seen as an improvement. However, Selvi demonstrates that the prior art already knew of using zero padding to compress irregularly sized image data. One of ordinary skill in the art could have trivially applied the techniques of Selvi to the neural network compression system of Chalfin. Further, one of ordinary skill in the art would have recognized that such an application of Selvi would have resulted in an improved system which could compress neural network data that doesn’t evenly divide into data blocks. As such, the application of Selvi and the claimed invention would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention in view of the disclosures of Chalfin and the teachings of Selvi.





Additional Considerations
The prior art made of record and not relied upon that is considered pertinent to applicant’s disclosure can be found in the PTO-892 Notice of References Cited. 
Wang et al. (CNNpack: Packing Convolutional Neural Networks in the Frequency Domain) recognizes that convolutional filters may be treated as image data. 
Jong Hwan Ko et al. (Adaptive Weight Compression for Memory-Efficient Neural Networks) discusses applying JPEG encoding to compress the weights of a neural network “by exploiting the spatial locality and smoothness of the weight matrix.”
Bar-On et al. (US 2018/02983758 A1) discusses compressing a convolutional neural network by transforming it into the frequency domain and quantizing the frequency domain values. 

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Bion A Shelden whose telephone number is (571)270-0515. The examiner can normally be reached M-F, 12pm-10pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Hajime S Rojas can be reached on (571)270-5491. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance 





/Bion A Shelden/Examiner, Art Unit 3681                                                                                                                                                                                                        2022-01-12