Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
1.  Applicant’s arguments, filed November 30th, 2021, with respect to the drawing objections, specification objections, claim objections, and 35 USC 112(b) rejections have been fully considered and are persuasive.  The above objections and rejections have been withdrawn.

2.  Applicant’s arguments, filed November 30th, 2021, with respect to the 35 USC 103 rejections have been fully considered and are persuasive in light of the claim amendments.  Therefore, the rejections have been withdrawn.  However, upon further consideration, new grounds of rejection are made in view of Cho (US 2020/0311539, cited in previous action).

Claim Objections
3.  Claims 1, 6-8, 11-12, and 17-31 are objected to because of the following informalities:
In claims 1, 12, and 19, the following changes should be made for clarity: replacing “from an upstream GPU of the GPU” with “from an upstream GPU of the GPUs”, replacing “combing, by the GPU” with “combining, by the GPU”, replacing “a first respective DP result,” with “a first respective DP result, and”, and replacing “a downstream GPU of the GPU” with “a downstream GPU of the GPUs”.
In claims 25, 27, and 29, the phrase “GPUs in the logic ring” should be replaced with “GPUs in the logical ring”.
The remaining claims are objected to for their dependence upon claims 1, 12, and 19.
Appropriate correction is required.


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

4.  Claims are rejected under 35 U.S.C. 103 as being unpatentable over Gibiansky (“Bringing HPC Techniques to Deep Learning”), Sridharan et al (US 2019/0205745, herein Sridharan), Duenner et al (US 2020/0184368, herein Duenner), and Cho et al (US 2020/0311539, herein Cho).

Regarding claim 12, Gibiansky teaches a data processing system, comprising:
a plurality of GPUs, wherein each of the GPUs is configured to perform an artificial intelligence (AI) data processing (DP) operations (Page 1, summary), the operations including training an AI model based on a set of training data (Page 5 scatter-reduce & Page 1, summary) distributed across a plurality of GPUs that are arranged in a logical ring to train the AI model, wherein each of the plurality of GPUs includes a subset of the training data (Pages 2 & 4-5),
generating, from the subset of the set of training data on the GPU, a plurality of data blocks representing gradients for updating parameters of the AI model (Gibiansky conclusion, gradients are a derivative that must be generated for the training),
perform, by each the plurality of GPUs concurrently, a plurality of DP iterations (Pages 2, 9-11, parallel operation), including for each of the DP iterations:
performing an operation on one of the plurality of data blocks on the GPU (Pages 9-11, predetermined operations),
receiving, by the GPU, a data block from an upstream GPU of the GPU in the logical ring (Pages 1-5, 9-13, each GPU receives a result and transmitting data blocks between GPUs),

transmitting, by the GPU, the first respective result to a downstream GPU of the GPU in the logical ring via an inter-processor link (Pages 10-11, 13, transmitting data blocks to downstream GPUs).
Gibiansky does not explicitly teach that at least one central processing unit (CPU) is coupled to the GPUs and that operations are distributed from the CPU, the operations including receiving a request from a CPU for training the AI model and that the plurality of data blocks are distributed from the CPU, or wherein the GPUs are general purpose processing units, or wherein each of the plurality of GPUs includes a hardware compression module, wherein the GPUs perform  compression operation using the hardware compression module to generate a first compressed data block.

Sridharan teaches general purpose GPUs for processing ([0229], [0258]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to combine the teachings of Gibiansky and Sridharan to implement general purpose GPus for processing in the AI model training system.  This would merely be the simple substitution of one known element for another to obtain predictable results, and thus would have been obvious to one of ordinary skill in the art.

Gibiansky and Sridharan fail to teach that at least one central processing unit (CPU) is coupled to the GPUs and that operations are distributed from the CPU, the operations including receiving a request from a CPU for training the AI model and that the plurality of data blocks are distributed from the CPU, or wherein each of the plurality of GPUs includes a hardware compression module, wherein the GPUs perform  compression operation using the hardware compression module to generate a first compressed data block.

It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to combine the teachings of Gibiansky and Sridharan with those of Duenner to utilize a host CPU connected to the GPUs.  The GPUs would receive the request for training the AI model and the training data would be distributed from the CPU.  One of ordinary skill in the art would be motivated to do so to provide a host for managing and scheduling the workloads for the GPUs.

Gibiansky, Sridharan, and Duenner fail to teach wherein each GPU wherein the GPUs perform a compression operation using the hardware compression module to generate a first compressed data block.
Cho teaches a system comprising a hardware compression module (Fig 5, [0067-0071], compression component 570) for compressing gradient data and performing an addition operation using the compressed data (Fig 9, steps 912-914, [0092], Fig 7).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to combine the teachings of Gibiansky, Sridharan, and Duenner with those of Cho to implement each GPU compressing gradient data and utilizing the compressed data.  Doing so would reduce the transmission and communication bandwidth between the GPUs of the logical ring.  This would merely be a combination of known prior art elements to achieve predictable results, and thus would have been obvious to one of ordinary skill in the art.

Regarding claim 17, the combination of Gibiansky, Sridharan, Duenner, and Cho teaches the system of claim 12, wherein upon a completion of the plurality of DB iterations, the GPU includes a compressed copy of at least one data block of the plurality of data blocks from each of the plurality of GPUs (Gibiansky Pages 10-13, combining data blocks from GPUs of the ring & Cho Figs 5, 7, 9, [0067-0071], compressing data).


Regarding claim 29, the combination of Gibiansky, Sridharan, Duenner, and Cho teaches the system of claim 12, wherein each subset of the set of training data is divided into a plurality of data chunks, wherein a number of the plurality of data chunks is equal to a number of the plurality of GPUs in the logic ring, and wherein each of the plurality of data blocks representing the gradients is generated from one of the plurality of data chunks (Gibiansky pages 5-6, one data chunk per GPU on the ring).

Regarding claim 31, the combination of Gibiansky, Sridharan, Duenner, and Cho teaches the system of claim 12, wherein the compressing of the first data block includes using a zero-value compression algorithm, which compresses the first data blocks into a data structure having a bitmask section and a compressed data section, wherein the bitmask includes bits indicating positions in the data blocks having non-zero values (Cho Fig 6, [0090], bits indicate positions with non-zero values).

Claims 1, 6, 7, 11, and 27 refer to a method embodiment of the system embodiment of claims 12, 17, 18, 31, and 29, respectively.  Therefore, the above rejections for claims 12, 17, 18, 31, and 29, are applicable to claims 1, 6, 7, 11, and 27, respectively.

Regarding claim 28, the combination of Gibiansky, Sridharan, Duenner, and Cho teaches the method of claim 1, wherein each of the plurality of GPUs has a complete copy of the AI model (Gibiansky p2, “each GPU has a complete copy of the entire neural network model”).

.

5.  Claims 8, 23, and 30 are rejected under 35 U.S.C. 103 as being unpatentable over Gibiansky, Sridharan, Duenner, and Cho in view of Koenen et al (“CCIX: a new coherent multichip interconnect for accelerated use cases”).

Regarding claim 8, the combination of Gibiansky, Sridharan, Duenner, and Cho teaches the method of claim 1, but fails to teach that the inter-processor link comprises a Cache Coherent Interconnect for Accelerators (CCIX) connection.
Koenen teaches CCIX connection between processors (page 8).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to combine the teachings of Gibiansky, Sridharan, Duenner, and Cho with those of Koenen to implement the inter-processor link as a CCIX connection.  This would merely be the simple substitution of one known element for another (interprocessor link with CCIX interconnect) to obtain predictable results, and thus would have been obvious to one of ordinary skill in the art.

Claims 23 and 30 refer to a computer readable medium and system embodiment, respectively, of the method embodiment of claim 8.  Therefore, the above rejection for claim 8 is applicable to claims 23 and 30.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Yosinski (US 2019/0130272) discloses training an AI model utilizing a neural network compression module.
Harik (US 2018/0026649) discloses data compression system for training a machine learning model.
Jin (US 2016/0321777) discloses a training model using GPUs to transmit data between GPUs up and downstream of each other.
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHAEL J METZGER whose telephone number is (571)272-3105. The examiner can normally be reached Monday-Friday 7:30-4.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jyoti Mehta can be reached on 571-270-3995. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/MICHAEL J METZGER/             Primary Examiner, Art Unit 2182