Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION
Status of the Application
This Office Action is in response to Applicant’s Amendment filed on 12/13/2019 and subsequent preliminary amendment filed 12/13/2019.
Claims 1-20 are pending for this examination.
Claims 4, 9-10, 14, and 19-20 were amended.

Amendment to Specifications
The amendments to the specification received on 12/13/2019 are acceptable.

Information Disclosure Statement
The information disclosure statements (IDSes) submitted on 9/01/2020; 10/12/2021; and 6/14/2022 are in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statements are being considered by the examiner.

Claim Rejections - 35 U.S.C. § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.

Claims 1-4, 9, 11-14, and 19 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Zhao et al. (US 2019/0312772), herein referred to as Zhao ‘772.
Referring to claim 1, Zhao ‘772 teaches a data processing system (see Fig. 1, system 100), comprising: 
a central processing unit (CPU) (see Fig. 1, server cluster 160 with GPU server nodes 160-n, which comprises elements including but not limited to CPUs, see Paragraph 0015; see Paragraph 0025, wherein topology information for any given server includes the types and numbers of hardware processor resources including CPUs, GPUs, and other accelerator devices, and intra-node connection topologies including CPU-GPU and GPU-GPU communication; also see Fig. 3A-3E, CPU 303 and 304); and 
a plurality of accelerator cards coupled to the CPU over a bus (see Fig. 1, plurality of GPU server nodes 160-n; see Figs. 3A-3E, wherein the system includes a plurality of GPUs coupled to the GPUs either through a PCIe switches 305 or through a direct connection; see Paragraphs 0052-0056), each of the accelerator cards having a plurality of data processing (DP) accelerators to receive DP tasks from the CPU and to perform the received DP tasks (see Fig. 1, request queue 144 that stores the requests before issuing them for execution, see Figs. 2A-2B, wherein each GPU server node includes a plurality of GPUs; see Paragraph 0057, wherein requested jobs needing GPUs resources can have the resources provisioned from multiple GPU server nodes), wherein at least two of the accelerator cards are coupled to each other via an inter-card connection (see Fig. 2B, wherein server node 0 and server node 1, has the GPUs connected to each other through inter-node connection links 230 and 240), wherein at least two of the DP accelerators are coupled to each other via an inter-chip connection (see Fig. 2A-2B, wherein each GPU is connected to each other through intra-node connection links 202 and 212), 
wherein each of the inter-card connection and the inter-chip connection is capable of being dynamically activated or deactivated, such that in response to a request received from the CPU, any one of the accelerator cards or any one of the DP accelerators within any one of the accelerator cards can be enabled or disabled to process any one of the DP tasks received from the CPU (see Fig. 1, request queue 144; see Paragraphs 0053-0057, wherein requested jobs may specify specific types/models of GPU devices to service the request, i.e. dynamically activating / deactivating specific GPUs, and may need GPU resources to be provisioned from multiple GPU server nodes, i.e. requesting multiple DP accelerators to service the job request).
As to claim 2, Zhao ‘772 teaches the system of claim 1, wherein each of the DP accelerators of each of the accelerator cards includes a plurality of inter-chip interfaces, which can be utilized to interconnect with another one of the DP accelerators of the accelerator card via a respective inter-chip connection (see Fig. 2A-2B, wherein each GPU is connected to each other through intra-node connection links 202 and 212, wherein the interfaces are implied to exist in each GPU to allow for the connection links 202 and 212 to exist between GPUs).
As to claim 3, Zhao ‘772 teaches the system of claim 1, wherein each of the DP accelerators of each of the accelerator cards includes a plurality of inter-chip interfaces, which can be utilized to interconnect with another one DP accelerator of another accelerator card via a respective inter-card connection (see Fig. 2B, wherein server node 0 and server node 1, has the GPUs connected to each other through inter-node connection links 230 and 240, wherein the interfaces are implied to exist in each GPU to allow for the connection links to exist).
As to claim 4, Zhao ‘772 teaches the system of claim 1, wherein the DP accelerators in each of the accelerator cards are arranged in a plurality of rows and columns coupled to each other via one or more inter-chip connections (see Fig. 2B, wherein server node 0 and server node 1, has the GPUs connected to each other through inter-node connection links 230 and 240, wherein the interfaces are implied to exist in each GPU to allow for the connection links to exist; Examiner points out that just the arrangement of devices into rows and columns by itself is not patentably distinct as this is just an arrangement of parts with nothing unique about it, i.e. the ring arrangement of GPUs as seen in Figs. 2A-2B, and the CPU to GPU arrangements of Figs. 3A-3E could be arranged into rows and columns if desired as the physical arrangement does not alter the functionality).
As to claim 9, Zhao ‘772 teaches the system of claim 1, wherein each of the DP accelerators comprises an artificial intelligence (AI) accelerator chip (see Paragraph 0081, wherein the control server node accesses resource usage database 148 to evaluate GPU devices and determines an optimal communication order of selected GPU devices, i.e. implementing distributed deep learning (DL) training jobs as part of a distributed DL training process; see Paragraph 0019, wherein the GPU processing services are implemented for deep learning applications, to which Examiner points out that deep learning and machine learning are both terms associated with AI, thereby making the GPU server nodes used for AI training / tasks).

Referring to claim 11, Zhao ‘772 teaches an accelerator card (see Fig. 1, GPU server nodes 160-n), comprising: 
a host interface to be coupled to a central processing unit (CPU) over a bus (see Fig. 1, server cluster 160 with GPU server nodes 160-n, which comprises elements including but not limited to CPUs, see Paragraph 0015; see Paragraph 0025, wherein topology information for any given server includes the types and numbers of hardware processor resources including CPUs, GPUs, and other accelerator devices, and intra-node connection topologies including CPU-GPU and GPU-GPU communication; also see Fig. 3A-3E, CPU 303 and 304, wherein Examiner points out that the CPUs would inherently have an interface for connecting to other devices, such as other CPUs, GPUs, etc., which is not specifically shown in Figs. 3A-3E but is implied to exist as the connection links exist between the CPU and other devices); and 
a plurality of data processing (DP) accelerators to receive DP tasks from the CPU and to perform the received DP tasks (see Fig. 1, request queue 144 that stores the requests before issuing them for execution, see Figs. 2A-2B, wherein each GPU server node includes a plurality of GPUs; see Paragraph 0057, wherein requested jobs needing GPUs resources can have the resources provisioned from multiple GPU server nodes), wherein at least two of the DP accelerators are coupled to each other via an inter-chip connection (see Fig. 2A-2B, wherein each GPU is connected to each other through intra-node connection links 202 and 212), 
wherein each inter-chip connection is capable of being dynamically activated or deactivated, such that in response to a request received from the CPU via the host interface, any one of the DP accelerators can be enabled or disabled to process any one of the DP tasks received from the CPU (see Fig. 1, request queue 144; see Paragraphs 0053-0057, wherein requested jobs may specify specific types/models of GPU devices to service the request, i.e. dynamically activating / deactivating specific GPUs, and may need GPU resources to be provisioned from multiple GPU server nodes, i.e. requesting multiple DP accelerators to service the job request).
As to claim 12, Zhao ‘772 teaches the accelerator card of claim 11, wherein each of the DP accelerators includes a plurality of inter-chip interfaces, which can be utilized to interconnect with another one of the DP accelerators via a respective inter-chip connection. (see Fig. 2A-2B, wherein each GPU is connected to each other through intra-node connection links 202 and 212, wherein the interfaces are implied to exist in each GPU to allow for the connection links 202 and 212 to exist between GPUs). 
As to claim 13, Zhao ‘772 teaches the accelerator card of claim 11, wherein each of the DP accelerators includes a plurality of inter-chip interfaces, which can be utilized to interconnect with another one DP accelerator of another accelerator card via a respective inter-card connection (see Fig. 2B, wherein server node 0 and server node 1, has the GPUs connected to each other through inter-node connection links 230 and 240, wherein the interfaces are implied to exist in each GPU to allow for the connection links to exist).
As to claim 14, Zhao ‘772 teaches the accelerator card of claim 11, wherein the DP accelerators are arranged in a plurality of rows and columns coupled to each other via one or more inter-chip connections (see Fig. 2B, wherein server node 0 and server node 1, has the GPUs connected to each other through inter-node connection links 230 and 240, wherein the interfaces are implied to exist in each GPU to allow for the connection links to exist; Examiner points out that just the arrangement of devices into rows and columns by itself is not patentably distinct as this is just an arrangement of parts with nothing unique about it, i.e. the ring arrangement of GPUs as seen in Figs. 2A-2B, and the CPU to GPU arrangements of Figs. 3A-3E could be arranged into rows and columns if desired as the physical arrangement does not alter the functionality).
As to claim 19, Zhao ‘772 teaches the accelerator card of claim 11, wherein each of the DP accelerators comprises an artificial intelligence (AI) accelerator chip (see Paragraph 0081, wherein the control server node accesses resource usage database 148 to evaluate GPU devices and determines an optimal communication order of selected GPU devices, i.e. implementing distributed deep learning (DL) training jobs as part of a distributed DL training process; see Paragraph 0019, wherein the GPU processing services are implemented for deep learning applications, to which Examiner points out that deep learning and machine learning are both terms associated with AI, thereby making the GPU server nodes used for AI training / tasks).

Claim Rejections - 35 U.S.C. § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 10 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Zhao ‘772 in view of Marolia et al. (US 2019/0297015), herein referred to as Marolia ‘015.
As to claim 10, Zhao ‘772 teaches the system of claim 1, wherein the bus comprises a peripheral component interconnect express (PCIe) link or an Ethernet connection (see Paragraphs 0025-0032, wherein the connections can be of PCIe protocol and switches can be PCIe switches).
However, Zhao ‘772 does not teach the inter-chip connections or an inter-card connections comprises a cache coherent interconnect for accelerators (CCIX) link.
Marolia ‘015 teaches a communication system for heterogeneous computing elements (see Abstract), wherein an accelerator fabric can be used to connect GPUs or accelerators together including the usage of PCIe switching and CCIX (see Paragraph 0017).
Zhao ‘772 and Marolia ‘015 apply as analogous prior arts as both pertain to the same field of endeavor of being network type computer system with accelerators that communicate in between connected devices.
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Zhao ‘772 system as set forth above to include the usage of CCIX protocols for connecting accelerators / GPUs together, as taught by Marolia ‘015, as a person of ordinary skill in the art would have recognized that CCIX is a communication protocols sometimes associated with PCIe that implements cache coherency while connecting two devices together for communication, wherein a person of ordinary skill in the art would have been motivated to utilize CCIX for connecting together accelerators and GPUs as implementing cache coherence means that the memories of each connected device has the same data stored in the same areas and thus allowing for easy checks for errors that may occur during instances of replication or moving of data.
As to claim 20, Zhao ‘772 teaches the accelerator card of claim 11, wherein the bus comprises a peripheral component interconnect express (PCIe) link or an Ethernet connection (see Paragraphs 0025-0032, wherein the connections can be of PCIe protocol and switches can be PCIe switches).
However, Zhao ‘772 does not teach the inter-chip connections or an inter-card connections comprises a cache coherent interconnect for accelerators (CCIX) link.
Marolia ‘015 teaches a communication system for heterogeneous computing elements (see Abstract), wherein an accelerator fabric can be used to connect GPUs or accelerators together including the usage of PCIe switching and CCIX (see Paragraph 0017).
Zhao ‘772 and Marolia ‘015 apply as analogous prior arts as both pertain to the same field of endeavor of being network type computer system with accelerators that communicate in between connected devices.
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Zhao ‘772 system as set forth above to include the usage of CCIX protocols for connecting accelerators / GPUs together, as taught by Marolia ‘015, as a person of ordinary skill in the art would have recognized that CCIX is a communication protocols sometimes associated with PCIe that implements cache coherency while connecting two devices together for communication, wherein a person of ordinary skill in the art would have been motivated to utilize CCIX for connecting together accelerators and GPUs as implementing cache coherence means that the memories of each connected device has the same data stored in the same areas and thus allowing for easy checks for errors that may occur during instances of replication or moving of data.

Allowable Subject Matter
Claims 5-8 and 15-18 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
As to claims 5 and 15, Examiner finds that prior arts which does teach accelerator rings / GPU rings, but does not specifically teach each row of the DP accelerators of a first of the accelerator cards are coupled in series via respective horizontal inter-chip connections, forming a horizontal accelerator ring.
As to claims 6 and 16, Examiner finds that prior arts which does teach accelerator rings / GPU rings, but does not specifically teach each column of the first accelerator card is coupled to a corresponding column of a second of the accelerator cards via one or more respective inter-card connections, forming a vertical accelerator ring.
As to claims 7 and 17, Examiner finds that prior arts which does teach accelerator rings / GPU rings, but does not specifically teach each column of the DP accelerators of a first accelerator card are coupled in series via respective vertical inter-chip connections, forming a vertical accelerator ring.
As to claims 8 and 18, Examiner finds that prior arts which does teach accelerator rings / GPU rings, but does not specifically teach each row of the first accelerator card is coupled to a corresponding row of a second of the accelerator cards via one or more respective inter-card connections, forming a horizontal accelerator ring.

Relevant Prior Art
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Singh et al. (US 2020/0081850) teaches a system with multiple hardware accelerators connected to a host, wherein the hardware accelerators connect to each other using PCIe in a ring topology and the system implements a unified address space.

Contact Information
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHAEL SUN whose telephone number is (571)270-1724.  The examiner can normally be reached on Monday-Friday 8am-4pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Aimee Li can be reached on 571-272-4169.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/MICHAEL SUN/Primary Examiner, Art Unit 2183