Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Election/Restrictions
Applicant’s election without traverse of Species I in the reply filed on 10/18/2021 is acknowledged.  Because applicant did not distinctly and specifically point out the supposed errors in the restriction requirement, the election has been treated as an election without traverse.  Claims 1-10 and 16-20 have been identified by the Applicant as corresponding to the elected species while claims 11-15 are withdrawn from further consideration at this time for being directed towards non-elected Species II.

Specification
The disclosure is objected to because of the following informalities: 
The title of the invention is not descriptive. A new title is required that is clearly indicative of the invention to which the claims are directed.

The use of the terms NVlink, OpenCL, 3D XPoint, Nano-Ram, gRPC, zeroMQ, TensorFlow, Infiniband, Vulkan, and OpenGL, which are all trade names or a mark used in commerce, have been noted in this application. The terms should be accompanied by the generic terminology; furthermore the terms should be capitalized wherever it appears or, where appropriate, include a proper symbol indicating use in commerce such as ™, SM, or ® following the term.

Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 

(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation(s) is/are: 
“a general-purpose processor to” in claim 1.
“a network interface to” in claim 1.
“a graphics processor to” in claim 1.
“The machine learning framework workflow to” in claim 1.

Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.

Claim 9 rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.

Regarding Claim 9: claim 9 is indefinite for use of registered trademark NVLink as a limitation.  In the present case, the trademark/trade name is used to identify/describe conformance with a continuously changing communications protocol set forth by the NVLink standard and, accordingly, the identification/description is indefinite.  For examination purposes, the claim was construed to refer to any of the various NVLink communication variants.

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claims 1 and 16 are rejected under 35 U.S.C. 102 as being unpatentable over Zou (US 2016/0321776 A1).

 Regarding claim 1, Zou teaches A system to configure distributed training of a neural network, the system comprising: ([¶0066] "FIG. 1 The memory 102 may be configured to store a software program and module, such as a program" [¶0038] "FIG. 8 is a schematic diagram of gradient and parameter updating during training of a convolutional neural network (CNN) model;" )
 memory to store a library to facilitate transmission of data during distributed training of the neural network, the data associated with trainable parameters of the neural network; ([¶0083] "when the training begins, each worker group replicates the data on the main memory to a video memory of a corresponding GPU (e.g., through cudaMemcpyHostToDevice calling in NVIDIA's CUDA)" ) 
a network interface to transmit and receive gradient data associated with the trainable parameters; ([¶0065-0067] "FIG. 2 the server 100 includes a memory 102, one or more processors 104, a memory controller 106, a peripheral interface 108 and one or more GPUs 110. It may be understood that FIG. 2 merely shows a schematic structure, but does not limit the structure of the server 100." Gradient data described in [¶0061] "computing an updated gradient on each hidden layer through an updated error value, and finally feeding back the updated gradient to the input." to update and feed back input interpreted as synonymous with receiving and sending. ) 
a general-purpose processor to execute instructions provided by the library ([¶0066] "FIG. 1 and the processors 104 execute different functional applications and perform data processing by running the software program and module stored in the memory 102"), the instructions to cause the general-purpose processor to configure the network interface to transmit and receive the gradient data associated with the trainable parameters during a workflow of a machine learning framework; and ([¶0038] "FIGS. 9-11 are schematic diagrams of parameter exchange in a data processing method according to one embodiment of the present invention"  [¶0067]  "FIG. 1 the memory 102 may further include memories remotely disposed relative to the processor 106, and these remote memories may be connected to an electronic terminal 100 through a network" Gradient data described in [¶0061] "(explanation of DNN)  computing an updated gradient on each hidden layer through an updated error value, and finally feeding back the updated gradient to the input." to update and feed back input interpreted as synonymous with receiving and sending. ) 
a graphics processor to perform compute operations associated with machine learning framework workflow ([¶0063] "multi-GPU technology makes effective use of characteristics of parallelism, which can speed up the training process of the DNN.") to generate the gradient data associated with the trainable parameters (( [¶0061] "computing an updated gradient on each hidden layer through an updated error value, and finally feeding back the updated gradient to the input"), wherein, based on the machine learning framework workflow, the library is to interleave the compute operations on the graphics processor with transmission and receipt of gradient data via the network interface. ([0089] " It should be noted that, the parameter server herein may be a server configured to update parameters connected with the server 100 through a network, and may also be the server 100 per se, that is to say, the server 100 has a synchronization module configured to synchronize parameters between different GPUs." See also FIG. 9 to interleave the compute operations on the graphics processor is interpreted as synonymous with characteristics of parallelism.  To update and feed back input interpreted as synonymous with receiving and sending.).

Regarding claim 16, claim 16 effectively mirrors claim 1 and is therefore rejected under a similar interpretation.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 2-6 and 17-20 are rejected under 35 U.S.C. 103 as being unpatentable over Zou and in view of the Message Passing Interface Forum (“MPI: A Message-Passing Interface Standard Version 3.1”, 2015).

Regarding claim 2, Zou teaches the system as in claim 1.  However, Zou does not explicitly teach wherein a compute operation is configured to overlap with a communication operation to send or receive gradient data via the network interface.

 The Message Passing Interface Forum who teaches a related art of parallelizing compute operations teaches wherein a compute operation is configured to overlap with a communication operation to send or receive gradient data via the network interface. ([Chapter 5 section 12] "performance of many applications can be improved by overlapping communication and computation, and many systems enable this" See FIG. 9 in Zou for sending and receiving gradient data.). 

Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to overlap communication and computation in the parallelized neural network system of Zou. The combination would have been obvious because a person of ordinary skill in the art would be able to determine from the Message Passing Interface Forum ([Chapter 5 section 12] "performance of many applications can be improved by overlapping communication and computation, and many systems enable this").

Regarding claim 3, the combination of Zou, and the Message Passing Interface Forum teaches The system as in claim 2, the machine learning framework workflow to cause the graphics processor to perform a compute operation associated with a first portion of a first layer of the neural network. (Zou [¶0061] " the backward propagation means back-propagating output errors in a certain form layer by layer through each hidden layer, computing an updated gradient on each hidden layer through an updated error value, and finally feeding back the updated gradient to the input layer" [¶0091] "In order to further enhance the parameter exchange efficiency, this embodiment of the present invention designs a linear topological manner of parameter exchange for a multi-GPU scenario: a storage model parameter matrix and a storage gradient matrix are equally divided into partitions spatially, the number of partitions depends on the number of data parallel groups." Zou explicitly teaches that gradients are computed for each layer, and similarly teaches a method of splitting up parameters and gradients such that a gradient should be considered synonymous with a portion of a layer.).

 Regarding claim 4, the combination of Zou, and the Message Passing Interface Forum teaches The system as in claim 3, wherein in response to a notification of completion of the compute operation associated with the first portion of the first layer of the neural network, the library is to cause the network interface to transmit a result of the compute operation. (the Message Passing Interface Forum [p. 484 Chapter 12 Section 4] "In a thread-compliant implementation, an MPI process is a process that may be multithreaded. Each thread can issue MPI calls; however, threads are not separately addressable: a rank in a send or receive call identifies a process, not a thread. A message sent to a process can be received by any thread in this process." See also MPI_GREQUEST_COMPLETE function callback.  Claim limitation is interpreted as synonymous with thread callbacks which are well known in the art and further described in the MPI standards.).

Regarding claim 5, the combination of Zou, and the Message Passing Interface Forum teaches The system as in claim 4, the network interface to transmit the result according to a communication pattern for messages to be transmitted between worker nodes during distributed training of the neural network. (Zou [¶0078] "After the data is replicated to the video memory, the GPU takes out the mini-batch data each time, to perform mini-batch training, a gradient Δw is obtained according to a result of the mini-batch training…in this way, the plurality of GPUs in the parallel training all has the latest model parameter.").

Regarding claim 6, the combination of Zou, and the Message Passing Interface Forum teaches The system as in claim 5, wherein the communication pattern is a gather, scatter, allgather, alltoall, reduce, reduce scatter, or all-reduce. (the Message Passing Interface Forum See Chapter 5 Section 1 for description of each of these patterns as known communication patterns.).

Regarding claim 17, claim 17 effectively mirrors claim 2 and is therefore rejected under a similar interpretation.

Regarding claim 18, claim 18 effectively mirrors claim 3 and is therefore rejected under a similar interpretation.

Regarding claim 19, claim 19 effectively mirrors claim 4 and is therefore rejected under a similar interpretation.

Regarding claim 20, claim 20 effectively mirrors claim 5 and is therefore rejected under a similar interpretation.

Claim 7 is rejected under 35 U.S.C. 103 as being unpatentable over Zou and in view of Kancherla.

Regarding claim 7, Zou teaches The system as in claim 1.  However, Zou does not explicitly teach wherein the network interface is a fabric interface to enable a connection to a communication fabric, the communication fabric to interconnect worker nodes of a distributed training network.

Kancherla who teaches a related art of accessing data over a network fabric teaches wherein the network interface is a fabric interface to enable a connection to a communication fabric, the communication fabric to interconnect worker nodes of a distributed training network. ([¶0029] FIG. 2 "Switch fabric module 206 and I/O module 208 are also part of the data plane of network switch 200. Switch fabric module 206 is configured to interconnect the other modules of network switch 200".).

Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the network fabric in Kancherla with the parallelized GPU network in Zou. The combination would have been obvious because a person of ordinary skill in the art would be able to determine a similar motivation from Kancherla ([¶0002] “By ensuring that each core is assigned a proportional share of the incoming traffic, processing bottlenecks can be avoided and the overall throughput/performance of the network device can be increased.”).

Claims 8-10 are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Zou and Kancherla and in further view of Raikin (US 2016/0077976 A1).

Regarding claim 8, the combination of Zou and Kancherla teaches The system as in claim 7.  However, the combination of Zou and Kancherla does not explicitly teach wherein the fabric interface is a peripheral component interconnect express interface.

Raikin who teaches a related art of accessing data over a network fabric teaches wherein the fabric interface is a peripheral component interconnect express interface. ([¶0003] "In various computer systems, peripheral devices communicate over a network fabric such as a PCI or PCI-Express (PCIe) bus".). 

Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the parallelized GPU in Zou and Kancherla with the network fiber in Raikin. The combination would have been obvious because a person of ordinary skill in the art would be able to determine from Raikin ([¶0004] “As another example, U.S. Patent Application Publication 2014/0055467, whose disclosure is incorporated herein by reference, describes a system that may include a Graphics Processing Unit (GPU) and a Field Programmable Gate Array (FPGA). The system may further include a bus interface that is external to the FPGA, and that is configured to transfer data directly between the GPU and the FPGA without storing the data in a memory of a central processing unit (CPU) as an intermediary operation.”).

Regarding claim 9, Zou teaches the system as in claim 7.  However, Zou does not explicitly teach wherein the fabric interface is an NVLink interface.

However, Kancherla does not explicitly teach the system as in claim 7, wherein the fabric interface is an NVLink interface.

Raikin who teaches a related art of accessing data over a network fabric teaches the system as in claim 7, wherein the fabric interface is an NVLink interface. ([¶0025] "The description that follows refers mainly to a PCIe bus, which is a type of a switched fabric, but the disclosed techniques apply to other interconnect configurations such as PCI, AXI, NVLINK, AMBA, HyperTransport and QPI.").

Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the parallelized GPU in Zou and Kancherla with the network fiber in Raikin. The combination would have been obvious because a person of ordinary skill in the art would be able to determine from Raikin ([¶0004] “As another example, U.S. Patent Application Publication 2014/0055467, whose disclosure is incorporated herein by reference, describes a system that may include a Graphics Processing Unit (GPU) and a Field Programmable Gate Array (FPGA). The system may further include a bus interface that is external to the FPGA, and that is configured to transfer data directly between the GPU and the FPGA without storing the data in a memory of a central processing unit (CPU) as an intermediary operation.”).

 Regarding claim 10, Zou teaches The system as in claim 7.  However, Zou does not explicitly teach, wherein the graphics processor includes at least a portion of the fabric interface.

Raikin who teaches a related art of accessing data over a network fabric teaches The system as in claim 7, wherein the graphics processor includes at least a portion of the fabric interface ([¶0047] FIG. 1 "Each GPU 44 in FIG. 1 comprises… the internal address space that the GPU uses for accessing GPU memory 60, and respective physical addresses of fabric 40.").

Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the parallelized GPU in Zou and Kancherla with the network fiber in Raikin. The combination would have been obvious because a person of ordinary skill in the art would be able to determine from Raikin ([¶0004] “As another example, U.S. Patent Application Publication 2014/0055467, whose disclosure is incorporated herein by reference, describes a system that may include a Graphics Processing Unit (GPU) and a Field Programmable Gate Array (FPGA). The system may further include a bus interface that is external to the FPGA, and that is configured to transfer data directly between the GPU and the FPGA without storing the data in a memory of a central processing unit (CPU) as an intermediary operation.”).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Tournavitis (“Towards a Holistic Approach to Auto-Parallelization Integrating Profile-Driven Parallelism Detection and Machine-Learning Based Mapping”, 2009).
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SIDNEY VINCENT BOSTWICK whose telephone number is (571)272-4720.  The examiner can normally be reached on M-F 7:30am-5:00pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Miranda Huang can be reached on (571)270-7092.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/SB/Examiner, Art Unit 2124                                                                                                                                                                                                        



/MIRANDA M HUANG/Supervisory Patent Examiner, Art Unit 2124