DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This application, filed on 09/12/2018, claims foreign priority to Application No. 2017-241778 filed in Japan on 12/18/2017 and Application No. 2018-159500 filed in Japan on 08/28/2018. Claims 1-9 are pending and have been examined.

Information Disclosure Statement
The information disclosure statement (IDS) was submitted on 09/12/2018.  The submission is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Specification
The title of the invention is not descriptive.  A new title is required that is clearly indicative of the invention to which the claims are directed. The following title is suggested: System for Distributed Processing of Nodes.

Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation(s) are: 
Specification paragraph numbers are based on PGPUB US20190188563A1
Claim 1:
the first node is configured to calculate a first gradient to update a first weight of objective function to a second weight and the second node is configured to calculate a second gradient to update the first weight of the objective function to the second weight (Specification [0020])
the third node is configured to calculate a third gradient to update a third weight of the objective function to a fourth weight and the fourth node is configured to calculate a fourth gradient to update the third weight of the objective function to the fourth weight (Specification [0020])

Claim 2:
the first node is configured to calculate a first gradient to update a first weight of objective function to a second weight and the second node is configured to calculate a second gradient to update the first weight of the objective function to the second weight (Specification [0020])
the third node is configured to calculate a third gradient to update a third weight of the objective function to a fourth weight and the fourth node is configured to calculate a fourth gradient to update the third weight of the objective function to the fourth weight (Specification [0020])
Claim 3:
the first node is configured to calculate a first gradient to update a first weight of objective function to a second weight and the second node is configured to calculate a second gradient to update the first weight of the objective function to the second weight (Specification [0020])
the third node is configured to calculate a third gradient to update a third weight of the objective function to a fourth weight and the fourth node is configured to calculate a fourth gradient to update the third weight of the objective function to the fourth weight (Specification [0020])
Claim 5:
wherein the server node is configured to calculate the second weight and the fourth weight (Specification [0042]: “FIG. 3 shows an example of the system structure of the server node 20 of FIG. 2. The server node 20 includes, for example, CPU 201, system controller 202, main memory 203, BIOS-ROM 204, nonvolatile memory 205, communication device 206, and embedded controller (EC) 207”)
the first node is configured to transmit the second weight transmitted from the server node to the second node (Specification [0020]) 
the third node is configured to transmit the fourth weight transmitted from the server node to the fourth node (Specification [0020])
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.

Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.

The following is a quotation of the first paragraph of pre-AIA  35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.

Claims 1-9 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA  35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention. 
Claim limitations in claims 1, 2, 3, and 5 (as indicated in the Claim Interpretation section) invokes 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. However, the written description fails to disclose the corresponding structure, material, or acts for performing the entire claimed function and to clearly link the structure, material, or acts to the function. In particular, Specification [0020] merely reiterates the claim language, but does not provide description of the corresponding structure for the claimed “first node”, “second node”, “third node”, and “fourth node” in claims 1, 2, 3, and 5. Therefore, claims 1, 2, 3, and 5 lack written description and are rejected under 35 U.S.C. 112(a). See MPEP 2181 IV (“Merely restating a function associated with a means-plus-function limitation is insufficient to provide the corresponding structure for definiteness. See, e.g., Noah, 675 F.3d at 1317, 102 USPQ2d at 1419; Blackboard, 574 F.3d at 1384; Aristocrat, 521 F.3d at 1334, 86 USPQ2d at 1239. It follows therefore that such a mere restatement of function in the specification without more description of the means that accomplish the function would also likely fail to provide adequate written description under section 112(a) or pre-AIA  section 112, first paragraph”).
Dependent claims 4-5 and 7-9 are rejected based on the same rationale as claim 1. Dependent claim 6 is rejected based on the same rationale as claim 3.

The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 1-9 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claim limitations in claims 1, 2, 3, and 5 (as indicated in the Claim Interpretation section) invokes 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. However, the written description fails to disclose the corresponding structure, material, or acts for performing the entire claimed function and to clearly link the structure, material, or acts to the function. In particular, Specification [0020] merely reiterates the claim language, but does not provide description of the corresponding structure for the claimed “first node”, “second node”, “third node”, and “fourth node” in claims 1, 2, 3, and 5. Moreover, regarding “server node” of claim 5, Specification [0042]: “FIG. 3 shows an example of the system structure of the server node 20 of FIG. 2. The server node 20 includes, for example, CPU 201, system controller 202, main memory 203, BIOS-ROM 204, nonvolatile memory 205, communication device 206, and embedded controller (EC) 207” (emphasis added) does not clearly and definitively establish the corresponding structure for the “server node” because the Specification specifically identifies that the description is provided as an example. Therefore, claims 1, 2, 3, and 5 are indefinite and are rejected under 35 U.S.C. 112(b) or pre-AIA  35 U.S.C. 112, second paragraph. For examination purposes, “first node”, “second node”, “third node”, “fourth node”, and “server node” are interpreted as being implemented by a computer or a processor.
Applicant may:
(a)        Amend the claim so that the claim limitation will no longer be interpreted as a limitation under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph; 
(b)        Amend the written description of the specification such that it expressly recites what structure, material, or acts perform the entire claimed function, without introducing any new matter (35 U.S.C. 132(a)); or 
(c)        Amend the written description of the specification such that it clearly links the structure, material, or acts disclosed therein to the function recited in the claim, without introducing any new matter (35 U.S.C. 132(a)).
If applicant is of the opinion that the written description of the specification already implicitly or inherently discloses the corresponding structure, material, or acts and clearly links them to the function so that one of ordinary skill in the art would recognize what structure, material, or acts perform the claimed function, applicant should clarify the record by either: 
(a)        Amending the written description of the specification such that it expressly recites the corresponding structure, material, or acts for performing the claimed function and clearly links or associates the structure, material, or acts to the claimed function, without introducing any new matter (35 U.S.C. 132(a)); or 
(b)        Stating on the record what the corresponding structure, material, or acts, which are implicitly or inherently set forth in the written description of the specification, perform the claimed function. For more information, see 37 CFR 1.75(d) and MPEP §§ 608.01(o) and 2181.

Claims 1, 2, and 3 recite “perform nth (n is a natural number)”, which lacks clarity because it is unclear if the recitation inside of the parentheses is a limitation of the claim. For examination purposes, “perform nth (n is a natural number)” has been interpreted as “perform nth, wherein n is a natural number” (emphasis added).
Claims 1, 2, and 3 recite “perform mth (m is a natural number)”, which lacks clarity because it is unclear if the recitation inside of the parentheses is a limitation of the claim. For examination purposes, “perform mth (m is a natural number)” has been interpreted as “perform mth, wherein m is a natural number” (emphasis added).
Claim 8 recites the limitation "the nodes" in line 7-8.  There is insufficient antecedent basis for this limitation in the claim. For examination purposes, "the nodes" has been interpreted as "the plurality of nodes".
Dependent claims 4-5 and 7-9 are rejected based on the same rationale as claim 1. Dependent claim 6 is rejected based on the same rationale as claim 3.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 1-9 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Regarding Claim 1,
Claim 1 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Step 1 Analysis: Claim 1 is directed to a system, which is directed to a machine, one of the statutory categories.
Step 2A Prong One Analysis: Each of the following limitations:
calculate a first gradient to update a first weight of objective function to a second weight
calculate a second gradient to update the first weight of the objective function to the second weight
calculate a third gradient to update a third weight of the objective function to a fourth weight
calculate a fourth gradient to update the third weight of the objective function to the fourth weight
the second weight updated from the first weight is further updated using the first and second gradients
the fourth weight updated from the third weight is further updated using the first to fourth gradients
as drafted, under the broadest reasonable interpretation, covers mathematical concepts (mathematical relationships, mathematical formulas or equations, mathematical calculations) but for the recitation of mere instructions to apply an exception language and generally linking the use of a judicial exception to a particular technological environment language. In particular, the above limitations in the context of this claim encompass calculate a first gradient to update a first weight of objective function to a second weight (calculation of gradient to update a weight corresponds to mathematical calculation, see Specification [0024]); calculate a second gradient to update the first weight of the objective function to the second weight (calculation of gradient to update a weight corresponds to mathematical calculation, see Specification [0024]); calculate a third gradient to update a third weight of the objective function to a fourth weight (calculation of gradient to update a weight corresponds to mathematical calculation, see Specification [0024]); calculate a fourth gradient to update the third weight of the objective function to the fourth weight (calculation of gradient to update a weight corresponds to mathematical calculation, see Specification [0024]); the second weight updated from the first weight is further updated using the first and second gradients (calculation of gradient to update a weight corresponds to mathematical calculation, see Specification [0024]); the fourth weight updated from the third weight is further updated using the first to fourth gradients (calculation of gradient to update a weight corresponds to mathematical calculation, see Specification [0024]).
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. In particular, the claim recites additional elements that amount to a recitation of the words "apply it" (or an equivalent) or mere instructions to implement an abstract idea or other exception on a computer, which do not integrate a judicial exception into a practical application. See MPEP 2106.05(f). The recitations of “a first node and a second node of a first group; and a third node and a fourth node of a second group, wherein...”, “the first node is configured to”, “the second node is configured to”, “the third node is configured to”, and “the fourth node is configured to” amount to mere instruction to implement an abstract idea of calculating gradient to update weight. Moreover, the recitations of “in a case where the first node and the second node perform nth (n is a natural number) parallel distributed processing”, “in a case where the third node and the fourth node perform mth (m is a natural number) parallel distributed processing”, “if the calculation of gradient by the first node and the second node is faster than the calculation of gradient by the third node and the fourth node, in n+1th parallel distributed processing performed by the first node and the second node”, and “in m+1th parallel distributed processing performed by the third node and the fourth node” amount to generally linking the use of a judicial exception to a particular technological environment (namely the technological environment as described by these limitations), which do not integrate the judicial exception into a practical application. See MPEP 2106.05(h). Accordingly, the additional elements do not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. See MPEP 2106.04(d).  
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements of “a first node and a second node of a first group; and a third node and a fourth node of a second group, wherein...”, “the first node is configured to”, “the second node is configured to”, “the third node is configured to”, and “the fourth node is configured to” amount to recitation of the words "apply it" (or an equivalent) or mere instructions to implement an abstract idea or other exception on a computer, which do not amount to significant more. See MPEP 2106.05(f). Moreover, the additional elements of “in a case where the first node and the second node perform nth (n is a natural number) parallel distributed processing”, “in a case where the third node and the fourth node perform mth (m is a natural number) parallel distributed processing”, “if the calculation of gradient by the first node and the second node is faster than the calculation of gradient by the third node and the fourth node, in n+1th parallel distributed processing performed by the first node and the second node”, and “in m+1th parallel distributed processing performed by the third node and the fourth node” amount to generally linking the use of a judicial exception to a particular technological environment, which do not amount to significantly more. See MPEP 2106.05(h). Therefore, the claim is not patent eligible.
Regarding Claim 2,
Claim 2 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Step 1 Analysis: Claim 2 is directed to a system, which is directed to a machine, one of the statutory categories.
Step 2A Prong One Analysis: Each of the following limitations:
calculate a first gradient to update a first weight of objective function to a second weight
calculate a second gradient to update the first weight of the objective function to the second weight,
calculate a third gradient to update a third weight of the objective function to a fourth weight
calculate a fourth gradient to update the third weight of the objective function to the fourth weight,
the second weight updated from the first weight is further updated using the first to fourth gradients
the fourth weight updated from the third weight is further updated using the third and fourth gradients.
as drafted, under the broadest reasonable interpretation, covers mathematical concepts (mathematical relationships, mathematical formulas or equations, mathematical calculations) but for the recitation of mere instructions to apply an exception language and generally linking the use of a judicial exception to a particular technological environment language. In particular, the above limitations in the context of this claim encompass calculate a first gradient to update a first weight of objective function to a second weight (calculation of gradient to update a weight corresponds to mathematical calculation, see Specification [0024]); calculate a second gradient to update the first weight of the objective function to the second weight (calculation of gradient to update a weight corresponds to mathematical calculation, see Specification [0024]); calculate a third gradient to update a third weight of the objective function to a fourth weight (calculation of gradient to update a weight corresponds to mathematical calculation, see Specification [0024]); calculate a fourth gradient to update the third weight of the objective function to the fourth weight (calculation of gradient to update a weight corresponds to mathematical calculation, see Specification [0024]); the second weight updated from the first weight is further updated using the first to fourth gradients (calculation of gradient to update a weight corresponds to mathematical calculation, see Specification [0024]); the fourth weight updated from the third weight is further updated using the third and fourth gradients (calculation of gradient to update a weight corresponds to mathematical calculation, see Specification [0024]).
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. In particular, the claim recites additional elements that amount to a recitation of the words "apply it" (or an equivalent) or mere instructions to implement an abstract idea or other exception on a computer, which do not integrate a judicial exception into a practical application. See MPEP 2106.05(f). The recitations of “a first node and a second node of a first group; and a third node and a fourth node of a second group, wherein...”, “the first node is configured to”, “the second node is configured to”, “the third node is configured to”, and “the fourth node is configured to” amount to mere instruction to implement an abstract idea of calculating gradient to update weight. Moreover, the recitations of “in a case where the first node and the second node perform nth (n is a natural number) parallel distributed processing”, “in a case where the third node and the fourth node perform mth (m is a natural number) parallel distributed processing”, “if the calculation of gradient by the third node and the fourth node is faster than the calculation of gradient by the first node and the second node, in n+1th parallel distributed processing performed by the first node and the second node”, and “in m+1th parallel
distributed processing performed by the third node and the fourth node” amount to generally linking the use of a judicial exception to a particular technological environment (namely the technological environment as described by these limitations), which do not integrate the judicial exception into a practical application. See MPEP 2106.05(h). Accordingly, the additional elements do not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. See MPEP 2106.04(d).  
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements of “a first node and a second node of a first group; and a third node and a fourth node of a second group, wherein...”, “the first node is configured to”, “the second node is configured to”, “the third node is configured to”, and “the fourth node is configured to” amount to recitation of the words "apply it" (or an equivalent) or mere instructions to implement an abstract idea or other exception on a computer, which do not amount to significant more. See MPEP 2106.05(f). Moreover, the additional elements of “in a case where the first node and the second node perform nth (n is a natural number) parallel distributed processing”, “in a case where the third node and the fourth node perform mth (m is a natural number) parallel distributed processing”, “if the calculation of gradient by the third node and the fourth node is faster than the calculation of gradient by the first node and the second node, in n+1th parallel distributed processing performed by the first node and the second node”, and “in m+1th parallel distributed processing performed by the third node and the fourth node” amount to generally linking the use of a judicial exception to a particular technological environment, which do not amount to significantly more. See MPEP 2106.05(h). Therefore, the claim is not patent eligible.
Regarding Claim 3,
Claim 3 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Step 1 Analysis: Claim 3 is directed to a system, which is directed to a machine, one of the statutory categories.
Step 2A Prong One Analysis: Each of the following limitations:
calculate a first gradient to update a first weight of objective function to a second weight
calculate a second gradient to update the first weight of the objective function to the second weight
calculate a third gradient to update a third weight of the objective function to a fourth weight
calculate a fourth gradient to update the third weight of the objective function to the fourth weight
the second weight updated from the first weight is further updated using the first and second gradients
the fourth weight updated from the third weight is further updated using the first to fourth gradients
the second weight updated from the first weight is further updated using the first to fourth gradients
the fourth weight updated from the third weight is further updated using the third and fourth gradients
as drafted, under the broadest reasonable interpretation, covers mathematical concepts (mathematical relationships, mathematical formulas or equations, mathematical calculations) but for the recitation of mere instructions to apply an exception language and generally linking the use of a judicial exception to a particular technological environment language. In particular, the above limitations in the context of this claim encompass calculate a first gradient to update a first weight of objective function to a second weight (calculation of gradient to update a weight corresponds to mathematical calculation, see Specification [0024]); calculate a second gradient to update the first weight of the objective function to the second weight (calculation of gradient to update a weight corresponds to mathematical calculation, see Specification [0024]); calculate a third gradient to update a third weight of the objective function to a fourth weight (calculation of gradient to update a weight corresponds to mathematical calculation, see Specification [0024]); calculate a fourth gradient to update the third weight of the objective function to the fourth weight (calculation of gradient to update a weight corresponds to mathematical calculation, see Specification [0024]); the second weight updated from the first weight is further updated using the first and second gradients (calculation of gradient to update a weight corresponds to mathematical calculation, see Specification [0024]); the fourth weight updated from the third weight is further updated using the first to fourth gradients (calculation of gradient to update a weight corresponds to mathematical calculation, see Specification [0024]); the second weight updated from the first weight is further updated using the first to fourth gradients (calculation of gradient to update a weight corresponds to mathematical calculation, see Specification [0024]); the fourth weight updated from the third weight is further updated using the third and fourth gradients (calculation of gradient to update a weight corresponds to mathematical calculation, see Specification [0024]).
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. In particular, the claim recites additional elements that amount to a recitation of the words "apply it" (or an equivalent) or mere instructions to implement an abstract idea or other exception on a computer, which do not integrate a judicial exception into a practical application. See MPEP 2106.05(f). The recitations of “a first node and a second node of a first group; and a third node and a fourth node of a second group, wherein...”, “the first node is configured to”, “the second node is configured to”, “the third node is configured to”, and “the fourth node is configured to” amount to mere instruction to implement an abstract idea of calculating gradient to update weight. Moreover, the recitations of “in a case where the first node and the second node perform nth (n is a natural number) parallel distributed processing”, “in a case where the third node and the fourth node perform mth (m is a natural number) parallel distributed processing”, “if the calculation of gradient by the first node and the second node is faster than the calculation of gradient by the third node and the fourth node, in n+1th parallel distributed processing performed by the first node and the second node”, “in m+1th parallel distributed processing performed by the third node and the fourth node”, “if the calculation of gradient by the third node and the fourth node is faster than the calculation of gradient by the first node and the second node, in n+1th parallel distributed processing performed by the first node and the second node”, and “in m+1th parallel distributed processing performed by the third node and the fourth node” amount to generally linking the use of a judicial exception to a particular technological environment (namely the technological environment as described by these limitations), which do not integrate the judicial exception into a practical application. See MPEP 2106.05(h). Accordingly, the additional elements do not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. See MPEP 2106.04(d).  
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements of “a first node and a second node of a first group; and a third node and a fourth node of a second group, wherein...”, “the first node is configured to”, “the second node is configured to”, “the third node is configured to”, and “the fourth node is configured to” amount to recitation of the words "apply it" (or an equivalent) or mere instructions to implement an abstract idea or other exception on a computer, which do not amount to significant more. See MPEP 2106.05(f). Moreover, the additional elements of “in a case where the first node and the second node perform nth (n is a natural number) parallel distributed processing”, “in a case where the third node and the fourth node perform mth (m is a natural number) parallel distributed processing”, “if the calculation of gradient by the first node and the second node is faster than the calculation of gradient by the third node and the fourth node, in n+1th parallel distributed processing performed by the first node and the second node”, “in m+1th parallel distributed processing performed by the third node and the fourth node”, “if the calculation of gradient by the third node and the fourth node is faster than the calculation of gradient by the first node and the second node, in n+1th parallel distributed processing performed by the first node and the second node”, and “in m+1th parallel distributed processing performed by the third node and the fourth node” amount to generally linking the use of a judicial exception to a particular technological environment, which do not amount to significantly more. See MPEP 2106.05(h). Therefore, the claim is not patent eligible.
Regarding Claim 4,
Claim 4 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Step 1 Analysis: Claim 4 is directed to a system, which is directed to a machine, one of the statutory categories.
Step 2A Prong One Analysis: Each of the following limitations:
wherein the second weight is updated using a fifth gradient calculated from the first and second gradients
the fourth weight is updated using a sixth gradient calculated from the third and fourth gradients
as drafted, under the broadest reasonable interpretation, covers mathematical concepts (mathematical relationships, mathematical formulas or equations, mathematical calculations) but for the recitation of mere instructions to apply an exception language and generally linking the use of a judicial exception to a particular technological environment language. In particular, the above limitations in the context of this claim encompass wherein the second weight is updated using a fifth gradient calculated from the first and second gradients (calculation of gradient to update a weight corresponds to mathematical calculation, see Specification [0024]); the fourth weight is updated using a sixth gradient calculated from the third and fourth gradients (calculation of gradient to update a weight corresponds to mathematical calculation, see Specification [0024]).
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. In particular, the claim recites additional elements that amount to a recitation of the words "apply it" (or an equivalent) or mere instructions to implement an abstract idea or other exception on a computer, which do not integrate a judicial exception into a practical application. See MPEP 2106.05(f). The recitations of “a first node and a second node of a first group; and a third node and a fourth node of a second group, wherein...”, “the first node is configured to”, “the second node is configured to”, “the third node is configured to”, and “the fourth node is configured to” amount to mere instruction to implement an abstract idea of calculating gradient to update weight. Moreover, the recitations of “in a case where the first node and the second node perform nth (n is a natural number) parallel distributed processing”, “in a case where the third node and the fourth node perform mth (m is a natural number) parallel distributed processing”, “if the calculation of gradient by the first node and the second node is faster than the calculation of gradient by the third node and the fourth node, in n+1th parallel distributed processing performed by the first node and the second node”, and “in m+1th parallel distributed processing performed by the third node and the fourth node” amount to generally linking the use of a judicial exception to a particular technological environment (namely the technological environment as described by these limitations), which do not integrate the judicial exception into a practical application. See MPEP 2106.05(h). Accordingly, the additional elements do not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. See MPEP 2106.04(d).  
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements of “a first node and a second node of a first group; and a third node and a fourth node of a second group, wherein...”, “the first node is configured to”, “the second node is configured to”, “the third node is configured to”, and “the fourth node is configured to” amount to recitation of the words "apply it" (or an equivalent) or mere instructions to implement an abstract idea or other exception on a computer, which do not amount to significant more. See MPEP 2106.05(f). Moreover, the additional elements of “in a case where the first node and the second node perform nth (n is a natural number) parallel distributed processing”, “in a case where the third node and the fourth node perform mth (m is a natural number) parallel distributed processing”, “if the calculation of gradient by the first node and the second node is faster than the calculation of gradient by the third node and the fourth node, in n+1th parallel distributed processing performed by the first node and the second node”, and “in m+1th parallel distributed processing performed by the third node and the fourth node” amount to generally linking the use of a judicial exception to a particular technological environment, which do not amount to significantly more. See MPEP 2106.05(h). Therefore, the claim is not patent eligible.
Regarding Claim 5,
Claim 5 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Step 1 Analysis: Claim 5 is directed to a system, which is directed to a machine, one of the statutory categories.
Step 2A Prong One Analysis: Each of the following limitations:
calculate the second weight and the fourth weight
as drafted, under the broadest reasonable interpretation, covers mathematical concepts (mathematical relationships, mathematical formulas or equations, mathematical calculations) but for the recitation of mere instructions to apply an exception language, generally linking the use of a judicial exception to a particular technological environment language, and insignificant extra-solution activity language. In particular, the above limitations in the context of this claim encompass calculate the second weight and the fourth weight (calculation of gradient to update a weight corresponds to mathematical calculation, see Specification [0024]).
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. In particular, the claim recites additional elements that amount to a recitation of the words "apply it" (or an equivalent) or mere instructions to implement an abstract idea or other exception on a computer, which do not integrate a judicial exception into a practical application. See MPEP 2106.05(f). The recitations of “a first node and a second node of a first group; and a third node and a fourth node of a second group, wherein...”, “the first node is configured to”, “the second node is configured to”, “the third node is configured to”, “the fourth node is configured to” and “a server node communicatively connected to the first node and the third node, wherein the server node” amount to mere instruction to implement an abstract idea of calculating gradient to update weight. Moreover, the recitations of “in a case where the first node and the second node perform nth (n is a natural number) parallel distributed processing”, “in a case where the third node and the fourth node perform mth (m is a natural number) parallel distributed processing”, “if the calculation of gradient by the first node and the second node is faster than the calculation of gradient by the third node and the fourth node, in n+1th parallel distributed processing performed by the first node and the second node”, and “in m+1th parallel distributed processing performed by the third node and the fourth node” amount to generally linking the use of a judicial exception to a particular technological environment (namely the technological environment as described by these limitations), which do not integrate the judicial exception into a practical application. See MPEP 2106.05(h). Furthermore, the additional elements “the first node is configured to transmit the second weight transmitted from the server node to the second node” and “the third node is configured to transmit the fourth weight transmitted from the server node to the fourth node” amount to mere data gathering by receiving or transmitting data over a network, which is an insignificant extra-solution activity that does not integrate the judicial exception into a practical application. See MPEP 2106.05(g). Accordingly, the additional elements do not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. See MPEP 2106.04(d).  
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements of “a first node and a second node of a first group; and a third node and a fourth node of a second group, wherein...”, “the first node is configured to”, “the second node is configured to”, “the third node is configured to”, “the fourth node is configured to” and “a server node communicatively connected to the first node and the third node, wherein the server node” amount to recitation of the words "apply it" (or an equivalent) or mere instructions to implement an abstract idea or other exception on a computer, which do not amount to significant more. See MPEP 2106.05(f). Moreover, the additional elements of “in a case where the first node and the second node perform nth (n is a natural number) parallel distributed processing”, “in a case where the third node and the fourth node perform mth (m is a natural number) parallel distributed processing”, “if the calculation of gradient by the first node and the second node is faster than the calculation of gradient by the third node and the fourth node, in n+1th parallel distributed processing performed by the first node and the second node”, and “in m+1th parallel distributed processing performed by the third node and the fourth node” amount to generally linking the use of a judicial exception to a particular technological environment, which do not amount to significantly more. See MPEP 2106.05(h). Furthermore, the additional elements “the first node is configured to transmit the second weight transmitted from the server node to the second node” and “the third node is configured to transmit the fourth weight transmitted from the server node to the fourth node” amount to receiving or transmitting data over a network, which is an insignificant extra-solution activity that is well-understood, routine, conventional. See MPEP 2106.05(d) (“The courts have recognized the following computer functions as well‐understood, routine, and conventional functions when they are claimed in a merely generic manner (e.g., at a high level of generality) or as insignificant extra-solution activity. i. Receiving or transmitting data over a network”). Therefore, the claim is not patent eligible.
Regarding Claim 6,
Claim 6 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Step 1 Analysis: Claim 6 is directed to a system, which is directed to a machine, one of the statutory categories.
Step 2A Prong One Analysis: Each of the following limitations:
the fourth weight is further updated using the first to fourth gradients 
the second weight is further updated using the first to fourth gradients
as drafted, under the broadest reasonable interpretation, covers mathematical concepts (mathematical relationships, mathematical formulas or equations, mathematical calculations) but for the recitation of mere instructions to apply an exception language, generally linking the use of a judicial exception to a particular technological environment language, and insignificant extra-solution activity language. In particular, the above limitations in the context of this claim encompass the fourth weight is further updated using the first to fourth gradients (calculation of gradient to update a weight corresponds to mathematical calculation, see Specification [0024]); the second weight is further updated using the first to fourth gradients (calculation of gradient to update a weight corresponds to mathematical calculation, see Specification [0024]).
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. In particular, the claim recites additional elements that amount to a recitation of the words "apply it" (or an equivalent) or mere instructions to implement an abstract idea or other exception on a computer, which do not integrate a judicial exception into a practical application. See MPEP 2106.05(f). The recitations of “a first node and a second node of a first group; and a third node and a fourth node of a second group, wherein...”, “the first node is configured to”, “the second node is configured to”, “the third node is configured to”, and “the fourth node is configured to” amount to mere instruction to implement an abstract idea of calculating gradient to update weight. Moreover, the recitations of “in a case where the first node and the second node perform nth (n is a natural number) parallel distributed processing”, “in a case where the third node and the fourth node perform mth (m is a natural number) parallel distributed processing”, “if the calculation of gradient by the first node and the second node is faster than the calculation of gradient by the third node and the fourth node, in n+1th parallel distributed processing performed by the first node and the second node”, “in m+1th parallel distributed processing performed by the third node and the fourth node”, “if the calculation of gradient by the third node and the fourth node is faster than the calculation of gradient by the first node and the second node, in n+1th parallel distributed processing performed by the first node and the second node”, “in m+1th parallel distributed processing performed by the third node and the fourth node”, “in m+1th parallel distributed processing performed by the third node and the fourth node”, and “in n+1th parallel
distributed processing performed by the first node and the second node” amount to generally linking the use of a judicial exception to a particular technological environment (namely the technological environment as described by these limitations), which do not integrate the judicial exception into a practical application. See MPEP 2106.05(h). Furthermore, the additional elements “wherein, if the calculation of gradient by the first node and the second node is faster than the calculation of gradient by the third node and the fourth node, the first and second gradients are transmitted to the third node and the fourth node” and “if the calculation of gradient by the third node and the fourth node is faster than the calculation of gradient by the first node and the second node, the third and fourth gradients are transmitted to the first node and the second node” amount to mere data gathering by receiving or transmitting data over a network, which is an insignificant extra-solution activity that does not integrate the judicial exception into a practical application. See MPEP 2106.05(g). Accordingly, the additional elements do not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. See MPEP 2106.04(d).  
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements of “a first node and a second node of a first group; and a third node and a fourth node of a second group, wherein...”, “the first node is configured to”, “the second node is configured to”, “the third node is configured to”, and “the fourth node is configured to” amount to recitation of the words "apply it" (or an equivalent) or mere instructions to implement an abstract idea or other exception on a computer, which do not amount to significant more. See MPEP 2106.05(f). Moreover, the additional elements of “in a case where the first node and the second node perform nth (n is a natural number) parallel distributed processing”, “in a case where the third node and the fourth node perform mth (m is a natural number) parallel distributed processing”, “if the calculation of gradient by the first node and the second node is faster than the calculation of gradient by the third node and the fourth node, in n+1th parallel distributed processing performed by the first node and the second node”, “in m+1th parallel distributed processing performed by the third node and the fourth node”, “if the calculation of gradient by the third node and the fourth node is faster than the calculation of gradient by the first node and the second node, in n+1th parallel distributed processing performed by the first node and the second node”, “in m+1th parallel distributed processing performed by the third node and the fourth node”, “in m+1th parallel distributed processing performed by the third node and the fourth node”, and “in n+1th parallel distributed processing performed by the first node and the second node” amount to generally linking the use of a judicial exception to a particular technological environment, which do not amount to significantly more. See MPEP 2106.05(h). Furthermore, the additional elements “wherein, if the calculation of gradient by the first node and the second node is faster than the calculation of gradient by the third node and the fourth node, the first and second gradients are transmitted to the third node and the fourth node” and “if the calculation of gradient by the third node and the fourth node is faster than the calculation of gradient by the first node and the second node, the third and fourth gradients are transmitted to the first node and the second node” amount to receiving or transmitting data over a network, which is an insignificant extra-solution activity that is well-understood, routine, conventional. See MPEP 2106.05(d) (“The courts have recognized the following computer functions as well‐understood, routine, and conventional functions when they are claimed in a merely generic manner (e.g., at a high level of generality) or as insignificant extra-solution activity. i. Receiving or transmitting data over a network”). Therefore, the claim is not patent eligible.
Regarding Claim 7,
Claim 7 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Step 1 Analysis: Claim 7 is directed to a system, which is directed to a machine, one of the statutory categories.
Step 2A Prong One Analysis: Please see analysis of claim 1.
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. In particular, the claim recites additional elements that amount to a recitation of the words "apply it" (or an equivalent) or mere instructions to implement an abstract idea or other exception on a computer, which do not integrate a judicial exception into a practical application. See MPEP 2106.05(f). The recitations of “a first node and a second node of a first group; and a third node and a fourth node of a second group, wherein...”, “the first node is configured to”, “the second node is configured to”, “the third node is configured to”, and “the fourth node is configured to” amount to mere instruction to implement an abstract idea of calculating gradient to update weight. Moreover, the recitations of “in a case where the first node and the second node perform nth (n is a natural number) parallel distributed processing”, “in a case where the third node and the fourth node perform mth (m is a natural number) parallel distributed processing”, “if the calculation of gradient by the first node and the second node is faster than the calculation of gradient by the third node and the fourth node, in n+1th parallel distributed processing performed by the first node and the second node”, “in m+1th parallel distributed processing performed by the third node and the fourth node”, and “wherein a difference of processing speed between the first node and the second node of the first group is a first threshold value or less, and a difference of processing speed between the third node and the fourth node of the second group is a second threshold value or less” amount to generally linking the use of a judicial exception to a particular technological environment (namely the technological environment as described by these limitations), which do not integrate the judicial exception into a practical application. See MPEP 2106.05(h). Accordingly, the additional elements do not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. See MPEP 2106.04(d).  
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements of “a first node and a second node of a first group; and a third node and a fourth node of a second group, wherein...”, “the first node is configured to”, “the second node is configured to”, “the third node is configured to”, and “the fourth node is configured to” amount to recitation of the words "apply it" (or an equivalent) or mere instructions to implement an abstract idea or other exception on a computer, which do not amount to significant more. See MPEP 2106.05(f). Moreover, the additional elements of “in a case where the first node and the second node perform nth (n is a natural number) parallel distributed processing”, “in a case where the third node and the fourth node perform mth (m is a natural number) parallel distributed processing”, “if the calculation of gradient by the first node and the second node is faster than the calculation of gradient by the third node and the fourth node, in n+1th parallel distributed processing performed by the first node and the second node”, “in m+1th parallel distributed processing performed by the third node and the fourth node” and “wherein a difference of processing speed between the first node and the second node of the first group is a first threshold value or less, and a difference of processing speed between the third node and the fourth node of the second group is a second threshold value or less” amount to generally linking the use of a judicial exception to a particular technological environment, which do not amount to significantly more. See MPEP 2106.05(h). Therefore, the claim is not patent eligible.
Regarding Claim 8,
Claim 8 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Step 1 Analysis: Claim 8 is directed to a system, which is directed to a machine, one of the statutory categories.
Step 2A Prong One Analysis: Please see analysis of claim 7.
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. In particular, the claim recites additional elements that amount to a recitation of the words "apply it" (or an equivalent) or mere instructions to implement an abstract idea or other exception on a computer, which do not integrate a judicial exception into a practical application. See MPEP 2106.05(f). The recitations of “a first node and a second node of a first group; and a third node and a fourth node of a second group, wherein...”, “the first node is configured to”, “the second node is configured to”, “the third node is configured to”, “the fourth node is configured to” and “wherein a plurality of
nodes including the first node and the second node are in the first group, a plurality of nodes including the third node and the fourth node are in the second group” amount to mere instruction to implement an abstract idea of calculating gradient to update weight. Moreover, the recitations of “in a case where the first node and the second node perform nth (n is a natural number) parallel distributed processing”, “in a case where the third node and the fourth node perform mth (m is a natural number) parallel distributed processing”, “if the calculation of gradient by the first node and the second node is faster than the calculation of gradient by the third node and the fourth node, in n+1th parallel distributed processing performed by the first node and the second node”, “in m+1th parallel distributed processing performed by the third node and the fourth node”, “wherein a difference of processing speed between the first node and the second node of the first group is a first threshold value or less, and a difference of processing speed between the third node and the fourth node of the second group is a second threshold value or less”, and “if the processing speed of the third node and the fourth node is slower than the processing speed of the first node and the second node, a first number of the nodes of the first group is less than a second number of the nodes of the second group” amount to generally linking the use of a judicial exception to a particular technological environment (namely the technological environment as described by these limitations), which do not integrate the judicial exception into a practical application. See MPEP 2106.05(h). Accordingly, the additional elements do not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. See MPEP 2106.04(d).  
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements of “a first node and a second node of a first group; and a third node and a fourth node of a second group, wherein...”, “the first node is configured to”, “the second node is configured to”, “the third node is configured to”, “the fourth node is configured to”, “wherein a plurality of nodes including the first node and the second node are in the first group, a plurality of nodes including the third node and the fourth node are in the second group”, and “wherein a plurality of nodes including the first node and the second node are in the first group, a plurality of nodes including the third node and the fourth node are in the second group” amount to recitation of the words "apply it" (or an equivalent) or mere instructions to implement an abstract idea or other exception on a computer, which do not amount to significant more. See MPEP 2106.05(f). Moreover, the additional elements of “in a case where the first node and the second node perform nth (n is a natural number) parallel distributed processing”, “in a case where the third node and the fourth node perform mth (m is a natural number) parallel distributed processing”, “if the calculation of gradient by the first node and the second node is faster than the calculation of gradient by the third node and the fourth node, in n+1th parallel distributed processing performed by the first node and the second node”, “in m+1th parallel distributed processing performed by the third node and the fourth node”, “wherein a difference of processing speed between the first node and the second node of the first group is a first threshold value or less, and a difference of processing speed between the third node and the fourth node of the second group is a second threshold value or less”, and “if the processing speed of the third node and the fourth node is slower than the processing speed of the first node and the second node, a first number of the nodes of the first group is less than a second number of the nodes of the second group” amount to generally linking the use of a judicial exception to a particular technological environment, which do not amount to significantly more. See MPEP 2106.05(h). Therefore, the claim is not patent eligible.
Regarding Claim 9,
Claim 9 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Step 1 Analysis: Claim 9 is directed to a system, which is directed to a machine, one of the statutory categories.
Step 2A Prong One Analysis: Please see analysis of claim 7.
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. In particular, the claim recites additional elements that amount to a recitation of the words "apply it" (or an equivalent) or mere instructions to implement an abstract idea or other exception on a computer, which do not integrate a judicial exception into a practical application. See MPEP 2106.05(f). The recitations of “a first node and a second node of a first group; and a third node and a fourth node of a second group, wherein...”, “the first node is configured to”, “the second node is configured to”, “the third node is configured to”, and “the fourth node is configured to” amount to mere instruction to implement an abstract idea of calculating gradient to update weight. Moreover, the recitations of “in a case where the first node and the second node perform nth (n is a natural number) parallel distributed processing”, “in a case where the third node and the fourth node perform mth (m is a natural number) parallel distributed processing”, “if the calculation of gradient by the first node and the second node is faster than the calculation of gradient by the third node and the fourth node, in n+1th parallel distributed processing performed by the first node and the second node”, “in m+1th parallel distributed processing performed by the third node and the fourth node”, “wherein a difference of processing speed between the first node and the second node of the first group is a first threshold value or less, and a difference of processing speed between the third node and the fourth node of the second group is a second threshold value or less”, and “wherein, if the processing speed of the first node and the second node is slower than the processing speed of the third node and the fourth node, an amount of processing of each of the third node and the fourth node of the second group in the parallel distributed processing is less than the amount of processing of the first node and the second node of the first group” amount to generally linking the use of a judicial exception to a particular technological environment (namely the technological environment as described by these limitations), which do not integrate the judicial exception into a practical application. See MPEP 2106.05(h). Accordingly, the additional elements do not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. See MPEP 2106.04(d).  
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements of “a first node and a second node of a first group; and a third node and a fourth node of a second group, wherein...”, “the first node is configured to”, “the second node is configured to”, “the third node is configured to”, and “the fourth node is configured to” amount to recitation of the words "apply it" (or an equivalent) or mere instructions to implement an abstract idea or other exception on a computer, which do not amount to significant more. See MPEP 2106.05(f). Moreover, the additional elements of “in a case where the first node and the second node perform nth (n is a natural number) parallel distributed processing”, “in a case where the third node and the fourth node perform mth (m is a natural number) parallel distributed processing”, “if the calculation of gradient by the first node and the second node is faster than the calculation of gradient by the third node and the fourth node, in n+1th parallel distributed processing performed by the first node and the second node”, “in m+1th parallel distributed processing performed by the third node and the fourth node”, “wherein a difference of processing speed between the first node and the second node of the first group is a first threshold value or less, and a difference of processing speed between the third node and the fourth node of the second group is a second threshold value or less” and “wherein, if the processing speed of the first node and the second node is slower than the processing speed of the third node and the fourth node, an amount of processing of each of the third node and the fourth node of the second group in the parallel distributed processing is less than the amount of processing of the first node and the second node of the first group” amount to generally linking the use of a judicial exception to a particular technological environment, which do not amount to significantly more. See MPEP 2106.05(h). Therefore, the claim is not patent eligible.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-4 and 6 are rejected under 35 U.S.C. 103 as being unpatentable over Chilimbi et al. (US 2015/0324690 A1) in view of Chapelle et al. (US 2013/0290223 A1).
Regarding Claim 1,
Chilimbi et al. teaches A system comprising: a first node and a second node of a first group; and a third node and a fourth node of a second group, wherein (Fig. 7 teaches a system comprising at least a Machine 1 (first node) and Machine 2 (second node) of the Replica 704A group (a first group), and Machine 1 (node 3) and Machine 2 (fourth node) of the Replica 704N group (a second group); pg. 5 [0052]: “Machines 1-M may be any of the machines 610 in FIG. 6” and pg. 4 [0045] teach each machine can be a computer),
in a case where the first node and the second node perform nth (n is a natural number) parallel distributed processing (Fig. 5 and Fig. 7 and pg. 1 [0008]: “large-scale distributed systems comprised of tens of thousands of CPU cores for training large deep neural networks, as shown in FIG. 5. The system architecture 500 shown in FIG. 5 leverages model and data parallelism. Model worker machines are arranged into model replicas such as 502A, 502B, and 502C. Large models are partitioned across the multiple model worker machines in each model replica (e.g., 502A-C) enabling the model computation to proceed in parallel” teach first machine (first node) and second machine (second node) in each model replica perform parallel distributed processing of data continuously), 
the first node is configured to calculate a first gradient to update a first weight of objective function to a second weight and the second node is configured to calculate a second gradient to update the first weight of the objective function to the second weight (pg. 6 [0065]: “In such an embodiment, rather than directly sending the weight updates, the activation and error gradient vectors may be sent to the global parameter server(s) 706, as shown by arrows 712 in FIG. 7, where the matrix multiply can be performed locally to compute and apply the weight updates” and pg. 6 [0066]: “The global parameter server(s) 706 may be in constant communication with the model training machines (e.g., Machine 1, Machine 2, etc.) receiving updates to model parameters and sending the current weight values. These communications are illustrated by arrows 712 and 714. Each of the replicas 704A-704N compute weight updates locally from the error and activation terms” teach each machine calculates activation and error gradient vectors that are sent to the global parameter server, which are then used to calculate weight updates in that Machine 1 of replica 704A (first node) calculates a first gradient to update first weight to a second weight, and Machine 2 of replica 704A (second node) calculates a second gradient to update first weight to second weight; pg. 2 [0029] teaches weights are associated with an objective function),
and in a case where the third node and the fourth node perform mth (m is a natural number) parallel distributed processing (Fig. 5 and Fig. 7 and pg. 1 [0008]: “large-scale distributed systems comprised of tens of thousands of CPU cores for training large deep neural networks, as shown in FIG. 5. The system architecture 500 shown in FIG. 5 leverages model and data parallelism. Model worker machines are arranged into model replicas such as 502A, 502B, and 502C. Large models are partitioned across the multiple model worker machines in each model replica (e.g., 502A-C) enabling the model computation to proceed in parallel” teach first and second machines in another replica (correspond to third and fourth node) perform parallel distributed processing of data continuously), 
the third node is configured to calculate a third gradient to update a third weight of the objective function to a fourth weight and the fourth node is configured to calculate a fourth gradient to update the third weight of the objective function to the fourth weight (Fig. 7 and pg. 6 [0065]: “In such an embodiment, rather than directly sending the weight updates, the activation and error gradient vectors may be sent to the global parameter server(s) 706, as shown by arrows 712 in FIG. 7, where the matrix multiply can be performed locally to compute and apply the weight updates” and pg. 6 [0066]: “The global parameter server(s) 706 may be in constant communication with the model training machines (e.g., Machine 1, Machine 2, etc.) receiving updates to model parameters and sending the current weight values. These communications are illustrated by arrows 712 and 714. Each of the replicas 704A-704N compute weight updates locally from the error and activation terms” teach each machine calculates activation and error gradient vectors that are sent to the global parameter server, which are then used to calculate weight updates in that Machine 1 of replica 704N (third node) calculates a third gradient to update a third weight to a fourth weight, and Machine 2 of replica 704N  (fourth node) calculates a fourth gradient to update third weight to fourth weight; pg. 2 [0029] teaches weights are associated with an objective function).
Chilimbi et al. does not appear to explicitly teach if the calculation of gradient by the first node and the second node is faster than the calculation of gradient by the third node and the fourth node, in n+1th parallel distributed processing performed by the first node and the second node, the second weight updated from the first weight is further updated using the first and second gradients, and, in m+1th parallel distributed processing performed by the third node and the fourth node, the fourth weight updated from the third weight is further updated using the first to fourth gradients.
However, Chapelle et al. teaches if the calculation of gradient by the first node and the second node is faster than the calculation of gradient by the third node and the fourth node, in n+1th parallel distributed processing performed by the first node and the second node, the second weight updated from the first weight is further updated using the first and second gradients (Fig. 1 and pg. 4 [0043]: “The default replication factor in HADOOP is 3. However, it is understood that a different replication factor, i.e., different number of competing nodes, may be applied in other examples. The same machine learning process is then performed by each competing node 500, 502, 504 on the replicated data...the coordination node 108 may determine an operation node from the competing nodes with the replicated training data based on a processing speed of each competing node. In another example, instead of waiting for one competing node to finish the job, the coordination node 108 may inquire the status from each competing node after a certain time period. As shown in FIG. 5, the competing node 2 502 reports a "failed" status, which may be caused by any machine failure; the competing node 3 504 reports a "delayed" status, which indicates that the competing node 3 is busy handling other jobs. Nevertheless, once an operation node is determined from the competing nodes, the coordination node 108 then sends a connection instruction to the operation node, as described above. It is understood that, in case all the competing nodes are failed or delayed, the coordination node 108 may transfer the replicated data to an available node in the cluster 104 where the job can be executed” teach determining if the processing speed of a first and second nodes is faster than the processing speed of a third and fourth nodes (for example, if third and fourth nodes have “delayed” or “failed” status), and the data processing will be performed by the first and second node; pg. 3 [0033]: “The present disclosure describes method, system, and programming aspects of efficient and reliable large scale distributed machine learning on a cluster. The method and system as disclosed herein aim at efficiently and effectively parallel learning very large datasets, including for example, trillions of features, billions of training samples, and millions of parameters, with a good predictive accuracy” teaches parallel distributed processing; pg. 6 [0052]: “For example, a stochastic gradient descent process, or any online optimization algorithm, is performed in each operation node for calculating the initial local parameter in the first iteration. At block 1002, the initial aggregated parameter is transmitted to each operation node in accordance with the network topology. That is, in the first iteration, a reduce operation is performed to sum up all local parameters calculated based on a rapid initial optimization algorithm by all operation nodes, followed by a broadcast operation that provides the initial aggregated parameter to each operation node...For example, after the first iteration, a batch gradient descent process, or any batch optimization algorithm, is performed in each operation node for calculating the updated local parameter based on the initial aggregated parameter obtained from the first iteration and the local training data” and pg. 6 [0058]: “The first algorithm starts with each node making one online pass over its local data according to adaptive gradient updates modified for loss nonlinearity. AllReduce operation is used to average these weights non-uniformly using the local gradients (local parameters)” teach data processing includes updating weights using gradients in multiple iterations and updating a second weight from gradients calculated by multiple nodes; Fig. 1 teaches at least a first to fourth node), 
and, in m+1th parallel distributed processing performed by the third node and the fourth node, the fourth weight updated from the third weight is further updated using the first to fourth gradients (pg. 6 [0052]: “For example, a stochastic gradient descent process, or any online optimization algorithm, is performed in each operation node for calculating the initial local parameter in the first iteration. At block 1002, the initial aggregated parameter is transmitted to each operation node in accordance with the network topology. That is, in the first iteration, a reduce operation is performed to sum up all local parameters calculated based on a rapid initial optimization algorithm by all operation nodes, followed by a broadcast operation that provides the initial aggregated parameter to each operation node...For example, after the first iteration, a batch gradient descent process, or any batch optimization algorithm, is performed in each operation node for calculating the updated local parameter based on the initial aggregated parameter obtained from the first iteration and the local training data” and pg. 6 [0058]: “The first algorithm starts with each node making one online pass over its local data according to adaptive gradient updates modified for loss nonlinearity. AllReduce operation is used to average these weights non-uniformly using the local gradients (local parameters)” teach data processing includes updating weights using gradients in multiple iterations and updating a fourth weight from gradients calculated by multiple nodes; Fig. 1 teaches at least a first to fourth node).
Chilimbi et al. and Chapelle et al. are analogous art to the claimed invention because they are directed to distributed data processing. 
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate the above limitation(s) as taught by Chapelle et al. to the disclosed invention of Chilimbi et al.
One of ordinary skill in the arts would have been motivated to make this modification to leverage “method, system, and programming aspects of efficient and reliable large scale distributed machine learning on a cluster. The method and system as disclosed herein aim at efficiently and effectively parallel learning very large datasets, including for example, trillions of features, billions of training samples, and millions of parameters, with a good predictive accuracy” (Chapelle et al. pg. 3 [0033]).
Regarding Claim 2,
Chilimbi et al. teaches A system comprising; a first node and a second node of a first group; and a third node and a fourth node of a second group, wherein (Fig. 7 teaches a system comprising at least a Machine 1 (first node) and Machine 2 (second node) of the Replica 704A group (a first group), and Machine 1 (node 3) and Machine 2 (fourth node) of the Replica 704N group (a second group); pg. 5 [0052]: “Machines 1-M may be any of the machines 610 in FIG. 6” and pg. 4 [0045] teach each machine can be a computer),
in a case where the first node and the second node perform nth (n is a natural number) parallel distributed processing (Fig. 5 and Fig. 7 and pg. 1 [0008]: “large-scale distributed systems comprised of tens of thousands of CPU cores for training large deep neural networks, as shown in FIG. 5. The system architecture 500 shown in FIG. 5 leverages model and data parallelism. Model worker machines are arranged into model replicas such as 502A, 502B, and 502C. Large models are partitioned across the multiple model worker machines in each model replica (e.g., 502A-C) enabling the model computation to proceed in parallel” teach first machine (first node) and second machine (second node) in each model replica perform parallel distributed processing of data continuously), 
the first node is configured to calculate a first gradient to update a first weight of objective function to a second weight and the second node is configured to calculate a second gradient to update the first weight of the objective function to the second weight (pg. 6 [0065]: “In such an embodiment, rather than directly sending the weight updates, the activation and error gradient vectors may be sent to the global parameter server(s) 706, as shown by arrows 712 in FIG. 7, where the matrix multiply can be performed locally to compute and apply the weight updates” and pg. 6 [0066]: “The global parameter server(s) 706 may be in constant communication with the model training machines (e.g., Machine 1, Machine 2, etc.) receiving updates to model parameters and sending the current weight values. These communications are illustrated by arrows 712 and 714. Each of the replicas 704A-704N compute weight updates locally from the error and activation terms” teach each machine calculates activation and error gradient vectors that are sent to the global parameter server, which are then used to calculate weight updates in that Machine 1 of replica 704A (first node) calculates a first gradient to update first weight to a second weight, and Machine 2 of replica 704A (second node) calculates a second gradient to update first weight to second weight; pg. 2 [0029] teaches weights are associated with an objective function),
and in a case where the third node and the fourth node perform mth (m is a natural number) parallel distributed processing (Fig. 5 and Fig. 7 and pg. 1 [0008]: “large-scale distributed systems comprised of tens of thousands of CPU cores for training large deep neural networks, as shown in FIG. 5. The system architecture 500 shown in FIG. 5 leverages model and data parallelism. Model worker machines are arranged into model replicas such as 502A, 502B, and 502C. Large models are partitioned across the multiple model worker machines in each model replica (e.g., 502A-C) enabling the model computation to proceed in parallel” teach first and second machines in another replica (correspond to third and fourth node) perform parallel distributed processing of data continuously), 
the third node is configured to calculate a third gradient to update a third weight of the objective function to a fourth weight and the fourth node is configured to calculate a fourth gradient to update the third weight of the objective function to the fourth weight (Fig. 7 and pg. 6 [0065]: “In such an embodiment, rather than directly sending the weight updates, the activation and error gradient vectors may be sent to the global parameter server(s) 706, as shown by arrows 712 in FIG. 7, where the matrix multiply can be performed locally to compute and apply the weight updates” and pg. 6 [0066]: “The global parameter server(s) 706 may be in constant communication with the model training machines (e.g., Machine 1, Machine 2, etc.) receiving updates to model parameters and sending the current weight values. These communications are illustrated by arrows 712 and 714. Each of the replicas 704A-704N compute weight updates locally from the error and activation terms” teach each machine calculates activation and error gradient vectors that are sent to the global parameter server, which are then used to calculate weight updates in that Machine 1 of replica 704N (third node) calculates a third gradient to update a third weight to a fourth weight, and Machine 2 of replica 704N  (fourth node) calculates a fourth gradient to update third weight to fourth weight; pg. 2 [0029] teaches weights are associated with an objective function).
Chilimbi et al. does not appear to explicitly teach if the calculation of gradient by the third node and the fourth node is faster than the calculation of gradient by the first node and the second node, in n+1th parallel distributed processing performed by the first node and the second node, the second weight updated from the first weight is further updated using the first to fourth gradients, and, in m+1th parallel distributed processing performed by the third node and the fourth node, the fourth weight updated from the third weight is further updated using the third and fourth gradients.
However, Chapelle et al. teaches if the calculation of gradient by the third node and the fourth node is faster than the calculation of gradient by the first node and the second node, in n+1th parallel distributed processing performed by the first node and the second node, the second weight updated from the first weight is further updated using the first to fourth gradients (Fig. 1 and pg. 4 [0043]: “The default replication factor in HADOOP is 3. However, it is understood that a different replication factor, i.e., different number of competing nodes, may be applied in other examples. The same machine learning process is then performed by each competing node 500, 502, 504 on the replicated data...the coordination node 108 may determine an operation node from the competing nodes with the replicated training data based on a processing speed of each competing node. In another example, instead of waiting for one competing node to finish the job, the coordination node 108 may inquire the status from each competing node after a certain time period. As shown in FIG. 5, the competing node 2 502 reports a "failed" status, which may be caused by any machine failure; the competing node 3 504 reports a "delayed" status, which indicates that the competing node 3 is busy handling other jobs. Nevertheless, once an operation node is determined from the competing nodes, the coordination node 108 then sends a connection instruction to the operation node, as described above. It is understood that, in case all the competing nodes are failed or delayed, the coordination node 108 may transfer the replicated data to an available node in the cluster 104 where the job can be executed” teach determining if the processing speed of a third node and the fourth node is faster than the processing speed of a first and second nodes (for example, if first and second nodes have “delayed” or “failed” status), and the data processing will be performed by the first and second node; pg. 3 [0033]: “The present disclosure describes method, system, and programming aspects of efficient and reliable large scale distributed machine learning on a cluster. The method and system as disclosed herein aim at efficiently and effectively parallel learning very large datasets, including for example, trillions of features, billions of training samples, and millions of parameters, with a good predictive accuracy” teaches parallel distributed processing; pg. 6 [0052]: “For example, a stochastic gradient descent process, or any online optimization algorithm, is performed in each operation node for calculating the initial local parameter in the first iteration. At block 1002, the initial aggregated parameter is transmitted to each operation node in accordance with the network topology. That is, in the first iteration, a reduce operation is performed to sum up all local parameters calculated based on a rapid initial optimization algorithm by all operation nodes, followed by a broadcast operation that provides the initial aggregated parameter to each operation node...For example, after the first iteration, a batch gradient descent process, or any batch optimization algorithm, is performed in each operation node for calculating the updated local parameter based on the initial aggregated parameter obtained from the first iteration and the local training data” and pg. 6 [0058]: “The first algorithm starts with each node making one online pass over its local data according to adaptive gradient updates modified for loss nonlinearity. AllReduce operation is used to average these weights non-uniformly using the local gradients (local parameters)” teach data processing includes updating weights using gradients in multiple iterations and updating a second weight from gradients calculated by multiple nodes; Fig. 1 teaches at least a first to fourth node), 
and, in m+1th parallel distributed processing performed by the third node and the fourth node, the fourth weight updated from the third weight is further updated using the third and fourth gradients (pg. 6 [0052]: “For example, a stochastic gradient descent process, or any online optimization algorithm, is performed in each operation node for calculating the initial local parameter in the first iteration. At block 1002, the initial aggregated parameter is transmitted to each operation node in accordance with the network topology. That is, in the first iteration, a reduce operation is performed to sum up all local parameters calculated based on a rapid initial optimization algorithm by all operation nodes, followed by a broadcast operation that provides the initial aggregated parameter to each operation node...For example, after the first iteration, a batch gradient descent process, or any batch optimization algorithm, is performed in each operation node for calculating the updated local parameter based on the initial aggregated parameter obtained from the first iteration and the local training data” and pg. 6 [0058]: “The first algorithm starts with each node making one online pass over its local data according to adaptive gradient updates modified for loss nonlinearity. AllReduce operation is used to average these weights non-uniformly using the local gradients (local parameters)” teach data processing includes updating weights using gradients in multiple iterations and updating a fourth weight from gradients calculated by multiple nodes; Fig. 1 teaches at least a first to fourth node).
Chilimbi et al. and Chapelle et al. are analogous art to the claimed invention because they are directed to distributed data processing. 
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate the above limitation(s) as taught by Chapelle et al. to the disclosed invention of Chilimbi et al.
One of ordinary skill in the arts would have been motivated to make this modification to leverage “method, system, and programming aspects of efficient and reliable large scale distributed machine learning on a cluster. The method and system as disclosed herein aim at efficiently and effectively parallel learning very large datasets, including for example, trillions of features, billions of training samples, and millions of parameters, with a good predictive accuracy” (Chapelle et al. pg. 3 [0033]).
Regarding Claim 3,
Chilimbi et al. teaches A system comprising: a first node and a second node of a first group; and a third node and a fourth node of a second group, wherein (Fig. 7 teaches a system comprising at least a Machine 1 (first node) and Machine 2 (second node) of the Replica 704A group (a first group), and Machine 1 (node 3) and Machine 2 (fourth node) of the Replica 704N group (a second group); pg. 5 [0052]: “Machines 1-M may be any of the machines 610 in FIG. 6” and pg. 4 [0045] teach each machine can be a computer),
in a case where the first node and the second node perform nth (n is a natural number) parallel distributed processing (Fig. 5 and Fig. 7 and pg. 1 [0008]: “large-scale distributed systems comprised of tens of thousands of CPU cores for training large deep neural networks, as shown in FIG. 5. The system architecture 500 shown in FIG. 5 leverages model and data parallelism. Model worker machines are arranged into model replicas such as 502A, 502B, and 502C. Large models are partitioned across the multiple model worker machines in each model replica (e.g., 502A-C) enabling the model computation to proceed in parallel” teach first machine (first node) and second machine (second node) in each model replica perform parallel distributed processing of data continuously), 
the first node is configured to calculate a first gradient to update a first weight of objective function to a second weight and the second node is configured to calculate a second gradient to update the first weight of the objective function to the second weight (pg. 6 [0065]: “In such an embodiment, rather than directly sending the weight updates, the activation and error gradient vectors may be sent to the global parameter server(s) 706, as shown by arrows 712 in FIG. 7, where the matrix multiply can be performed locally to compute and apply the weight updates” and pg. 6 [0066]: “The global parameter server(s) 706 may be in constant communication with the model training machines (e.g., Machine 1, Machine 2, etc.) receiving updates to model parameters and sending the current weight values. These communications are illustrated by arrows 712 and 714. Each of the replicas 704A-704N compute weight updates locally from the error and activation terms” teach each machine calculates activation and error gradient vectors that are sent to the global parameter server, which are then used to calculate weight updates in that Machine 1 of replica 704A (first node) calculates a first gradient to update first weight to a second weight, and Machine 2 of replica 704A (second node) calculates a second gradient to update first weight to second weight; pg. 2 [0029] teaches weights are associated with an objective function),
and in a case where the third node and the fourth node perform mth (m is a natural number) parallel distributed processing (Fig. 5 and Fig. 7 and pg. 1 [0008]: “large-scale distributed systems comprised of tens of thousands of CPU cores for training large deep neural networks, as shown in FIG. 5. The system architecture 500 shown in FIG. 5 leverages model and data parallelism. Model worker machines are arranged into model replicas such as 502A, 502B, and 502C. Large models are partitioned across the multiple model worker machines in each model replica (e.g., 502A-C) enabling the model computation to proceed in parallel” teach first and second machines in another replica (correspond to third and fourth node) perform parallel distributed processing of data continuously), 
the third node is configured to calculate a third gradient to update a third weight of the objective function to a fourth weight and the fourth node is configured to calculate a fourth gradient to update the third weight of the objective function to the fourth weight (Fig. 7 and pg. 6 [0065]: “In such an embodiment, rather than directly sending the weight updates, the activation and error gradient vectors may be sent to the global parameter server(s) 706, as shown by arrows 712 in FIG. 7, where the matrix multiply can be performed locally to compute and apply the weight updates” and pg. 6 [0066]: “The global parameter server(s) 706 may be in constant communication with the model training machines (e.g., Machine 1, Machine 2, etc.) receiving updates to model parameters and sending the current weight values. These communications are illustrated by arrows 712 and 714. Each of the replicas 704A-704N compute weight updates locally from the error and activation terms” teach each machine calculates activation and error gradient vectors that are sent to the global parameter server, which are then used to calculate weight updates in that Machine 1 of replica 704N (third node) calculates a third gradient to update a third weight to a fourth weight, and Machine 2 of replica 704N  (fourth node) calculates a fourth gradient to update third weight to fourth weight; pg. 2 [0029] teaches weights are associated with an objective function).
Chilimbi et al. does not appear to explicitly teach if the calculation of gradient by the first node and the second node is faster than the calculation of gradient by the third node and the fourth node, in n+1th parallel distributed processing performed by the first node and the second node, the second weight updated from the first weight is further updated using the first and second gradients, and, in m+1th parallel distributed processing performed by the third node and the fourth node, the fourth weight updated from the third weight is further updated using the first to fourth gradients, and
if the calculation of gradient by the third node and the fourth node is faster than the calculation of gradient by the first node and the second node, in n+1th parallel distributed processing performed by the first node and the second node, the second weight updated from the first weight is further updated using the first to fourth gradients, and, in m+1th parallel distributed processing performed by the third node and the fourth node, the fourth weight updated from the third weight is further updated using the third and fourth gradients.
However, Chapelle et al. teaches if the calculation of gradient by the first node and the second node is faster than the calculation of gradient by the third node and the fourth node, in n+1th parallel distributed processing performed by the first node and the second node, the second weight updated from the first weight is further updated using the first and second gradients (Fig. 1 and pg. 4 [0043]: “The default replication factor in HADOOP is 3. However, it is understood that a different replication factor, i.e., different number of competing nodes, may be applied in other examples. The same machine learning process is then performed by each competing node 500, 502, 504 on the replicated data...the coordination node 108 may determine an operation node from the competing nodes with the replicated training data based on a processing speed of each competing node. In another example, instead of waiting for one competing node to finish the job, the coordination node 108 may inquire the status from each competing node after a certain time period. As shown in FIG. 5, the competing node 2 502 reports a "failed" status, which may be caused by any machine failure; the competing node 3 504 reports a "delayed" status, which indicates that the competing node 3 is busy handling other jobs. Nevertheless, once an operation node is determined from the competing nodes, the coordination node 108 then sends a connection instruction to the operation node, as described above. It is understood that, in case all the competing nodes are failed or delayed, the coordination node 108 may transfer the replicated data to an available node in the cluster 104 where the job can be executed” teach determining if the processing speed of a first and second nodes is faster than the processing speed of a third and fourth nodes (for example, if third and fourth nodes have “delayed” or “failed” status), and the data processing will be performed by the first and second node; pg. 3 [0033]: “The present disclosure describes method, system, and programming aspects of efficient and reliable large scale distributed machine learning on a cluster. The method and system as disclosed herein aim at efficiently and effectively parallel learning very large datasets, including for example, trillions of features, billions of training samples, and millions of parameters, with a good predictive accuracy” teaches parallel distributed processing; pg. 6 [0052]: “For example, a stochastic gradient descent process, or any online optimization algorithm, is performed in each operation node for calculating the initial local parameter in the first iteration. At block 1002, the initial aggregated parameter is transmitted to each operation node in accordance with the network topology. That is, in the first iteration, a reduce operation is performed to sum up all local parameters calculated based on a rapid initial optimization algorithm by all operation nodes, followed by a broadcast operation that provides the initial aggregated parameter to each operation node...For example, after the first iteration, a batch gradient descent process, or any batch optimization algorithm, is performed in each operation node for calculating the updated local parameter based on the initial aggregated parameter obtained from the first iteration and the local training data” and pg. 6 [0058]: “The first algorithm starts with each node making one online pass over its local data according to adaptive gradient updates modified for loss nonlinearity. AllReduce operation is used to average these weights non-uniformly using the local gradients (local parameters)” teach data processing includes updating weights using gradients in multiple iterations and updating a second weight from gradients calculated by multiple nodes; Fig. 1 teaches at least a first to fourth node), 
and, in m+1th parallel distributed processing performed by the third node and the fourth node, the fourth weight updated from the third weight is further updated using the first to fourth gradients (pg. 6 [0052]: “For example, a stochastic gradient descent process, or any online optimization algorithm, is performed in each operation node for calculating the initial local parameter in the first iteration. At block 1002, the initial aggregated parameter is transmitted to each operation node in accordance with the network topology. That is, in the first iteration, a reduce operation is performed to sum up all local parameters calculated based on a rapid initial optimization algorithm by all operation nodes, followed by a broadcast operation that provides the initial aggregated parameter to each operation node...For example, after the first iteration, a batch gradient descent process, or any batch optimization algorithm, is performed in each operation node for calculating the updated local parameter based on the initial aggregated parameter obtained from the first iteration and the local training data” and pg. 6 [0058]: “The first algorithm starts with each node making one online pass over its local data according to adaptive gradient updates modified for loss nonlinearity. AllReduce operation is used to average these weights non-uniformly using the local gradients (local parameters)” teach data processing includes updating weights using gradients in multiple iterations and updating a fourth weight from gradients calculated by multiple nodes; Fig. 1 teaches at least a first to fourth node),
and if the calculation of gradient by the third node and the fourth node is faster than the calculation of gradient by the first node and the second node, in n+1th parallel distributed processing performed by the first node and the second node, the second weight updated from the first weight is further updated using the first to fourth gradients (Fig. 1 and pg. 4 [0043]: “The default replication factor in HADOOP is 3. However, it is understood that a different replication factor, i.e., different number of competing nodes, may be applied in other examples. The same machine learning process is then performed by each competing node 500, 502, 504 on the replicated data...the coordination node 108 may determine an operation node from the competing nodes with the replicated training data based on a processing speed of each competing node. In another example, instead of waiting for one competing node to finish the job, the coordination node 108 may inquire the status from each competing node after a certain time period. As shown in FIG. 5, the competing node 2 502 reports a "failed" status, which may be caused by any machine failure; the competing node 3 504 reports a "delayed" status, which indicates that the competing node 3 is busy handling other jobs. Nevertheless, once an operation node is determined from the competing nodes, the coordination node 108 then sends a connection instruction to the operation node, as described above. It is understood that, in case all the competing nodes are failed or delayed, the coordination node 108 may transfer the replicated data to an available node in the cluster 104 where the job can be executed” teach determining if the processing speed of a third node and the fourth node is faster than a first and second nodes (for example, if first and second nodes have “delayed” or “failed” status), and the data processing will be performed by the first and second node; pg. 3 [0033]: “The present disclosure describes method, system, and programming aspects of efficient and reliable large scale distributed machine learning on a cluster. The method and system as disclosed herein aim at efficiently and effectively parallel learning very large datasets, including for example, trillions of features, billions of training samples, and millions of parameters, with a good predictive accuracy” teaches parallel distributed processing; pg. 6 [0052]: “For example, a stochastic gradient descent process, or any online optimization algorithm, is performed in each operation node for calculating the initial local parameter in the first iteration. At block 1002, the initial aggregated parameter is transmitted to each operation node in accordance with the network topology. That is, in the first iteration, a reduce operation is performed to sum up all local parameters calculated based on a rapid initial optimization algorithm by all operation nodes, followed by a broadcast operation that provides the initial aggregated parameter to each operation node...For example, after the first iteration, a batch gradient descent process, or any batch optimization algorithm, is performed in each operation node for calculating the updated local parameter based on the initial aggregated parameter obtained from the first iteration and the local training data” and pg. 6 [0058]: “The first algorithm starts with each node making one online pass over its local data according to adaptive gradient updates modified for loss nonlinearity. AllReduce operation is used to average these weights non-uniformly using the local gradients (local parameters)” teach data processing includes updating weights using gradients in multiple iterations and updating a second weight from gradients calculated by multiple nodes; Fig. 1 teaches at least a first to fourth node), 
and, in m+1th parallel distributed processing performed by the third node and the fourth node, the fourth weight updated from the third weight is further updated using the third and fourth gradients. (pg. 6 [0052]: “For example, a stochastic gradient descent process, or any online optimization algorithm, is performed in each operation node for calculating the initial local parameter in the first iteration. At block 1002, the initial aggregated parameter is transmitted to each operation node in accordance with the network topology. That is, in the first iteration, a reduce operation is performed to sum up all local parameters calculated based on a rapid initial optimization algorithm by all operation nodes, followed by a broadcast operation that provides the initial aggregated parameter to each operation node...For example, after the first iteration, a batch gradient descent process, or any batch optimization algorithm, is performed in each operation node for calculating the updated local parameter based on the initial aggregated parameter obtained from the first iteration and the local training data” and pg. 6 [0058]: “The first algorithm starts with each node making one online pass over its local data according to adaptive gradient updates modified for loss nonlinearity. AllReduce operation is used to average these weights non-uniformly using the local gradients (local parameters)” teach data processing includes updating weights using gradients in multiple iterations and updating a fourth weight from gradients calculated by multiple nodes; Fig. 1 teaches at least a first to fourth node).
Chilimbi et al. and Chapelle et al. are analogous art to the claimed invention because they are directed to distributed data processing. 
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate the above limitation(s) as taught by Chapelle et al. to the disclosed invention of Chilimbi et al.
One of ordinary skill in the arts would have been motivated to make this modification to leverage “method, system, and programming aspects of efficient and reliable large scale distributed machine learning on a cluster. The method and system as disclosed herein aim at efficiently and effectively parallel learning very large datasets, including for example, trillions of features, billions of training samples, and millions of parameters, with a good predictive accuracy” (Chapelle et al. pg. 3 [0033]).
Regarding Claim 4,
Chilimbi et al. in view of Chapelle et al. teaches the system of Claim 1.
Chilimbi et al. further teaches wherein the second weight is updated using a fifth gradient calculated from the first and second gradients (Fig. 7 and pg. 6 [0065]: “In such an embodiment, rather than directly sending the weight updates, the activation and error gradient vectors may be sent to the global parameter server(s) 706, as shown by arrows 712 in FIG. 7, where the matrix multiply can be performed locally to compute and apply the weight updates” and pg. 6 [0066]: “The global parameter server(s) 706 may be in constant communication with the model training machines (e.g., Machine 1, Machine 2, etc.) receiving updates to model parameters and sending the current weight values. These communications are illustrated by arrows 712 and 714. Each of the replicas 704A-704N compute weight updates locally from the error and activation terms” teach each machine calculates activation and error gradient vectors that are sent to the global parameter server, which are then used to calculate weight updates in a continuous and iterative manner, which corresponds to second weight is updated using a fifth gradient calculated from the first and second gradients; Fig. 7 teaches multiple machines providing multiple gradients), 
and the fourth weight is updated using a sixth gradient calculated from the third and fourth gradients (Fig. 7 and pg. 6 [0065]: “In such an embodiment, rather than directly sending the weight updates, the activation and error gradient vectors may be sent to the global parameter server(s) 706, as shown by arrows 712 in FIG. 7, where the matrix multiply can be performed locally to compute and apply the weight updates” and pg. 6 [0066]: “The global parameter server(s) 706 may be in constant communication with the model training machines (e.g., Machine 1, Machine 2, etc.) receiving updates to model parameters and sending the current weight values. These communications are illustrated by arrows 712 and 714. Each of the replicas 704A-704N compute weight updates locally from the error and activation terms” teach each machine calculates activation and error gradient vectors that are sent to the global parameter server, which are then used to calculate weight updates in a continuous and iterative manner, which corresponds to fourth weight is updated using a sixth gradient calculated from the third and fourth gradients; Fig. 7 teaches multiple machines providing multiple gradients).
Regarding Claim 6,
Chilimbi et al. in view of Chapelle et al. teaches the system of Claim 3.
Chapelle et al. further teaches wherein, if the calculation of gradient by the first node and the second node is faster than the calculation of gradient by the third node and the fourth node, the first and second gradients are transmitted to the third node and the fourth node (Fig. 1 and pg. 4 [0043]: “The default replication factor in HADOOP is 3. However, it is understood that a different replication factor, i.e., different number of competing nodes, may be applied in other examples. The same machine learning process is then performed by each competing node 500, 502, 504 on the replicated data...the coordination node 108 may determine an operation node from the competing nodes with the replicated training data based on a processing speed of each competing node. In another example, instead of waiting for one competing node to finish the job, the coordination node 108 may inquire the status from each competing node after a certain time period. As shown in FIG. 5, the competing node 2 502 reports a "failed" status, which may be caused by any machine failure; the competing node 3 504 reports a "delayed" status, which indicates that the competing node 3 is busy handling other jobs. Nevertheless, once an operation node is determined from the competing nodes, the coordination node 108 then sends a connection instruction to the operation node, as described above. It is understood that, in case all the competing nodes are failed or delayed, the coordination node 108 may transfer the replicated data to an available node in the cluster 104 where the job can be executed” teach determining if the processing speed of a first and second nodes is faster than a third and fourth nodes (for example, if third and fourth nodes have “delayed” or “failed” status); pg. 3 [0033]: “The present disclosure describes method, system, and programming aspects of efficient and reliable large scale distributed machine learning on a cluster. The method and system as disclosed herein aim at efficiently and effectively parallel learning very large datasets, including for example, trillions of features, billions of training samples, and millions of parameters, with a good predictive accuracy” teaches parallel distributed processing; pg. 6 [0052]: “For example, a stochastic gradient descent process, or any online optimization algorithm, is performed in each operation node for calculating the initial local parameter in the first iteration. At block 1002, the initial aggregated parameter is transmitted to each operation node in accordance with the network topology. That is, in the first iteration, a reduce operation is performed to sum up all local parameters calculated based on a rapid initial optimization algorithm by all operation nodes, followed by a broadcast operation that provides the initial aggregated parameter to each operation node...For example, after the first iteration, a batch gradient descent process, or any batch optimization algorithm, is performed in each operation node for calculating the updated local parameter based on the initial aggregated parameter obtained from the first iteration and the local training data” and pg. 6 [0058]: “The first algorithm starts with each node making one online pass over its local data according to adaptive gradient updates modified for loss nonlinearity. AllReduce operation is used to average these weights non-uniformly using the local gradients (local parameters)” teach data processing includes updating weights using gradients in multiple iterations and updating a second weight from gradients calculated and transmitted by multiple nodes; Fig. 1 teaches at least a first to fourth node), 
and, in m+1th parallel distributed processing performed by the third node and the fourth node, the fourth weight is further updated using the first to fourth gradients (pg. 6 [0052]: “For example, a stochastic gradient descent process, or any online optimization algorithm, is performed in each operation node for calculating the initial local parameter in the first iteration. At block 1002, the initial aggregated parameter is transmitted to each operation node in accordance with the network topology. That is, in the first iteration, a reduce operation is performed to sum up all local parameters calculated based on a rapid initial optimization algorithm by all operation nodes, followed by a broadcast operation that provides the initial aggregated parameter to each operation node...For example, after the first iteration, a batch gradient descent process, or any batch optimization algorithm, is performed in each operation node for calculating the updated local parameter based on the initial aggregated parameter obtained from the first iteration and the local training data” and pg. 6 [0058]: “The first algorithm starts with each node making one online pass over its local data according to adaptive gradient updates modified for loss nonlinearity. AllReduce operation is used to average these weights non-uniformly using the local gradients (local parameters)” teach data processing includes updating weights using gradients in multiple iterations and updating a fourth weight from gradients calculated by multiple nodes; Fig. 1 teaches at least a first to fourth node),
and if the calculation of gradient by the third node and the fourth node is faster than the calculation of gradient by the first node and the second node, the third and fourth gradients are transmitted to the first node and the second node (Fig. 1 and pg. 4 [0043]: “The default replication factor in HADOOP is 3. However, it is understood that a different replication factor, i.e., different number of competing nodes, may be applied in other examples. The same machine learning process is then performed by each competing node 500, 502, 504 on the replicated data...the coordination node 108 may determine an operation node from the competing nodes with the replicated training data based on a processing speed of each competing node. In another example, instead of waiting for one competing node to finish the job, the coordination node 108 may inquire the status from each competing node after a certain time period. As shown in FIG. 5, the competing node 2 502 reports a "failed" status, which may be caused by any machine failure; the competing node 3 504 reports a "delayed" status, which indicates that the competing node 3 is busy handling other jobs. Nevertheless, once an operation node is determined from the competing nodes, the coordination node 108 then sends a connection instruction to the operation node, as described above. It is understood that, in case all the competing nodes are failed or delayed, the coordination node 108 may transfer the replicated data to an available node in the cluster 104 where the job can be executed” teach determining if the processing speed of a third node and the fourth node is faster than a first and second nodes (for example, if first and second nodes have “delayed” or “failed” status); pg. 3 [0033]: “The present disclosure describes method, system, and programming aspects of efficient and reliable large scale distributed machine learning on a cluster. The method and system as disclosed herein aim at efficiently and effectively parallel learning very large datasets, including for example, trillions of features, billions of training samples, and millions of parameters, with a good predictive accuracy” teaches parallel distributed processing; pg. 6 [0052]: “For example, a stochastic gradient descent process, or any online optimization algorithm, is performed in each operation node for calculating the initial local parameter in the first iteration. At block 1002, the initial aggregated parameter is transmitted to each operation node in accordance with the network topology. That is, in the first iteration, a reduce operation is performed to sum up all local parameters calculated based on a rapid initial optimization algorithm by all operation nodes, followed by a broadcast operation that provides the initial aggregated parameter to each operation node...For example, after the first iteration, a batch gradient descent process, or any batch optimization algorithm, is performed in each operation node for calculating the updated local parameter based on the initial aggregated parameter obtained from the first iteration and the local training data” and pg. 6 [0058]: “The first algorithm starts with each node making one online pass over its local data according to adaptive gradient updates modified for loss nonlinearity. AllReduce operation is used to average these weights non-uniformly using the local gradients (local parameters)” teach data processing includes updating weights using gradients in multiple iterations and updating a second weight from gradients calculated and transmitted by multiple nodes; Fig. 1 teaches at least a first to fourth node), 
and, in n+1th parallel distributed processing performed by the first node and the second node, the second weight is further updated using the first to fourth gradients (pg. 6 [0052]: “For example, a stochastic gradient descent process, or any online optimization algorithm, is performed in each operation node for calculating the initial local parameter in the first iteration. At block 1002, the initial aggregated parameter is transmitted to each operation node in accordance with the network topology. That is, in the first iteration, a reduce operation is performed to sum up all local parameters calculated based on a rapid initial optimization algorithm by all operation nodes, followed by a broadcast operation that provides the initial aggregated parameter to each operation node...For example, after the first iteration, a batch gradient descent process, or any batch optimization algorithm, is performed in each operation node for calculating the updated local parameter based on the initial aggregated parameter obtained from the first iteration and the local training data” and pg. 6 [0058]: “The first algorithm starts with each node making one online pass over its local data according to adaptive gradient updates modified for loss nonlinearity. AllReduce operation is used to average these weights non-uniformly using the local gradients (local parameters)” teach data processing includes updating weights using gradients in multiple iterations and updating a second weight from gradients calculated by multiple nodes; Fig. 1 teaches at least a first to fourth node).
Chilimbi et al. and Chapelle et al. are analogous art to the claimed invention because they are directed to distributed data processing. 
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate the above limitation(s) as taught by Chapelle et al. to the disclosed invention of Chilimbi et al.
One of ordinary skill in the arts would have been motivated to make this modification to leverage “method, system, and programming aspects of efficient and reliable large scale distributed machine learning on a cluster. The method and system as disclosed herein aim at efficiently and effectively parallel learning very large datasets, including for example, trillions of features, billions of training samples, and millions of parameters, with a good predictive accuracy” (Chapelle et al. pg. 3 [0033]).

Claim 5 is rejected under 35 U.S.C. 103 as being unpatentable over Chilimbi et al. (US 2015/0324690 A1) in view of Chapelle et al. (US 2013/0290223 A1) and further in view of Li et al. (“Scaling Distributed Machine Learning with the Parameter Server”).
Regarding Claim 5,
Chilimbi et al. in view of Chapelle et al. teaches the system of claim 1.
Chilimbi et al. further teaches further comprising a server node communicatively connected to the first node and the third node, wherein the server node is configured to calculate the second weight and the fourth weight (Fig. 7 and pg. 6 [0065]: “In such an embodiment, rather than directly sending the weight updates, the activation and error gradient vectors may be sent to the global parameter server(s) 706, as shown by arrows 712 in FIG. 7, where the matrix multiply can be performed locally to compute and apply the weight updates” and pg. 6 [0066]: “The global parameter server(s) 706 may be in constant communication with the model training machines (e.g., Machine 1, Machine 2, etc.) receiving updates to model parameters and sending the current weight values. These communications are illustrated by arrows 712 and 714. Each of the replicas 704A-704N compute weight updates locally from the error and activation terms” teach each machine calculates activation and error gradient vectors that are sent to the global parameter server (corresponds to server node), which are then used to calculate weight updates in a continuous and iterative manner; Fig. 7 teaches multiple machines providing multiple gradients are communicatively connected to the global parameter server (corresponds to server node)).
Chilimbi et al. in view of Chapelle et al. does not appear to explicitly teach the first node is configured to transmit the second weight transmitted from the server node to the second node, and the third node is configured to transmit the fourth weight transmitted from the server node to the fourth node.
However, Li et al. teaches the first node is configured to transmit the second weight transmitted from the server node to the second node, and the third node is configured to transmit the fourth weight transmitted from the server node to the fourth node (Figure 4 and pg. 588 Section 3: “An instance of the parameter server can run more than one algorithm simultaneously. Parameter server nodes are grouped into a server group and several worker groups as shown in Figure 4. A server node in the server group maintains a partition of the globally shared parameters. Server nodes communicate with each other to replicate and/or to migrate parameters for reliability and scaling” teach multiple server nodes (at least six server nodes) can communicate with each other to replicate and/or to migrate parameters such as weights (see Figure 2), wherein communicating (transmitting) weights from one server node to other server nodes correspond to a first server node transmitting the second weight transmitted from another server node to a second server node, and a third server node transmitting the fourth weight transmitted from another server node to a fourth server node).
Chilimbi et al., Chapelle et al., and Li et al. are analogous art to the claimed invention because they are directed to distributed data processing. 
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate the above limitation(s) as taught by Li et al. to the disclosed invention of Chilimbi et al. in view of Chapelle et al.
One of ordinary skill in the arts would have been motivated to make this modification to leverage “a parameter server framework to solve distributed machine learning problems” that “is easy to use” wherein “[s]erver nodes communicate with each other to replicate and/or to migrate parameters for reliability and scaling” (Li et al. pg. 588 Section 3 and pg. 596 Section 6).

Claims 7-9 are rejected under 35 U.S.C. 103 as being unpatentable over Chilimbi et al. (US 2015/0324690 A1) in view of Chapelle et al. (US 2013/0290223 A1) and further in view of Yang et al. (US 2018/0069944 A1).
Regarding Claim 7,
Chilimbi et al. in view of Chapelle et al. teaches the system of Claim 1.
Chilimbi et al. in view of Chapelle et al. does not appear to explicitly teach wherein a difference of processing speed between the first node and the second node of the first group is a first threshold value or less, and a difference of processing speed between the third node and the fourth node of the second group is a second threshold value or less.
However, Yang et al. teaches wherein a difference of processing speed between the first node and the second node of the first group is a first threshold value or less (pg. 7 [0086]: “Accordingly, a parallel prefetching policy may be determined. In some embodiments of the present invention, a parallel fetching daemon may be used to trigger the parallel prefetching. The parallel fetching daemon may periodically check whether the access speed of the SSD of the replica node (including network delay) is approximately equal to the access speed of the local SSD of the primary node by comparing their difference with a preset threshold” teaches determining whether the difference of access speed (corresponds to processing speed) of the SSD of the replica node (the SSD corresponds to first node) and the access speed of the local SSD of the primary node (the local SSD corresponds to second node) meets a threshold (indicating the speeds are approximately equal); Fig. 4A teaches a second group of SSD of the replica node and SSD of the primary node), 
and a difference of processing speed between the third node and the fourth node of the second group is a second threshold value or less (pg. 7 [0086]: “Accordingly, a parallel prefetching policy may be determined. In some embodiments of the present invention, a parallel fetching daemon may be used to trigger the parallel prefetching. The parallel fetching daemon may periodically check whether the access speed of the SSD of the replica node (including network delay) is approximately equal to the access speed of the local SSD of the primary node by comparing their difference with a preset threshold” teaches determining whether the difference of access speed (corresponds to processing speed) of the SSD of the replica node (the SSD corresponds to third node) and the access speed of the local SSD of the primary node (the local SSD corresponds to fourth node) meets a threshold (indicating the speeds are approximately equal); Fig. 4B teaches a second group of SSD of the replica node and SSD of the primary node).
Chilimbi et al., Chapelle et al., and Yang et al. are analogous art to the claimed invention because they are directed to data processing. 
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate the above limitation(s) as taught by Yang et al. to the disclosed invention of Chilimbi et al. in view of Chapelle et al.
One of ordinary skill in the arts would have been motivated to make this modification to leverage “the solutions provided by embodiments of the invention...for a data replica manager designed for distributed caching, and data processing systems using SSD-HDD tier storage systems. The embodiments improve the ability of fault tolerance by storing caches in replica nodes to effectively recover from disasters while enhancing performance in the SSD space” (Yang et al. pg. 7 [0088]).
Regarding Claim 8,
Chilimbi et al. in view of Chapelle et al. in view of Yang et al. teaches the system of Claim 7.
Chapelle et al. further teaches wherein a plurality of nodes including the first node and the second node are in the first group, a plurality of nodes including the third node and the fourth node are in the second group, and if the processing speed of the third node and the fourth node is slower than the processing speed of the first node and the second node, a first number of the nodes of the first group is less than a second number of the nodes of the second group (Fig. 1 and pg. 4 [0043]: “The default replication factor in HADOOP is 3. However, it is understood that a different replication factor, i.e., different number of competing nodes, may be applied in other examples. The same machine learning process is then performed by each competing node 500, 502, 504 on the replicated data...the coordination node 108 may determine an operation node from the competing nodes with the replicated training data based on a processing speed of each competing node. In another example, instead of waiting for one competing node to finish the job, the coordination node 108 may inquire the status from each competing node after a certain time period. As shown in FIG. 5, the competing node 2 502 reports a "failed" status, which may be caused by any machine failure; the competing node 3 504 reports a "delayed" status, which indicates that the competing node 3 is busy handling other jobs. Nevertheless, once an operation node is determined from the competing nodes, the coordination node 108 then sends a connection instruction to the operation node, as described above. It is understood that, in case all the competing nodes are failed or delayed, the coordination node 108 may transfer the replicated data to an available node in the cluster 104 where the job can be executed” teach determining if the processing speed of a third and fourth nodes in a second group is slower than the processing speed of a first and second nodes in a first group (for example, if third and fourth nodes have “delayed” or “failed” status) wherein the number of nodes in each competing group can be different (meaning one group can have fewer nodes than another group); Fig. 1 teaches at least eight nodes).
Chilimbi et al. and Chapelle et al. are analogous art to the claimed invention because they are directed to distributed data processing. 
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate the above limitation(s) as taught by Chapelle et al. to the disclosed invention of Chilimbi et al.
One of ordinary skill in the arts would have been motivated to make this modification to leverage “method, system, and programming aspects of efficient and reliable large scale distributed machine learning on a cluster. The method and system as disclosed herein aim at efficiently and effectively parallel learning very large datasets, including for example, trillions of features, billions of training samples, and millions of parameters, with a good predictive accuracy” (Chapelle et al. pg. 3 [0033]).
Regarding Claim 9,
Chilimbi et al. in view of Chapelle et al. in view of Yang et al. teaches the system of Claim 7.
Chapelle et al. further teaches wherein, if the processing speed of the first node and the second node is slower than the processing speed of the third node and the fourth node, an amount of processing of each of the third node and the fourth node of the second group in the parallel distributed processing is less than the amount of processing of the first node and the second node of the first group (Fig. 1 and pg. 4 [0043]: “The default replication factor in HADOOP is 3. However, it is understood that a different replication factor, i.e., different number of competing nodes, may be applied in other examples. The same machine learning process is then performed by each competing node 500, 502, 504 on the replicated data...the coordination node 108 may determine an operation node from the competing nodes with the replicated training data based on a processing speed of each competing node. In another example, instead of waiting for one competing node to finish the job, the coordination node 108 may inquire the status from each competing node after a certain time period. As shown in FIG. 5, the competing node 2 502 reports a "failed" status, which may be caused by any machine failure; the competing node 3 504 reports a "delayed" status, which indicates that the competing node 3 is busy handling other jobs. Nevertheless, once an operation node is determined from the competing nodes, the coordination node 108 then sends a connection instruction to the operation node, as described above. It is understood that, in case all the competing nodes are failed or delayed, the coordination node 108 may transfer the replicated data to an available node in the cluster 104 where the job can be executed” teach determining if the processing speed of a first and second nodes is slower than the processing speed of a third and fourth nodes, this means that the first and second nodes may have  “delayed” or “failed” status, which means that the amount of processing of the first and second nodes is more than that of the third and fourth nodes because the first and second nodes are “busy handling other jobs”; in other words, this means that the amount of processing of the third and fourth node is less than the amount of processing of the first and second nodes; pg. 3 [0033]: “The present disclosure describes method, system, and programming aspects of efficient and reliable large scale distributed machine learning on a cluster. The method and system as disclosed herein aim at efficiently and effectively parallel learning very large datasets, including for example, trillions of features, billions of training samples, and millions of parameters, with a good predictive accuracy” teaches parallel distributed processing; Fig. 1 teaches at least eight nodes).
Chilimbi et al. and Chapelle et al. are analogous art to the claimed invention because they are directed to distributed data processing. 
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate the above limitation(s) as taught by Chapelle et al. to the disclosed invention of Chilimbi et al.
One of ordinary skill in the arts would have been motivated to make this modification to leverage “method, system, and programming aspects of efficient and reliable large scale distributed machine learning on a cluster. The method and system as disclosed herein aim at efficiently and effectively parallel learning very large datasets, including for example, trillions of features, billions of training samples, and millions of parameters, with a good predictive accuracy” (Chapelle et al. pg. 3 [0033]).

Prior Art
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure: Wu et al. (US 2018/0295161 A1) teaches a service with a plurality of communication sessions relocated from a first server to a second server based on performance indicators associated with the communication sessions, which is relevant to Fig. 7 of the present application.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to YING YU CHEN whose telephone number is (571)270-1484. The examiner can normally be reached Monday-Friday 7:30 am-5:00 pm (EST).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kamran Afshar can be reached on (571) 272-7796. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/YING YU CHEN/Examiner, Art Unit 2125