DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
1.	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Continued Examination
2.	A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's “Request for Continued Examination” filed on 15 October 2020 [hereinafter Response] has been entered, where:
	Claims 1, 15, and 16 are amended.
	Claim 8 is cancelled.
Claims 1-7 and 9-20 are pending.
Claims 1-7 and 9-20 are rejected.
Information Disclosure Statement
3.	Information disclosure statements were submitted on 29 December 2020 and 30 December 2020. The submissions comply with the provisions of 37 CFR 1.97. Accordingly, the Examiner considered the information disclosure statements.
Claim Rejections - 35 U.S.C. § 103
4.	The following is a quotation of 35 U.S.C. § 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
5.	The factual inquiries for determining obviousness under 35 U.S.C. § 103 are summarized as follows:
1. 	Determining the scope and contents of the prior art.
2. 	Ascertaining the differences between the prior art and the claims at issue.
3. 	Resolving the level of ordinary skill in the pertinent art.
4. 	Considering objective evidence present in the application indicating obviousness or nonobviousness.
6.	This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. § 102(b)(2)(C) for any potential 35 U.S.C. § 102(a)(2) prior art against the later invention.
7.	Claims 1, 6, 7, 9-13, 15, 16, 19 and 20 are rejected under 35 U.S.C. § 103 as being unpatentable over US Published Application 20130290223 to Chapelle et al. [hereinafter Chapelle] in view of US Published Application 20150379424 to Dirac et al. [hereinafter Dirac], and further in view of Wang et al., “SINGA: Putting Deep Learning in the Hands of Multimedia Users,” MM ’15 (26 October 2015) [hereinafter Wang]..
Regarding claim 1, Chapelle teaches [i]n a computer system comprising a job server communicating with a plurality of compute nodes over a network (Chapelle ¶ 0039 teaches a coordination node 108 (that is, a job server) may be, for example, the gateway node of a HADOOP cluster (a plurality of compute nodes over a network), which is a special node that serves as an entry point and/or proxy when a user accesses the HADOOP cluster (that is, a job server communicating with a plurality of compute nodes over a network)), a method for training a plurality of machine learning models, wherein each machine learning model comprises a set of parameters (Chapelle ¶ 0035 teaches a “machine learning process” . . . may include any process that tunes a number of parameters to be simultaneously optimal on training dataset using one or more machines), the method comprising:
the job server receiving . . . jobs for training the machine learning models (Chapelle ¶ 0035 teaches a user 102 . . . may send a request to the cluster 104 via the network 106 . . . to start the distributed machine learning process; Chapelle ¶ 0039 teaches a coordination node 108 . . . is a special node that serves as an entry point and/or proxy (the job server) when a user accesses the HADOOP cluster (that is, via the job server receiving . . . jobs for training the machine learning models));
the job server allocating the training jobs to at least two training groups, at least one training group of the at least two training groups having two or more compute nodes (Chapelle ¶ 0004 teaches a cluster having multiple nodes. . . . In distributed computing, a problem is divided into many tasks, each of which is solved by one or more computers (that is, at least two training groups); Chapelle ¶ 0038 teaches that [b]y the time of running the distributed machine learning process, the training datasets have already resided on the cluster 104, for example, in the central training database 110 of the cluster 104, as shown in FIG. 1, or have been partitioned (that is, allocating the training jobs) across the regular nodes 104-1, 104-2, ... 104-7, 104-8 of the cluster 104 (that is, at least one training group of the at least two training groups having two or more compute nodes)) based on current requirements of the training jobs (Chapelle ¶ 0086 teaches the computer functions relating to machine learning may be implemented in a distributed fashion (that is, the job server allocating training jobs) on a number of similar platforms (one or more compute nodes), to distribute the processing load (based on current requirements of the training jobs)) and current status of the compute nodes, including the job server determining which compute nodes are included in which training group of the at least two training groups (Chapelle ¶ 0043 teaches the coordination node 108 . . . is configured to determine a plurality of operation nodes from the plurality of regular nodes based on a status of the machine learning process performed in each regular node (that is, current status of the compute nodes, including the job server determining which compute nodes are included in which training group of the at least two training groups));
each training group of the at least two training groups executing its allocated training jobs, said execution comprising updating values for the parameters of the machine learning models (Chapelle ¶ 0052 teaches [e]ach updated local parameter (updating values for the parameters) is calculated (that is, the at least two training groups executing its allocated training jobs) based on the initial aggregated parameter and the subset of the training data in each operation node); and
. . . , wherein the job server allocates the training jobs to training groups based on computing capability and/or availability of the compute nodes, based on data storage capability and/or availability of the compute nodes, and/or based on communications capability and/or availability between the compute nodes (Chapelle ¶ 0051 teaches [i]f a slow or died operation node is detected, . . . the subset of training data and local parameter of the slow or died operation node is moved (the job server allocates the training jobs to training groups) to a backup node in the cluster 104 (based on . . . availability of the compute nodes, . . . or availability between them)).
Though Chapelle teaches the feature of distributed machine learning on a cluster including a plurality of clusters based on a job request from a user, and calculating node parameters relating to the distributed machine learning, Chapelle, however, does not explicitly teach that the jobs are a plurality of jobs.

But Dirac teaches a plurality of jobs (Dirac, Abstract, teaches [a] machine learning service implements programmatic interfaces for a variety of operations on Dirac ¶ 0033 teaches that the [r]esults of some jobs (that is, a plurality of jobs) may be stored as [machine learning service] artifacts within repository 120; Dirac ¶ 0055 teaches that the [machine learning service] artifacts 601 may include . . . modifiable or in-development models 630 (that is, the plurality of jobs for training the machine learning models); Examiner notes that the plurality of jobs “for training” is an intended use of the plurality of jobs, which is not positively recited in the claim. Accordingly, Examiner affords such language little patentable weight).
	Chapelle and Dirac are from the same or similar field of endeavor. Chapelle teaches distributed machine learning on a cluster including a plurality of nodes. Dirac teaches selecting workload distribution strategies for jobs at a machine learning service. Thus, it would have been obvious to a person having ordinary skill in the art as of the effective filing date of the Applicant’s invention to modify the teachings of Chapelle pertaining to distributed machine learning with the machine learning service executing jobs for training machine learning models of Dirac.
	The motivation for doing so is to provide a customizable, easy-to-use machine learning service (MLS) designed to support large numbers of users and a wide variety of algorithms and problem sizes. (Dirac ¶ 0024).
	Though Chapelle and Dirac teaches the feature of a job server allocation of a plurality of training jobs in distributed machine learning environments, the combination of Chapelle and Dirac, however, does not explicitly teach -
* * *
and for the at least one training group that comprises two or more compute nodes, communicating the updated values of the parameters only between the two or more compute nodes of a same training group and using the communicated updated values in furtherance of the training job, . . . .
But Wang teaches -
* * *
and for at least one of the training groups that comprises two or more compute nodes, communicating the updated values of the parameters only between the two or more compute nodes of a same training group and using the communicated updated values in furtherance of the training job (Wang, Fig. 10, teaches a logical architecture of SINGA:

    PNG
    media_image1.png
    429
    467
    media_image1.png
    Greyscale

Wang, right column of p. 29, “6.1 System Architecture,” first paragraph, teaches that Figure 10 shows the logical architecture. The architecture consists of multiple server groups and worker groups, and each worker group communicates with only one server group. Each server group maintains a complete replica of the model parameters, and is responsible for handling requests (e.g., get or update parameters) from worker groups for the at least one training group that comprises two or more compute nodes, communicating the updated values of the parameters only between the two or more compute nodes of a same training group); see also Wang, right column of p. 30, “6.3 Parameter Sharing,” first paragraph), . . . .
	Chapelle, Dirac, and Wang are from the same or similar field of endeavor. Chapelle and Dirac are from the same or similar field of endeavor. Chapelle teaches distributed machine learning on a cluster including a plurality of nodes. Dirac teaches selecting workload distribution strategies for jobs at a machine learning service. Wang teaches distributed deep learning platform that has an intuitive programming model and good scalability. Thus, it would have been obvious to a person having ordinary skill in the art as of the effective filing date of the Applicant’s invention to modify the teachings of the combination of Chapelle and Dirac pertaining to distributed machine learning with executing jobs for training machine learning models with the deep learning platform of Wang.
The motivation for doing so is to, in distributed machine learning, improve the convergence rate and to improve the efficiency of each iteration and finding an optimal cluster setting (e.g., group size) that trades off between the convergence rate and efficiency to minimize the training time to reach certain accuracy. (Wang, left column of p. 26, “1. Introduction,” first full paragraph). 
Regarding claim 6, the combination of Chapelle, Dirac, and Wang teaches all of the limitations of claim 1, as described above. 
Chapelle also teaches further comprising:
the job server changing which compute nodes are included in which training group of the at least two training groups based on current requirements of the training jobs and current status of the compute nodes (Chapelle ¶ 0051 teaches [dynamically detecting] whether there is a slow (or died) operation node (based on current requirements of the training jobs and current status of the compute nodes) . . . based on the processing speed of each operation node. If a slow or died operation node is detected, . . . the subset of training data and local parameter of the slow or died operation node is moved to a backup node (changing which compute node) in the cluster 104 (that is, changing which compute nodes are included in the training group based on current requirements of the training jobs and current status of the compute nodes)).
Regarding claim 7, the combination of Chapelle, Dirac, and Wang teaches all of the limitations of claim 1, as described above. 
Chapelle also teaches wherein allocating the training jobs to training groups based on current status of the compute nodes comprises allocating the training jobs to training groups based on current capability of the compute nodes (Chapelle ¶ 0050 teaches an operation node is determined from competing nodes based on processing speed of each competing node (that is, based on current capability of the compute nodes)) and on current availability of the compute nodes (Chapelle ¶ 0050 teaches distributed machine learning . . . [where] the same subset of the training data is allocated to competing nodes . . . from the training database 110 of the cluster 104 (that is, the competing nodes are allocated based on current availability of the compute nodes)).
Regarding claim 9, the combination of Chapelle, Dirac and Wang teaches all of the limitations of claim 1, as described above.
Chapelle teaches wherein, for the at least one training group, the job server specifies the communications of the updated values between compute nodes (Chapelle ¶ 0039 teaches the coordination node 108 (that is, the job server) then may send a connection instruction to each operation node (that is, the job server specifies the communications). The connection instruction includes information identifying all other operation nodes that are connected to a particular operation node in accordance with the network topology and any other suitable information for forming the network topology. . . . The nodes to be connected may be identified by, for example, domain name, IP address, alias, or any other suitable mechanism, in the connection instruction (that is, the job server specifies the communications of the updated values between compute nodes); see also Chapelle ¶ 0046, which teaches a machine learning module 706 and an AllReducing module 708 are executed and running on the processor 702 in each operation node; Chapelle ¶ 0048 further teaches [t]he AllReducing module 708 is also configured to transmit the local parameter to at least one connected node in accordance with the network topology (that is, communications of the updated values)).
Regarding claim 10, the combination of Chapelle, Dirac and Wang teaches all of the limitations of claim 1, as described above.
Chapelle teaches wherein the training jobs begin with initial values for the parameters, progress through interim values of the parameters and end with final values for the parameters (Chapelle ¶ 0047 teaches different optimization algorithms may be applied by the machine learning module 706 in the first iteration (that is, the training jobs being with initial values for the parameters) and the following iterations (that is, progress through interim values of the parameters); Chapelle ¶ 0059 teaches that the [L-BFGS] algorithm benefits from . . . rapid convergence (that is, end with final values for the parameters)), and determination of the interim and final values of the parameters is performed by the compute nodes in the training groups rather than by the job server (Chapelle ¶ 0039 teaches [t]he coordination node 108 . . . is configured to determine a plurality of operation nodes 200 from the plurality of regular nodes based on a status of the machine learning process performed in each regular node. The status is received by the coordination node 108 from each regular node of the cluster 104. The coordination node 108 is further configured to connect the plurality of operation nodes 200 to form a network topology (that is, because the coordination node 108 pertains to forming the network topology, then the determination of the interim and final values of the parameters is performed by the compute nodes in the training groups rather than by the job server)).
Regarding claim 11, the combination of Chapelle, Dirac and Wang teaches all of the limitations of claim 10, as described above.
Chapelle teaches wherein, for at least one of the training jobs, the job server does not access the final values (Chapelle ¶ 0043 & FIG. 5 teaches [t]he first competing node that finishes the [training] process may report a “completed” status to the coordination node 108 (that is, with a status of “completed,” “failed,” or “delayed” from operation nodes, the coordination node of Chapelle, for at least one of the training jobs, the job server does not access the final values)).
Regarding claim 12, the combination of Chapelle, Dirac and Wang teaches all of the limitations of claim 1, as described above. 
Chapelle teaches further comprising:
the job server monitoring the training groups' execution of their allocated jobs (Chapelle ¶ 0051 teaches whether there is a slow (or died) operation node is dynamically detected (that is, to dynamically detect is by the job server monitoring the training groups’ executing of their allocated jobs)).
Regarding claim 13, the combination of Chapelle, Dirac and Wang teaches all of the limitations of claim 1, as described above.
Chapelle also teaches further comprising:
for at least one of the training jobs, the job server providing a visual display of the parameters for the training job (Chapelle ¶ 0077 & FIG. 14 teaches showing test auPRC, on both datasets, as a functions of the number of iterations for four different strategies (visual display of the parameters for the training job)).
Regarding claim 15, Chapelle teaches [a] non-transitory computer-readable storage medium storing executable computer program instructions (Chapelle ¶ 0016 teaches a machine readable and non-transitory medium having information recorded thereon for distributed machine learning on a cluster) for training a plurality of machine learning models, wherein each machine learning model comprises a set of parameters, the instructions executable by a processor and causing the processor to perform a method (Chapelle ¶ 0016 teaches the information, when read by the machine the instructions executable by a processor), causes the machine to perform a series of steps (causing the processor to perform a method)) comprising:
receiving a plurality of jobs for training the machine learning models (Chapelle ¶ 0035 teaches a user 102 . . . may send a request to the cluster 104 via the network 106 . . . to start the distributed machine learning process; Chapelle ¶ 0039 teaches a coordination node 108 . . . is a special node that serves as an entry point and/or proxy (the job server) when a user accesses the HADOOP cluster (that is, via the user, the coordination node 108 receives, which is the job server receiving . . . jobs for training the machine learning models)) . . . ;
allocating the training jobs to . . . at least one training group of the at least two training groupshaving two or more compute nodes (Chapelle ¶ 0038 teaches that [b]y the time of running the distributed machine learning process, the training datasets have already resided on the cluster 104, for example, in the central training database 110 of the cluster 104, as shown in FIG. 1, or have been partitioned (that is, allocating the training jobs) across the regular nodes 104-1, 104-2, ... 104-7, 104-8 of the cluster 104 (that is, at least one training group of the at least two training groups having two or more compute nodes)) based on current requirements of the training jobs (Chapelle ¶ 0086 teaches the computer functions relating to machine learning may be implemented in a distributed fashion (that is, the job server allocating training jobs) on a number of similar platforms (one or more compute nodes), to distribute the processing load (based on current requirements of the training jobs)) and current status of the compute nodes (Chapelle ¶ 0043 teaches the coordination node 108 . . . is configured to determine a plurality of operation nodes from the plurality of regular based on . . . current status of the compute nodes)),
wherein:
each training group of the at least two training groups execute its allocated training jobs, said execution comprising updating values for the parameters of the machine learning models (Chapelle ¶ 0052 teaches [e]ach updated local parameter (updating values for the parameters) is calculated (that is, each training group of the at least two training groups execute its allocated training jobs) based on the initial aggregated parameter and the subset of the training data in each operation node); and
 . . . , wherein the job server allocates the training jobs to training groups based on computing capability and/or availability of the compute nodes, based on data storage capability and/or availability of the compute nodes, and/or based on communications capability and/or availability between the compute nodes (Chapelle ¶ 0051 teaches [i]f a slow or died operation node is detected, . . . the subset of training data and local parameter of the slow or died operation node is moved (the job server allocates the training jobs to training groups) to a backup node in the cluster 104 (based on . . . availability of the compute nodes, . . . or availability between them)).
Though Chapelle teaches the feature of distributed machine learning on a cluster including a plurality of clusters based on a job request from a user, and calculating node parameters relating to the distributed machine learning, Chapelle, however, does not explicitly teach that the jobs are a plurality of jobs.
Dirac teaches a plurality of jobs (Dirac, Abstract, teaches [a] machine learning service implements programmatic interfaces for a variety of operations on several entity types, such as . . . models; resulting from the plurality of jobs for training, Dirac ¶ 0033 teaches that the [r]esults of some jobs (that is, a plurality of jobs) may be stored as [machine learning service] artifacts within repository 120; Dirac ¶ 0055 teaches that the [machine learning service] artifacts 601 may include . . . modifiable or in-development models 630 (that is, the plurality of jobs for training the machine learning models); Examiner notes that the plurality of jobs “for training” is an intended use of the plurality of jobs, which is not positively recited in the claim. Accordingly, Examiner affords such language little patentable weight).
	Chapelle and Dirac are from the same or similar field of endeavor. Chapelle teaches distributed machine learning on a cluster including a plurality of nodes. Dirac teaches selecting workload distribution strategies for jobs at a machine learning service. Thus, it would have been obvious to a person having ordinary skill in the art as of the effective filing date of the Applicant’s invention to modify the teachings of Chapelle pertaining to distributed machine learning with the machine learning service executing jobs for training machine learning models of Dirac.
The motivation for doing so is to provide a customizable, easy-to-use machine learning service (MLS) designed to support large numbers of users and a wide variety of algorithms and problem sizes. (Dirac ¶ 0024).
Though Chapelle and Dirac teaches the feature of a job server allocation of a plurality of training jobs in distributed machine learning environments, the combination of Chapelle and Dirac, however, does not explicitly teach -
* * *
. . . wherein:
* * *
and for the at least one training group that comprises two or more compute nodes, the compute nodes of the training group communicate the updated values of the parameters only between the two or more compute nodes of a same training group and use the communicated updated values in furtherance of the training job, . . .
But Wang teaches -
* * *
. . . wherein:
* * *
and for at least one of the training groups that comprises two or more compute nodes, communicating the updated values of the parameters only between the two or more compute nodes of a same training group and using the communicated updated values in furtherance of the training job (Wang, Fig. 10, teaches a logical architecture of SINGA:

    PNG
    media_image1.png
    429
    467
    media_image1.png
    Greyscale

Wang, right column of p. 29, “6.1 System Architecture,” first paragraph, teaches that Figure 10 shows the logical architecture. The architecture consists of multiple server groups and worker groups, and each worker group communicates with only one server group. Each server group maintains a complete replica of the model parameters, and is responsible for handling requests (e.g., get or update parameters) from worker groups (that is, for the at least one training group that comprises two or more compute nodes, communicating the updated values of the parameters only between the two or more compute nodes of a same training group); see also Wang, right column of p. 30, “6.3 Parameter Sharing,” first paragraph), . . . .
Chapelle, Dirac, and Wang are from the same or similar field of endeavor. Chapelle and Dirac are from the same or similar field of endeavor. Chapelle teaches distributed machine learning on a cluster including a plurality of nodes. Dirac teaches selecting workload distribution strategies for jobs at a machine learning service. Wang teaches distributed deep learning platform that has an intuitive programming model and good scalability. Thus, it would have been obvious to a person having ordinary skill in the art as of the effective filing date of the Applicant’s invention to modify the teachings Chapelle and Dirac pertaining to distributed machine learning with executing jobs for training machine learning models with the deep learning platform of Wang.
The motivation for doing so is to, in distributed machine learning, improve the convergence rate and to improve the efficiency of each iteration and finding an optimal cluster setting (e.g., group size) that trades off between the convergence rate and efficiency to minimize the training time to reach certain accuracy. (Wang, left column of p. 26, “1. Introduction,” first full paragraph).
Regarding claim 16, Chapelle teaches [a] computer system for training a plurality of machine learning models, wherein each machine learning model comprises a set of parameters (Chapelle ¶ 0035 teaches a “machine learning process” . . . may include any process that tunes a number of parameters to be simultaneously optimal on training dataset using one or more machines), the computer system comprising:
a job server (Chapelle ¶ 0039 teaches a coordination node 108 . . . is a special node that serves as an entry point and/or proxy (the job server) when a user accesses the HADOOP cluster); and
a plurality of compute nodes in communication with the job server (Chapelle ¶ 0037 teaches [a] cluster 104 in which the distributed machine learning is performed includes a plurality of regular nodes 104-1, 104-2, ... 104-7, 104-8, (that is, a plurality of compute nodes) and at least one coordination node 108 (e.g., a gateway node in a HADOOP cluster), which communicate through the network 106 (in communication with the job server));
wherein the job server receives . . . jobs for training the machine learning models (Chapelle ¶ 0035 teaches a user 102 . . . may send a request to the cluster 104 via the network 106 . . . to start the distributed machine learning process; Chapelle ¶ 0039 teaches a coordination node 108 . . . is a special node that serves as an entry point and/or proxy (the job server) when a user accesses the HADOOP cluster (that is, via the user, the coordination node 108 receives, which is the job server receives)); 
the job server allocates the training jobs to at least two training groups, at least one training group of the at least two training groups having two or more compute nodes based on current requirements of the training groups, at least one training group of the at least two training groups having the training jobs and current status of the compute nodes having two or more compute nodes based on current requirements of the training jobs and current status of the compute nodes (Chapelle ¶ 0004 teaches a cluster having multiple nodes. . . . In distributed computing, a problem is divided into many tasks, each of which is solved by one or more computers (that is, at least two training groups); Chapelle ¶ 0038 teaches that [b]y the time of running the distributed machine learning process, the training datasets have already resided on the cluster 104, for example, in the central training database 110 of the cluster 104, as shown in FIG. 1, or have been partitioned (that is, the job server allocates the training jobs) across the regular nodes 104-1, 104-2, ... 104-7, 104-8 of the cluster 104 (to . . . training groups having . . . one or more compute nodes); Chapelle ¶ 0086 teaches the computer functions relating to machine learning may be implemented in a distributed fashion (that is, the job server allocating training jobs) on a number of one or more compute nodes), to distribute the processing load (based on current requirements of the training jobs)); 
and the job server determines which compute nodes are included in which training group of the at least two training groups (Chapelle ¶ 0043 teaches the coordination node 108 . . . is configured to determine a plurality of operation nodes from the plurality of regular nodes based on a status of the machine learning process performed in each regular node (that is, the job server determines which compute nodes are included in which training group)); and
wherein each training group of the at least two training groups executing its allocated training jobs, said execution comprising updating values for the parameters of the machine learning models (Chapelle ¶ 0052 teaches [e]ach updated local parameter (updating values for the parameters of the machine learning models) is calculated (that is, each training group of the . . . training groups executing its allocated training jobs) based on the initial aggregated parameter and the subset of the training data in each operation node); and. . . , wherein the job server allocates the training jobs to training groups based on computing capability and/or availability of the compute nodes, based on data storage capability and/or availability of the compute nodes, and/or based on communications capability and/or availability between the compute nodes (Chapelle ¶ 0051 teaches [i]f a slow or died operation node is detected, . . . the subset of training data and local parameter of the slow or died operation node is moved (the job server allocates the training jobs to training groups) to a backup node in the cluster 104 (based on . . . availability of the compute nodes, . . . or availability between them)).
Chapelle teaches the feature of distributed machine learning on a cluster including a plurality of clusters based on a job request from a user, and calculating node parameters relating to the distributed machine learning, Chapelle, however, does not explicitly teach that the jobs are a plurality of jobs.
But Dirac teaches a plurality of jobs (Dirac, Abstract, teaches [a] machine learning service implements programmatic interfaces for a variety of operations on several entity types, such as . . . models; resulting from the plurality of jobs for training, Dirac ¶ 0033 teaches that the [r]esults of some jobs (that is, a plurality of jobs) may be stored as [machine learning service] artifacts within repository 120; Dirac ¶ 0055 teaches that the [machine learning service] artifacts 601 may include . . . modifiable or in-development models 630 (that is, the plurality of jobs for training the machine learning models); Examiner notes that the plurality of jobs “for training” is an intended use of the plurality of jobs, which is not positively recited in the claim. Accordingly, Examiner affords such language little patentable weight).
	Chapelle and Dirac are from the same or similar field of endeavor. Chapelle teaches distributed machine learning on a cluster including a plurality of nodes. Dirac teaches selecting workload distribution strategies for jobs at a machine learning service. Thus, it would have been obvious to a person having ordinary skill in the art as of the effective filing date of the Applicant’s invention to modify the teachings of Chapelle pertaining to distributed machine learning with the machine learning service executing jobs for training machine learning models of Dirac.
Dirac ¶ 0024).
Though Chapelle and Dirac teaches the feature of a job server allocation of a plurality of training jobs in distributed machine learning environments, the combination of Chapelle and Dirac, however, does not explicitly teach -
* * *
. . . and, for the at least one training group that comprises two or more compute nodes, the compute nodes communicate the updated values of the parameters only between the two or more compute nodes of a same training group and use the communicated updated values in furtherance of the training job, . . . .
But Wang teaches -
* * *
. . . and, for the at least one training group that comprises two or more compute nodes, the compute nodes communicate the updated values of the parameters only between the two or more compute nodes of a same training group and use the communicated updated values in furtherance of the training job (Wang, Fig. 10, teaches a logical architecture of SINGA:

    PNG
    media_image1.png
    429
    467
    media_image1.png
    Greyscale

Wang, right column of p. 29, “6.1 System Architecture,” first paragraph, teaches that Figure 10 shows the logical architecture. The architecture consists of multiple server groups and worker groups, and each worker group communicates with only one server group. Each server group maintains a complete replica of the model parameters, and is responsible for handling requests (e.g., get or update parameters) from worker groups (that is, for the at least one training group that comprises two or more compute nodes, the compute nodes communicate the updated values of the parameters only between the two or more compute nodes of a same training group and use the communicated updated values in furtherance of the training job); see also Wang, right column of p. 30, “6.3 Parameter Sharing,” first paragraph), . . . .
	Chapelle, Dirac, and Wang are from the same or similar field of endeavor. Chapelle and Dirac are from the same or similar field of endeavor. Chapelle teaches distributed machine learning on a cluster including a plurality of nodes. Dirac teaches selecting workload distribution strategies for jobs at a machine learning service. Wang teaches distributed deep learning platform that has an intuitive programming model and good scalability. Thus, it would have been obvious to a person having ordinary skill in Chapelle and Dirac pertaining to distributed machine learning with executing jobs for training machine learning models with the deep learning platform of Wang.
The motivation for doing so is to, in distributed machine learning, improve the convergence rate and to improve the efficiency of each iteration and finding an optimal cluster setting (e.g., group size) that trades off between the convergence rate and efficiency to minimize the training time to reach certain accuracy. (Wang, left column of p. 26, “1. Introduction,” first full paragraph).
Regarding claim 19, the combination of Chapelle, Dirac, and Wang teaches all of the limitations of claim 16, as described above.
Chapelle also teaches further comprising:
a buffer node in communication with the compute nodes (Chapelle ¶ 0038 teaches [a] cluster 104 . . . includes a plurality of regular nodes (that is, compute nodes) . . . and at least one coordination node 108; Chapelle ¶ 0038 teaches the cluster 104 may also include a training database 110 (that is, in communication with the compute nodes)), the buffer node buffering data to be used in a next training job to be executed by the compute nodes (Chapelle ¶ 0038 teaches that the training database 110 . . . stores one or more very large training datasets (that is, buffer node), for example, each including trillions of features, billions of training samples, and millions of parameters, for distributed machine learning performed on the cluster 104 (the buffer node buffering data to be used in a next training job to be executed by the compute nodes)).
Regarding claim 20, the combination of Chapelle, Dirac, and Wang teaches all of the limitations of claim 16, as described above. 
Wang teaches wherein the two or more compute nodes in the at least one training group comprise a memory shared by the compute nodes, and the compute nodes communicate the updated values of the parameters by communicating locations of the updated values in the shared memory (Wang, left column of p. 30, “6.1 System Architecture,” first full paragraph, teaches [i]f two units manage the same parameter partition and they are in the same process, it is possible for them to leverage the shared memory (that is, a memory shared by the compute nodes) to reduce communication cost, as discussed later in Section 6.3; Wang, right column of p. 30, “6.3 Parameter Sharing,” first paragraph, teaches If workers and servers resident in the same process, their ParamShard (partitions) can be configured to share the same memory space. In this case, the messages transferred between different execution units just contain pointers to the data, which reduces the communication cost (that is, wherein the two or more compute nodes in the at least one training group comprise a memory shared by the compute nodes, and the compute nodes communicate the updated values of the parameters by communicating locations of the updated values in the shared memory)).
Chapelle, Dirac, and Wang are from the same or similar field of endeavor. Chapelle teaches distributed machine learning on a cluster including a plurality of nodes. Dirac teaches selecting workload distribution strategies for jobs at a machine learning service. Wang teaches distributed deep learning platform that has an intuitive programming model and good scalability. Thus, it would have been obvious to a person Chapelle and Dirac pertaining to distributed machine learning parameters with the machine learning service providing model pointers to storage devices for training machine learning model job execution results of Wang.
The motivation for doing so is to, in distributed machine learning, improve the convergence rate and to improve the efficiency of each iteration and finding an optimal cluster setting (e.g., group size) that trades off between the convergence rate and efficiency to minimize the training time to reach certain accuracy. (Wang, left column of p. 26, “1. Introduction,” first full paragraph).
8.	Claims 2 and 5 are rejected under 35 U.S.C. § 103 as being unpatentable over US Published Application 20130290223 to Chapelle et al. [hereinafter Chapelle] in view of US Published Application 20150379424 to Dirac et al. [hereinafter Dirac] and Wang et al., “SINGA: Putting Deep Learning in the Hands of Multimedia Users,” MM ’15 (26 October 2015) [hereinafter Wang]., and further in view of Padhy, “Big Data Processing with Hadoop-MapReduce in Cloud Systems,” Int’l Journal of Cloud Computing & Services Science (2013) [hereinafter Padhy].
Regarding claim 2, the combination of Chapelle, Dirac, and Wang teaches all of the limitations of claim 1, as described above. 
Though the combination of Chapelle, Dirac, and Wang teaches the feature multiple server groups and multiple worker groups, where each worker group Chapelle, Dirac, and Wang does not explicitly teach -
wherein the computer system has a master-worker architecture, wherein the job server operates as a master for each of the training groups and each training group operates as a worker for the job server.
But Padhy teaches - 
wherein the computer system has a master-worker architecture, wherein the job server operates as a master for each of the training groups and each training group operates as a worker for the job server (Padhy at p. 19, Section 3.1, first paragraph, teaches that Hadoop is a Map/Reduce framework that works on [Hadoop Distributed File System (HDFS)]; Padhy at p. 19, Section 3.1, paragraph 5, and FIG. 3, teaches [e]ach node in a Hadoop cluster is either a master or a slave (that is, the computer system has a master-worker architecture); Padhy at p. 19, Section 4, first paragraph, teaches that [a]n HDFS cluster has two types of node operating in a master-worker pattern: a NameNode (the master) and a number of DataNodes (workers). The [master] manages the file system namespace [and] maintains the file system tree and the metadata for all the files and directories in the tree. . . . [The slaves] store and retrieve blocks when they are told to (by clients or the [master]) (that is, the job server operates as a master for each of the training groups), and they report back to the [master] (that is, each training group operates as a worker for the job server)).
Chapelle, Dirac, Wang, and Padhy are from the same or similar field of endeavor. Chapelle teaches distributed machine learning based on a Hadoop cluster having a coordination node and plurality of operation nodes. Dirac teaches a machine Wang teaches distributed deep learning platform that has an intuitive programming model and good scalability. Padhy teaches a master-worker architecture for large scale machine learning and data mining applications. Thus, it would have been obvious to a person having ordinary skill in the art as of the effective filing date of the Applicant’s invention to incorporate the teachings of the combination of Chapelle, Dirac, and Wang pertaining to distributed machine learning with the master-worker architecture of Padhy.
The motivation for doing so is because the exponential growth of data requires new strategies for processing and analyzing information. (Padhy, Abstract).
Regarding claim 5, the combination of Chapelle, Dirac, Wang, and Padhy teaches all of the limitations of claim 2, as described above.
Chapelle teaches wherein for at least one of the training groups with two or more compute nodes: 
the training job begins with initial values for the parameters (Chapelle ¶ 0011 teaches [a] stochastic gradient descent process is performed based on the subset of the training data to calculate an initial local parameter (that is, the training job beings with initial values for the parameters) and ends with final values for the parameters (Chapelle ¶ 0059 teaches [t]he [L-BFGS] algorithm benefits from the fast reduction of error initially that an online algorithm provides, and rapid convergence (that is, final values for the parameters) in a good neighborhood), and updating of the parameters from the initial values to the final values is performed and stored by one of the compute nodes in the training group (Chapelle ¶ 0046 that [e]ach operation node one of the compute nodes in the training groups) includes . . . a storage 704 operatively coupled to each other. . . . The storage 704 includes . . . a parameter storage 712 for temporally or permanently storing local and aggregated parameters; with respect to parameter updating, Chapelle ¶ 0047 teaches an online optimization algorithm such as a stochastic gradient descent process may be applied to the initial iteration to generate initial local parameters, while a batch optimization algorithm such as a batch gradient descent process may be applied in the following iterations to generate updated local parameters (that is, updating of the parameters from the initial values to the final values is performed and stored)).
9.	Claim 3 is rejected under 35 U.S.C. § 103 as being unpatentable over US Published Application 20130290223 to Chapelle et al. [hereinafter Chapelle] in view of US Published Application 20150379424 to Dirac et al. [hereinafter Dirac] and Wang et al., “SINGA: Putting Deep Learning in the Hands of Multimedia Users,” MM ’15 (26 October 2015) [hereinafter Wang]., and further in view of Padhy, “Big Data Processing with Hadoop-MapReduce in Cloud Systems,” Int’l Journal of Cloud Computing & Services Science (2013) [hereinafter Padhy] and US Patent 8027938 to Xu et al. [hereinafter Xu].
Regarding claim 3, the combination of Chapelle, Dirac, Wang, and Padhy teaches all of the limitations of claim 2, as described above. 
Though the combination of Chapelle, Dirac, Wang and Padhy teaches the feature multiple server groups and multiple worker groups, where each worker group Chapelle, Dirac, Wang and Padhy does not explicitly teach -
wherein at least one of the training groups with two or more compute nodes also has a master-worker architecture within the training group, wherein one of the compute nodes in the training group operates as a master for a remainder of the compute nodes in the training group and the remainder of the compute nodes operate as workers for the one compute node.
But Xu teaches -
wherein at least one of the training groups with two or more compute nodes also has a master-worker architecture within the training group (Xu 3:42-47 teaches [i]n the distributed architecture 100, a specified computation operation . . . is disturbed [sic] down from the master to the works level-by-level (that is, at least one of the training groups with two or more compute nodes also has a master-worker architecture within the training group), wherein one of the compute nodes in the training group operates as a master for a remainder of the compute nodes in the training group and the remainder of the compute nodes operate as workers for the one compute node (the discriminative learning process of Xu 4:13-16 teaches [f]or example, a particular worker can perform a local combination of results from lower level workers depending on the level of the particular worker (that is, the remainder . . . operate as workers for the one compute node) in communication with the worker (this worker being one . . . in the training group operates as a master for a remainder of the compute nodes)).
Chapelle, Dirac, Wang, Padhy, and Xu are from the same or similar field of endeavor. Chapelle teaches distributed machine learning based on a Hadoop cluster having a coordination node and plurality of operation nodes. Dirac teaches a machine learning service having a request handler to accept client requests and inserts corresponding job objects into a MLS job queue for distribution. Wang teaches distributed deep learning platform that has an intuitive programming model and good scalability. Padhy teaches a master-worker architecture for large scale machine learning and data mining applications. Xu teaches a distributed architecture for processing of parts of training data by workers. Thus, it would have been obvious to a person having ordinary skill in the art as of the effective filing date of the Applicant’s invention to modify the teachings of the combination of Chapelle and Dirac pertaining to distributed machine learning, the shared workgroup parameter space of Wang, and the master-worker architecture of Padhy with the distributed machine learning architecture of Xu.
The motivation for doing so is to improve the robustness of a large margin online learning algorithm without sacrificing convergence rates. (Xu 3:3-5).
10.	Claim 4 is rejected under 35 U.S.C. § 103 as being unpatentable over US Published Application 20130290223 to Chapelle et al. [hereinafter Chapelle] in view of US Published Application 20150379424 to Dirac et al. [hereinafter Dirac] and Wang et al., “SINGA: Putting Deep Learning in the Hands of Multimedia Users,” MM ’15 (26 October 2015) [hereinafter Wang]., and further in view of Padhy, “Big Data Processing with Hadoop-MapReduce in Cloud Systems,” Int’l Journal of Cloud Computing & Services Science (2013) [hereinafter Padhy] and US Published Application 20160103901 to Kadav et al. [hereinafter Kadav].
Regarding claim 4, the combination of Chapelle, Dirac, Wang, and Padhy teaches all of the limitations of claim 2, as described above. 
Though the combination of Chapelle, Dirac, Wang and Padhy teaches the feature of multiple server groups and multiple worker training groups, where each worker group communicates with only one server group, the combination of Chapelle, Dirac, Wang and Padhy does not explicitly teach -
wherein at least one of the training groups with two or more compute nodes has a peer-to-peer architecture within the training group.
But Kadav teaches - 
wherein at least one of the training groups with two or more compute nodes has a peer-to-peer architecture within the training group (Kadav ¶ 0020 teaches distributed machine learning over existing [machine learning] systems . . . [A machine learning toolset (MALT)] provides peer-to-peer learning by interleaving gradient (changes to parameters) updates with parameter values to limit network costs).
Chapelle, Dirac, Wang, Padhy, and Kadav are from the same or similar field of endeavor. Chapelle teaches distributed machine learning based on a Hadoop cluster having a coordination node and plurality of operation nodes. Dirac teaches a machine learning service having a request handler to accept client requests and inserts corresponding job objects into a machine learning service (MLS) job queue for distribution. Wang teaches distributed deep learning platform that has an intuitive programming model and good scalability. Padhy teaches a master-worker architecture Kadav teaches a distributed machine learning over existing machine learning systems. Thus, it would have been obvious to a person having ordinary skill in the art as of the effective filing date of the Applicant’s invention to modify the teachings of the combination of Chapelle and Dirac pertaining to distributed machine learning, the shared workgroup parameter space of Wang, and the master-worker architecture of Padhy with the peer-to-peer learning of Kadav.
The motivation for doing so is to reduce the model training time, which leads to better models being produced at shorter intervals, as well as in parameter tuning. (Kadav ¶ 0007).
11.	Claims 14, 17, and 18 are rejected under 35 U.S.C. § 103 as being unpatentable over US Published Application 20130290223 to Chapelle et al. [hereinafter Chapelle] in view of US Published Application 20150379424 to Dirac et al. [hereinafter Dirac] and Wang et al., “SINGA: Putting Deep Learning in the Hands of Multimedia Users,” MM ’15 (26 October 2015) [hereinafter Wang]., and further in view of Boulon et al., “Chukwa: A Large-Scale Monitoring System,” (2008) [hereinafter Boulon].
Regarding claim 14, the combination of Chapelle, Dirac, and Wang teaches all of the limitations of claim 1, as described above. 
Though the combination of Chapelle, Dirac, and Wang teaches the feature of displays in a machine learning context, the combination of Chapelle, Dirac, and Wang, however, does not explicitly teach further comprising:
the job server providing a visual display of the current status of the compute nodes and/or a current availability of the compute nodes.
But Boulon teaches further comprising:
the job server providing a visual display of the current status of the compute nodes and/or a current availability of the compute nodes (Boulon, right column of p. 1, Section 1, third paragraph, teaches Chukwa can scale to thousands of nodes in both collection and analysis capacities, while providing a standardized and familiar framework for processing the collected data; Boulon, left column of p. 2, Section 2, paragraphs 1-3, teaches Chukwa [is] to monitor multiple clusters . . . [including] what resources are available for future jobs; Boulon, left column of p. 4, Section 4, first paragraph, teaches [t]o ease analysis of collected data, we’ve built a flexible, configurable, “portal-style” web interface to Chukwa, termed the Hadoop Infrastructure Care Center (HICC); Boulon FIG. 2 teaches a graphical user interface:

    PNG
    media_image2.png
    492
    594
    media_image2.png
    Greyscale

compute nodes) metric tabs including “Global Status”, “Cluster Status”, “Hod Job Viewer”, “DFS Status”, and “Event Viewer” (the job server providing a visual display of the current status of the compute nodes); see also, e.g., HICC User Guide at <https://chukwa.apache.org/docs/r0.5.0/hicc.html>).
Chapelle, Dirac, Wang, and Boulon are from the same or similar field of endeavor. Chapelle teaches distributed machine learning on a cluster including a plurality of nodes. Dirac teaches selecting workload distribution strategies for jobs at a machine learning service. Wang teaches distributed deep learning platform that has an intuitive programming model and good scalability. Boulon teaches a visual tool for monitoring of large distributed systems. Thus, it would have been obvious to a person having ordinary skill in the art as of the effective filing date of the Applicant’s invention to modify the teachings of the combination of Chapelle, Dirac, and Wang pertaining to distributed machine learning service executing jobs for training machine learning models with the distributed service visualization monitoring of Boulon.
	The motivation for doing so is to provide a tool built on top of a distributed file system implementation for displaying monitoring and analysis results, in order to make the best use of collected data. (Boulon, Abstract).
Regarding claim 17, the combination of Chapelle, Dirac, and Wang teaches all of the limitations of claim 16, as described above. 
Though the combination of Chapelle, Dirac, and Wang teaches the feature of distributed machine learning, the combination of Chapelle, Dirac, and Wang, however, does not explicitly teach -
 together include at least 1,000 processor units.
But Boulon teaches -
wherein the job server and the plurality of compute nodes together include at least 1,000 processor units (where processing power for the job server or a compute node is provided by a processor unit, Boulon, left column of p. 2, Section 2, eighth paragraph, teaches [o]ur initial goal was to be able to monitor Hadoop clusters (the job server and the plurality of compute nodes) of 2000 nodes (together at least 1,000 processor units), outputting 5 to 6 MB of data per second, and to have collected data available for processing within ten minutes).
Chapelle, Dirac, Wang and Boulon are from the same or similar field of endeavor. Chapelle teaches distributed machine learning on a cluster including a plurality of nodes. Dirac teaches selecting workload distribution strategies for jobs at a machine learning service. Wang teaches distributed deep learning platform that has an intuitive programming model and good scalability. Boulon teaches a visual tool for monitoring of large distributed systems. Thus, it would have been obvious to a person having ordinary skill in the art as of the effective filing date of the Applicant’s invention to modify the teachings of the combination of Chapelle, Dirac, and Wang pertaining to distributed machine learning service executing jobs for training machine learning models with the distributed service visualization monitoring of Boulon.
	The motivation for doing so is to provide a tool built on top of a distributed file system implementation for displaying monitoring and analysis results of large distributed systems in order to make the best use of collected data. (Boulon, Abstract).
Regarding claim 18, the combination of Chapelle, Dirac, and Wang teaches all of the limitations of claim 16, as described above.
Though the combination of Chapelle, Dirac, and Wang teaches the feature of display devices in a distributed machine learning apparatus and method, the combination of Chapelle, Dirac, and Wang, however, does not explicitly teach -
a display node in communication with the job server wherein, for at least one of the training jobs, the display node provides a visual display of the parameters for the training job.
But Boulon teaches -
a display node in communication with the job server (Boulon, Abstract, teaches the design and initial implementation of Chukwa, a data collection system for monitoring and analyzing large distributed systems. Chukwa is built on top of Hadoop (that is, a display node), an open source distributed file system and MapReduce implementation; Boulon teaches that Chukaw includes adaptors to collect Hadoop logs, application metrics, and system telemetry (that is, in communication with the job server)) wherein, for at least one of the training jobs, the display node provides a visual display of the parameters for the training job (Boulon, right column of p. 1, Section 1, third paragraph, teaches Chukwa can scale to thousands of nodes in both collection and analysis capacities, while providing a standardized and familiar framework for processing the collected data; Boulon, left column of p. 2, Section 2, paragraphs 1-3, teaches Chukwa [is] to monitor multiple clusters . . . [including] what resources are available for future jobs; Boulon, left column of p. 4, Section 4, first paragraph, teaches [t]o ease analysis of collected data, we’ve built a flexible, Boulon FIG. 2 teaches a graphical user interface:

    PNG
    media_image2.png
    492
    594
    media_image2.png
    Greyscale

FIG. 2 teaches DataNode (compute nodes) metric tabs including “Global Status”, “Cluster Status”, “Hod Job Viewer”, “DFS Status”, and “Event Viewer” (the job server providing a visual display of the current status of the compute nodes); see also, e.g., HICC User Guide at <https://chukwa.apache.org/docs/r0.5.0/hicc.html>).
Chapelle, Dirac, Wang, and Boulon are from the same or similar field of endeavor. Chapelle teaches distributed machine learning on a cluster including a plurality of nodes. Dirac teaches selecting workload distribution strategies for jobs at a machine learning service. Wang teaches distributed deep learning platform that has an intuitive programming model and good scalability. Boulon teaches a visual tool for monitoring of large distributed systems. Thus, it would have been obvious to a person having ordinary skill in the art as of the effective filing date of the Applicant’s invention to Chapelle, Dirac, and Wang pertaining to distributed machine learning service executing jobs for training machine learning models with the distributed service monitoring of Boulon.
The motivation for doing so is to provide a tool built on top of a distributed file system implementation for displaying monitoring and analysis results, in order to make the best use of collected data. (Boulon, Abstract).
Response to Arguments
12.	Applicant’s arguments with have been fully considered. Examiner responds below.
13.	Applicant argues that “Chapelle does not show the organization of compute nodes in different training groups and sharing of computed parameters only between compute nodes within a particular training group. At best, Fig. 7 shows only a single training group, and all Operation Nodes A-C exchanging information with one another.” (Response at p. 9). Further, with respect to the After Final Amendment, Applicant submits that “[t]o expedite prosecution, Applicant further amended claim 1 to specify "the job server allocating the training jobs to at least two training groups, at least one training group of the at least two training groups having two or more compute nodes," and "for the at least one training group that comprises two or more compute nodes, communicating the updated values of the parameters only between the two or more compute nodes of a same training group." None of the cited portions of the cited references disclosed such subject matter.” (Response at p. 9).
Chapelle, with regard to different training groups, and sharing of computed parameters only between compute nodes within that particular training group. Examiner cites to Wang as teaching these features.
Examiner notes the claims recite language, by way of example, such as “allocating,” “current status,” “availability.” Under the BRI of the claims, Examiner construes such terms as preparatory or static in nature for the distribution of machine learning jobs. With regard to the Figure 3 and accompanying text of the specification, Examiner notes the specification describes features directed to “some level of dynamic reallocation” (See, e.g., PGPUB1 ¶¶ 0042-52). Although the claims are interpreted in light of the specification, limitations from the specification are not read into the claims. See In re Van Geuns, 988 F.2d 1181, 20 USPQ2d 1057 (Fed. Cir. 1993).
Examiner suggests clarifying the claim language in kind may aid in advancing the prosecution of the instant application; provided, however, that upon receipt of such a written response, Examiner would reconsider the cited references, and conduct a further search.
Conclusion 
14.	The prior art made of record and not relied upon is considered pertinent to Applicant’s disclosure:
(Abadi et al., “TensorFlow: A System for Large-Scale Machine Learning,” OSDI ’16 (2016)) teaches the feature of optimizing performance by manually placing operations to balance the computation, memory, and network requirements across multiple tasks and multiple devices within those tasks.
15.	Any inquiry concerning this communication or earlier communications from the Examiner should be directed to KEVIN L. SMITH whose telephone number is (571) 272-5964. Normally, the Examiner is available on Monday-Thursday 0730-1730. 
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the Examiner by telephone are unsuccessful, the Examiner’s supervisor, KAKALI CHAKI can be reached on 571-272-3719. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

Examiner, Art Unit 2122

/BABOUCARR FAAL/Primary Examiner, Art Unit 2184                                                                                                                                                                                                        


    
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
    

    
        1 US Published Application 20180314971 entitled “TRAINING MACHINE LEARNING MODELS ON A LARGE-SCALE DISTRIBUTED SYSTEM USING A JOB SERVER” to Chen et al., filed 26 April 2017 [hereinafter PGPUB].