Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 03/07/22 has been entered.

Status of Claims
This action is in response to the amendments filed 03/07/22. Claims 1, 9, 12, and 13 have been amended, claims 1-20 are currently pending.
	
Response to Arguments
Applicant’s arguments regarding the 112(b) rejection of claims 8 and 16 have been fully considered but they are not persuasive. Applicant's argues on page 12 that the claims as filed in the original specification are part of the disclosure. While this is true, the claim itself is not clear and the specification does not provide clarification or further explain the limitations of claims 8 and 16. Stating that a probability is “corresponding” to a given tier does not imply that the probability is used for selecting that tier. As stated in the previous office action, neither claim 8, claim 16, nor the specification disclose that pi is a selection probability, and the method of using a discrete probability distribution to perform random selection of a tier, as described on page 12 of Applicant’s remarks, does not appear in the disclosure. Paragraphs [0033] and [0040] merely state that tiers are randomly selected but do not disclose how the process of randomly selecting tiers occurs. 
Applicant argues on page 13 that one of ordinary skill in the art would readily understand that the formula in claims 8 and 16 uses a special variance reduction technique to predict missing replies. However, neither the claims nor the specification discuss variance reduction and the terms of the equation in claims 8 and 16 are not clearly defined. Claims 8 and 16 only recite applying a prediction step that aggregates responses from a selected tier with responses from non-selected tiers, but it is not clear how prediction occurs since the formula only appears to aggregate responses. Applicant argues on page 13 that “for a certain round, the aggregator first computes the average of the most recent replies for the current tier, denoted as "AVG(mostRecent_replies )". Then, once the aggregator receives the replies from the current round of the queried tier, it averages them "AVG(replies)". The results are combined recited in the formula in the claim”. Neither the claims nor the specification describe how to differentiate the “most recent” replies from the rest of the replies for a selected tier during a current round, so it is unclear at what point a reply stops being a “most recent” reply or if the “AVG(replies)” term also includes the most recent replies or not. 
For these reasons, the metes and bounds of claims 8 and 16 cannot be determined and no prior art can be applied.
Applicant’s amendments and arguments regarding the 112(b) rejection of claims 9-11 have been fully considered but they are not persuasive. Applicant's argues on page 14 that claim 9 relates to the number of run epochs that have completed so far, and that the claimed subject matter checks whether the number of epochs that ran already is less than the synchronization epochs or not. It is not clear how the system could synchronize before the system runs, therefore it is not clear how the number of synchronization epochs could be greater than the number of run epochs. Applicant gives an example wherein the total number of epochs is 50 and the synchronization epochs are 5, but this does not explain if the fifth epoch is the first synchronization epoch, if some 5 out of 50 epochs are synchronization epochs, or there if is a synchronization epoch after every 5 run epochs, etc. Applicant’s example states that while running epochs 1-4, one action is taken, then after (i.e. at epoch 5 which Examiner is interpreting is meant to be the synchronization epoch) another action is taken. However, in this scenario the number of run epochs would be at least 4 by the time the number of synchronization epochs was 1. Under the broadest reasonable interpretation the number of run epochs could be less than or equal to the number of synchronization epochs, i.e. the system could synchronize at every run. From applicant’s example, it would appear that the “number of run epochs” and the “number of synchronization epochs” are meant to be predetermined labels given to the epochs before training starts, and that those labels determine how the system should react (i.e. the person who designs the training decides that four rounds of training should occur before any synchronization occurs). The claim as written does not make it clear that this is the case, as one of ordinary skill would interpret that a number of epochs refers to a count (i.e. at epoch 4, the number of run epochs would be 4 and the number of synchronization epochs would be 0 because no synchronization has occurred yet, then at epoch 5 the number of run epochs would be 5 and the number of synchronization epochs would be 1, etc.). For purposes of prior art examination, Examiner is interpreting that the intention of claim 9 is to let a certain number of training runs occur before synchronizing in order to prevent pre-emptively identifying stragglers or dropouts.  
Dependent claims 10-11 are rejected under 35 U.S.C 112(b) because they fail to cure the deficiencies of independent claim 9 on which they depend.
Applicant' s amendments and arguments regarding the 101 rejection have been fully considered and are considered persuasive, therefore the 101 rejection of claim 9-11 no longer stands.
Applicant's amendments and arguments regarding the prior art rejection of claim 1 have been fully considered but they are not persuasive. Applicant argues that the references fail to show certain features of applicant’s invention; however, it is noted that the features upon which applicant relies (i.e., that a predicted reply relates to the content of a response and therefore is different from a measure of performance from page 24 of Applicant’s arguments) are not recited in the rejected claims or in the specification.  Although the claims are interpreted in light of the specification, limitations from the specification are not read into the claims, and in this case the specification does not disclose that a predicted reply relates to the content of a response.  See In re Van Geuns, 988 F.2d 1181, 26 USPQ2d 1057 (Fed. Cir. 1993).
Since neither the claims nor the specification provide explicit definitions for a “predicted reply”, explicitly define the difference between a reply and a response, or define a particular kind of query that would require a particular kind of reply for a straggler, the broadest reasonable interpretation of a “predicted reply” includes a method of using an approximate or best guess of how the federated learning participant would responded.
Regarding claim 9, Applicant argues on pages 26-27 that Prakash does not teach removing response times of drop outs, but merely teaches IoT devices entering or leaving a fog. This argument has been fully considered; Examiner notes that one of ordinary skill would understand that removing drop outs from the pool of federated learning participants would remove the corresponding response times, but Prakash does not specifically teach that devices leaving fog are considered to be dropouts. Applicant’s argument is considered persuasive; therefore, the rejection has been withdrawn.  However, upon further consideration, a new grounds of rejection is made in view of Ouyang, in further view of Smith, in further view of McColl, in further view of Martin. 
The prior rejections have been updated to include the amended limitations and to clarify the reasoning given for the limitations that were not amended.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 8, 9-11 and 16 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention. 

    PNG
    media_image1.png
    49
    562
    media_image1.png
    Greyscale
Claims 8 and 16 recite a limitation “applying a prediction step to aggregate responses from the federated learning participants of the selected tier that respond to the querying with information from the federated learning participants in non-selected tiers” and the following equation: 

This equation is not clearly defined by the claims or the specification – it is unclear what the “corresponding probability pi” corresponds to or how it is used in the prediction step. The claims state only that pi corresponds to a queried tier ti, but do not define what the probability is for. The formula in the claims appears to aggregate responses from a current epoch, but it unclear how this then becomes a prediction step. Additionally, neither the claims nor the specification describe how to differentiate the “most recent” replies from the rest of the replies for a selected tier during a current round, so it is unclear at what point a reply stops being a “most recent” reply. Therefore the metes and bounds of these claims cannot be determined and no prior art can be applied.
Claim 9 recites the limitation “determining that a number of run epochs is less than a number of synchronization epochs”. It is unclear how the number of run epochs could be less than the number of synchronization epochs given that the method appears to gather data (i.e. run) before it checks to see if all the participants have responded (i.e. synchronized). The claim as written does not make it clear what “a number of run epochs” or “a number of synchronization epoch” refers to, as one of ordinary skill would interpret that a number of epochs refers to a count (i.e. at epoch 4, the number of run epochs would be 4 and the number of synchronization epochs would be 0 because no synchronization has occurred yet, then at epoch 5 the number of run epochs would be 5 and the number of synchronization epochs would be 1, etc.), so there could not be a situation where the number of run epochs would be less than the number of synchronization epochs. For purposes of prior art examination, Examiner is interpreting that the intention of claim 9 is to let a certain number of training runs occur before synchronizing in order to prevent pre-emptively identifying stragglers or dropouts
Dependent claims 10-11 are rejected under 35 U.S.C 112(b) because they fail to cure the deficiencies of their independent claim.

 Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-7, 12-15, and 17-20 are rejected under 35 U.S.C. 103 as being unpatentable over Ouyang et al (“ML-NA: A Machine Learning based Node Performance Analyzer Utilizing Straggler Statistics”, herein Ouyang) in view of Smith et al* (“Federated Multi-Task Learning”, herein Smith).
*a copy of this document was provided with the non-final action dated 06/14/2021, therefore a copy has not been provided with this action.
Regarding claim 1, Ouyang teaches a computer-implemented method (Ouyang pg. 73 section 1 para. 7 recites “Proposed ML-NA, a Machine Learning based Node performance Analyzer. This multi-stage framework can classify machine nodes into different categories depending on their performance through clustering” (i.e. a method to assign machine learning participants into tiers based on response time)) of communicating [in a federated learning environment], the method comprising:
monitoring a plurality of federated learning participants for one or more factors associated with stragglers (Ouyang fig. 1 and pg. 74 Section II C 1 recites “From task durations for different nodes shown in Figure 1, it is observable that some machines have a shorter average task processing time than the others, while some are either with a much longer average duration indicating a slower execution, or with a larger variation in time of processing tasks, showing an unstable performance.” Ouyang pg. 75 Section II C 2 recites “collecting all Dij from tasks assigned in each node (i.e. monitoring the execution time of each participant) to reflect the quickness or slowness derived from different node performance rather than job heterogeneity. Statistics of Dij values per node are calculated as the basic metrics to measure the node performance” (i.e. identifying factors associated with stragglers));
assigning the federated learning participants into tiers based on the monitoring of the one or more factors, each of the tiers having a designated wait time (Ouyang pg. 77, Section III B 1 recites that “the first step to label the nodes is to put the nodes with similar performance into the same group. In this scenario, clustering is the most well-known technique that can be used, and k-means is one of the simplest whilst very effective clustering algorithms” (i.e. assigning participants into groups based on their response time));
querying the federated learning participants in a selected tier (Ouyang pg. 75 Section II C 2 recites “Dij reveals the relative speed of tij compared to other tasks within Jj . A positive Dij value represents a slower execution because the duration of tij is larger than the job average, and the increment of the positive Dij indicates an aggravated straggler behavior tij exhibits. Vice versa, a negative Dij indicates a shorter response, and the smaller the negative value, the quicker tij performs. We then collect all Dij from tasks assigned in each node to reflect the quickness or slowness derived from different node performance rather than job heterogeneity (i.e. querying a specific set of participants from a given tier). Statistics of Dij values per node are calculated as the basic metrics to measure the node performance.” Ouyang pg. 77 Section III B 2 recites “after putting the nodes with similar performance into k groups, we then need to determine which cluster represents the weakest performance group”);
designating the federated learning participants that respond after a predetermined time within the designated wait time as stragglers (Ouyang pg. 75 Section II C 2 recites that “a positive Dij value represents a slower execution because the duration of Dij is larger than the job average, and the increment of the positive Dij indicates an aggravated straggler behavior Dij. Vice versa, a negative Dij indicates a shorter response, and the smaller the negative value, the quicker Dij performs” (i.e. a node with a slower execution time is determined to be a straggler)).
However, Ouyang does not explicitly teach a federated learning environment, and updating a training of a federated learning model by applying a predicted reply to the query for each of the stragglers including collected participants' replies and computed predicted replies to the query associated with the stragglers.
Smith teaches a federated learning environment (section 3 para. 1 recites “we suggest a general MTL (i.e. multi-task learning) framework for the federated setting, and propose a novel method, MOCHA, to handle the systems challenges of federated MTL” (i.e. a federated learning environment));
 and updating a training of a federated learning model by applying a predicted reply to the query for each of the stragglers including collected participants' replies and computed predicted replies to the query associated with the stragglers (section 3.4 para. 1 recites “During MOCHA’s federated update of W (Examiner’s Note: section 3.3 teaches that W is “a matrix whose t-th column is the weight vector for the t-th task”), the central node requires a response from all workers before performing a synchronous update. In the federated setting, a naive execution of this communication protocol could introduce dramatic straggler effects due to node heterogeneity. To avoid stragglers, MOCHA provides the t-th node with the flexibility to approximately solve its subproblem Gσt(.), where the quality of the approximation is controlled by a per-node parameter ϴht (i.e. applying a predicted reply for each straggler). Para. 2 recites “ϴht ranges from zero to one, where ϴht = 0 indicates an exact solution to Gσt(.) and ϴht =1 indicates that node t made no progress during iteration h (which we refer to as a dropped node) (Examiner’s Note: a value between 0 and 1 would indicate a straggler node, therefore approximately solving a subproblem would be considered as updating a training by applying a predicted reply for each straggler; additionally, lines 7-12 of algorithm 1 show that each straggler is approximately solved and how the training is updated with these predicted replies)).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine these teachings by utilizing the methods from Ouyang in the federated learning environment from Smith. Ouyang and Smith are both directed to methods of straggler mitigation in distributed learning environments, but while Ouyang recites a heterogeneous machine learning model, it does not specifically recite a federated learning environment. It would be obvious to use the federated learning environment and the method of approximately solving for stragglers from Smith so as to save time and bandwidth of the overall model, as well as increase user privacy by keeping the training data on the local device and only passing the results of the machine learning training between the local and global models.

Regarding claim 2, the combination of Ouyang and Smith teaches the computer-implemented method according to claim 1, further comprising: 
updating the training of the federated learning model with collected participants' replies and computed predictions in response to identifying whether a quorum of federated learning participants has responded to the querying (Smith section 3.4 para. 1 recites “During MOCHA’s federated update of W, the central node requires a response from all workers before performing a synchronous update. In the federated setting, a naive execution of this communication protocol could introduce dramatic straggler effects due to node heterogeneity (i.e. identifying a quorum of federated learning participant responses. Lines 7-13 of Algorithm 1 show how the method collects responses from the federated learning participants and uses them to update the training model)); 
and identifying the federated learning participants that do not respond within the designated wait time as drop outs (Smith section 3.4 para. 2 recites “We define ϴht as a function of these factors, and assume that each node has a controller that may derive ϴht from the current clock cycle and statistical/systems setting. ϴht ranges from zero to one, where ϴht = 0 indicates an exact solution to Gσt(.) and ϴht = 1  indicates that node t made no progress during iteration h (which we refer to as a dropped node)” (i.e. a participant that does not respond is designated as a drop out)).
Regarding claim 3, the combination of Ouyang and Smith teaches the method according to claim 2, wherein for each round of updating the training of the federated learning model, updating the designated wait time per tier, and the method further comprising:
determining an accuracy of the training of the federated learning model according to one or more predetermined criteria (Ouyang pg. 78 Section IV B para. 3 recites that “figure 6(a) concludes the minimal, average and maximum accuracies when predicting each month’s node performance categories utilizing different training sizes with the optimal parameter settings” (i.e. determining an accuracy of the training model)), 
and terminating an asynchronized training stage of the federated learning model when the accuracy does not increase after a predetermined number of asynchronization time periods (Smith section fig. 3 and 5.4 para. 1 recite “Finally, we explore the effect of nodes dropping on the performance of MOCHA. We do not draw comparisons to other methods, as to the best of our knowledge, no other methods for distributed multi-task learning directly address fault tolerance. In MOCHA, we incorporate this setting by allowing ϴht := 1, as explored theoretically in Section 4. In Figure 3, we look at the performance of MOCHA, either for one fixed W update, or running the entire MOCHA method, as the probability that nodes drop at each iteration (pht in Assumption 2) increases. We see that the performance of MOCHA is robust to relatively high values of pht , both during a single update of W and in how this affects the performance of the overall method. However, as intuition would suggest, if one of the nodes never sends updates (i.e., ph1 := 1 for all h, green dotted line), the method does not converge to the correct solution (Examiner’s note: convergence is well known in the art to represent the point at which the model is closest to a desired value (i.e. the accuracy will not increase further)).
Regarding claim 4, the combination of Ouyang and Smith teaches the method according to claim 1, wherein the selected tier for querying is selected by a randomizing procedure (Smith section 5.2 para 2 recites “We use a cluster-regularized multi-task model, as described in Section 3.1. For each dataset from Section 5.1, we randomly split the data into 75% training and 25% testing, and learn multi-task, local, and global support vector machine models, selecting the best regularization parameter, λ ϵ {1e-5, 1e-4, 1e-3, 1e-2, 0.1, 1, 10}, for each model using 5-fold cross-validation (i.e. nodes are selected randomly for straggler analysis)).
Regarding claim 5, the combination of Ouyang and Smith teaches the method according to claim 1, further comprising: periodically updating the training of the federated learning model with the collected participants' replies and computed predictions of the stragglers (lines 7-13 from Algorithm 1 in Smith show solving each subproblem Δαt and updating a training model with collected participants’ replies W (section 3.4 para. 2 teaches when ϴht = 0, an exact solution is computed, i.e. a collected reply) and computed predictions from stragglers (section 3.4 para. 2 teaches when ϴht = 1 a node could not compute a solution, therefore 0 < ϴht < 1 would indicate an approximately solved solution or a predicted reply for a straggler)).
Regarding claim 6, the combination of Ouyang and Smith teaches the method according to claim 1, further comprising:
updating the monitoring of the federated learning participants (Ouyang pg. 75 Section II C 2 and Figure 3 show how the monitoring of participants is updated over time); 
and determining whether to reassign the federated learning participants into different tiers, based on the updated monitoring for each synchronization time period of a plurality of synchronization time periods (Ouyang pg. 74 Section 1 para. 4 recites “through classifying nodes into different categories and predicting the corresponding performance category with high accuracy, the scheduler can select suitable nodes to launch latency-sensitive tasks, avoid assigning speculative tasks onto nodes that are likely to be in their weak performance state in the near future” (i.e. reassigning participants based on monitoring data)).
Regarding claim 7, the combination of Ouyang and Smith teaches the method according to claim 1, further comprising: dynamically rearranging the tiers based on updated monitoring of the federated learning participants (Ouyang pg. 73, the abstract recites “that by leveraging historical parallel tasks execution log data, ML-NA classifies cluster nodes into different categories and predicts their performance in the near future as a scheduling guide” (i.e. dynamically rearranges participants into different categories based on monitoring data) to improve speculation effectiveness and minimize task straggler generation).
Claim 12 is a computer readable storage medium claim and its limitation is included in claim 1. The only difference is that claim 12 requires a computer readable storage medium (Smith section 3.4 para. 1 recites “The following factors determine the quality of the t-th node’s solution to its subproblem: 1. Statistical challenges, such as the size of Xt and the intrinsic difficulty of subproblem Gσt(.). 2. Systems challenges, such as the node’s storage, computational, and communication capacities due to hardware (CPU, memory), network connection (3G, 4G, WiFi), and power (battery level) (i.e. the methods from Smith are implemented using a computer system that requires a storage medium or memory). 3. A global clock cycle imposed by the central node specifying a deadline for receiving updates”). Therefore, claim 12 is rejected for the same reasons as claim 1.
Claim 13 is a computer readable storage medium claim and its limitation is included in claim 2. Claim 13 is rejected for the same reasons as claim 2.
Regarding claim 14, the combination of Ouyang and Smith teaches the computer readable storage medium according to claim 13, wherein the monitoring of the plurality of federated learning participants further comprises capturing behavior patterns of the federated learning participants (Ouyang pg. 73, the abstract recites that “by leveraging historical parallel tasks execution log data, ML-NA classifies cluster nodes into different categories and predicts their performance in the near future as a scheduling guide to improve speculation effectiveness and minimize task straggler generation” (i.e. monitoring behavior patterns of participants)).
Regarding claim 15, the combination of Ouyang and Smith teaches the computer readable storage medium according to claim 14, further comprising identifying at least one of the drop outs or predicting at least one of the stragglers based on the captured behavior patterns of the federated learning participants (Ouyang pg. 73, the abstract recites that “by leveraging historical parallel tasks execution log data, ML-NA classifies cluster nodes into different categories and predicts their performance in the near future as a scheduling guide to improve speculation effectiveness and minimize task straggler generation” (i.e. predicting a straggler based on the known behavior of participants)).
Claim 17 is a computer readable storage medium claim and its limitation is included in claim 7. Claim 17 is rejected for the same reasons as claim 7.
Claim 18 is a computer readable storage medium claim and its limitation is included in claim 5. Claim 18 is rejected for the same reasons as claim 5.
Claim 19 is a computer readable storage medium claim and its limitation is included in claim 4. Claim 19 is rejected for the same reasons as claim 4.
Claim 20 is a computer readable storage medium claim and its limitation is included in claim 3. Claim 20 is rejected for the same reasons as claim 3.	

Claims 9-11 are rejected under 35 U.S.C. 103 as being unpatentable over Ouyang et al (“ML-NA: A Machine Learning based Node Performance Analyzer Utilizing Straggler Statistics”, herein Ouyang) in view of Smith et al (“Federated Multi-Task Learning”, herein Smith), in further view of McColl (WO 2019086120 A1, herein McColl) and Martin et al (US 9946465 B1, herein Martin).
Regarding claim 9, Ouyang teaches a computer-implemented method (Ouyang pg. 73 section 1 para. 7 recites “Proposed ML-NA, a Machine Learning based Node performance Analyzer. This multi-stage framework can classify machine nodes into different categories depending on their performance through clustering” (i.e. a method to assign machine learning participants into tiers based on response time)) of [training a federated learning model], the method comprising:
assigning, by the computing device (Ouyang pg. 73, the abstract recites “Current Cloud clusters often consist of heterogeneous machine nodes, which can trigger performance challenges such as the task straggler problem, whereby a small subset of parallel tasks running abnormally slower than the other sibling ones” and later the abstract recites “In this paper we develop ML-NA, a Machine Learning based Node performance Analyzer. By leveraging historical parallel tasks execution log data, ML-NA classifies cluster nodes into different categories and predicts their performance in the near future as a scheduling guide to improve speculation effectiveness and minimize task straggler generation. We consider MapReduce as a representative framework to perform our analysis, and use the published OpenCloud trace as a case study to train and to evaluate our model” (i.e. the methods of Ouyang are performed by a computing device)), an average reply time to each tier of a plurality of tiers having a predetermined number of federated learning participants per tier (Ouyang pg. 76 Section III A 2 recites that “the three basic meta-features selected to build up the node performance analysis model are the average and the standard deviation of all Dij from tasks per node (i.e. the average reply time is calculated for each cluster or tier), as well as the normalized task number).
However, Ouyang does not explicitly teach a federated learning environment.
Smith teaches a federated learning environment (section 3 para. 1 recites “we suggest a general MTL (i.e. multi-task learning) framework for the federated setting, and propose a novel method, MOCHA, to handle the systems challenges of federated MTL” (i.e. a federated learning environment)).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine these teachings by utilizing the methods from Ouyang in the federated learning environment from Smith. Ouyang and Smith are both directed to methods of straggler mitigation in distributed learning environments, but while Ouyang recites a heterogeneous machine learning model, it does not specifically recite a federated learning environment. It would be obvious to use the federated learning environment and the method of approximately solving for stragglers from Smith so as to save time and bandwidth of the overall model, as well as increase user privacy by keeping the training data on the local device and only passing the results of the machine learning training between the local and global models.
The combination of Ouyang and Smith does not explicitly teach initializing a plurality of federated learning participants in training of the federated learning model; (a) in response to determining that a number of run epochs is less than a number of synchronization epochs (nsyn) training the federated learning model by: receiving responses from at least some of the plurality of federated learning participants; and updating a response time (RTi) until a maximum time (Tmax) elapses; (b) in response to determining that a number of run epochs is greater than a number of synchronization epochs training the federated learning model by: identifying a federated learning participant from the plurality of federated learning participants as a drop out for which RTi= nsyn * Tmax.
McColl teaches initializing, by a computing device (Ouyang pg. 73, the abstract recites “Current Cloud clusters often consist of heterogeneous machine nodes, which can trigger performance challenges such as the task straggler problem, whereby a small subset of parallel tasks running abnormally slower than the other sibling ones” and later the abstract recites “In this paper we develop ML-NA, a Machine Learning based Node performance Analyzer. By leveraging historical parallel tasks execution log data, ML-NA classifies cluster nodes into different categories and predicts their performance in the near future as a scheduling guide to improve speculation effectiveness and minimize task straggler generation. We consider MapReduce as a representative framework to perform our analysis, and use the published OpenCloud trace as a case study to train and to evaluate our model” (i.e. the methods of Ouyang are performed by a computing device)), a plurality of federated learning participants in training of a federated learning model (McColl fig. 6 and pg. 15, lines 25-27 recite “in a first step 601, a first process/sub-task is executed by a current computing node 201 and communications are initiated” (i.e. initializing a learning participant; one of ordinary skill would understand that these learning participants could be initialized in the federated learning system of Ouyang as modified by Smith))
(a) in response to determining that a number of run epochs is less than a number of synchronization epochs (McColl fig. 6 and pg. 15 lines 28-30 recite “in a further step 603, it is checked whether the minimum duration, i.e. MinTime has been set and/or whether the synchronization parameter has been set to "False" for the current computing node 201” (i.e. the nodes have not all synchronized yet, but they are not determined to be stragglers yet either)) training the federated learning model (Smith section 3 para. 1 recites “In federated learning, the aim is to learn a model over data that resides on, and has been generated by, m distributed nodes. As a running example, consider learning the activities of mobile phone users in a cell network based on their individual sensor, text, or image data. Each node (phone) may generate data via a distinct distribution, and so it is natural to fit separate models to the distributed data—one for each local dataset. However, structure between models frequently exists (e.g., people may behave similarly when using their phones), and modeling these relationships via multi-task learning is a natural strategy to improve performance and boost the effective sample size for each node. In this section, we suggest a general MTL framework for the federated setting, and propose a novel method, MOCHA, to handle the systems challenges of federated MTL” (i.e. training a federated learning model)) by: 
receiving, by the computing device (Ouyang pg. 73, the abstract recites “Current Cloud clusters often consist of heterogeneous machine nodes, which can trigger performance challenges such as the task straggler problem, whereby a small subset of parallel tasks running abnormally slower than the other sibling ones” and later the abstract recites “In this paper we develop ML-NA, a Machine Learning based Node performance Analyzer. By leveraging historical parallel tasks execution log data, ML-NA classifies cluster nodes into different categories and predicts their performance in the near future as a scheduling guide to improve speculation effectiveness and minimize task straggler generation. We consider MapReduce as a representative framework to perform our analysis, and use the published OpenCloud trace as a case study to train and to evaluate our model” (i.e. the methods of Ouyang are performed by a computing device)), responses from at least some of the plurality of federated learning participants (McColl fig. 6 and pg. 15 lines 30-31 recite “in a further step 603, it is checked whether the minimum duration, i.e. MinTime has been set and/or whether the synchronization parameter has been set to "False" for the current computing node 201. If this is not the case, the current computing node 201 tries to set the minimum duration, i.e. MinTime in a further step 605” (i.e. responses are being received because the nodes have not been instructed to synchronize yet); 
and updating, by the computing device (Ouyang pg. 73, the abstract recites “Current Cloud clusters often consist of heterogeneous machine nodes, which can trigger performance challenges such as the task straggler problem, whereby a small subset of parallel tasks running abnormally slower than the other sibling ones” and later the abstract recites “In this paper we develop ML-NA, a Machine Learning based Node performance Analyzer. By leveraging historical parallel tasks execution log data, ML-NA classifies cluster nodes into different categories and predicts their performance in the near future as a scheduling guide to improve speculation effectiveness and minimize task straggler generation. We consider MapReduce as a representative framework to perform our analysis, and use the published OpenCloud trace as a case study to train and to evaluate our model” (i.e. the methods of Ouyang are performed by a computing device)), a response time (RTi) until a maximum time (Tmax) elapses (McColl fig. 6 and pg. 15 lines 31-34 recite “in a further step 607, the current computing node 201 notifies other clones, i.e. computing nodes 201 executing the same process/sub-task about the completion of the process by the current computing node 201” (i.e. the nodes can update their response times because the elapsed time is not larger than the maximum allowed time yet));
(b) in response to determining that a number of run epochs is greater than a number of synchronization epochs (McColl fig. 8 and pg. 16, lines 10-12 recite “in a first step 801, it is check whether the computing round is not complete yet and whether the elapsed time is larger than the minimum duration or a multiple thereof, e.g. T*MinTime” (i.e. the number of run epochs is greater than the number of synchronization epochs)) training the federated learning model (Smith section 3 para. 1 recites “In federated learning, the aim is to learn a model over data that resides on, and has been generated by, m distributed nodes. As a running example, consider learning the activities of mobile phone users in a cell network based on their individual sensor, text, or image data. Each node (phone) may generate data via a distinct distribution, and so it is natural to fit separate models to the distributed data—one for each local dataset. However, structure between models frequently exists (e.g., people may behave similarly when using their phones), and modeling these relationships via multi-task learning is a natural strategy to improve performance and boost the effective sample size for each node. In this section, we suggest a general MTL framework for the federated setting, and propose a novel method, MOCHA, to handle the systems challenges of federated MTL” (i.e. training a federated learning model)) by:
identifying, by the computing device (Ouyang pg. 73, the abstract recites “Current Cloud clusters often consist of heterogeneous machine nodes, which can trigger performance challenges such as the task straggler problem, whereby a small subset of parallel tasks running abnormally slower than the other sibling ones” and later the abstract recites “In this paper we develop ML-NA, a Machine Learning based Node performance Analyzer. By leveraging historical parallel tasks execution log data, ML-NA classifies cluster nodes into different categories and predicts their performance in the near future as a scheduling guide to improve speculation effectiveness and minimize task straggler generation. We consider MapReduce as a representative framework to perform our analysis, and use the published OpenCloud trace as a case study to train and to evaluate our model” (i.e. the methods of Ouyang are performed by a computing device)), a federated learning participant from the plurality of federated learning participants as a drop out for which RTi= nsyn * Tmax (McColl fig. 8 and pg. 16, lines 9-14 recite “if this is the case, the respective computing node 201 is a high-latency computing node 201 and will send Tail Limit interrupts in a step 803 to the other computing nodes 201” (i.e. the participant that has not responded is designated as a drop out));
and removing, by the computing device (Ouyang pg. 73, the abstract recites “Current Cloud clusters often consist of heterogeneous machine nodes, which can trigger performance challenges such as the task straggler problem, whereby a small subset of parallel tasks running abnormally slower than the other sibling ones” and later the abstract recites “In this paper we develop ML-NA, a Machine Learning based Node performance Analyzer. By leveraging historical parallel tasks execution log data, ML-NA classifies cluster nodes into different categories and predicts their performance in the near future as a scheduling guide to improve speculation effectiveness and minimize task straggler generation. We consider MapReduce as a representative framework to perform our analysis, and use the published OpenCloud trace as a case study to train and to evaluate our model” (i.e. the methods of Ouyang are performed by a computing device)), response times of the drop outs (Smith section 3.4 para. 2 recites “We define ϴht as a function of these factors, and assume that each node has a controller that may derive ϴht from the current clock cycle and statistical/systems setting. ϴht ranges from zero to one, where ϴht = 0 indicates an exact solution to Gσt(.) and ϴht = 1  indicates that node t made no progress during iteration h (which we refer to as a dropped node). For instance, a node may ‘drop’ if it runs out of battery, or if its network bandwidth deteriorates during iteration and it is thus unable to return its update within the current clock cycle”. Para. 3 recites “MOCHA mitigates stragglers by enabling the t-th node to define its own ϴht. On every iteration h, the local updates that a node performs and sends in a clock cycle will yield a specific value for ϴht. As discussed in Section 4, MOCHA is additionally robust to a small fraction of nodes periodically dropping and performing no local updates (i.e., ϴht := 1) under suitable conditions” (i.e. removing drop outs from the pool of federated learning participants; one of ordinary skill would understand that once dropouts have been identified in the previous limitation, removing them would remove the corresponding response times as well)).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine these teachings by utilizing the fault tolerance method from McColl with the straggler avoidance method from Ouyang in the federated learning environment from Smith. All three of these methods are designed to improve a distributed machine learning system, but neither Smith nor Ouyang explicitly recite a method to identify and remove drop outs based on the number of run epochs that have occurred. One of ordinary skill would benefit from including this capability from McColl to more accurately track the progress of the federated learning environment and prevent system failures from delaying or stopping the progress of the other distributed machines.
The combination of Ouyang, Smith, and McColl does not explicitly teach creating a histogram of remaining response times.
Martin teaches creating, by the computing device (Ouyang pg. 73, the abstract recites “Current Cloud clusters often consist of heterogeneous machine nodes, which can trigger performance challenges such as the task straggler problem, whereby a small subset of parallel tasks running abnormally slower than the other sibling ones” and later the abstract recites “In this paper we develop ML-NA, a Machine Learning based Node performance Analyzer. By leveraging historical parallel tasks execution log data, ML-NA classifies cluster nodes into different categories and predicts their performance in the near future as a scheduling guide to improve speculation effectiveness and minimize task straggler generation. We consider MapReduce as a representative framework to perform our analysis, and use the published OpenCloud trace as a case study to train and to evaluate our model” (i.e. the methods of Ouyang are performed by a computing device)), a histogram of remaining response times (Martin col. 2, lines 54-64 recite “determining one of a plurality of l/O workload classifications for each of the plurality of data sets in accordance with said set of values of said each data set; and for each of the plurality of I/O workload classifications including more than one of the plurality of data sets, combining said more than one of the plurality of data sets into a first aggregate data set including an aggregate set of values in accordance with said set of values of each of said more than one data set and including an aggregate response time histogram in accordance with said response time histogram of each of said more than one data set” (Examiner’s Note: a histogram can be created for the remaining response times from the previous limitation)).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine these teachings by creating histograms for the response times in each tier using the methods from Martin along with the fault tolerance method from McColl and the straggler avoidance method from Ouyang in the federated learning environment from Smith. Having used the model progress tracking methods from McColl to help identify and remove the dropped nodes from Smith, it would be obvious to use the methods from Martin to create a histogram of remaining response times to more accurately reflect the state of the federated learning system.  Martin, Ouyang, McColl, and Smith are all directed to tracking response times for distributed machine learning systems, so one of ordinary skill would benefit from including the ability to create histograms to allow one of ordinary skill to visualize the response times and determine whether a parameter change in the global model has increased the number of stragglers and/or dropouts, for example.

Regarding claim 10, the combination of Ouyang, Smith, McColl, and Martin teaches the method according to claim 9, wherein when the number of run epochs is greater than the number of synchronization epochs, the method further comprising:
creating, by the computing device (Ouyang pg. 73, the abstract recites “Current Cloud clusters often consist of heterogeneous machine nodes, which can trigger performance challenges such as the task straggler problem, whereby a small subset of parallel tasks running abnormally slower than the other sibling ones” and later the abstract recites “In this paper we develop ML-NA, a Machine Learning based Node performance Analyzer. By leveraging historical parallel tasks execution log data, ML-NA classifies cluster nodes into different categories and predicts their performance in the near future as a scheduling guide to improve speculation effectiveness and minimize task straggler generation. We consider MapReduce as a representative framework to perform our analysis, and use the published OpenCloud trace as a case study to train and to evaluate our model” (i.e. the methods of Ouyang are performed by a computing device)), a histogram of remaining response times (Martin col. 2, lines 54-64 recite “determining one of a plurality of l/O workload classifications for each of the plurality of data sets in accordance with said set of values of said each data set; and for each of the plurality of I/O workload classifications including more than one of the plurality of data sets, combining said more than one of the plurality of data sets into a first aggregate data set including an aggregate set of values in accordance with said set of values of each of said more than one data set and including an aggregate response time histogram in accordance with said response time histogram of each of said more than one data set” (i.e. a histogram of response times)); 
and dividing the histogram into the plurality of tiers including the plurality of federated learning participants (Martin col. 2, lines 46-54 recite that “the method may include collecting a plurality of data sets for a plurality of time periods, wherein each of the plurality of data sets is collected during one of the plurality of time periods and said each data set includes a set of values for a plurality of parameters characterizing I/O workload for said one time period and a response time histogram characterizing response time for said one time period” (i.e. the aggregated histogram is created from histograms that correspond to the plurality of tiers)).
Regarding claim 11, the combination of Ouyang, Smith, McColl, and Martin teaches the method according to claim 9, further comprising:
updating, by the computing device (Ouyang pg. 73, the abstract recites “Current Cloud clusters often consist of heterogeneous machine nodes, which can trigger performance challenges such as the task straggler problem, whereby a small subset of parallel tasks running abnormally slower than the other sibling ones” and later the abstract recites “In this paper we develop ML-NA, a Machine Learning based Node performance Analyzer. By leveraging historical parallel tasks execution log data, ML-NA classifies cluster nodes into different categories and predicts their performance in the near future as a scheduling guide to improve speculation effectiveness and minimize task straggler generation. We consider MapReduce as a representative framework to perform our analysis, and use the published OpenCloud trace as a case study to train and to evaluate our model” (i.e. the methods of Ouyang are performed by a computing device)), a response time to Tmax for the federated learning participants from which responses were not received by an aggregator when the number of run epochs is less than a number of synchronization epochs (McColl pg. 14, lines 26-28 recite that “each sub-task can have access not only to its own local data and state, but also to other information including: a copy of the minimum duration, i.e. MinTime for the computation round; its elapsed time for the round” (i.e. each participant has its own response time). Pg. 14, lines 34-36 recite that “the computation performed by the distributed computing system 200 has a TailLimit T (i.e. Tmax). In an embodiment, any sub-task that fails to complete before T*MinTime is marked as a fault/tail, others can be marked as live” (i.e. the participants that were did not respond get assigned the maximum response time)).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
US 9886325 B2 (Malewicz et al) teaches straggler identification and mitigation for a large-scale data processing model.
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to LEAH M FEITL whose telephone number is (571)272-8350. The examiner can normally be reached on M-F 0800-1700.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Li B. Zhen can be reached on (571) 272-3768. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll- free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
	/L.M.F./             Examiner, Art Unit 2121                                                 


	/Li B. Zhen/             Supervisory Patent Examiner, Art Unit 2121