DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
               This action is in response to the application filed August 9, 2018. Claims 1-20 are pending and have been examined.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1-6 and 14-19 are rejected under 35 U.S.C. 103 as being unpatentable over Biswas (US 20190102695 A1) in view of Qiu ("CrowdSelect: Increasing Accuracy of Crowdsourcing Tasks through Behavior Prediction and User Selection") in further view of Kopp (US 20190311298 A1).

In regard to claims 1 and 14, Biswas teaches: A computer-implemented method for distributed machine learning comprising: (Biswas, [0042] "2. System Overview... In the example of FIG. 1, worker servers 110, a client computing device 120, and a master machine learning server computer 130 [distributed system with machine learning server] are communicatively coupled to a data communications network 100.")
generating a specification of a machine learning model at a model requester node that is an (Biswas, Fig. 2 and Fig. 3, [0058] "At step 202, a first server computer stores one or more machine learning training datasets, each of the datasets comprising input data and verified output data. The training datasets are sets of input and output data that are used to train a machine learning system [a specification of a machine learning model]."; [0080] "At step 310, the client computing device [a model requester node] sends an input dataset to master machine learning server computer 130."; [0045] "Worker servers 110 may be physical server computers and/or virtual server instances stored in a data center, such as through cloud computing [cloud computing environment]."; The client computing device generates the machine learning training datasets, i.e. a requestor generates a specification of a ML model, and provides the data to the master server.)
distributing the specification from the model requester node to a plurality of other (Biswas, [0081] "At steps 314, 316, and 318, master machine learning server computer 130 sends the configuration file and the training dataset to first worker server computer 302, second worker server computer 304, and third server computer 130 306 [distributing data to other nodes] respectively with first parameters, second parameters, and third parameters respectively.")
receiving replies to the specification from the plurality of other (Biswas, [0084] "At step 322, each of first worker server computer 302, second worker server computer 304, and third worker server computer 306 send output datasets to master machine learning server computer 130."; [0090] "At step 326, the master machine learning server computer sends the most accurate output dataset to client computing device 120."; client device receiving replies from other nodes through the master server.)

Biswas does not teach, but Qiu teaches: in response to the replies, the model requester node identifying a set of participating (Qiu, p. 542 "... a set of workers, denoted by U will be asked to complete tasks. The task requester needs to choose some of the workers from U [identifying a set of participating nodes] and coordinate the answers from these workers using some criteria (or fusion method), such as the common majority voting, to estimate the correct answer of the task."; "Question: How to find a set of workers St to maximize the accuracy of majority voting and the total cost of St does not exceed the budget B"; answers from workers are replies. If the answer is correct is the ability of the worker/node.)
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Biswas to incorporate the teachings of Qiu by including the CrowdSelect method. Doing so would provide a theoretically proven algorithm to assign workers to tasks in a cost efficient manner, while ensuring high accuracy of the overall task. (Qiu, Abstract "Our proposed approach, CrowdSelect, offers a theoretically proven algorithm to assign workers to tasks in a cost efficient manner, while ensuring high accuracy of the overall task.")

Biswas and Qiu do not teach, but Kopp teaches: … edge node(s)… ([0036] "The system includes devices 122 (also referred to as edge devices or worker devices 122). ")
training the machine learning model, without exchanging training data among the model requester node and the participating edge nodes, by repeatedly: (Kopp, [0018] "Embodiments described herein provide systems and methods for training a machine learned model on a large number of devices. Each device acquires a local set of training data without sharing data sets across devices. The devices train the model on the respective device's set of training data."; [0072] "The process repeats for a number of iteration until the parameters converge or a predetermined number of iteration is reached. This process may be repeated hundreds or thousands of times.")
distributing most recent parameters of the machine learning model to the participating edge nodes; (Kopp, [0041] "The parameter servers 125 are configured to receive locally trained model parameters [most recent parameters] from a device 122, adjust centrally stored model parameters, and transmit the adjusted centrally model parameters back to the device. The parameter server 125 communicates with each device 122 of the plurality of devices 122 that are assigned to the parameter server 125."; [0043] "The parameter server 125 stores a central parameter vector that the parameter server 125 updates each time a device (worker unit) sends a parameter vector to the parameter server 125."; [0067] "at act A120, the worker device 122 transmits a second parameter from the trained model to the parameter server 125."; Parameter server can communicate with each of the devices (i.e. participating nodes) to distribute central parameters adjusted with most recent parameters/locally trained/2nd parameter (received from any worker) to the participating nodes.)
receiving updates to the most recent parameters from the participating edge nodes; and (Kopp, [0041] "The parameter servers 125 are configured to receive locally trained model parameters from a device 122, adjust centrally stored model parameters, and transmit the adjusted centrally model parameters back to the device [receiving updates]."; [0068] "At act A130, the worker device 122 receives a third parameter from the parameter server 125 [pull from parameter server]; the device/worker receiving updates/3rd parameter/central parameter updated from the participating nodes. i.e. replacing 2nd parameter with 3rd parameter)
establishing new parameters for the machine learning model by aggregating the updates from the participating edge nodes. (Kopp, [0026] "During the processes, the parameter server is constantly updating the central set of parameters and transmitting the updated set to the worker that transmitted the local parameters. As workers collect new data, the local models may be trained on the new data or a combination of the new and old data. Over time, the transmitted parameters back and forth between the workers and the parameter server eventually settles on a final set of parameters. [e.g. establishing new parameters]"; [0041] "The parameter servers 125 may be configured to aggregate parameters from one or more models that are trained on the devices 122."; [0055] "The parameter server 125 aggregates the parameter vectors from each of the three devices [aggregating updates from nodes] and generates a central parameter vector [e.g. establishing new parameters].")
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Biswas and Qiu to incorporate the teachings of Kopp by including distributed processing with edge devices. Doing so would maintain privacy and transmission concerns. (Kopp, [0034] "Embodiments provide for distributed processing of data while maintaining privacy and transmission concerns. In an embodiment, all the data remains on the edge devices to satisfy privacy concerns.")

Claim 14 recites substantially the same limitation as claim 1, therefore the rejection applied to claim 1 also apply to claim 14. In addition, Biswas teaches: A non-transitory computer readable medium embodying computer executable instructions that when executed by a computer cause the computer to facilitate a method of: (Biswas, [0173] "For example, FIG. 12 is a block diagram that illustrates a computer system 1200 upon which embodiments may be implemented... Computer system 1200 also includes a main memory 1206, such as a random-access memory (RAM) or other dynamic storage device, coupled to bus 1202 for storing information and instructions to be executed by processor 1204. ")

In regard to claims 2 and 15, Biswas, Qiu and Kopp teach: The method of claim 1 wherein the specification includes at least one of input data format and number of output classes. (Biswas, [0058] "For example, a training dataset for classification problems may include a column of descriptions and a column of classifications. A dataset may include multiple inputs and/or multiple outputs. For example, a single set of input data may have a classification output and a normalization output. As another example, multiple columns of inputs [e.g. input data format] may be used for a single classification output [# of output classes]."; [0060] "In an embodiment, the first server computer stores a plurality of training datasets... For example, the first server computer may store a first training dataset for classification of text data and a second training dataset for classification of image data."; training data may include text data and image data, which are also examples of input data format.)

In regard to claims 3 and 16, Biswas, Qiu and Kopp teach: The method of claim 1 wherein the specification, the replies, and the parameters of the machine learning model are communicated via a broker node distinct from the model requester node and the other edge nodes. (Biswas, Fig. 1 and 3, [0042] "2. System Overview... In the example of FIG. 1, worker servers 110 [other nodes], a client computing device [a requestor] 120, and a master machine learning server computer 130 [a broker node] are communicatively coupled to a data communications network 100."; [0091] "The master machine learning server computer may store the parameter values, configuration files, and training datasets [specification] locally and/or on a separate server computer, such as a cloud server."; [0048] [0052]; [0084] receiving output datasets / replies from worker servers)

In regard to claims 4 and 17, Biswas, Qiu and Kopp teach: The method of claim 3 wherein the broker node is at a server of the cloud computing network. (Biswas, [0091] "The master machine learning server computer may store the parameter values, configuration files, and training datasets locally and/or on a separate server computer, such as a cloud server."; a master server can be a server of the cloud computing network.)

In regard to claims 5 and 18, Biswas, Qiu and Kopp teach: The method of claim 1 wherein training the machine learning model further includes generating results at the model requester node by updating the most recent parameters based on training data available at the model requester node, and (Kopp, [0071] "At act A140, the worker device 122 retrains the model using the local training data [locally available data] and the third parameter. The worker device 122 may use the same local training data or may update the training data with newly collected sensor data... Additional data may be added to the training data set as the data is collected."; [0072] "If new data is added to the training data, the device may retrain the model and request a new central parameter..."; a worker is retrained using local training data, i.e. generating results (by updating the most recent parameters / by replacing 2nd parameter with 3rd parameter) based on training data available at the requestor node.)
aggregating the updates includes combining the results from the model requester node with the updates from the participating edge nodes. (Kopp,  [0041] "The parameter servers 125 may be configured to aggregate parameters from one or more models that are trained on the devices 122."; [0055] "The parameter server 125 aggregates the parameter vectors from each of the three devices and generates a central parameter vector."; [0068] "In an embodiment, the parameter server 125 stores a central parameter vector that the parameter server 125 updates each time a worker unit sends it a local parameter or local parameter vector."; one of the worker is the model requester, and others are participating nodes.)
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Biswas and Qiu to incorporate the teachings of Kopp by including distributed processing with edge devices. Doing so would maintain privacy and transmission concerns. 

In regard to claims 6 and 19, Biswas, Qiu and Kopp teach: The method of claim 1 wherein each of the participating edge nodes updates the model parameters based only on training data available at that participating edge node. (Kopp, [0018] "Each device acquires a local set of training data without sharing data sets across devices. The devices train the model on the respective device's set of training data.")
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Biswas and Qiu to incorporate the teachings of Kopp by including distributed processing with edge devices. Doing so would maintain privacy and transmission concerns.

Claims 7-12 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Biswas in view of Qiu in view of Kopp in further view of Boutsis ("On task assignment for real-time reliable crowdsourcing").

In regard to claims 7 and 20, Biswas and Qiu teach: The method of claim 1 wherein identifying a set of participating edge nodes comprises: (Qiu, p. 542 "... a set of workers, denoted by U will be asked to complete tasks. The task requester needs to choose some of the workers from U [identifying a set of participating edge nodes] and coordinate the answers from these workers using some criteria (or fusion method), such as the common majority voting, to estimate the correct answer of the task."; "Question: How to find a set of workers St to maximize the accuracy of majority voting and the total cost of St does not exceed the budget B")
… estimating learning utility of each of the plurality of other edge nodes, (Qiu, p. 540 Table 1 Notations "w: The utility of worker i to task t"; p. 541 "Intuitively, the smaller the error rate is, the larger the utility will be.") based on comparison of the external updated parameters to the internal updated parameters; (Qiu, p. 540 "The worker error rate prediction provides an estimate of a worker’s expected behavior, according to his responses compared to the answers obtained for the same task from other workers and the answers of gold tasks (i.e., tasks for which the true answers is known to the task requester)."; p. 541 "Next, we provide a discussion about a form of utility which offers a theoretically optimal value of utility based on the error rate.")
requesting cost estimates from each of the plurality of other edge nodes; (Qiu, p. 540 Table 1 Notations "c: The cost of worker i for task t"; p. 542 "where each worker i has … cost c...")

    PNG
    media_image1.png
    84
    506
    media_image1.png
    Greyscale
identifying a lowest-value edge node from the plurality of other edge nodes, based on a smallest value of a ratio of learning utility to cost estimate for each of the plurality of other edge nodes; (Qiu, p. 543 "Considering the above two factors, we define the following metric for each worker i, which is proportional to w (2p -1) and inversely proportional to his cost c (line 1-2) a = w(2p-1) / c (14) Here, a is a relative measure of worker i’s error rate to its cost... After that, we sort the workers by decreasing a (line 3)."; w is the learning utility, c is the cost estimate, a is the ratio of learning utility to cost, the last one in the descending order is the lowest-value node.)

reducing the plurality of other edge nodes by excluding the low(Qiu, p. 543 "let the sorted worker sequence be 1, 2, ..., i, ..., N. Finally, we select workers one by one from the sorted sequence."; because workers are selected from the highest a to lowest a, therefore low-value a are excluded.)

    PNG
    media_image2.png
    198
    578
    media_image2.png
    Greyscale
… generating the set of participating edge nodes from the plurality of other edge nodes by repeating steps of identifying and reducing until a total of the cost estimates from the plurality of other edge nodes is within a cost budget of the model requester node. (Qiu, p. 543 "Finally, we select workers one by one from the sorted sequence. More specifically, we use B' to represent the remaining budget, where B' is initiated by B [cost budget]. In each iteration, we choose item i from the head of the sorted array. If c < B', we select it and B' = b'- c"; St is the set of participating nodes/workers generated from this algorithm.) 

It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Biswas to incorporate the teachings of Qiu by including the CrowdSelect method. Doing so would provide a theoretically proven algorithm to assign workers to tasks in a cost efficient manner, while ensuring high accuracy of the overall task.

Biswas and Qiu do not teach, but Kopp teaches:  generating seed parameters of the machine learning model by performing preliminary training of the machine learning model at the model requester node; (Kopp, [0058] "At act A110, a worker device 122 trains a model [preliminary training] using local training data and a first parameter. The worker device 122 includes a model and local training data."; [0067] "at act A120, the worker device 122 transmits a second parameter from the trained model to the parameter server 125. The second parameter [seed parameters] may be parameter vector that is generated as a result of training the model using the training data.")
obtaining external updated parameters by facilitating one-step updates to the seed parameters by each of the plurality of other edge nodes;  (Kopp, [0068] "At act A130, the worker device 122 receives a third parameter from the parameter server 125 [pull from parameter server]. In an embodiment, the parameter server 125 stores a central parameter vector [external updated parameters from other workers] that the parameter server 125 updates each time a worker unit sends it a local parameter or local parameter vector."; Pull the updated central parameter/3rd parameter, i.e. obtaining the external updated parameters. Pulling is one-step pulling / one-step updating. )
obtaining internal updated parameters by performing a one-step update of the seed parameters at the model requester node;  (Kopp, [0071] "At act A140, the worker device 122 retrains the model using the local training data and the third parameter." [retraining to obtain internal updated parameters]; [0072] "At act A150, the worker device 122 transmits the fourth parameter of the updated trained model [internal updated parameters] to the parameter server 125"; retrain the model using the updated central parameter, i.e. replacing the seed parameters/2nd parameter with the central parameter/3rd parameter (one-step update of seed parameter) to retrain the model)
aggregating the external and internal updated parameters at the model requester node; (Kopp, [0026] "During the processes, the parameter server is constantly updating the central set of parameters and transmitting the updated set to the worker that transmitted the local parameters. As workers collect new data, the local models may be trained on the new data or a combination of the new and old data. Over time, the transmitted parameters back and forth between the workers and the parameter server eventually settles on a final set of parameters. The final set of parameters and the model may then be used by the worker or other devices... "; the worker/requestor uses the final set of parameters aggregating from other workers (external updated parameters) and itself (internal updated parameters).)
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Biswas and Qiu to incorporate the teachings of Kopp by including distributed processing with edge devices. Doing so would maintain privacy and transmission concerns.


    PNG
    media_image3.png
    142
    401
    media_image3.png
    Greyscale
Biswas, Qiu and Kopp do not teach, but Boutsis teaches: reducing the plurality of other edge nodes by excluding the lowest-value edge node from the plurality of other edge nodes; and (Boutsis, p. 5 "We continue iterating through the workers list L... We investigate if we can swap that worker with any of the workers wg assigned to the group groupj and estimate if this swap can increase the objective function that refers to the reliability... but we only evaluate the distance of each individual wg in the group with the current worker wi to determine the best swap decision. After determining the 'best' swap, i.e. the swap that increases the group reliability more than the other possible swaps, while remaining in the feasible region, we choose to make the swap. Thus, we remove wg from the set and we add wi."; p.6 "we swap the wi with another worker with the smallest reliability from groupj that provides a solution in the feasible region..."the best swap (removing wg) corresponds to excluding the lowest-value node) 

It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Biswas, Qiu and Kopp to incorporate the teachings of Boutsis by including swap produces. Doing so would provide a local optimum in every iteration. (Boutsis, p. 6 "Also note, that this swap produces a local optimum in every iteration as we show in Lemma 4.2. If wi does not increase the objective probabilities, it means that he is either an unreliable worker or will probably miss the deadline, compared to our selected workers, and we do not consider him for taskj"; p.6 "Thus, taking Lemma 4.1 into consideration, we ensure that for the set of workers groupj ∪ {wi}, there is no other feasible solution with amountj workers and highest P(votegj ) and thus, it is a local optimum")

In regard to claim 8, Biswas teaches: A computer-implemented method for distributed machine learning comprising: (Biswas, [0042] "2. System Overview... In the example of FIG. 1, worker servers 110, a client computing device 120, and a master machine learning server computer 130 [distributed system with machine learning server] are communicatively coupled to a data communications network 100.")

Biswas does not teach, but Kopp teaches: generating seed parameters of a machine learning model by performing preliminary training of the machine learning model at a model requester node that is an edge node of a network of cloud computing nodes; (Kopp, [0058] "At act A110, a worker device 122 trains a model [preliminary training] using local training data and a first parameter. The worker device 122 includes a model and local training data."; [0067] "at act A120, the worker device 122 transmits a second parameter from the trained model to the parameter server 125. The second parameter [seed parameters] may be parameter vector that is generated as a result of training the model using the training data."; [0036] "The system includes devices 122 (also referred to as edge devices or worker devices 122). ")
obtaining external updated parameters by facilitating one-step updates to the seed parameters by each of a plurality of other edge nodes; (Kopp, [0068] "At act A130, the worker device 122 receives a third parameter from the parameter server 125 [pull from parameter server]. In an embodiment, the parameter server 125 stores a central parameter vector [external updated parameters from other workers] that the parameter server 125 updates each time a worker unit sends it a local parameter or local parameter vector."; Pull the updated central parameter/3rd parameter, i.e. obtaining the external updated parameters. Pulling is one-step pulling / one-step updating. )
obtaining internal updated parameters by performing a one-step update of the seed parameters at the model requester node; (Kopp, [0071] "At act A140, the worker device 122 retrains the model using the local training data and the third parameter." [retraining to obtain internal updated parameters]; [0072] "At act A150, the worker device 122 transmits the fourth parameter of the updated trained model [internal updated parameters] to the parameter server 125"; retrain the model using the updated central parameter, i.e. replacing the seed parameters/2nd parameter with the central parameter/3rd parameter (one-step update of seed parameter) to retrain the model)
aggregating the external and internal updated parameters at the model requester node; (Kopp, [0026] "During the processes, the parameter server is constantly updating the central set of parameters and transmitting the updated set to the worker that transmitted the local parameters. As workers collect new data, the local models may be trained on the new data or a combination of the new and old data. Over time, the transmitted parameters back and forth between the workers and the parameter server eventually settles on a final set of parameters. The final set of parameters and the model may then be used by the worker or other devices... "; the worker/requestor uses the final set of parameters aggregating from other workers (external updated parameters) and itself (internal updated parameters).)
… training the machine learning model by repeatedly: (Kopp, [0072] "The process repeats for a number of iteration until the parameters converge or a predetermined number of iteration is reached. This process may be repeated hundreds or thousands of times.")
distributing most recent parameters of the machine learning model to the participating edge nodes; (Kopp, [0041] "The parameter servers 125 are configured to receive locally trained model parameters [most recent parameters] from a device 122, adjust centrally stored model parameters, and transmit the adjusted centrally model parameters back to the device. The parameter server 125 communicates with each device 122 of the plurality of devices 122 that are assigned to the parameter server 125."; [0043] "The parameter server 125 stores a central parameter vector that the parameter server 125 updates each time a device (worker unit) sends a parameter vector to the parameter server 125."; [0067] "at act A120, the worker device 122 transmits a second parameter from the trained model to the parameter server 125."; Parameter server can communicate with each of the devices (i.e. participating nodes) to distribute central parameters adjusted with most recent parameters/locally trained/2nd parameter (received from any worker) to the participating nodes.)
receiving updates to the most recent parameters from the participating edge nodes; and (Kopp, [0041] "The parameter servers 125 are configured to receive locally trained model parameters from a device 122, adjust centrally stored model parameters, and transmit the adjusted centrally model parameters back to the device [receiving updates]."; [0068] "At act A130, the worker device 122 receives a third parameter from the parameter server 125 [pull from parameter server]; the device/worker receiving updates/3rd parameter/central parameter updated from the participating nodes. i.e. replacing 2nd parameter with 3rd parameter)
establishing new parameters for the machine learning model by aggregating the updates from the participating edge nodes. (Kopp, [0026] "During the processes, the parameter server is constantly updating the central set of parameters and transmitting the updated set to the worker that transmitted the local parameters. As workers collect new data, the local models may be trained on the new data or a combination of the new and old data. Over time, the transmitted parameters back and forth between the workers and the parameter server eventually settles on a final set of parameters. [e.g. establishing new parameters]"; [0041] "The parameter servers 125 may be configured to aggregate parameters from one or more models that are trained on the devices 122."; [0055] "The parameter server 125 aggregates the parameter vectors from each of the three devices [aggregating updates from nodes] and generates a central parameter vector [e.g. establishing new parameters].")
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Biswas to incorporate the teachings of Kopp by including distributed processing with edge devices. Doing so would maintain privacy and transmission concerns.

Biswas and Kopp do not teach, but Qiu teaches: estimating learning utility of each of the plurality of other edge nodes, based on comparison of the external updated parameters to the internal updated parameters; (Qiu, p. 540 Table 1 Notations "w: The utility of worker i to task t"; p. 541 "Intuitively, the smaller the error rate is, the larger the utility will be.") (Qiu, p. 540 "The worker error rate prediction provides an estimate of a worker’s expected behavior, according to his responses compared to the answers obtained for the same task from other workers and the answers of gold tasks (i.e., tasks for which the true answers is known to the task requester)."; p. 541 "Next, we provide a discussion about a form of utility which offers a theoretically optimal value of utility based on the error rate.")
requesting cost estimates from each of the plurality of other edge nodes; (Qiu, p. 540 Table 1 Notations "c: The cost of worker i for task t"; p. 542 "where each worker i has … cost c...")
identifying a lowest-value edge node from the plurality of other edge nodes, based on a smallest value of a ratio of learning utility to cost estimate for each of the plurality of other edge nodes; (Qiu, p. 543 "Considering the above two factors, we define the following metric for each worker i, which is proportional to w (2p -1) and inversely proportional to his cost c (line 1-2) a = w(2p-1) / c (14) Here, a is a relative measure of worker i’s error rate to its cost... After that, we sort the workers by decreasing a (line 3)."; w is the learning utility, c is the cost estimate, a is the ratio of learning utility to cost, the last one in the descending order is the lowest-value node.)
reducing the plurality of other edge nodes by excluding the low(Qiu, p. 543 "let the sorted worker sequence be 1, 2, ..., i, ..., N. Finally, we select workers one by one from the sorted sequence."; because workers are selected from the highest a to lowest a, therefore low-value a are excluded.)
…generating a set of participating edge nodes from the plurality of other edge nodes by repeating steps of identifying and reducing until a total of the cost estimates from the plurality of other edge nodes is within a cost budget of the model requester node; and (Qiu, p. 543 "Finally, we select workers one by one from the sorted sequence. More specifically, we use B' to represent the remaining budget, where B' is initiated by B [cost budget]. In each iteration, we choose item i from the head of the sorted array. If c < B', we select it and B' = b'- c"; St is the set of participating nodes/workers generated from this algorithm.)
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Biswas and Kopp to incorporate the teachings of Qiu by including the CrowdSelect method. Doing so would provide a theoretically proven algorithm to assign workers to tasks in a cost efficient manner, while ensuring high accuracy of the overall task.

Biswas, Kopp and Qiu do not teach, but Boutsis teaches: …reducing the plurality of other edge nodes by excluding the lowest-value edge node from the plurality of other edge nodes; and (Boutsis, p. 5 "We continue iterating through the workers list L... We investigate if we can swap that worker with any of the workers wg assigned to the group groupj and estimate if this swap can increase the objective function that refers to the reliability... but we only evaluate the distance of each individual wg in the group with the current worker wi to determine the best swap decision. After determining the 'best' swap, i.e. the swap that increases the group reliability more than the other possible swaps, while remaining in the feasible region, we choose to make the swap. Thus, we remove wg from the set and we add wi."; p.6 "we swap the wi with another worker with the smallest reliability from groupj that provides a solution in the feasible region..."the best swap (removing wg) corresponds to excluding the lowest-value node)
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Biswas, Kopp and Qiu to incorporate the teachings of Boutsis by including swap produces. Doing so would provide a local optimum in every iteration.

In regard to claim 9, Biswas, Kopp and Qiu and Boutsis teach: The method of claim 8 wherein the seed parameters and the external parameters of the machine learning model are communicated via a broker node distinct from the model requester node and the other edge nodes. (Biswas, Fig. 1 and 3, [0042] "2. System Overview... In the example of FIG. 1, worker servers 110 [other nodes], a client computing device [a requestor] 120, and a master machine learning server computer 130 [a broker node] are communicatively coupled to a data communications network 100."; [0091] "The master machine learning server computer may store the parameter values, configuration files, and training datasets [specification] locally and/or on a separate server computer, such as a cloud server."; [0048] [0052]; [0084] receiving output datasets / replies from worker servers; parameter values include the seed parameters and the external parameters.)

In regard to claim 10, Biswas, Kopp and Qiu and Boutsis teach: The method of claim 9 wherein the broker node is at a server of the cloud computing network. (Biswas, [0091] "The master machine learning server computer may store the parameter values, configuration files, and training datasets locally and/or on a separate server computer, such as a cloud server."; a master server / broker server can be a server of the cloud computing network.)

In regard to claim 11, Biswas, Kopp and Qiu and Boutsis teach: The method of claim 8 wherein training the machine learning model further includes generating results at the model requester node by updating the most recent parameters based on training data available at the model requester node, and (Kopp, [0071] "At act A140, the worker device 122 retrains the model using the local training data [locally available data] and the third parameter. The worker device 122 may use the same local training data or may update the training data with newly collected sensor data... Additional data may be added to the training data set as the data is collected."; [0072] "If new data is added to the training data, the device may retrain the model and request a new central parameter..."; a worker is retrained using local training data, i.e. generating results (by updating the most recent parameters / by replacing 2nd parameter with 3rd parameter) based on training data available at the requestor node.)
aggregating the updates includes combining the results from the model requester node with the updates from the participating edge nodes. (Kopp,  [0041] "The parameter servers 125 may be configured to aggregate parameters from one or more models that are trained on the devices 122."; [0055] "The parameter server 125 aggregates the parameter vectors from each of the three devices and generates a central parameter vector."; [0068] "In an embodiment, the parameter server 125 stores a central parameter vector that the parameter server 125 updates each time a worker unit sends it a local parameter or local parameter vector."; one of the worker is the model requester, and others are participating nodes.)
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Biswas to incorporate the teachings of Kopp by including distributed processing with edge devices. Doing so would maintain privacy and transmission concerns.

In regard to claim 12, Biswas, Kopp and Qiu and Boutsis teach: The method of claim 8 wherein each of the participating edge nodes updates the model parameters based only on training data available at that participating edge node. (Kopp, [0018] "Each device acquires a local set of training data without sharing data sets across devices. The devices train the model on the respective device's set of training data.")
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Biswas to incorporate the teachings of Kopp by including distributed processing with edge devices. Doing so would maintain privacy and transmission concerns.

Claim 13 is rejected under 35 U.S.C. 103 as being unpatentable over Biswas in view of Qiu in view of Kopp in view of Boutsis in further view of Eden (US 20150356488 A1).
In regard to claim 13, Biswas, Qiu, Kopp and Boutsis do not teach, but Eden teaches: The method of claim 8 wherein identifying the lowest-value edge node incorporates a measure of truthfulness determined by excluding from the participating edge nodes at least one edge node with relatively high learning utility, then comparing the training accuracy with and without the at least one edge node over time. (Eden, Abstract "A crowdsourcing environment is described herein which uses a single-stage or multi-stage approach to evaluate the quality of work performed by a worker [comparing/evaluating the accuracy/quality of the node/worker], with respect to an identified task. "[0063] "The reputation evaluation module 304 generates a reputation score, which reflects the propensity of the worker to perform desirable (e.g., accurate) [accuracy] work for the task (or task type) under consideration."; [0122] "In one implementation, the training system 126 can generate the reputation evaluation model 308 [a measure of truthfulness] (of FIG. 3) in a manner which parallels the two-stage processing described above. More particularly, the training system 126 can first remove training examples from the training set which correspond to the work perform by spam agents [without/excluding spam nodes], to produce a spam-removed training set. The training system 126 can then train the reputation evaluation model 308 based on the spam-removed training set. For a single-stage model, the training system 126 can dispense with the preliminary step of removing examples associated with spam agents [with spam nodes]."; [0091] "the analysis engine 114 may perform its analysis on a non-real-time basis, e.g., on a periodic basis."; [0074] "... over the course of the current day, etc... who answers a large number of tasks in a short period of time [over time] (relative to some specified norm), may correspond to a low-quality worker or a spam agent, justifying a low reputation score and a high spam score"; spec. [0064] nodes with high learning utility can include harmful training data, i.e. spam data.)
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Biswas, Kopp and Qiu to incorporate the teachings of Eden by including a multi-stage approach to evaluate the quality of work performed by a worker. Doing so would allow the system to distinguish low-quality workers and high-quality workers. (Eden, [0003] "Among other drawbacks, the presence of low-quality work can quickly deplete the allocated financial resources of a task owner"; [0004] "a crowdsourcing environment is described herein which uses a multi-stage approach to evaluate the quality of work performed by a worker, with respect to an identified task.")

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SU-TING CHUANG whose telephone number is (408)918-7519.  The examiner can normally be reached on Monday - Thursday 8-5 PT.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kakali Chaki can be reached on (571)272-3719.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


/S.C./Examiner, Art Unit 2122


/KAKALI CHAKI/Supervisory Patent Examiner, Art Unit 2122