DETAILED ACTION
This action is responsive to Remarks and Claim amendments filed on September 23, 2022.
Claims 1-4 and 6-20 have been amended. Claim 5 has been canceled.
 Claims 1-4 and 6-20 are pending and are presented to examination.
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Examiner Notes
Examiner cites particular columns, paragraphs, figures and line numbers in the references as applied to the claims below for the convenience of the applicant. Although the specified citations are representative of the teachings in the art and are applied to the specific limitations within the individual claim, other passages and figures may apply as well. It is respectfully requested that, in preparing responses, the applicant fully consider the references in their entirety as potentially teaching all or part of the claimed invention, as well as the context of the passage as taught by the prior art or disclosed by the examiner.

Response to amendments
The objection of the disclosure (Abstract) is withdrawn in view of applicant’s amendments.

Response to Arguments
Applicants have argued that Ma, along with the remaining arts of record, does not teach the newly added limitations of independent claims 1, 15, 17 and 19 (Remarks, pages 11-13). Applicants' arguments have been fully considered and are persuasive. Therefore, the rejection is withdrawn. However, upon further consideration, a new ground of rejection is made as set forth in details below. See Wesolowski et al. (US Pub. No. 2019/0114537) art being made of record as applied herein.

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claims 1-3, 6-9 and 11-20 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Wesolowski et al. (US Pub. No. 2019/0114537) – hereinafter Wesolowski).
  	With respect to claim 1 (currently amended), Wesolowski teaches an apparatus, comprising processing circuitry (see figures 8 and 12 (and related text)) configured to:   	obtain first trained parameters for a model, wherein the first trained parameters have been generated by training the model using data from a first data cohort (see paragraphs [0020], [0028], [0035], [0062], [0066], [0069], various hyper-parameters, or ML-training execution parameters (e.g., execution-settings for training a machine learning model that may be initialized prior to commencing the training of the machine learning model), that affect consistency of processing results (or intermediate processing results) across different computer architectures may be adjusted to assure consistency of training results (e.g., results having a similar accuracy within a predefined percentage value as if training were continued uninterrupted on a single system). The scheduler machine may also note the computer architecture of a first system (e.g., the number of computing machines or CPUs or GPUs) on which an ML model is being trained, and when transferring the training processing of the ML model to a second system having a computer architecture different than the first system, the scheduler machine may (automatically) adjust hyper-parameters for execution on the second system to assure consistency of training results with the first system. See paragraph [0056], computer processing of a trained (Sparse NN) ML model may be split between at least one local machine and at least one remote machine, over a computer network. The local machine, which may be a local ranking machine (e.g., a Facebook server), may be characterized by a computer architecture that emphasizes computational power over memory availability. The remote machine (e.g., another Facebook server), which may be a back-end service such as remote predictor (or a parameter server), may be characterized by a computer architecture that emphasizes memory storage capacity over computational power. In addition to differences in computational resources, the local machine and the remote machine may have access to different data sets (e.g., the local machine may have access to (e.g., receive as input) user features and the remote machine may have access to (e.g., store) trained embedding matrices). Output results of the remote machine may then be sent to the local machine, where they may be merged with outputs from the local machine according to the trained (Sparse NN) ML model).   	obtain second trained parameters for the model, wherein the second trained parameters have been generated by training the model on another apparatus using data from a second, different data cohort that is available to the other apparatus but is not available to the apparatus (see paragraphs [0020], [0028], [0035], [0062], [0066], [0069], various hyper-parameters, or ML-training execution parameters (e.g., execution-settings for training a machine learning model that may be initialized prior to commencing the training of the machine learning model), that affect consistency of processing results (or intermediate processing results) across different computer architectures may be adjusted to assure consistency of training results (e.g., results having a similar accuracy within a predefined percentage value as if training were continued uninterrupted on a single system). The scheduler machine may also note the computer architecture of a first system (e.g., the number of computing machines or CPUs or GPUs) on which an ML model is being trained, and when transferring the training processing of the ML model to a second system having a computer architecture different than the first system, the scheduler machine may (automatically) adjust hyper-parameters for execution on the second system to assure consistency of training results with the first system. See paragraph [0056], computer processing of a trained (Sparse NN) ML model may be split between at least one local machine and at least one remote machine, over a computer network. The local machine, which may be a local ranking machine (e.g., a Facebook server), may be characterized by a computer architecture that emphasizes computational power over memory availability. The remote machine (e.g., another Facebook server), which may be a back-end service such as remote predictor (or a parameter server), may be characterized by a computer architecture that emphasizes memory storage capacity over computational power. In addition to differences in computational resources, the local machine and the remote machine may have access to different data sets (e.g., the local machine may have access to (e.g., receive as input) user features and the remote machine may have access to (e.g., store) trained embedding matrices). Output results of the remote machine may then be sent to the local machine, where they may be merged with outputs from the local machine according to the trained (Sparse NN) ML model).   	determine a first evaluation value by inputting data from the first data cohort into a model having the first trained parameters, wherein the first evaluation value is representative of a performance of the model having the first trained parameters when applied to data from the first data cohort (see the rejection above, and paragraph [0026] and claim 2, the scheduler machine may distribute execution of a single machine learning model across multiple different computing machines, so that each computing machine trains a different portion (e.g., graph-segment) of the ML model and the different computing machines exchange processing data, as needed. In this case, the scheduler machine may monitor the performance of each computing machine, and if necessary, transfer execution of a portion of the machine learning model from one machine to a faster or slower machine, as necessary, to maintain optimal timing between the transferring of processing data between the machines (e.g., to minimize wait time by one machine waiting for another machine to reach a point where a check-point may be created or to complete transferring of processing data. See paragraphs [0062], [0068]-[0069], [0071]-[0072], [0074], it is noted that the transferring of training a neural network model from one machine (or training group) to another is not a straight forward matter. Firstly, the training of different neural networks for different tasks is not the same. Some neural networks may be trained to discern attributes (e.g., patterns, relations, similarities, etc.) of text data, others of video data, others of audio data, others of image data, others of metadata, etc. Some neural network models may be short, and some may be long, some may have a short latency (e.g., short training time) and others may have a long latency (e.g., long training time). Some neural network model may require faster machines, or more memory, and each may generally require a different profile machine. Additionally, as is explained above, machine learning is execution aware, meaning that if training of a neural network ML model is blindly transferred from a first machine to a second machine, and the second machine does not match the first machine in terms of computing power and characteristics, then correctness problems may arise. As an example, when moving between GPUs and CPUs, it may take many CPUs to get the same computing capability as a single GPU. To illustrate, it may take 8 CPUs to get the same throughput as a single GPU, which means that in order to get the same throughput when transferring training from a GPU to a set of CPUs, 8 CPUs may be needed. A GPU may run optimally with a mini-batch size of 32 while a CPU chip may run optimally with a mini-batch size of 16. With 8 CPUs, the aggregate mini-batch size would therefore be 128 (i.e., 8×16). The difference in the mini-batch size could lead to correctness problems, and as such algorithmic adjustments may be needed to yield the same accuracy. To avoid such problems, in particular embodiments master ML control system 21 may introduce check-point handshaking, wherein it compares the configuration of a first machine wherein a check-point is generated to the configuration of a target machine (or training group) to where a neural network model (or graph-segment of the neural network ML model) is to be transferred, identifies hyper-parameters of the training technique (e.g., distributed SGD, or other gradient descent-based technique) being used to train the neural network on the first machine, and adjusts at least part of these hyper-parameters in accordance with (hardware or performance) characteristics (e.g., type) of the target machine (or training group) so that the target machine (or training group) produces training results (e.g., weights or parameters) similar to, e.g., within a predefined percentage range, of training results achievable by the first machine if its training of the neural network ML model had not been interrupted (e.g., with its original hyper-parameter settings). For example, if the first machine were a GPU-based machine (e.g., having 8 GPUs) operating at a high processing speed, and the target machine (or training group) were CPU-based, then the master ML control system 21 may set a hyper-parameter for the target machine to define a larger batch size in order obtain a similar speed as the first machine) and   	determine a second evaluation value by inputting data from the first data cohort into a model having the second trained parameters, wherein the second evaluation value is representative of a performance of the model having the second trained parameters when applied to data from the first data cohort  (see the rejection above, and paragraph [0026] and claim 2, the scheduler machine may distribute execution of a single machine learning model across multiple different computing machines, so that each computing machine trains a different portion (e.g., graph-segment) of the ML model and the different computing machines exchange processing data, as needed. In this case, the scheduler machine may monitor the performance of each computing machine, and if necessary, transfer execution of a portion of the machine learning model from one machine to a faster or slower machine, as necessary, to maintain optimal timing between the transferring of processing data between the machines (e.g., to minimize wait time by one machine waiting for another machine to reach a point where a check-point may be created or to complete transferring of processing data. See paragraphs [0062], [0068]-[0069], [0071]-[0072], [0074], it is noted that the transferring of training a neural network model from one machine (or training group) to another is not a straight forward matter. Firstly, the training of different neural networks for different tasks is not the same. Some neural networks may be trained to discern attributes (e.g., patterns, relations, similarities, etc.) of text data, others of video data, others of audio data, others of image data, others of metadata, etc. Some neural network models may be short, and some may be long, some may have a short latency (e.g., short training time) and others may have a long latency (e.g., long training time). Some neural network model may require faster machines, or more memory, and each may generally require a different profile machine. Additionally, as is explained above, machine learning is execution aware, meaning that if training of a neural network ML model is blindly transferred from a first machine to a second machine, and the second machine does not match the first machine in terms of computing power and characteristics, then correctness problems may arise. As an example, when moving between GPUs and CPUs, it may take many CPUs to get the same computing capability as a single GPU. To illustrate, it may take 8 CPUs to get the same throughput as a single GPU, which means that in order to get the same throughput when transferring training from a GPU to a set of CPUs, 8 CPUs may be needed. A GPU may run optimally with a mini-batch size of 32 while a CPU chip may run optimally with a mini-batch size of 16. With 8 CPUs, the aggregate mini-batch size would therefore be 128 (i.e., 8×16). The difference in the mini-batch size could lead to correctness problems, and as such algorithmic adjustments may be needed to yield the same accuracy. To avoid such problems, in particular embodiments master ML control system 21 may introduce check-point handshaking, wherein it compares the configuration of a first machine wherein a check-point is generated to the configuration of a target machine (or training group) to where a neural network model (or graph-segment of the neural network ML model) is to be transferred, identifies hyper-parameters of the training technique (e.g., distributed SGD, or other gradient descent-based technique) being used to train the neural network on the first machine, and adjusts at least part of these hyper-parameters in accordance with (hardware or performance) characteristics (e.g., type) of the target machine (or training group) so that the target machine (or training group) produces training results (e.g., weights or parameters) similar to, e.g., within a predefined percentage range, of training results achievable by the first machine if its training of the neural network ML model had not been interrupted (e.g., with its original hyper-parameter settings). For example, if the first machine were a GPU-based machine (e.g., having 8 GPUs) operating at a high processing speed, and the target machine (or training group) were CPU-based, then the master ML control system 21 may set a hyper-parameter for the target machine to define a larger batch size in order obtain a similar speed as the first machine).  	With respect to claim 2 (currently amended), Wesolowski teaches wherein the processing circuitry is further configured to obtain an updated model by combining the first trained parameters and the second trained parameters (see figure 6 and paragraphs [0058], [0062], the operation nodal model 70 of FIG. 5 divided into multiple graph-segments (91 to 94). Optionally, the graph-segments may be configured to be sufficiently self-contained so that each may be processed (executed) independent of each other, as much as practical. Individual graph-segments may be distributed (designated) for execution to specific machines (e.g., a computing system including multiple computing machines) that have the appropriate resources (e.g., high computational resources or high data storage resources or high memory bandwidth) for executing the individual graph-segments. For example, compute intensive graph-segments may be designated for processing within a first machine (as indicated by an “M1” node designation in FIG. 6), and memory-intensive graph-segments may be designated for processing on a second machine (as indicated by an “M2” node designation). Irrespective, the output results of executing graph segments on the first machine or second machine may be merged into a reconstruction of the graph representation of the original ML model 70, and a final result may be determined).
  	With respect to claim 3 (currently amended), Wesolowski teaches wherein the combination of the first trained parameters and second trained parameters is weighted in accordance with the first evaluation value and the second evaluation value (see paragraphs [0033]-[0034], [0037]-[0039], [0047], [0050] and figure 2, a simplified neural network consisting of an input layer InL′, a hidden layer HL1′, and an output layer OutL′. Input layer InL′ is shown having two input nodes i1 and i2 that respectively receive inputs Input_1 and Input_2 (e.g., the input nodes of layer InL′ receive an input vector of two dimensions). The input layer InL′ feeds forward to one hidden layer HL1′ having two nodes h1 and h2, which in turn are fed forward to an output layer OutL′ of two nodes o1 and o2. Interconnections, or links, between neurons (illustrative shown as solid arrows) may have weights (e.g., w1 to w8) associated with them. Typically except for the input layer, a node (neuron) may receive as input the outputs of nodes in its immediately preceding layer. Each node may calculate its output by, e.g., multiplying each of its inputs by each input's corresponding interconnection weight, summing the products of it inputs, adding (or multiplying by) a constant defined by another weight or bias that may be associated with that particular node (e.g., node weights w9, w10, w11, w12 respectively corresponding to nodes h1, h2, o1, and o2), and applying a function (e.g., non-linear or logarithmic) to the result. The non-linear function may be termed an activation function or transfer function. Multiple activation functions are known in the art, and selection of a specific activation function is not critical to the present discussion. It is noted, however, that operation of the ML model, or behavior of the neural net, is dependent upon weight values, which may be learned so that the neural network provides a desired output for a given input. Furthermore, see figure 5 and paragraphs [0057], [0062], [0068], [0070]).  	With respect to claim 6 (currently amended), Wesolowski teaches wherein the performance represented by each evaluation value comprises an accuracy of the model as compared to random chance (see paragraph [0034], during a training, or learning, stage, the neural net learns (e.g., is trained to determine) appropriate weight values to achieve a desired output for a given input. Before the neural net is trained, the weights may be individually assigned an initial (e.g., random and optionally non-zero) value. See paragraph [0037], the entire training dataset may be randomized and submitted to the ML model, but the master parameter set is not updated until the entire training set has been processed. At this point, an aggregate of the weight updates produced by the entire training set may be used to update the master parameters set. This may constitute one epoch in the sequence of training cycles. The entire training set may then be re-randomized and re-submitted to the ML model of another epoch (training) cycle. After each epoch cycle, gradient descent of the weights in the master parameters set may move toward their final values. Given the large size of typical training datasets, however, the feasibility of using batch SGD may often be limited by available memory size. As another example, in basic SGD, the entire training set may be randomized and individual input-output training pairs from the training set may be submitted to the ML model, one-by-one, for training).  	With respect to claim 7 (currently amended), Wesolowski teaches wherein the first data cohort is held by an entity that is local to the apparatus and/or wherein the first data cohort consists of data acquired at an entity that is local to the apparatus (see paragraph [0035], local parameters. See paragraph [0056], computer processing of a trained (Sparse NN) ML model may be split between at least one local machine and at least one remote machine, over a computer network. The local machine, which may be a local ranking machine (e.g., a Facebook server), may be characterized by a computer architecture that emphasizes computational power over memory availability. The remote machine (e.g., another Facebook server), which may be a back-end service such as remote predictor (or a parameter server), may be characterized by a computer architecture that emphasizes memory storage capacity over computational power).   	With respect to claim 8 (currently amended), Wesolowski teaches wherein the second data cohort is held by a further entity that is remote from the apparatus and/or wherein the second data cohort consists of data acquired at a further entity that is remote from the apparatus (see paragraph [0056], computer processing of a trained (Sparse NN) ML model may be split between at least one local machine and at least one remote machine, over a computer network. The local machine, which may be a local ranking machine (e.g., a Facebook server), may be characterized by a computer architecture that emphasizes computational power over memory availability. The remote machine (e.g., another Facebook server), which may be a back-end service such as remote predictor (or a parameter server), may be characterized by a computer architecture that emphasizes memory storage capacity over computational power).  	With respect to claim 9 (currently amended), Wesolowski teaches wherein the entity that is local to the apparatus comprises or forms part of at least one of an institution, a hospital, a university, a company (see paragraph [0077], entity (e.g. enterprise)).	With respect to claim 11 (currently amended), Wesolowski teaches wherein the processing circuitry is further 20configured to: obtain at least one further set of trained parameters obtained by training the model on at least one further data cohort; and determine a respective evaluation value for each further set of trained parameters; 25wherein the obtaining of the updated model comprises combining the first trained parameters, the second trained parameters, and the at least one further set of trained parameters, and wherein the combination is weighted in accordance with the evaluation values (Examiner notes: the claim language merely introduce an additional data cohort which is combined to also generate a weight of the parameters. This concept is taught by Wesolowski (see rejection of claims 1 and 3).   	With respect to claim 12 (currently amended), Wesolowski teaches wherein the apparatus is configured to repeat a training cycle iteratively, and the training cycle comprises: transmitting the updated model to at least one further apparatus; training the updated model to obtain further trained parameters; testing the further trained parameters and further trained parameters from the at least one further apparatus; and obtaining a further updated model by combining the further trained parameters (see paragraphs [0036], [0038], [0075], updating the weights based on how much effect each weight has on the overall error so that the output of the neural network moves closer to the desired training output. This cycle may then be repeated until the actual output of the neural network is within an acceptable error range of the desired training output. Furthermore, see paragraph [0062], tuning the parameters in order to be tested in a different machine).  	With respect to claim 13 (currently amended), Wesolowski teaches wherein the model comprises a neural network, optionally a convolutional neural network (see paragraphs [0001], [0005], [0019], [0038], [0060] and figure 1, 9, neural network model).    	With respect to claim 14 (currently amended), Wesolowski teaches wherein the apparatus is further configured to use the updated model to process a set of data, wherein the set of data was not used in training the model (see paragraph [0055], after the ML model has been trained, the trained ML model may be submitted to a service server (e.g., a CPU-based machine or system used for day-to-day operational servicing of client users). Typically service servers may be of different types, which may refer to their specific generation, or primary use, or configuration, which may be characterized by different resource-emphases, such as computing capacity level or memory size. Service servers may be assigned tasks (running an ML model) based on their resources. Service servers may support different (client) products and services, such as adds, newsfeeds, searches, etc., and may also support internal services, such as database management. For example, a service server may be a ranking machine (e.g., a Facebook server), and execution of a trained ML model may identify candidate item(s) (among multiple available candidate items) that may be of most interest to a user. In particular embodiments, each trained ML model may consider a user input (or request) and one (or more) available candidate items as an information pair (more specifically, as a user/request-and-candidate item pair), and provide a prediction value (e.g., probability value) for this particular pair based on the ML model, which may then be compared with prediction values of other pairs to identify the optimal pair(s), which may be those having the highest prediction values. Therefore multiple instances of the trained ML model may be executed to consider multiple user/request-and-candidate item pairs to consider multiple candidate items, or to consider multiple candidate items for multiple different users. Additionally as explained above, some inputs may need embedding, and although the embeddings will have already been defined (e.g., embedding matrices will have been trained), large memory capacities may be needed to store the trained embedding matrices. As is also explained above, the user features (e.g., dense inputs) may be large and require high data processing capacity. Thus, assignment of a specific trained ML model to a specific service server type may be dependent upon how well the configuration of a service server type meets the processing requirements of the specific ML model).    	With respect to claim 15, the claim is directed to a method that corresponds to the apparatus recited in claim 1, respectively (see the rejection of claim 1 above).  
 	With respect to claim 16 (currently amended), Wesolowski teaches an apparatus comprising processing circuitry configured to:   	receive a data set for processing; and automatically process the data set to obtain a desired output, wherein the processing of the data set comprises using a model trained in accordance with the method of claim 15 (see paragraphs [0034]-[0035], [0056]).   	With respect to claims 17-18, the claims is directed to a system that corresponds to the apparatus recited in claims 1-3. Examiner notes: claim 17 recites a broader version of claim 1 and is rejected under the same rationale for rejecting claim 1 
  	With respect to claim 19, the claim is directed to a method that corresponds to the system recited in claim 17, respectively (see the rejection of claim 17 above).
  	With respect to claim 20, the claim is directed to a method that corresponds to the apparatus recited in claims 2-3, respectively (see the rejection of claims 2-3 above).
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

  	The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claim 4 is rejected under 35 U.S.C. 103 as being unpatentable over Wesolowski et al. (US Pub. No. 2019/0114537) in view of Flunkert et al. (US Pat. No. 10,936,947 – hereinafter Flunkert).  	With respect to claim 4 (currently amended), Wesolowski is silent to disclose wherein the processing circuitry is further configured to determine a degree of similarity between the first data cohort and the second data cohort based on the determined first evaluation value and second evaluation value.  	However, in an analogous art, Flunkert teaches wherein the processing circuitry is further configured to determine a degree of similarity between the first data cohort and the second data cohort based on the determined first evaluation value and second evaluation value (see column 3 line 46 - column 4 line 12, if a sufficiently large training data set is used, the model may be general enough to be able to make predictions regarding demands for an item for which no (or very few) demand observations were available for training. For example, if information regarding the similarity of a new item I.sub.new to some set of other items {I.sub.old} along one or more dimensions such as item/product category, price, etc. is provided, where demand observations for {I.sub.old} items were used to train the model while demand observations for I.sub.new was not used to train the model, the model may still be able to provide useful predictions for I.sub.new demand. See column 8 lines 13-20, forecasts 180 may be generated in at least some embodiments even for items which were not represented in the training data set, or for which very few observations were included in the training data—e.g., input comprising an indication of a similarity between a new item and a set of items represented in the training data may be sufficient to generate a probabilistic forecast for demand for the new item. See column 16 line 46 – column 17 line 2, the forecaster may receive a request for a probabilistic demand forecast for one or more target items (some of which might not have had demand time series used for training the RNN model, or may have much shorter time series available than were used for training the RNN model) (element 716). The request may include, for example, information which can be used to determine the similarity of a target item to various subsets of the items whose demand observations were used for training. For example, in some embodiments, the model may be able to determine similarities between items based on item prices or price ranges, a product category associated with the item, item introduction timing information (e.g., when the item was added to an inventory represented by the input demand time series, which may also be referred to as the “age” of the item), or marketing information associated with the item (such as promotion periods, price reductions, etc.). Such information may be used to determine input features to be provided as input to the trained model for one or more executions in at least some embodiments. The request may also include a time series of demand observations for the target item itself, which may in some cases be shorter (i.e., have fewer elements than) than the time series for various other items used as part of the training set for the RNN model).
    	Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Wesolowski’s teaching, which set forth a method includes establishing access to first and second different computing systems. A machine learning model is assigned for training to the first computing system, and the first computing system creates a check-point during training in response to a first predefined triggering event. The check-point may be a record of an execution state in the training of the machine learning model by the first computing system. In response to a second predefined triggering event, the training of the machine learning model on the first computing system is halted, and in response to a third predefined triggering event, the training of the machine learning model is transferred to the second computing system, which continues training the machine learning model starting from the execution state recorded by the check-point, by determining a degree of similarity between the models as suggested by Flunkert, as Flunkert would be able to generate a probabilistic forecast for demand.
Claim 10 is rejected under 35 U.S.C. 103 as being unpatentable over Wesolowski et al. (US Pub. No. 2019/0114537) in view Micah J. Sheller et al. (Multi-Institutional Deep Learning Modeling Without Sharing Patient Data: A Feasibility Study on Brain Tumor Segmentation – hereinafter Sheller – IDS 09/27/2019).
  	With respect to claim 10 (currently amended), Wesolowski is silent to disclose wherein data restrictions are such that the data from the second data cohort is not permitted to be provided to the entity that is local to the apparatus and/or the apparatus is not capable of receiving data from the second data cohort.  	However, in an analogous art, Sheller teaches wherein data restrictions are such that the data from the second data cohort is not permitted to be provided to the entity that is local to the apparatus and/or the apparatus is not capable of receiving data from the second data cohort (see page 3, second and third paragraphs “Differential Privacy”, noise can be added before sending an update, to obscure the presence of any collection of samples in the institution’s dataset, and an accounting can be made as to the likelihood that such a determination can be made from the resulting model. The model is then said to have a degree of ‘differential privacy’).
    	Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Wesolowski’s teaching, which set forth a method includes establishing access to first and second different computing systems. A machine learning model is assigned for training to the first computing system, and the first computing system creates a check-point during training in response to a first predefined triggering event. The check-point may be a record of an execution state in the training of the machine learning model by the first computing system. In response to a second predefined triggering event, the training of the machine learning model on the first computing system is halted, and in response to a third predefined triggering event, the training of the machine learning model is transferred to the second computing system, which continues training the machine learning model starting from the execution state recorded by the check-point, by performing data restrictions between the models as suggested by Sheller, as Sheller would provide a differential privacy among the models and collected data.

Conclusion
Applicant’s amendments necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
 Any inquiry concerning this communication should be directed to examiner Anibal Rivera, whose telephone/fax numbers are (571) 270 1200 and (571) 270 2200, respectively. The examiner can normally be reached Monday-Friday from 8:30AM to 4:00PM.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Hyung S. Sough, can be reached at (571) 272 6799.
Any inquiry of a general nature of relating to the status of this application or proceeding should be directed to the TC 2100 Group receptionist whose telephone number is (571) 272 2100.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free).

/ANIBAL RIVERA/Primary Examiner, Art Unit 2192