Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Detailed Action

1.	The Examiner acknowledges the applicant’s amendment filed October 12, 2020.  At this point claims 1-20 are pending in the instant application and ready for examination by the Examiner.


2.	A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on October 12, 2020 has been entered.


Claim Rejections - 35 USC § 103
3.	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the 

In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

Claim 19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Gibiansky in view of Fleisman and further in view of Markert. (U. S. Patent Publication 20160110657, referred to as Gibiansky; U. S. Patent Publication 20160335790, referred to as Fleisman; U. S. Patent Publication 20140309754, referred to as Markert)

Claim 19
Gibiansky discloses a system, comprising: use input data for a set of local versions of a machine learning model for a set of entities (Gibiansky, 0096; Use input data for a set of local versions of a machine learning model for a set of entities of applicants maps to ‘‘The data management unit 260 includes logic executable by the processor 202 to manage the data used to perform the features and functionality herein, which may vary based on the implementation. For example, in one implementation, the data management unit 260 may manage chunking of one or more of input data (e.g. training data that is too large for a single selection and optimization server 102 to store and output data (e.g. partial machine learning models generated across a plurality of selection and optimization servers 102, and/or processors thereof, and combined to create a global machine learning model).’ of Gibiansky.) to calculate a batch of performance metrics for the set of local versions (Gibiansky, 0118, 0043; To calculate a batch of performance metrics for the set of local versions of applicant maps to ‘The data may in fact reside in the data store 112, and simply be accessed by different selection and optimization servers 102, or chunks of the data may be stored directly on the different selection and optimization servers 102. In either arrangement, the selection and optimization servers 102 may load appropriate portions of their assigned data into memory and begin to form partial machine learning models independently of one another.’ and ‘FIG. 1 shows an implementation of a system 100 for selecting between different machine learning methods and optimizing the parameters that control their behavior. In the depicted implementation, the system 100 includes a selection and optimization server 102, a plurality of client devices 114a . . . 114n, a production server 108, a data collector 110 and associated data store 112.’ of Gibiansky.), wherein the machine learning model includes a set of hyperparameters (Gibiansky, 0047; Wherein the machine learning model includes a set of hyperparameters of applicant maps to ‘In one implementation, a model includes a choice of a machine learning method (e.g. GBM or SVM), hyperparameter settings (e.g. SVM's regularization term) and parameter settings (e.g. SVM's alpha coefficients on each data point) and the system and method herein can determine any of these values Gibiansky, 0093; Apply an optimization technique to the batch of performance metrics to produce a set of updates to the set of hyperparameters of applicant maps to ‘For example, the machine learning method unit 230 supports SVM and GBM, and in one implementation, implements a set of SVM parameters received from the parameter optimization unit 240 by scoring tuning data (e.g. label email as either spam or not spam) using SVM and the received SVM parameters.’ of Gibiansky. EC: The Examiner’s position of a learning method is an algorithm which has the ability to alter parameters to fit (learn) the current data set. For example a SVM is an algorithm which generates a hyperplane in n-dimensional space such that the binary division of the data points.  SEE ‘IntroSVM’ fig 2.1 and p10 for basic SVM introduction.)
Gibiansky does not disclose expressly when a new entity is added to the set of entities, update the updated set of hyperparameters by adding a new dimension for the new entity.
Fleisman discloses when a new entity is added to the set of entities, update the updated set of hyperparameters by adding a new dimension for the new entity. (Fleisman, 0033; Furthermore, it may enable the enforcement of additional constraints in the ICP formulation, such as kinematic physical constraints, repulsive points that push the model away, weighting parameters, or the like. The techniques discussed herein may be characterized as iterative closest point inverse kinematics (ICPIK) techniques or the like. For example, as discussed herein, an Iterative Closest Point (ICP) technique may find a transformation that aligns two point clouds or the like. At each iteration, the process may update the correspondence between the source and target point clouds, and determine the transformation that best aligns them until convergence is attained. EC; ‘New dimension’ of applicant maps to ‘additional constraints.’ Hyperparameters’ maps to ‘parameters.’) It would have been obvious to one having ordinary skill in the art, having the teachings of Gibiansky and Fleisman before him before the effective filing date of the claimed invention, to modify Gibiansky to incorporate the ability to add additional parameters (dimensions) to a model of Fleisman. Given the advantage of the model being able to handle additional local tasks, one having ordinary skill in the art would have been motivated to make this obvious modification. 
Gibiansky and Fleisman do not disclose expressly a training module comprising a non-transitory computer-readable medium storing instructions that, when executed, cause the system to: use the input data to train the set of local versions of the machine learning model; and update the set of hyperparameters with the set of updates for the set of trained local versions of the machine learning model, wherein updating the set of hyperparameters improves the performance of the set of trained local versions of the machine learning model.
Markert discloses a training module comprising a non-transitory computer-readable medium storing instructions that, when executed, cause the system to (Markert, claim 12; A non-transitory, computer-readable data storage medium storing a computer program having program codes which, when executed on a computer,….): use the input data to train the set of local versions of the machine learning model; and update the set of hyperparameters with the set of updates for the set of trained local Markert, 0055-0060; ‘[0055] In step S1, a first data-based partial model f.sub.first.sub.--.sub.partial.sub.--.sub.model(x) is provided based on hyperparameters and nodes, which is formed completely or partially from a first initially provided training data record. [0056] In step S2, a second training data record (x.sub.i, y.sub.i), i=1, . . . N is furthermore provided, where x.sub.i represents the p-dimensional measuring points and y.sub.i represents the scalar output values. [0057] In step S3, the deviations [tilde over (y)].sub.i between the model predictions f.sub.first.sub.--.sub.partial.sub.--.sub.model(x.sub.i) (output values or function values) of the first data-based partial model and the measuring points y.sub.i at the observed measuring points of the second training data record are then ascertained:) [tilde over (y)].sub.i=y.sub.i-f.sub.first.sub.--.sub.partial.sub.--.sub.model(x.sub.- i) [0058] In step S4, an additional, second data-based model f.sub.second.sub.--.sub.partial.sub.--.sub.model(x) is trained on the obtained deviations) [tilde over (y)].sub.i, i.e., on the training data (x.sub.i, [tilde over (y)].sub.i), i=1, . . . , N. It is to be noted that the additional partial model is trained with the aid of an average value function which corresponds to a constant value of 0, thus following the zero function in the extrapolation range.’ of Markert. EC: Markert discloses partial models are based or associated with hyperparameters. Training data is supplied and Markert discloses a little detail of training with determining the ‘deviations’ between predictions and observed measuring point. Updating hyperparameters of applicant maps to the result of training the partial models. This updating uses the determined ‘deviations’, and training improve the performance. That is the function of ‘training.’ an additional training data record, which, for example, were associated with a certain local effect via a classification or clustering method, may be modeled in an additional data-based partial model.’  EC: Here ‘updating’ can be seen as using ‘additional training records’ and since a hyperparameter is associated with a clustering method.) It would have been obvious to one having ordinary skill in the art, having the teachings of Gibiansky, Fleisman and Markert before him before the effective filing date of the claimed invention, to modify Gibiansky and Fleisman to incorporate breaking down a model into smaller sub models for training of Markert. Given the advantage of focused training and lowering computational costs, one having ordinary skill in the art would have been motivated to make this obvious modification.


Claim(s) 1-2 and 8 is/are rejected under 35 U.S.C. 103 as being unpatentable over Gibiansky, Fleisman and Markert as applied to claim 19 above, and further in view of Yanase. (U. S. Patent Publication 20120016816, referred to as Yanase)

Claim 1
Gibiansky discloses a method, comprising: using input data for a set of local versions of a machine learning model for a set of entities (Gibiansky, 0096; Using input data for a set of local versions of a machine learning model for a set of entities   of applicants maps to ‘‘The data management unit 260 includes logic executable by the and output data (e.g. partial machine learning models generated across a plurality of selection and optimization servers 102, and/or processors thereof, and combined to create a global machine learning model).’ of Gibiansky.) to calculate a batch of performance metrics for the set of local versions (Gibiansky, 0118, 0043; To calculate a batch of performance metrics for the set of local versions of applicant maps to ‘The data may in fact reside in the data store 112, and simply be accessed by different selection and optimization servers 102, or chunks of the data may be stored directly on the different selection and optimization servers 102. In either arrangement, the selection and optimization servers 102 may load appropriate portions of their assigned data into memory and begin to form partial machine learning models independently of one another.’ and ‘FIG. 1 shows an implementation of a system 100 for selecting between different machine learning methods and optimizing the parameters that control their behavior. In the depicted implementation, the system 100 includes a selection and optimization server 102, a plurality of client devices 114a . . . 114n, a production server 108, a data collector 110 and associated data store 112.’ of Gibiansky.), wherein the machine learning model includes a set of hyperparameters; (Gibiansky, 0047; Wherein the machine learning model includes a set of (e.g. GBM or SVM), hyperparameter settings (e.g. SVM's regularization term) and parameter settings (e.g. SVM's alpha coefficients on each data point) and the system and method herein can determine any of these values which define a model.’ of Gibiansky.) applying an optimization technique to the batch of performance metrics to produce a set of updates to the set of hyperparameters.  (Gibiansky, 0093; Applying an optimization technique to the batch of performance metrics to produce a set of updates to the set of hyperparameters of applicant maps to ‘For example, the machine learning method unit 230 supports SVM and GBM, and in one implementation, implements a set of SVM parameters received from the parameter optimization unit 240 by scoring tuning data (e.g. label email as either spam or not spam) using SVM and the received SVM parameters.’ of Gibiansky. EC: The Examiner’s position of a learning method is an algorithm which has the ability to alter parameters to fit (learn) the current data set. For example a SVM is an algorithm which generates a hyperplane in n-dimensional space such that the binary division of the data points.  SEE ‘IntroSVM’ fig 2.1 and p10 for basic SVM introduction.) 
Gibiansky does not disclose expressly when a new entity is added to the set of entities, updating the updated set of hyperparameters by adding a new dimension for the new entity. 
Fleisman discloses when a new entity is added to the set of entities, updating the updated set of hyperparameters by adding a new dimension for the new entity. (Fleisman, 0033; Furthermore, it may enable the enforcement of additional constraints in the ICP formulation, such as kinematic physical constraints, repulsive points that weighting parameters, or the like. The techniques discussed herein may be characterized as iterative closest point inverse kinematics (ICPIK) techniques or the like. For example, as discussed herein, an Iterative Closest Point (ICP) technique may find a transformation that aligns two point clouds or the like. At each iteration, the process may update the correspondence between the source and target point clouds, and determine the transformation that best aligns them until convergence is attained. EC; ‘New dimension’ of applicant maps to ‘additional constraints.’ Hyperparameters’ maps to ‘parameters.’) It would have been obvious to one having ordinary skill in the art, having the teachings of Gibiansky and Fleisman before him before the effective filing date of the claimed invention, to modify Gibiansky to incorporate the ability to add additional parameters (dimensions) to a model of Fleisman. Given the advantage of the model being able to handle additional local tasks, one having ordinary skill in the art would have been motivated to make this obvious modification. 
Gibiansky and Fleisman do not disclose expressly updating the set of hyperparameters with the set of updates for the set of local versions of the machine learning model, wherein updating the set of hyperparameters improves the performance of the set of local versions of the machine learning model.
Markert discloses updating the set of hyperparameters with the set of updates for the set of local versions of the machine learning model, wherein updating the set of hyperparameters improves the performance of the set of local versions of the machine learning model. (Markert, 0055-0060; Updating the set of hyperparameters with the set of updates for the set of local versions of the machine learning model, wherein updating partial model f.sub.first.sub.--.sub.partial.sub.--.sub.model(x) is provided based on hyperparameters and nodes, which is formed completely or partially from a first initially provided training data record. [0056] In step S2, a second training data record (x.sub.i, y.sub.i), i=1, . . . N is furthermore provided, where x.sub.i represents the p-dimensional measuring points and y.sub.i represents the scalar output values. [0057] In step S3, the deviations [tilde over (y)].sub.i between the model predictions f.sub.first.sub.--.sub.partial.sub.--.sub.model(x.sub.i) (output values or function values) of the first data-based partial model and the measuring points y.sub.i at the observed measuring points of the second training data record are then ascertained:) [tilde over (y)].sub.i=y.sub.i-f.sub.first.sub.--.sub.partial.sub.--.sub.model(x.sub.- i) [0058] In step S4, an additional, second data-based model f.sub.second.sub.--.sub.partial.sub.--.sub.model(x) is trained on the obtained deviations) [tilde over (y)].sub.i, i.e., on the training data (x.sub.i, [tilde over (y)].sub.i), i=1, . . . , N. It is to be noted that the additional partial model is trained with the aid of an average value function which corresponds to a constant value of 0, thus following the zero function in the extrapolation range.’ of Markert. EC: Markert discloses partial models are based or associated with hyperparameters. Training data is supplied and Markert discloses a little detail of training with determining the ‘deviations’ between predictions and observed measuring point. Updating hyperparameters of applicant maps to the result of training the partial models. This updating uses the determined ‘deviations’, and training improve the performance. That is the function of ‘training.’ [0060] ‘This concept may be generalized in a simple manner by allowing any number of an additional training data record, which, for example, were associated with a certain local effect via a classification or clustering method, may be modeled in an additional data-based partial model.’  EC: Here ‘updating’ can be seen as using ‘additional training records’ and since a hyperparameter is associated with a clustering method.) It would have been obvious to one having ordinary skill in the art, having the teachings of Gibiansky, Fleisman and Markert before him before the effective filing date of the claimed invention, to modify Gibiansky and Fleisman to incorporate breaking down a model into smaller sub models for training of Markert. Given the advantage of focused training and lowering computational costs, one having ordinary skill in the art would have been motivated to make this obvious modification.
Gibiansky, Fleisman and Markert do not disclose expressly wherein updating the updated set of hyperparameters with a new dimension for the new entity comprises at least one of: updating the updated set of hyperparameters with a default value for the new dimension: or updating the new dimension with a random value.
Yanase discloses wherein updating the updated set of hyperparameters with a new dimension for the new entity comprises at least one of: updating the updated set of hyperparameters with a default value for the new dimension: or updating the new dimension with a random value. (Yanase, 0077; Then, the model updater 240 initializes k centroid vectors C(i) at random. The model updater 240 sends the centroid vector C(i) to the respective data processors 210.) It would have been obvious to one having ordinary skill in the art, having the teachings of Gibiansky, Fleisman, Markert and Yanase before him before the effective filing date of the claimed invention, to modify 

Claim 2
Gibiansky and Fleisman do not disclose expressly using input data to train the set of local versions of the machine learning model.
Markert discloses using input data to train the set of local versions of the machine learning model. (Markert, 0055-0058; Updating the set of hyperparameters with the set of updates for the set of local versions of the machine learning model, wherein updating the set of hyperparameters improves the performance of the set of local versions of the machine learning model maps to…‘[0055] In step S1, a first data-based partial model f.sub.first.sub.--.sub.partial.sub.--.sub.model(x) is provided based on hyperparameters and nodes, which is formed completely or partially from a first initially provided training data record. [0056] In step S2, a second training data record (x.sub.i, y.sub.i), i=1, . . . N  is furthermore provided, where x.sub.i represents the p-dimensional measuring points and y.sub.i represents the scalar output values.’ of Markert. EC: Markert discloses partial models are based or associated with hyperparameters. Training data is supplied and Markert discloses a little detail of training with determining the ‘deviations’ between predictions and observed measuring point. Updating hyperparameters of applicant maps to the result of training the partial models. This updating uses the determined ‘deviations’, and training improve the .


Claim(s) 3-5, 9, 11-15 and 18 is/are rejected under 35 U.S.C. 103 as being unpatentable over Gibiansky, Fleisman, Markert and Yanase as applied to claim 1, 2 and 8 above, and further in view of Eberhart. (‘A New Optimizer Using Particle Swarm Theory’, referred to as Eberhart)

Claim 3
Gibiansky, Fleisman, Markert and Yanase do not disclose expressly wherein using the input data to train the set of local versions of the machine learning model comprises: when a pre-specified amount of the input data is received for an entity in the set of entities, using the pre-specified amount to produce an update to a version of the machine learning model for the entity.
Eberhart discloses wherein using the input data to train the set of local versions of the machine learning model comprises: when a pre-specified amount of the input data is received for an entity in the set of entities, using the pre-specified amount to Eberhart, p39-p40; When a pre-specified amount of the input data is received for an entity in the set of entities, using the pre-specified amount to produce an update to a version of the statistical model for the entity of applicant maps to ‘For neural networks, it seems reasonable to initialize all positional coordinates (corresponding to connection weights) to within a range of (-4.0, +4.0), and velocities should not be so high as to fly particles out of the usable field. It is also necessary to clamp velocities to some maximum to prevent overflow. The test examples use a population of 20 particles. (The authors have used populations of 10-50 particles for other applications).’ and ‘ Particle swarm opum1zalion has also been demonstrated to perform well on genetic algorithm test functions, and it appears to be a promising approach for robot task learning.’ of Eberhart.) It would have been obvious to one having ordinary skill in the art, having the teachings of Gibiansky, Fleisman, Markert , Yanase and Eberhart before him before the effective filing date of the claimed invention, to modify Gibiansky, Fleisman, Markert and Yanase to incorporate being able to add another parameter (dimension) of Eberhart. Given the advantage of adding another parameter would aid in accuracy, one having ordinary skill in the art would have been motivated to make this obvious modification. 

Claim 4
Gibiansky, Fleisman, Markert and Yanase do not disclose expressly wherein using the input data to train the set of local versions of the machine learning model merging, into a global version of the machine learning, the update and other updates to 
Eberhart discloses wherein using the input data to train the set of local versions of the machine learning model merging, into a global version of the machine learning, the update and other updates to other versions of the machine learning model for other entities in the set of entities asynchronously from generating the update and the other updates. (Eberhart, p39-40; Merging, into a global version of the machine learning, the update and other updates to other versions of the machine learning model for other entities in the set of entities asynchronously from generating the update and the other updates of applicant maps to ‘The particle swarm optimization concept consists of, at each time step, changing the velocity (accelerating) each particle toward its pbest and gbest (global version).’ and ‘4. Compare evaluation with group s previous best (PBEST[GBEST]): If current value< PBEST[GBEST]then GBEST=particle's array index,…’ of Eberhart. EC:  Here the global version is gbest and ‘other version’ is pbest.) It would have been obvious to one having ordinary skill in the art, having the teachings of Gibiansky, Fleisman, Markert , Yanase and Eberhart before him before the effective filing date of the claimed invention, to modify Gibiansky, Fleisman, Markert and Yanase to incorporate being able to add another parameter (dimension) of Eberhart. Given the advantage of adding another parameter would aid in accuracy, one having ordinary skill in the art would have been motivated to make this obvious modification. 

Claim 5

Eberhart discloses wherein using the input data to calculate the batch of performance metrics associated with the of local versions of the statistical machine learning comprises: using a batch of outputs generated from a subset of the input data for a local version of the machine learning model and a set of labels associated with the subset of input data to calculate a performance metric for the local version (Eberhart, p40; Using a set of outputs generated from a subset of the input data for a local version of the machine learning model and a set of labels associated with the subset of input data to calculate a performance metric for the local version of applicant maps to ‘6. Move to PresentX[ ][d] + v[ ][d]: Loop to step 2 and repeat until a criterion is met.’ of Eberhart.  EC: Performance metric maps to the calculation of the criterion.’ And ‘This paper' introduces a "local" version of the optimizer in which, in addition to pbest, each particle keeps track of the best solution, called lbest, attained with a local topological neighborhood of particles.’ of Eberhart.  EC: Performance metric maps to the calculation of the criterion.); and discounting contributions of the outputs to the performance metric based on a set of ages associated with the subset of the input data. (Eberhart, p41; Median iterations required to meet a criterion or squared error per node < 0. 02. Population=20 particles . There were no trials with iterations> 2000.) It would have been obvious to one having ordinary skill in the art, having the teachings of Gibiansky, Fleisman, Markert , Yanase and Eberhart before him before the effective filing date of the claimed invention, to modify Gibiansky, Fleisman, Markert and Yanase to incorporate being able to add another parameter (dimension) of Eberhart. Given the advantage of adding another parameter would aid in accuracy, one having ordinary skill in the art would have been motivated to make this obvious modification. 

Claim 9
Gibiansky, Fleisman, Markert and Yanase do not disclose expressly wherein the set of hyperparameters comprises at least one of: a regularization parameter; a clustering parameter; a convergence parameter; a feature complexity; a model training parameter; a model selection parameter; a decay parameter; a threshold; or a hyper-hyperparameter.
Eberhart discloses wherein the set of hyperparameters comprises at least one of: a regularization parameter; a clustering parameter; a convergence parameter; a feature complexity; a model training parameter; a model selection parameter; a decay parameter; a threshold; or a hyper-hyperparameter. (Eberhart, p40; Wherein the set of hyperparameters comprises at least one of: a regularization parameter; a clustering parameter; a convergence parameter; a feature complexity; a model training parameter; a model selection parameter; a decay parameter; a threshold; and a hyper-until a criterion is met.’ of Eberhart.  EC: Performance metric maps to the calculation of the criterion. EC: The convergence parameter is that which is used by the ‘criterion’ decision engine. ) It would have been obvious to one having ordinary skill in the art, having the teachings of Gibiansky, Fleisman, Markert , Yanase and Eberhart before him before the effective filing date of the claimed invention, to modify Gibiansky, Fleisman, Markert and Yanase to incorporate being able to add another parameter (dimension) of Eberhart. Given the advantage of adding another parameter would aid in accuracy, one having ordinary skill in the art would have been motivated to make this obvious modification. 

Claim 11
Gibiansky discloses an apparatus, comprising: one or more processors; and memory storing instructions that, when executed by the one or more processors, cause the apparatus to (Gibiansky, 0010; One or more processors; and memory storing instructions that, when executed by the one or more processors, cause the apparatus to of applicant maps to ‘According to one innovative aspect of the subject matter described in this disclosure, a system comprises: one or more processors; and a memory storing instructions that, when executed by the one or more processors, cause the system to:…’ of Gibiansky): use input data for a set of local versions of a machine learning model for a set of entities (Gibiansky, 0096; Use input data for a set of local versions of a machine learning model for a set of entities of applicants maps to ‘‘The data management unit 260 includes logic executable by the processor 202 to manage the and output data (e.g. partial machine learning models generated across a plurality of selection and optimization servers 102, and/or processors thereof, and combined to create a global machine learning model).’ of Gibiansky.) to calculate a batch of performance metrics for the set of local versions (Gibiansky, 0118, 0043; To calculate a batch of performance metrics for the set of local versions of applicant maps to ‘The data may in fact reside in the data store 112, and simply be accessed by different selection and optimization servers 102, or chunks of the data may be stored directly on the different selection and optimization servers 102. In either arrangement, the selection and optimization servers 102 may load appropriate portions of their assigned data into memory and begin to form partial machine learning models independently of one another.’ and ‘FIG. 1 shows an implementation of a system 100 for selecting between different machine learning methods and optimizing the parameters that control their behavior. In the depicted implementation, the system 100 includes a selection and optimization server 102, a plurality of client devices 114a . . . 114n, a production server 108, a data collector 110 and associated data store 112.’ of Gibiansky.), wherein the machine learning model includes a set of hyperparameters (Gibiansky, 0047; Wherein the machine learning model includes a set of (e.g. GBM or SVM), hyperparameter settings (e.g. SVM's regularization term) and parameter settings (e.g. SVM's alpha coefficients on each data point) and the system and method herein can determine any of these values which define a model.’ of Gibiansky.);…. apply an optimization technique to the batch of performance metrics to produce a set of updates to the set of hyperparameters. (Gibiansky, 0093; Apply an optimization technique to the batch of performance metrics to produce a set of updates to the set of hyperparameters of applicant maps to ‘For example, the machine learning method unit 230 supports SVM and GBM, and in one implementation, implements a set of SVM parameters received from the parameter optimization unit 240 by scoring tuning data (e.g. label email as either spam or not spam) using SVM and the received SVM parameters.’ of Gibiansky. EC: The Examiner’s position of a learning method is an algorithm which has the ability to alter parameters to fit (learn) the current data set. For example a SVM is an algorithm which generates a hyperplane in n-dimensional space such that the binary division of the data points.  SEE ‘IntroSVM’ fig 2.1 and p10 for basic SVM introduction.)
Gibiansky does not disclose expressly when a new entity is added to the set of entities, update the updated set of hyperparameters by adding a new dimension for the new entity. 
Fleisman discloses when a new entity is added to the set of entities, update the updated set of hyperparameters by adding a new dimension for the new entity. (Fleisman, 0033; Furthermore, it may enable the enforcement of additional constraints in the ICP formulation, such as kinematic physical constraints, repulsive points that weighting parameters, or the like. The techniques discussed herein may be characterized as iterative closest point inverse kinematics (ICPIK) techniques or the like. For example, as discussed herein, an Iterative Closest Point (ICP) technique may find a transformation that aligns two point clouds or the like. At each iteration, the process may update the correspondence between the source and target point clouds, and determine the transformation that best aligns them until convergence is attained. EC; ‘New dimension’ of applicant maps to ‘additional constraints.’ Hyperparameters’ maps to ‘parameters.’) It would have been obvious to one having ordinary skill in the art, having the teachings of Gibiansky and Fleisman before him before the effective filing date of the claimed invention, to modify Gibiansky to incorporate the ability to add additional parameters (dimensions) to a model of Fleisman. Given the advantage of the model being able to handle additional local tasks, one having ordinary skill in the art would have been motivated to make this obvious modification. 
Gibiansky and Fleisman do not disclose expressly update the set of hyperparameters with the set of updates for the set of local versions of the machine learning model, wherein updating the set of hyperparameters improves the performance of the set of local versions of the machine learning model.
Markert discloses update the set of hyperparameters with the set of updates for the set of local versions of the machine learning model, wherein updating the set of hyperparameters improves the performance of the set of local versions of the machine learning model. (Markert, 0055-0060; update the set of hyperparameters with the set of updates for the set of local versions of the machine learning model, wherein updating partial model f.sub.first.sub.--.sub.partial.sub.--.sub.model(x) is provided based on hyperparameters and nodes, which is formed completely or partially from a first initially provided training data record. [0056] In step S2, a second training data record (x.sub.i, y.sub.i), i=1, . . . N is furthermore provided, where x.sub.i represents the p-dimensional measuring points and y.sub.i represents the scalar output values. [0057] In step S3, the deviations [tilde over (y)].sub.i between the model predictions f.sub.first.sub.--.sub.partial.sub.--.sub.model(x.sub.i) (output values or function values) of the first data-based partial model and the measuring points y.sub.i at the observed measuring points of the second training data record are then ascertained:) [tilde over (y)].sub.i=y.sub.i-f.sub.first.sub.--.sub.partial.sub.--.sub.model(x.sub.- i) [0058] In step S4, an additional, second data-based model f.sub.second.sub.--.sub.partial.sub.--.sub.model(x) is trained on the obtained deviations) [tilde over (y)].sub.i, i.e., on the training data (x.sub.i, [tilde over (y)].sub.i), i=1, . . . , N. It is to be noted that the additional partial model is trained with the aid of an average value function which corresponds to a constant value of 0, thus following the zero function in the extrapolation range.’ of Markert. EC: Markert discloses partial models are based or associated with hyperparameters. Training data is supplied and Markert discloses a little detail of training with determining the ‘deviations’ between predictions and observed measuring point. Updating hyperparameters of applicant maps to the result of training the partial models. This updating uses the determined ‘deviations’, and training improve the performance. That is the function of ‘training.’ [0060] ‘This concept may be generalized in a simple manner by allowing any an additional training data record, which, for example, were associated with a certain local effect via a classification or clustering method, may be modeled in an additional data-based partial model.’  EC: Here ‘updating’ can be seen as using ‘additional training records’ and since a hyperparameter is associated with a clustering method.) It would have been obvious to one having ordinary skill in the art, having the teachings of Gibiansky, Fleisman and Markert before him before the effective filing date of the claimed invention, to modify Gibiansky and Fleisman to incorporate breaking down a model into smaller sub models for training of Markert. Given the advantage of focused training and lowering computational costs, one having ordinary skill in the art would have been motivated to make this obvious modification.
Gibiansky, Fleisman, Markert and Yanase do not disclose expressly wherein using the input data to calculate the batch of performance metrics comprises: using a set of outputs generated from a subset of the input data for a local version of the machine learning model and a set of labels associated with the subset of input data to calculate a performance metric for the local version: and discounting contributions of the outputs to the performance metric based on a set of ages associated with the subset of the input data.
Eberhart discloses wherein using the input data to calculate the batch of performance metrics comprises: using a set of outputs generated from a subset of the input data for a local version of the machine learning model and a set of labels associated with the subset of input data to calculate a performance metric for the local version (Eberhart, p40; ‘This paper introduces a "local" version of the optimizer in Eberhart, p41; Table 1. Local version , neighborhood= 2 . Median iterations required to meet a criterion or squared error per node < 0. 02. Population=20 particles . There were no trials with iterations> 2000.) It would have been obvious to one having ordinary skill in the art, having the teachings of Gibiansky, Fleisman, Markert , Yanase and Eberhart before him before the effective filing date of the claimed invention, to modify Gibiansky, Fleisman, Markert and Yanase to incorporate being able to add another parameter (dimension) of Eberhart. Given the advantage of adding another parameter would aid in accuracy, one having ordinary skill in the art would have been motivated to make this obvious modification. 

Claim 12
Gibiansky and Fleisman do not disclose expressly wherein the memory further stores instructions that, when executed by the one or more processors, cause the apparatus to: use the input data to train the set of local versions of the machine learning model.
Markert discloses wherein the memory further stores instructions that, when executed by the one or more processors, cause the apparatus to: use the input data to train the set of local versions of the machine learning model. (Markert, 0055-0058; Updating the set of hyperparameters with the set of updates for the set of local versions of the machine learning model, wherein updating the set of hyperparameters improves partial model f.sub.first.sub.--.sub.partial.sub.--.sub.model(x) is provided based on hyperparameters and nodes, which is formed completely or partially from a first initially provided training data record. [0056] In step S2, a second training data record (x.sub.i, y.sub.i), i=1, . . . N is furthermore provided, where x.sub.i represents the p-dimensional measuring points and y.sub.i represents the scalar output values.’ of Markert. EC: Markert discloses partial models are based or associated with hyperparameters. Training data is supplied and Markert discloses a little detail of training with determining the ‘deviations’ between predictions and observed measuring point. Updating hyperparameters of applicant maps to the result of training the partial models. This updating uses the determined ‘deviations’, and training improve the performance. That is the function of ‘training.’) It would have been obvious to one having ordinary skill in the art, having the teachings of Gibiansky, Fleisman and Markert before him before the effective filing date of the claimed invention, to modify Gibiansky and Fleisman to incorporate breaking down a model into smaller sub models for training of Markert. Given the advantage of focused training and lowering computational costs, one having ordinary skill in the art would have been motivated to make this obvious modification.

Claim 13
Gibiansky, Fleisman, Markert and Yanase do not disclose expressly wherein using the input data to train the set of local versions of the machine learning model comprises: when a pre-specified amount of the input data is received for an entity in the 
Eberhart discloses wherein using the input data to train the set of local versions of the machine learning model comprises: when a pre-specified amount of the input data is received for an entity in the set of entities, using the pre-specified amount to produce an update to a local version of the machine learning model for the entity. (Eberhart, p39-p40; When a pre-specified amount of the input data is received for an entity in the set of entities, using the pre-specified amount to produce an update to a local version of the machine learning model for the entity of applicant maps to ‘For neural networks, it seems reasonable to initialize all positional coordinates (corresponding to connection weights) to within a range of (-4.0, +4.0), and velocities should not be so high as to fly particles out of the usable field. It is also necessary to clamp velocities to some maximum to prevent overflow. The test examples use a population of 20 particles. (The authors have used populations of 10-50 particles for other applications).’ and ‘ Particle swarm opum1zalion has also been demonstrated to perform well on genetic algorithm test functions, and it appears to be a promising approach for robot task learning.’ of Eberhart. EC:Updating local versions has been addressed above.) It would have been obvious to one having ordinary skill in the art, having the teachings of Gibiansky, Fleisman, Markert , Yanase and Eberhart before him before the effective filing date of the claimed invention, to modify Gibiansky, Fleisman, Markert and Yanase to incorporate being able to add another parameter (dimension) of Eberhart. Given the advantage of adding another parameter would aid in accuracy, one 

Claim 14
Gibiansky, Fleisman, Markert and Yanase do not disclose expressly wherein using the input data to train the set of local versions of the machine learning model further comprises: merging, into a global version of the machine learning, the update and other updates to other versions of the machine learning model for other entities in the set of entities asynchronously from generating the update and the other updates.
Eberhart discloses wherein using the input data to train the set of local versions of the machine learning model further comprises: merging, into a global version of the machine learning, the update and other updates to other versions of the machine learning model for other entities in the set of entities asynchronously from generating the update and the other updates. (Eberhart, p39-40; Merging, into a global version of the machine learning, the update and other updates to other versions of the machine learning model for other entities in the set of entities asynchronously from generating the update and the other updates of applicant maps to ‘The particle swarm optimization concept consists of, at each time step, changing the velocity (accelerating) each particle toward its pbest and gbest (global version).’ and ‘4. Compare evaluation with groups previous best (PBEST[GBEST]): If current value< PBEST[GBEST] then GBEST=particle's array index,…’ of Eberhart. EC:  Here the global version is gbest and ‘other version’ is pbest.) It would have been obvious to one having ordinary skill in the art, having the teachings of Gibiansky, Fleisman, Markert , Yanase and Eberhart before 

Claim 15
Gibiansky, Fleisman, Markert and Yanase do not disclose expressly wherein using the input data to calculate the batch of performance metrics associated with the first batch of local versions of the machine learning model comprises: using a set of outputs generated from a subset of the input data for a local version of the machine learning and a set of labels associated with the subset of input data to calculate a performance metric for the local version; and discounting contributions of the outputs to the performance metric based on a set of ages associated with the subset of the input data.
Eberhart discloses wherein using the input data to calculate the batch of performance metrics associated with the first batch of local versions of the machine learning model comprises: using a set of outputs generated from a subset of the input data for a local version of the machine learning and a set of labels associated with the subset of input data to calculate a performance metric for the local version (Eberhart, p40; Using a set of outputs generated from a subset of the input data for a local version of the machine learning and a set of labels associated with the subset of input data to calculate a performance metric for the local version of applicant maps to ‘6. Move to until a criterion is met.’ of Eberhart.  EC: Performance metric maps to the calculation of the criterion.’ And ‘This paper introduces a "local" version of the optimizer in which, in addition to pbest, each particle keeps track of the best solution, called lbest, attained with a local topological neighborhood of particles.’ of Eberhart. EC: Performance metric maps to the calculation of the criterion.); and discounting contributions of the outputs to the performance metric based on a set of ages associated with the subset of the input data. (Eberhart, p41; Table 1. Local version , neighborhood= 2 . Median iterations required to meet a criterion or squared error per node < 0. 02. Population=20 particles . There were no trials with iterations> 2000.) It would have been obvious to one having ordinary skill in the art, having the teachings of Gibiansky, Fleisman, Markert , Yanase and Eberhart before him before the effective filing date of the claimed invention, to modify Gibiansky, Fleisman, Markert and Yanase to incorporate being able to add another parameter (dimension) of Eberhart. Given the advantage of adding another parameter would aid in accuracy, one having ordinary skill in the art would have been motivated to make this obvious modification. 


Claim(s) 6-7, 16-17 and 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Gibiansky, Fleisman, Markert, Yanase and Eberhart as applied to claim 3-5, 9, 11-15 and 18 above, and further in view of Fucai. (‘Particle Swarm Optimization for Correlative Product Combinatorial Introduction Model’, referred to as Fucai)

Claim 6
Gibiansky, Fleisman, Markert and Yanase do not disclose expressly wherein applying the optimization technique to the batch of performance metrics to produce the set of updates to the set of hyperparameters for the machine learning model comprises: using a batch of particles to explore a search space for the hyperparameters.
Eberhart discloses wherein applying the optimization technique to the batch of performance metrics to produce the set of updates to the set of hyperparameters for the machine learning model comprises: using a batch of particles to explore a search space for the hyperparameters. (Eberhart, p39; Using a set of particles to explore a search space for the hyperparameters of applicant maps to ‘Particle swarm optimization is similar to a genetic algorithm [2] it1 that the system is initialized with a population of random solutions. It is unlike a genetic algorithm, however, in that each potential solution is also assigned a randomized velocity, and the potential solutions, called particles. are then "flown" through hyperspace.’ of Eberhart.) It would have been obvious to one having ordinary skill in the art, having the teachings of Gibiansky, Fleisman, Markert , Yanase and Eberhart before him before the effective filing date of the claimed invention, to modify Gibiansky, Fleisman, Markert and Yanase to incorporate being able to add another parameter (dimension) of Eberhart. Given the advantage of adding another parameter would aid in accuracy, one having ordinary skill in the art would have been motivated to make this obvious modification. 
Gibiansky, Fleisman, Markert, Yanase and Eberhart do not disclose expressly using the calculated performance metrics to update a set of average performance 
Fucai discloses using the calculated performance metrics to update a set of average performance metrics for a set of positions of the particles (Fucai, p5975; Using the calculated performance metrics to update a set of average performance metrics for a set of positions of the particles of applicant maps to ‘Where the i-th Particle is a D dimensional vector…Xi = { xi1, xi2, … xiD}, i=1, 2, …m, that is, Xi is the position of the ith Particle.’ of Fucai.); identifying, from the set of positions, a particle position with a highest average performance metric (Fucai, p5975; Identifying, from the set of positions, a particle position with a highest average performance metric of applicant maps to ‘One can calculate the evaluation of objective function with Xi. The motion speed of the i-th Particle can be also expressed as a D dimensional vector… Vi = {vi1, vi2, …viD}. Let Pi = {pi1, pi2, …piD}  represents the optimal position of i-th Particle, and Pg = {pg1, pg2, …pgD}  be the optimal position (the position giving the best fitness value) of whole Particle Swarm up to now.’ of Fucai.); updating the hyperparameters with values represented by the particle position (Fucai, p5975; Updating the hyperparameters with values represented by the particle position of applicant maps to ‘At each iteration the individual's motion speed and position in hyperspace is changed by… vid = vid + c1r1(pid – xid) + c2r2(pgd – xid) of Fucai.) using the average performance metrics to update positions and velocities of the particles in the search space. (Fucai, p5975; Using the id = xid + vid’ of Fucai.) It would have been obvious to one having ordinary skill in the art, having the teachings of Gibiansky, Fleisman, Markert , Yanase, Eberhart and Fucai before him before the effective filing date of the claimed invention, to modify Gibiansky, Fleisman, Markert, Yanase and Eberhart to incorporate finer detail of a particle swarm optimization algorithm of Fucai. Given the advantage of covering the finer detail within the claimed elements, one having ordinary skill in the art would have been motivated to make this obvious modification. 

Claim 7
Gibiansky, Fleisman, Markert, Yanase and Eberhart do not disclose expressly wherein using the average performance metrics to update the positions and the velocities of the particles in the search space comprises: removing a first subset of the positions with average performance metrics that fall below a threshold; replacing the first subset with the position of the particle; and for each particle in the set of particles, deflecting the particle toward a global best position for the set of particles and a historic best position for the particle.
Fucai discloses wherein using the average performance metrics to update the positions and the velocities of the particles in the search space comprises: removing a first subset of the positions with average performance metrics that fall below a threshold; replacing the first subset with the position of the particle; and for each particle in the set of particles, deflecting the particle toward a global best position for the set of particles and a historic best position for the particle. (Fucai, p5975; Removing a first id in …‘xid = xid + vid’ and the updating of vid in …vid = vid + c1r1(pid – xid) + c2r2(pgd – xid) of Fucai.) It would have been obvious to one having ordinary skill in the art, having the teachings of Gibiansky, Fleisman, Markert , Yanase, Eberhart and Fucai before him before the effective filing date of the claimed invention, to modify Gibiansky, Fleisman, Markert, Yanase and Eberhart to incorporate finer detail of a particle swarm optimization algorithm of Fucai. Given the advantage of covering the finer detail within the claimed elements, one having ordinary skill in the art would have been motivated to make this obvious modification. 

Claim 16
Gibiansky, Fleisman, Markert and Yanase do not disclose expressly wherein applying the optimization technique to the batch of performance metrics to produce the updates to the batch of hyperparameters for the machine learning model comprises: using a set of particles to explore a search space for the hyperparameters.
Eberhart discloses wherein applying the optimization technique to the batch of performance metrics to produce the updates to the batch of hyperparameters for the machine learning model comprises: using a set of particles to explore a search space Eberhart, p39; Using a set of particles to explore a search space for the hyperparameters of applicant maps to ‘Particle swarm optimization is similar to a genetic algorithm [2] it1 that the system is initialized with a population of random solutions. It is unlike a genetic algorithm, however, in that each potential solution is also assigned a randomized velocity, and the potential solutions, called particles. are then "flown" through hyperspace.’ of Eberhart.) It would have been obvious to one having ordinary skill in the art, having the teachings of Gibiansky, Fleisman, Markert , Yanase and Eberhart before him before the effective filing date of the claimed invention, to modify Gibiansky, Fleisman, Markert and Yanase to incorporate being able to add another parameter (dimension) of Eberhart. Given the advantage of adding another parameter would aid in accuracy, one having ordinary skill in the art would have been motivated to make this obvious modification. 
Gibiansky, Fleisman, Markert, Yanase and Eberhart do not disclose expressly using the calculated performance metrics to update a set of average performance metrics for a set of positions of the particles; identifying, from the set of positions, a particle position with a highest average performance metric; updating the hyperparameters with values represented by the particle position; and using the average performance metrics to update positions and velocities of the particles in the search space.
Fucai discloses using the calculated performance metrics to update a set of average performance metrics for a set of positions of the particles (Fucai, p5975; Using the calculated performance metrics to update a set of average performance metrics for a set of positions of the particles of applicant maps to ‘Where the i-th Particle is a D i = { xi1, xi2, … xiD}, i=1, 2, …m, that is, Xi is the position of the ith Particle.’ of Fucai.); identifying, from the set of positions, a particle position with a highest average performance metric (Fucai, p5975; Identifying, from the set of positions, a particle position with a highest average performance metric of applicant maps to ‘One can calculate the evaluation of objective function with Xi. The motion speed of the i-th Particle can be also expressed as a D dimensional vector… Vi = {vi1, vi2, …viD}. Let Pi = {pi1, pi2, …piD}  represents the optimal position of i-th Particle, and Pg = {pg1, pg2, …pgD}  be the optimal position (the position giving the best fitness value) of whole Particle Swarm up to now.’ of Fucai.); updating the hyperparameters with values represented by the particle position (Fucai, p5975; Updating the hyperparameters with values represented by the particle position of applicant maps to ‘At each iteration the individual's motion speed and position in hyperspace is changed by… vid = vid + c1r1(pid – xid) + c2r2(pgd – xid) of Fucai.); and using the average performance metrics to update positions and velocities of the particles in the search space. (Fucai, p5975; Using the average performance metrics to update positions and velocities of the particles in the search space of applicant maps to… ‘xid = xid + vid’ of Fucai.) It would have been obvious to one having ordinary skill in the art, having the teachings of Gibiansky, Fleisman, Markert , Yanase, Eberhart and Fucai before him before the effective filing date of the claimed invention, to modify Gibiansky, Fleisman, Markert, Yanase and Eberhart to incorporate finer detail of a particle swarm optimization algorithm of Fucai. Given the advantage of covering the finer detail within the claimed elements, one having ordinary skill in the art would have been motivated to make this obvious modification. 


Gibiansky, Fleisman, Markert, Yanase and Eberhart do not disclose expressly wherein using the average performance metrics to update the positions and the velocities of the particles in the search space comprises: removing a first subset of the positions with average performance metrics that fall below a threshold; replacing the first subset with the position of the particle; and for each particle in the set of particles, deflecting the particle toward a global best position for the set of particles and a historic best position for the particle.
Fucai discloses wherein using the average performance metrics to update the positions and the velocities of the particles in the search space comprises: removing a first subset of the positions with average performance metrics that fall below a threshold; replacing the first subset with the position of the particle; and for each particle in the set of particles, deflecting the particle toward a global best position for the set of particles and a historic best position for the particle. (Fucai, p5975; Removing a first subset of the positions with average performance metrics that fall below a threshold; replacing the first subset with the position of the particle; and for each particle in the set of particles, deflecting the particle toward a global best position for the set of particles and a historic best position for the particle of applicant maps to replacing the first subset with the position of the particle; and for each particle in the set of particles, deflecting the particle toward a global best position for the set of particles and a historic best position for the particle of applicant maps to the updating of xid in …
 ‘xid = xid + vid’ and the updating of vid in … vid = vid + c1r1(pid – xid) + c2r2(pgd – xid) of Fucai.) It would have been obvious to one having ordinary skill in the art, having the 

Claim 20
Gibiansky, Fleisman, Markert and Yanase do not disclose expressly wherein applying the optimization technique to the batch of performance metrics to produce the set of updates to the set of hyperparameters for the machine learning model comprises: using a set of particles to explore a search space for the hyperparameters.
Eberhart discloses wherein applying the optimization technique to the batch of performance metrics to produce the set of updates to the set of hyperparameters for the machine learning model comprises: using a set of particles to explore a search space for the hyperparameters. (Eberhart, p39; Using a set of particles to explore a search space for the hyperparameters of applicant maps to ‘Particle swarm optimization is similar to a genetic algorithm [2] it1 that the system is initialized with a population of random solutions. It is unlike a genetic algorithm, however, in that each potential solution is also assigned a randomized velocity, and the potential solutions, called particles. are then "flown" through hyperspace.’ of Eberhart.) It would have been obvious to one having ordinary skill in the art, having the teachings of Gibiansky, Fleisman, Markert , Yanase and Eberhart before him before the effective filing date of 
Gibiansky, Fleisman, Markert, Yanase and Eberhart do not disclose expressly using the calculated performance metrics to update a set of average performance metrics for a set of positions of the particles; identifying, from the set of positions, a particle position with a highest average performance metric; updating the hyperparameters with values represented by the particle position; and using the average performance metrics to update positions and velocities of the particles in the search space.
Fucai discloses using the calculated performance metrics to update a set of average performance metrics for a set of positions of the particles (Fucai, p5975; Using the calculated performance metrics to update a set of average performance metrics for a set of positions of the particles of applicant maps to ‘Where the i-th Particle is a D dimensional vector…Xi = { xi1, xi2, … xiD}, i=1, 2, …m, that is, Xi is the position of the ith Particle.’ of Fucai.); identifying, from the set of positions, a particle position with a highest average performance metric (Fucai, p5975; Identifying, from the set of positions, a particle position with a highest average performance metric of applicant maps to ‘One can calculate the evaluation of objective function with Xi. The motion speed of the i-th Particle can be also expressed as a D dimensional vector…Vi = {vi1, vi2, …viD}. Let Pi = {pi1, pi2, …piD}  represents the optimal position of i-th Particle, and Pg = {pg1, pg2, …pgD}  be the optimal position (the position giving the best fitness value) of Fucai, p5975; Updating the hyperparameters with values represented by the particle position of applicant maps to ‘At each iteration the individual's motion speed and position in hyperspace is changed by… vid = vid + c1r1(pid – xid) + c2r2(pgd – xid) of Fucai.); and using the average performance metrics to update positions and velocities of the particles in the search space. (Fucai, p5975; Using the average performance metrics to update positions and velocities of the particles in the search space of applicant maps to…‘xid = xid + vid’ of Fucai.) It would have been obvious to one having ordinary skill in the art, having the teachings of Gibiansky, Fleisman, Markert , Yanase, Eberhart and Fucai before him before the effective filing date of the claimed invention, to modify Gibiansky, Fleisman, Markert, Yanase and Eberhart to incorporate finer detail of a particle swarm optimization algorithm of Fucai. Given the advantage of covering the finer detail within the claimed elements, one having ordinary skill in the art would have been motivated to make this obvious modification. 


Claim(s) 10 is/are rejected under 35 U.S.C. 103 as being unpatentable over Gibiansky, Fleisman, Markert, and Yanase as applied to claim 1-2 and 8 above, and further in view of Owechko. (U. S. Patent Publication 20050196047, referred to as Owechko)

Claim 10
at least one of: a user; an advertisement; or a recommendation.
Owechko discloses wherein the set of entities comprises at least one of: a user; an advertisement; or a recommendation. (Owechko, 0088; Wherein the set of entities comprises at least one of: a user; an advertisement; and a recommendation of applicant maps to ‘Additional dimensions can be used to represent other classifier parameters, non-limiting examples of which include internal classifier parameters, object rotation angle, and time.’ of Owechko.  EC: A neural network is a classifier and as such can be seen as a recommendation tool, job listing or content item.) It would have been obvious to one having ordinary skill in the art, having the teachings of Gibiansky, Fleisman, Markert , Yanase and Owechko before him before the effective filing date of the claimed invention, to modify Gibiansky, Fleisman, Markert and Yanase to incorporate being able to add additional dimensions of Owechko. Given the advantage of obtaining a better and refined solution to a problem, one having ordinary skill in the art would have been motivated to make this obvious modification. 


Response to Arguments
4.	Applicant’s arguments filed on 10/31/2020 for claims 1-20 have been fully considered but are not persuasive.

5.	Applicant’s argument



In step SI, a first data-based partial model f'firstpan mi modei(x) is provided based on hyperparameters and nodes, which is formed completely or partially from a first initially provided training data record.

In step S2, a second training data record (xi, yi), i=1, …N

is furthermore provided, where xi represents the p-dimensional measuring points and yi represents the scalar output values.

In step S3, the deviations {tilde over (y)}i between the model predictions f'first panmi modei(xi) (output values or function values) of the first data-based partial model and the measuring points yi at the observed measuring points of the second training data record are then ascertained:)

{tilde Over (y)}i =yi —f'first_partial_model(^\)

In step S4, an additional, second data-based model {second panmi modci(x) is trained on the obtained deviations) {tilde over (y)}i, i.e., on the training data (xi, {tilde over (y)}i), i=l, . . ., N. It is to be noted that the additional partial model is trained with the aid of an average value function which corresponds to a constant value of 0, thus following the zero function in the extrapolation range.

(Emphasis added.) While paragraph 55 of Markert mentions hyperparameters (specifically that a model is provided based on hyperparameters), none of the cited paragraphs teach or suggest updating hyperparameters.

In response to these remarks, the Advisory Action states, “The question is what is meant by ‘updating parameters?’” The Advisory Action then points to Figure 4 and states that Figure 4 generally discloses a learning process. Based on this and the fact that Markert discloses a series of steps disclosing learning behavior, the Advisory Action somehow also discloses updating hyperparameters. Thus, there are multiple issues with the Advisory Action’s observations/suggestions. First, the corresponding text of Figure 4 does not mention “learning.” Such text is about one way to update a hyperparameter. Second, the fact that both the specification and Markert mention machine learning does not mean that Markert also discloses updating hyperparameters, much less updating hyperparameters in the way claimed. There are virtually hundreds of machine learning techniques described in the literature that do not involve updating hyperparameters at all. Third, Claim 1 recites one way to update a hyperparameter.

Examiner’s answer:
Per the specification the terms ‘hyperparameter’ has a number of different examples. ‘higher level of properties 0028’ ‘regularization 0029’, ‘convergence 0030’ ‘clustering 0031’ ‘model selection 0031’ ‘decay00 31’ and ‘threshold. 0031’
In 0060 of Markert, ‘This concept may be generalized in a simple manner by allowing any number of additional partial models. For example, measuring points of an additional training data record, which, for example, were associated with a certain local effect via a classification or clustering method, may be modeled in an additional data-based partial model.’ Here ‘updating’ can be seen as using ‘additional training records’ and since an hyperparameter is associated with a clustering method, this clarifies the cited art.  

6.	Applicant’s argument
What’s more, paragraph 56 of Markert (cited for allegedly disclosing the second updating step of Claim 1) fails to even mention hyperparameters. Therefore, that paragraph necessarily fails to disclose updating hyperparameters. much less doing so by adding a new dimension.

Examiner’s answer:
0055 connects hyperparameters to sub.models and the associated data. 0056 continues disclosing additional training record (updating). 0057 discloses the deviations (differences) between the outputs and a measuring position. This is a common description of training. 0058 Using the difference between the actual outcome and the desired outcome to modify a weight.  0060 links these models to clustering methods which parallels the specification of the use of a hyperparameter. 

7.	Applicant’s argument
Furthermore, paragraph 56 of Markert fails to disclose adding a new dimension. Instead, it merely mentions that xi represents p-dimensional measuring points.

Examiner’s answer:
In light of the applicants arguments, the examiner cites Fleisman. Parameters and/or constraints reflect an additional value which is seen as a ‘dimension.’ 

8.	Applicant’s argument
Eberhart necessarily fails to disclose “wherein updating the updated set of hyperparameters with a new dimension for the new entity comprises... updating the new dimension with a random value” as recited in present Claim 1.

Examiner’s answer:
The examine does not agree with the applicant but uses the reference Yanase as in claim 1.

9.	Applicant’s argument
(Emphasis added.) In rejecting Claim 15, the Final Office Action cites paragraph 44 of Shotten for allegedly disclosing “discounting contributions of the outputs to the performance metric based on a set of ages associated with the subset of the input data” as now recited in 

Examiner’s answer:
The examiner uses an established reference in light of the amended claims. Table 1. Local version , neighborhood= 2 . Median iterations required to meet a criterion or squared error per node < 0. 02. Population=20 particles . There were no trials with iterations> 2000. (Eberhart, p41)


10.	Claims 1-20 are rejected.


Correspondence Information
11.	Any inquiry concerning this information or related to the subject disclosure should be directed to the Examiner Mr. Peter Coughlan, whose telephone number is (571) 272-5990 (Fax 571-273-5990).  The Examiner can be reached on Monday through Friday from 7:15 a.m. to 3:45 p.m.
	If attempts to reach the Examiner by telephone are unsuccessful, the Examiner’s supervisor Mr. Li Zhen can be reached at (571) 272-3768.  Any response to this office action should be mailed to:
	Commissioner of Patents and Trademarks, 
	Washington, D. C. 20231;
Hand delivered to:
	Receptionist, 
	Customer Service Window, 
	Randolph Building, 
	401 Dulany Street,
	Alexandria, Virginia 22313,
	(located on the first floor of the south side of the Randolph Building);
or faxed to:

Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.







/Li B. Zhen/Supervisory Patent Examiner, Art Unit 2121