DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
1.	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .  In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
	
Status of the Application
2.	Claims 1-20 are pending in this application (16/588,913), as Applicant has filed a Request for Reconsideration under 37 CFR 1.111 on 03/10/2022, following the Non-Final Rejection office action dated 12/10/2021.    
	No Claims have been amended. 
	(Please see page 8 of Applicant Arguments/Remarks, filed on 03/10/2022)
	Applicant's submissions have been entered.


Statements Regarding 112(f): 6th Paragraph
3. 	The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. - An element in a claim for a combinationmay be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.
4. 	The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.
5. 	Claim 1 has/have been interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because it uses/they use generic placeholders coupled with functional language without reciting sufficient structure to achieve the function. Furthermore, the generic placeholders are not preceded by a structural modifier.
6. 	Since the claim limitation(s) invokes 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, claim(s) 1 has/have been interpreted to cover the corresponding structure described in the specification that achieves the claimed function, and equivalents thereof.
7. 	A review of the specification shows that for the 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph limitations/features of independent claim 1: 	
	“one or more computing devices configured to”, 
	“the machine learning training cluster is configured to:”
	“the machine learning analysis system is configured to”,
the corresponding structure to perform the functions are the processors 3010A-3010N as disclosed in Applicant's Specification, paragraph [0107], referring to FIG. 21:  "one or more processors 3010A-3010N coupled to a system memory 3020 via an input/output (I/O) interface 3030”.
8. 	It should be noted, that when the claim limitation does not use the phrase"means for" or "step for," examiners should determine whether the claim limitation uses a nonstructural term (a term that is simply a substitute for the term "means for").  Examiners will apply § 112 (f), to a claim limitation that uses a nonstructural term associated with functional language, unless the nonstructural term is (1) preceded by a structural modifier, defined in the specification as a particular structure or known by one skilled in the art, that denotes the type of structural device (e.g., "filters"), or (2) modified by sufficient structure or material for achieving the claimed function. The following is a list of non-structural terms that may invoke § 112, sixth paragraph:  "mechanism for," "module for," "device for," "unit for," "component for," "element for," "member for," "apparatus for," "machine for," or "system for." This list is not exhaustive, and other non-structural terms may invoke § 112 (f). 
9. 	If applicant wishes to provide further explanation or dispute the examiner'sinterpretation of the corresponding structure, applicant must identify the corresponding structure with reference to the specification by page and line number, and to the drawing, if any, by reference characters in response to this Office action. 
10. 	If applicant does not intend to have the claim limitation(s) treated under 35U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may amend the claim(s) so that it/they will clearly not invoke 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, or present a sufficient showing that the claim recites/recite sufficient structure, material, or acts for performing the claimed function to preclude application of 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. 
11. 	For more information, see MPEP § 2173 et seq. and Supplementary Examination Guidelines for Determining Compliance With 35 U.S.C. 112 and for Treatment of Related Issues in Patent Applications, 76 FR 7162, 7167 (Feb. 9, 2011 ). 

Claim Rejections - 35 USC § 103
12.	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimedinvention is not identically disclosed as set forth in section 102 of this title, if the differencesbetween the claimed invention and the prior art are such that the claimed invention as a wholewould have been obvious before the effective filing date of the claimed invention to a personhaving ordinary skill in the art to which the claimed invention pertains. Patentability shall notbe negated by the manner in which the invention was made. 

13.	Claims 1-20 are rejected, under AIA  35 U.S.C. 103, as being un-patentable by FAULHABER, JR. et al. (US 2019/0156247 A1, Pub. Date: May 23, 2019; Filed: Mar. 13, 2018; hereinafter FAULHABER) [cited by Applicant as a prior art in IDS filed on 04/29/2021], in view of Sobol et al. (US 2019/0209022 A1; Pub. Date:  Jul. 11, 2019; Filed: Dec. 27, 2018; hereinafter Sobol).

Regarding claim 1, FAULHABER teaches: 
(Original) A system (See, e.g., FAULHABER, FIG. 1; par [0025]:  “…FIG. 1 is a block diagram illustrating an environment for dynamic accuracy-based experimentation and deployment of machine learning models in provider networks according to some embodiments. …”  Examiner Note (EN):  FAULHABER discloses: FIG. 1 is a block diagram illustrating an environment for dynamic accuracy-based experimentation and deployment of machine learning models.), comprising: 

one or more computing devices configured to implement a machine learning training cluster (See, e.g., FAULHABER, FIG. 1; par [0030]:  “…multiple ML models 118A-118N can be configured as part of a group 116 of models--e.g., multiple models trained to perform a same "type" of inference…” EN:  FAULHABER discloses: multiple ML [machine learning] models 118A-118N can be configured as part of a group [cluster] 116 of models.), wherein the machine learning training cluster is configured to: 

train a machine learning model (See, e.g., FAULHABER, FIG. 1; par [0030]:  “…multiple ML models 118A-118N can be configured as part of a group 116 of models--e.g., multiple models trained to perform a same "type" of inference…” EN:  FAULHABER discloses: multiple ML models trained to perform a same "type" of inference.); and
one or more computing devices configured to implement a machine learning analysis system (See, e.g., FAULHABER, FIG. 1; par [0031]:  “The machine learning service 140, which may execute the group 116 of ML models 118A-118N (e.g., within a model hosting system), in some embodiments includes a dynamic router 108 and an analytics engine 122.…” EN:  FAULHABER discloses: The machine learning service 140 includes an analytics engine 122.), wherein the machine learning analysis system is configured to:
detect one or more problems associated with the training of the machine learning model based at least in part on the analysis of the aggregated data (See, e.g., FAULHABER, FIG. 1; par [0034]:  “The data 136 may include, for example, the input data 134 (e.g., provided in, or identified by, a request 132), the individual inference results 142 generated by the ML models 118, etc. The analytics engine 122 can determine, using such data 136, the quality of the inferences of the ML model(s) 118. …”  Also see, e.g., FAULHABER, par [0068]: “…the dashboard could be interactive and allow a user to change how traffic is passed to models, add models to groups, pull models out of groups, etc., and can also have alarming and alerting (e.g., to indicate that a model is not performing well). …”  EN:  FAULHABER teaches: determine the quality of the inferences of the ML model(s) that may indicate that a model is not performing well.); and
generate one or more alarms describing the one or more problems associated with the training of the machine learning model (See, e.g., FAULHABER, par [0068]:  “…the dashboard could be interactive and allow a user to change how traffic is passed to models, add models to groups, pull models out of groups, etc., and can also have alarming and alerting (e.g., to indicate that a model is not performing well).” EN:  FAULHABER teaches: alarming and alerting (e.g., to indicate that a model is not performing well).).


FAULHABER does not appear to explicitly teach: 
collect data associated with training of the machine learning model on the one or more computing devices, wherein the data associated with the training of the machine learning model is collected using agent software on the one or more computing devices, and wherein the data associated with the training of the machine learning model comprises tensor-level numerical values; and 
aggregate the data associated with the training of the machine learning model;
perform an analysis of aggregated data associated with the training of the machine learning model, wherein the analysis of the aggregated data comprises evaluation of one or more rules;

However, Sobol (US 2019/0209022 A1), in an analogous art of training machine learning models, teaches:
collect data associated with training of the machine learning model on the one or more computing devices, wherein the data associated with the training of the machine learning model is collected using agent software on the one or more computing devices, and wherein the data associated with the training of the machine learning model comprises tensor-level numerical values (See, e.g., Sobol, FIG. 2F, par [0243]:  “Within the machine learning context,… terms related to the data being acquired, analyzed and reported include "instance", "label", "feature" and "feature vector".  An instance is an example or observation of the data being collected, and may be further defined with an attribute (or input attribute) that is a specific numerical value of that particular instance, while a label is the output, target or answer that the machine learning algorithm is attempting to solve, the feature is a numerical value that corresponds to an input or input variable in the form of the sensed parameters, whereas a feature vector is a multidimensional representation (that is to say, vector, array or tensor) of the various features that are used to represent the object, phenomenon or thing that is being measured by the sensors 121 or other data-gathering components of the wearable electronic device 100.. …”(emphasis added)   Also see, e.g., Sobol, FIG. 2F: MEMORY 173B, par [0249]: “Such a model, as well as its corresponding algorithmic tools, may include configuring processor 173A to execute programmed software instructions by using a predefined set of machine code 173E such that the neural network which may receive data from the various sensors 121 in the form of various nodes 2100A, 2100B, 2100C .  . . 2100N of the input layer 2100, where each of these input nodes 2100A, 2100B, 2100C .  . . 2100N can store its corresponding input data value within a particular location of memory 173B.”  EN:  Sobol teaches: Within the machine learning context, data being acquired, analyzed and reported include "instance", "label", "feature" and "feature vector", wherein an instance is an observation of the data being collected, and further defined with an attribute that is a specific numerical value of that particular instance, whereas a feature vector is a multidimensional representation (that is to say, vector, array or tensor) of the various features that are used to represent the object, phenomenon or thing that is being measured by the sensors 121 or other data-gathering components of the wearable electronic device 100, and stored within a particular location of memory 173B by using a predefined set of machine code 173E.); and  

aggregate the data associated with the training of the machine learning model (See, e.g., Sobol, FIG. 2F, par [0164]:  “…the sensors 121 may act in conjunction with one another--as well as with instructions that are stored on a machine-readable medium such as memory 173B--to aggregate (or fuse) the acquired data in order to infer certain activities, conditions or circumstances. …”  EN:  Sobol teaches: aggregate (or fuse) the acquired data.).

perform an analysis of aggregated data associated with the training of the machine learning model, wherein the analysis of the aggregated data comprises evaluation of one or more rules (See, e.g., Sobol, FIG. 6, par [0292]:  “…the system 1 may be configured to analyze the significance of the data either with intra-patient or inter-patient baseline data 1700, including for algorithmic training where a data set may be stored in local or remote memory 173B that contains classified or labeled examples with known instances of location, movement or other useful biometric measures of individual activity.  This training data set 16010 may be input into one or more of the machine learning algorithms discussed herein such that once the algorithm is optimized through validation and testing of respective data sets 1620, 1630, a suitable classification rule for use in the ensuing model is established.  This allows presently-acquired data unique to the individual being monitored to be input into the machine learning model in order to determine whether the current (that is to say, real-time) activity from the individual indicates whether the risk of a particular medical condition is heightened.”  EN:  Sobol teaches: the system 1 configured to analyze the significance of the data stored in local or remote memory 173B and input into one or more of the machine learning algorithms such that once the algorithm is optimized through validation and testing of respective data sets 1620, 1630, a suitable classification rule for use in the ensuing model is established.);

It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to beneficially modify FAULHABER’s invention of machine learning model training, by incorporating the teachings of Sobol that teaches “collect data associated with training of the machine learning model on the one or more computing devices, wherein the data associated with the training of the machine learning model is collected using agent software on the one or more computing devices, and wherein the data associated with the training of the machine learning model comprises tensor-level numerical values; and ”, “aggregate the data associated with the training of the machine learning model;”, and “perform an analysis of aggregated data associated with the training of the machine learning model, wherein the analysis of the aggregated data comprises evaluation of one or more rules;”.  A person having ordinary skill in the art would have been motivated toward such a combination to improve FAULHABER to supplant conventional data acquisition components and associated computer systems (see Sobol, par [0008]).  FAULHABER and Sobol are analogous arts directed generally to training machine learning models.   


Regarding claim 2, FAULHABER and Sobol teaches: 
(Original) The system as recited in claim 1 (please see claim 1 rejection), 
wherein the one or more problems associated with the training of the machine learning model comprise a discrepancy in data distributions across a plurality of batches or across two or more of the computing devices of the machine learning training cluster (See, e.g., FAULHABER, par [0022]:  “…it can be useful to have several different ML models serving a same purpose.  For example, different ML models can be constructed using different training data, preprocessing operations, training parameters, model objectives, post-processing operations, or anything else that affects a final model.  However, deciding which model from multiple models is "better" (and therefore should be used) is not a straightforward task.  In many cases, a consistently "best" model may not exist, and a best model may depend on dynamic factors, such as spiky traffic and/or data distribution drifts.  …”  EN:  FAULHABER teaches: different ML models can be constructed using different training data, preprocessing operations, training parameters, model objectives, post-processing operations, or anything else that affects a final model, such as spiky traffic and/or data distribution drifts.).


Regarding claim 3, FAULHABER and Sobol teaches: 
(Original) The system as recited in claim 1 (please see claim 1 rejection), 

FAULHABER does not appear to explicitly teach: 
wherein the one or more problems associated with the training of the machine learning model comprise a vanishing gradient or an exploding gradient detected in tensor-level data.

However, Sobol (US 2019/0209022 A1), in the analogous art of training machine learning models, further teaches:

wherein the one or more problems associated with the training of the machine learning model comprise a vanishing gradient or an exploding gradient detected in tensor-level data (See, e.g., Sobol, FIG. 2F, par [0243]:  “Within the machine learning context,… terms related to the data being acquired, analyzed and reported include "instance", "label", "feature" and "feature vector".  An instance is an example or observation of the data being collected, and may be further defined with an attribute (or input attribute) that is a specific numerical value of that particular instance, while a label is the output, target or answer that the machine learning algorithm is attempting to solve, the feature is a numerical value that corresponds to an input or input variable in the form of the sensed parameters, whereas a feature vector is a multidimensional representation (that is to say, vector, array or tensor) of the various features that are used to represent the object, phenomenon or thing that is being measured by the sensors 121 or other data-gathering components of the wearable electronic device 100.. …”(emphasis added) Also see, e.g., Sobol, par [0353]: “While the general trajectory for all of these conditions is a downward worsening in functional status that ultimately ends in death, it is the presence of various acute, crisis or unstable stages along the trajectory that are of most interest to the present disclosure and the LEAP data being acquired by the wearable electronic device 100 for analysis on it or the system 1.”  EN:  Sobol teaches: a feature vector is a multidimensional representation (that is to say, vector, array or tensor) of the various features that are used to represent the object, phenomenon or thing that is being measured by the sensors 121, where the general trajectory for all of these conditions is a downward worsening in functional status.).

It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to beneficially modify the invention of FAULHABER and Sobol combination for machine learning model training, by incorporating the additional teachings of Sobol that teaches “wherein the one or more problems associated with the training of the machine learning model comprise a vanishing gradient or an exploding gradient detected in tensor-level data.”.  A person having ordinary skill in the art would have been motivated toward such a combination to improve FAULHABER and Sobol because: certain acute events may signal transitions between subsequent ones of the identifiable phases PH along the downward trajectory (see Sobol, par [0353]).  FAULHABER and Sobol are analogous arts directed generally to training machine learning models.   

Regarding claim 4, FAULHABER and Sobol teaches: 
(Original) The system as recited in claim 1 (please see claim 1 rejection),
wherein the training of the machine learning model is discontinued based at least in part on the one or more problems associated with the training of the machine learning model (See, e.g., FAULHABER, FIG. 8; par [0081]:  “…The model training system 820 can automatically scale up and down based on the volume of training requests received from user devices 802 via frontend 829, thereby relieving the user from the burden of having to worry about over-utilization (e.g., acquiring too little computing resources and suffering performance issues) or under-utilization (e.g., acquiring more computing resources than necessary to train the machine learning models, and thus overpaying).”  EN:  FAULHABER teaches: The model training system 820 can automatically scale up and down based on the volume of training requests received from user devices 802 via frontend 829.). 


Regarding claim 5, FAULHABER teaches: 
(Original) A computer-implemented method (See, e.g., FAULHABER, FIG. 7; par [0069]:  “FIG. 7 is a flow diagram illustrating operations 1000 of a method for dynamic accuracy-based deployment of machine learning models according to some embodiments. …”  EN:  FAULHABER discloses: FIG. 7 is a flow diagram illustrating operations 1000 of a method for dynamic accuracy-based deployment of machine learning models.), comprising:
detecting, by the machine learning analysis system, one or more conditions associated with the training of the machine learning model based at least in part on the analysis (See, e.g., FAULHABER, FIG. 1; par [0034]:  “The data 136 may include, for example, the input data 134 (e.g., provided in, or identified by, a request 132), the individual inference results 142 generated by the ML models 118, etc. The analytics engine 122 can determine, using such data 136, the quality of the inferences of the ML model(s) 118. …”  Also see, e.g., FAULHABER, par [0068]: “…the dashboard could be interactive and allow a user to change how traffic is passed to models, add models to groups, pull models out of groups, etc., and can also have alarming and alerting (e.g., to indicate that a model is not performing well). …”  EN:  FAULHABER teaches: determine the quality of the inferences of the ML model(s) that may indicate that a model is not performing well.); and 
generating, by the machine learning analysis system, one or more alarms describing the one or more conditions associated with the training of the machine learning model (See, e.g., FAULHABER, par [0068]:  “…the dashboard could be interactive and allow a user to change how traffic is passed to models, add models to groups, pull models out of groups, etc., and can also have alarming and alerting (e.g., to indicate that a model is not performing well).” EN:  FAULHABER teaches: alarming and alerting (e.g., to indicate that a model is not performing well).).


FAULHABER does not appear to explicitly teach: 
receiving, by a machine learning analysis system, data associated with training of a machine learning model, wherein the data associated with the training of the machine learning model is collected by one or more computing devices of a machine learning training cluster;
performing, by the machine learning analysis system, an analysis of the data associated with the training of the machine learning model;

However, Sobol (US 2019/0209022 A1), in an analogous art of training machine learning models, teaches:
receiving, by a machine learning analysis system, data associated with training of a machine learning model, wherein the data associated with the training of the machine learning model is collected by one or more computing devices of a machine learning training cluster (See, e.g., Sobol, FIG. 2F, par [0243]:  “Within the machine learning context,… terms related to the data being acquired, analyzed and reported include "instance", "label", "feature" and "feature vector".  An instance is an example or observation of the data being collected, and may be further defined with an attribute (or input attribute) that is a specific numerical value of that particular instance, while a label is the output, target or answer that the machine learning algorithm is attempting to solve, the feature is a numerical value that corresponds to an input or input variable in the form of the sensed parameters, whereas a feature vector is a multidimensional representation (that is to say, vector, array or tensor) of the various features that are used to represent the object, phenomenon or thing that is being measured by the sensors 121 or other data-gathering components of the wearable electronic device 100.. …”  Also see, e.g., Sobol, FIG. 2F: MEMORY 173B, par [0249]: “Such a model, as well as its corresponding algorithmic tools, may include configuring processor 173A to execute programmed software instructions by using a predefined set of machine code 173E such that the neural network which may receive data from the various sensors 121 in the form of various nodes 2100A, 2100B, 2100C .  . . 2100N of the input layer 2100, where each of these input nodes 2100A, 2100B, 2100C .  . . 2100N can store its corresponding input data value within a particular location of memory 173B.”  EN:  Sobol teaches: Within the machine learning context, data being acquired, analyzed and reported include "instance", "label", "feature" and "feature vector", wherein an instance is an observation of the data being collected, and further defined with an attribute that is a specific numerical value of that particular instance, whereas a feature vector is a multidimensional representation (that is to say, vector, array or tensor) of the various features that are used to represent the object, phenomenon or thing that is being measured by the sensors 121 or other data-gathering components of the wearable electronic device 100, and stored within a particular location of memory 173B by using a predefined set of machine code 173E.); and  

performing, by the machine learning analysis system, an analysis of the data associated with the training of the machine learning model (See, e.g., Sobol, FIG. 6, par [0292]:  “…the system 1 may be configured to analyze the significance of the data either with intra-patient or inter-patient baseline data 1700, including for algorithmic training where a data set may be stored in local or remote memory 173B that contains classified or labeled examples with known instances of location, movement or other useful biometric measures of individual activity.  This training data set 16010 may be input into one or more of the machine learning algorithms discussed herein such that once the algorithm is optimized through validation and testing of respective data sets 1620, 1630, a suitable classification rule for use in the ensuing model is established.  This allows presently-acquired data unique to the individual being monitored to be input into the machine learning model in order to determine whether the current (that is to say, real-time) activity from the individual indicates whether the risk of a particular medical condition is heightened.”  EN:  Sobol teaches: the system 1 configured to analyze the significance of the data stored in local or remote memory 173B and input into one or more of the machine learning algorithms such that once the algorithm is optimized through validation and testing of respective data sets 1620, 1630, a suitable classification rule for use in the ensuing model is established.);

It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to beneficially modify FAULHABER’s invention of machine learning model training, by incorporating the teachings of Sobol that teaches “receiving, by a machine learning analysis system, data associated with training of a machine learning model, wherein the data associated with the training of the machine learning model is collected by one or more computing devices of a machine learning training cluster; ”, and “performing, by the machine learning analysis system, an analysis of the data associated with the training of the machine learning model;”.  A person having ordinary skill in the art would have been motivated toward such a combination to improve FAULHABER to supplant conventional data acquisition components and associated computer systems (see Sobol, par [0008]).  FAULHABER and Sobol are analogous arts directed generally to training machine learning models.   


Regarding claim 6, FAULHABER and Sobol teaches: 
(Original) The method as recited in claim 5 (please see claim 5 rejection),wherein the data associated with the training of the machine learning model comprises tensor data output (See, e.g., Sobol, FIG. 2F, par [0243]:  “Within the machine learning context,… terms related to the data being acquired, analyzed and reported include "instance", "label", "feature" and "feature vector".  An instance is an example or observation of the data being collected, and may be further defined with an attribute (or input attribute) that is a specific numerical value of that particular instance, while a label is the output, target or answer that the machine learning algorithm is attempting to solve, the feature is a numerical value that corresponds to an input or input variable in the form of the sensed parameters, whereas a feature vector is a multidimensional representation (that is to say, vector, array or tensor) of the various features that are used to represent the object, phenomenon or thing that is being measured by the sensors 121 or other data-gathering components of the wearable electronic device 100.. …”(emphasis added)  EN:  Sobol teaches: a feature vector is a multidimensional representation (that is to say, vector, array or tensor) of the various features.) from one or more graphics processing units (GPUs) of the machine learning training cluster (See, e.g., FAULHABER, FIG. 8, par [0077]:  “…a user device 802 can provide a training request to the frontend 829 that includes a container image (or multiple container images, or an identifier of one or multiple locations where container images are stored), an indicator of input data (e.g., an address or location of input data), one or more hyperparameter values (e.g., values indicating how the algorithm will operate, how many algorithms to run in parallel, how many clusters into which to separate data, etc.), and/or information describing the computing machine on which to train a machine learning model (e.g., a graphical processing unit (GPU) instance type, a central processing unit (CPU) instance type, an amount of memory to allocate, a type of virtual machine instance to use for training, etc.).” EN:  FAULHABER teaches: information describing the computing machine on which to train a machine learning model (e.g., a graphical processing unit (GPU).), and wherein the data associated with the training of the machine learning model is aggregated prior to the analysis of the data associated with the training of the machine learning model (See, e.g., Sobol, FIG. 2F, par [0164]:  “…the sensors 121 may act in conjunction with one another--as well as with instructions that are stored on a machine-readable medium such as memory 173B--to aggregate (or fuse) the acquired data in order to infer certain activities, conditions or circumstances. …”  EN:  Sobol teaches: aggregate (or fuse) the acquired data.).


Regarding claim 7, FAULHABER and Sobol teaches: 
(Original) The method as recited in claim 5 (please see claim 5 rejection),
wherein the one or more conditions associated with the training of the machine learning model comprise a discrepancy in data distributions across a plurality of batches (See, e.g., FAULHABER, par [0022]:  “…it can be useful to have several different ML models serving a same purpose.  For example, different ML models can be constructed using different training data, preprocessing operations, training parameters, model objectives, post-processing operations, or anything else that affects a final model.  However, deciding which model from multiple models is "better" (and therefore should be used) is not a straightforward task.  In many cases, a consistently "best" model may not exist, and a best model may depend on dynamic factors, such as spiky traffic and/or data distribution drifts.  …”  EN:  FAULHABER teaches: different ML models can be constructed using different training data, preprocessing operations, training parameters, model objectives, post-processing operations, or anything else that affects a final model, such as spiky traffic and/or data distribution drifts.).


Regarding claim 8, FAULHABER and Sobol teaches: 
(Original) The method as recited in claim 5 (please see claim 5 rejection),
wherein the one or more conditions associated with the training of the machine learning model comprise a discrepancy in data distributions across two or more of the computing devices of the machine learning training cluster (See, e.g., FAULHABER, par [0022]:  “…it can be useful to have several different ML models serving a same purpose.  For example, different ML models can be constructed using different training data, preprocessing operations, training parameters, model objectives, post-processing operations, or anything else that affects a final model.  However, deciding which model from multiple models is "better" (and therefore should be used) is not a straightforward task.  In many cases, a consistently "best" model may not exist, and a best model may depend on dynamic factors, such as spiky traffic and/or data distribution drifts.  …”  EN:  FAULHABER teaches: different ML models can be constructed using different training data, preprocessing operations, training parameters, model objectives, post-processing operations, or anything else that affects a final model, such as spiky traffic and/or data distribution drifts.).


Regarding claim 9, FAULHABER and Sobol teaches: 
(Original) The method as recited in claim 5 (please see claim 5 rejection),
FAULHABER does not appear to explicitly teach: 
wherein the one or more conditions associated with the training of the machine learning model comprise a vanishing gradient or an exploding gradient detected in tensor-level data.

However, Sobol (US 2019/0209022 A1), in the analogous art of training machine learning models, further teaches:

wherein the one or more conditions associated with the training of the machine learning model comprise a vanishing gradient or an exploding gradient detected in tensor-level data (See, e.g., Sobol, FIG. 2F, par [0243]:  “Within the machine learning context,… terms related to the data being acquired, analyzed and reported include "instance", "label", "feature" and "feature vector".  An instance is an example or observation of the data being collected, and may be further defined with an attribute (or input attribute) that is a specific numerical value of that particular instance, while a label is the output, target or answer that the machine learning algorithm is attempting to solve, the feature is a numerical value that corresponds to an input or input variable in the form of the sensed parameters, whereas a feature vector is a multidimensional representation (that is to say, vector, array or tensor) of the various features that are used to represent the object, phenomenon or thing that is being measured by the sensors 121 or other data-gathering components of the wearable electronic device 100.. …”(emphasis added) Also see, e.g., Sobol, par [0353]: “While the general trajectory for all of these conditions is a downward worsening in functional status that ultimately ends in death, it is the presence of various acute, crisis or unstable stages along the trajectory that are of most interest to the present disclosure and the LEAP data being acquired by the wearable electronic device 100 for analysis on it or the system 1.”  EN:  Sobol teaches: a feature vector is a multidimensional representation (that is to say, vector, array or tensor) of the various features that are used to represent the object, phenomenon or thing that is being measured by the sensors 121, where the general trajectory for all of these conditions is a downward worsening in functional status.).


It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to beneficially modify the invention of FAULHABER and Sobol combination for machine learning model training, by incorporating the additional teachings of Sobol that teaches “wherein the one or more problems associated with the training of the machine learning model comprise a vanishing gradient or an exploding gradient detected in tensor-level data.”.  A person having ordinary skill in the art would have been motivated toward such a combination to improve FAULHABER and Sobol because: certain acute events may signal transitions between subsequent ones of the identifiable phases PH along the downward trajectory (see Sobol, par [0353]).  FAULHABER and Sobol are analogous arts directed generally to training machine learning models.   



Regarding claim 10, FAULHABER and Sobol teaches: 
(Original) The method as recited in claim 5 (please see claim 5 rejection),
FAULHABER does not appear to explicitly teach: 
wherein the one or more conditions associated with the training of the machine learning model comprise an overflow or an underflow detected in tensor-level data.

However, Sobol (US 2019/0209022 A1), in the analogous art of training machine learning models, further teaches:

wherein the one or more conditions associated with the training of the machine learning model comprise an overflow or an underflow detected in tensor-level data (See, e.g., Sobol, FIG. 2F, par [0243]:  “Within the machine learning context,… terms related to the data being acquired, analyzed and reported include "instance", "label", "feature" and "feature vector".  An instance is an example or observation of the data being collected, and may be further defined with an attribute (or input attribute) that is a specific numerical value of that particular instance, while a label is the output, target or answer that the machine learning algorithm is attempting to solve, the feature is a numerical value that corresponds to an input or input variable in the form of the sensed parameters, whereas a feature vector is a multidimensional representation (that is to say, vector, array or tensor) of the various features that are used to represent the object, phenomenon or thing that is being measured by the sensors 121 or other data-gathering components of the wearable electronic device 100.. …”(emphasis added) Also see, e.g., Sobol, par [0353]: “While the general trajectory for all of these conditions is a downward worsening in functional status that ultimately ends in death, it is the presence of various acute, crisis or unstable stages along the trajectory that are of most interest to the present disclosure and the LEAP data being acquired by the wearable electronic device 100 for analysis on it or the system 1.”  EN:  Sobol teaches: a feature vector is a multidimensional representation (that is to say, vector, array or tensor) of the various features that are used to represent the object, phenomenon or thing that is being measured by the sensors 121, where the general trajectory for all of these conditions is a downward worsening in functional status.).


It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to beneficially modify the invention of FAULHABER and Sobol combination for machine learning model training, by incorporating the additional teachings of Sobol that teaches “wherein the one or more conditions associated with the training of the machine learning model comprise an overflow or an underflow detected in tensor-level data”.  A person having ordinary skill in the art would have been motivated toward such a combination to improve FAULHABER and Sobol because: certain acute events may signal transitions between subsequent ones of the identifiable phases PH along the downward trajectory (see Sobol, par [0353]).  FAULHABER and Sobol are analogous arts directed generally to training machine learning models.   


Regarding claim 11, FAULHABER and Sobol teaches: 
(Original) The method as recited in claim 5 (please see claim 5 rejection),
wherein the data associated with the training of the machine learning model comprises data describing resource utilization of the machine learning training cluster, and wherein the one or more conditions associated  with the training of the machine learning model represent a violation of one or more resource utilization thresholds (See, e.g., FAULHABER, FIG. 8, par [0096]:  “…the model metrics can indicate that the machine learning model is performing poorly (e.g., has an error rate above a threshold value, has a statistical distribution that is not an expected or desired distribution (e.g., not a binomial distribution, a Poisson distribution, a geometric distribution, a normal distribution, Gaussian distribution, etc.), has an execution latency above a threshold value, has a confidence level below a threshold value)) and/or is performing progressively worse (e.g., the quality metric continues to worsen over time). …”  EN:  FAULHABER teaches: the model metrics can indicate that the machine learning model is performing poorly, e.g., has an error rate above a threshold value.).

Regarding claim 12, FAULHABER and Sobol teaches: 
(Original) The method as recited in claim 5 (please see claim 5 rejection),
wherein the data associated with the training of the machine learning model comprises data describing resource utilization of the machine learning training cluster, and wherein the one or more conditions associated with the training of the machine learning model represent one or more performance bottlenecks identified in one or more resources of the machine learning training cluster (See, e.g., FAULHABER, FIG. 8, par [0096]:  “…the model metrics can indicate that the machine learning model is performing poorly (e.g., has an error rate above a threshold value, has a statistical distribution that is not an expected or desired distribution (e.g., not a binomial distribution, a Poisson distribution, a geometric distribution, a normal distribution, Gaussian distribution, etc.), has an execution latency above a threshold value, has a confidence level below a threshold value)) and/or is performing progressively worse (e.g., the quality metric continues to worsen over time). …”  EN:  FAULHABER teaches: the model metrics can indicate that the machine learning model is performing poorly (e.g., has an error rate above a threshold value, has a statistical distribution that is not an expected or desired distribution (e.g., not a binomial distribution, a Poisson distribution, a geometric distribution, a normal distribution, Gaussian distribution, etc.), has an execution latency above a threshold value, has a confidence level below a threshold value)).). 

Regarding claim 13, FAULHABER and Sobol teaches: 
(Original) The method as recited in claim 5 (please see claim 5 rejection),
further comprising: 

discontinuing the training of the machine learning model based at least in part on the one or more conditions associated with the training of the machine learning model (See, e.g., FAULHABER, FIG. 8; par [0081]:  “…The model training system 820 can automatically scale up and down based on the volume of training requests received from user devices 802 via frontend 829, thereby relieving the user from the burden of having to worry about over-utilization (e.g., acquiring too little computing resources and suffering performance issues) or under-utilization (e.g., acquiring more computing resources than necessary to train the machine learning models, and thus overpaying).”  EN:  FAULHABER teaches: The model training system 820 can automatically scale up and down based on the volume of training requests received from user devices 802 via frontend 829.); 

modifying a configuration of the training of the machine learning model (See, e.g., FAULHABER, FIG. 8; par [0096]:  “…transmit a request to the model training system 820 to modify the machine learning model being trained (e.g., transmit a modification request).. …”  EN:  FAULHABER teaches: modify the machine learning model being trained.); and 

restarting the training of the machine learning model according to the configuration (See, e.g., FAULHABER, FIG. 8; par [0096]:  “…execute the code 836 stored in the new ML training container 830 to restart the machine learning model training process. …”  EN:  FAULHABER teaches: execute the code 836 stored in the new ML training container 830 to restart the machine learning model training process.).

Regarding claim 14, FAULHABER and Sobol teaches: 
(Original) The method as recited in claim 5 (please see claim 5 rejection),
further comprising: 
discontinuing use of the machine learning training cluster based at least in part on the one or more conditions associated with the training of the machine learning model (See, e.g., FAULHABER, FIG. 8; par [0081]:  “…The model training system 820 can automatically scale up and down based on the volume of training requests received from user devices 802 via frontend 829, thereby relieving the user from the burden of having to worry about over-utilization (e.g., acquiring too little computing resources and suffering performance issues) or under-utilization (e.g., acquiring more computing resources than necessary to train the machine learning models, and thus overpaying).”  EN:  FAULHABER teaches: The model training system 820 can automatically scale up and down based on the volume of training requests received from user devices 802 via frontend 829.).

Claims 15-19:
	Media Claims 15-19 are similar to rejected method Claims 5, 7, 8, 9 and 11, respectively.  
As such, Claims 15-19 are rejected under AIA  35 U.S.C. 103 as being un-patentable by FAULHABER and Sobol for similar rationale.

Regarding claim 20, FAULHABER and Sobol teaches: 
(Original) The one or more non-transitory computer-readable storage media as recited in claim 15 (please see claim 15 rejection), further comprising additional program instructions that, when executed on or across the one or more processors, perform:
discontinuing the training of the machine learning model based at least in part on the one or more problems associated with the training of the machine learning model (See, e.g., FAULHABER, FIG. 8; par [0081]:  “…The model training system 820 can automatically scale up and down based on the volume of training requests received from user devices 802 via frontend 829, thereby relieving the user from the burden of having to worry about over-utilization (e.g., acquiring too little computing resources and suffering performance issues) or under-utilization (e.g., acquiring more computing resources than necessary to train the machine learning models, and thus overpaying).”  EN:  FAULHABER discloses: The model training system 820 can automatically scale up and down based on the volume of training requests received from user devices 802 via frontend 829.).




			Response to Arguments/Remarks
14.	The Applicant Arguments/Remarks filed on 03/10/2022, under 37 CFR 1.111 have been fully considered by Examiner but they are not persuasive to overcome the rejections.  
35 U.S.C, § 112(f) Interpretation 
Applicant makes the following remarks, in page 8:
“The 35 U.S.C, § 112(f) Interpretation Is Improper
The Office Action alleges that claim 1 invokes 35 U.S.C. § 112(f) “because it uses/they use generic placeholders coupled with functional language without reciting sufficient structure to achieve the function” and that “the generic placeholders are not preceded by a structural modifier.” See, Office Action, p. 3. However, nowhere do any of the claims recite either of the terms “means” or “step” or any other nonce term, and thus the analysis under 35 U.S.C. § 112(f) should end there. Further, contrary to Office Action’s allegation, claim 1 recites “one or more computing devices configured to implement a machine learning training cluster” and “one or more computing devices configured to implement a machine learning analysis system.” The claimed “one or more computing devices” are not generic placeholders or nonce terms. Instead, each of the claimed “one or more computing devices” provides a specific structural components used to implement the claimed systems. Thus, claim 1 should invoke 35 U.S.C. § 112(f) and the Applicant respectfully disagrees with the characterizations of claim 1 alleged under 35 U.S.C. § 112(f) in the Office Action. Accordingly, Applicant respectfully requests that 35 U.S.C. § 112(f) invocation of claim 1 be withdrawn.”
Examiner’s response: 
Examiner respectfully disagrees. Examiner does not find Applicant’s above arguments to be persuasive. 
As such, 35 U.S.C, § 112(f) Interpretation of Claim 1 is maintained.

Claim Rejections under 35 U.S.C. § 103:
Applicant argues, in pages 8-10:
“The Claims Are Patentable Over The Cited References
The Office Action rejected claims 1-20 under 35 U.S.C. § 103 as allegedly being unpatentable over Faulhaber, JR. et al (US Publication No. 20190156247) (hereinafter “Faulhaber”’) in view of Sobol et al (US Publication No. 20190209022) (hereinafter “Sobol”). The rejection of claims 1-20 is respectfully traversed.
For ease of reference, claim 1 (and similarly claims 5 and 15) recites:
A system, comprising: 
one or more computing devices configured to implement a machine learning training cluster, wherein the machine learning training cluster is configured to: 
train a machine learning model; and 
collect data associated with training of the machine learning model on the one or more computing devices, wherein the data associated with the training of the machine learning model is collected using agent software on the one or more computing devices, and wherein the data associated with the training of the machine learning model comprises tensor-level numerical values; and 
one or more computing devices configured to implement a machine learning analysis system, wherein the machine learning analysis system is configured to: 
aggregate the data associated with the training of the machine learning model; 
perform an analysis of aggregated data associated with the training of the machine learning model, wherein the analysis of the aggregated data comprises evaluation of one or more rules; 
detect one or more problems associated with the training of the machine learning model based at least in part on the analysis of the aggregated data; and 
generate one or more alarms describing the one or more problems associated with the training of the machine learning model. (Emphasis added.) 
The Office Action concedes that Faulhaber fails to describe the above-emphasized features. See, Office Action, p. 7. The Office Action alleges that Sobol cures the deficiencies of Faulhaber. /d. However, Sobol fails to describe or suggest a machine learning analysis system configured to “perform an analysis of aggregated data associated with the training of the machine learning model, wherein the analysis of the aggregated data comprises evaluation of one or more rules” as recited in claim 1 (and similarly claims 5 and 15). 
Sobol describes that a “training data set 16010 may be input into one or more of the machine learning algorithms discussed herein such that once the algorithm is optimized through validation and testing of respective data sets 1620, 1630, a suitable classification rule for use in the ensuing model is established. See, Sobol, para. 0292. Thus, at best, Sobol describes establishing a classification rule for a model based on validated and tested data sets. However, establishing a classification rule based on analyzed data sets, as described by Sobol, fails to describe evaluating one or more rules as a component of performing analysis on data sets. Nowhere does Sobol describe that the establishment of a classification rule is a component of performing analysis on data sets. Accordingly, Sobol fails to describe or suggest a machine learning analysis system configured to “perform an analysis of aggregated data associated with the training of the machine learning model, wherein the analysis of the aggregated data comprises evaluation of one or more rules” as recited in claim 1 (and similarly claims 5 and 15).
For at least these reasons, claim 1 (and similarly claims 5 and 15) are not rendered obvious by the cited references. Applicant respectfully requests that the § 103 rejection of claims 1, 5 and 15 be withdrawn and that claims 1, 5, and 15 be allowed.
Claims 2-4 depend on claim 1. Claims 6-14 depend on claim 5. Claims 16-20 depend on claim 15. Thus, claims 2-4, 6-14, and 16-20 are also allowable over cited references at least because of their respective dependencies on claims 1, 5, and 15. Applicant respectfully requests that the § 103 rejection of claims 2-4, 6-14, and 16-20 be allowed.”
Examiner’s response: 
Examiner respectfully disagrees.  Examiner maintains that FAULHABER (US 2019/0156247 A1), in view of Sobol (US 2019/0209022 A1) teaches all limitations of independent Claim 1 (and similarly Claims 5 and 15), as evidenced by the citations and rationale presented in rejecting Claim 1 under AIA  35 U.S.C. 103, hereinabove, in this office action.  
Specifically, regarding Applicant’s above arguments with respect to the Claim 1 limitation “perform an analysis of aggregated data associated with the training of the machine learning model, wherein the analysis of the aggregated data comprises evaluation of one or more rules;” taught by Sobol, it is respectfully noted that:  Sobol (e.g., FIG. 6, par [0292]: “…the system 1 may be configured to analyze the significance of the data either with intra-patient or inter-patient baseline data 1700, including for algorithmic training where a data set may be stored in local or remote memory 173B that contains classified or labeled examples with known instances of location, movement or other useful biometric measures of individual activity.  This training data set 16010 may be input into one or more of the machine learning algorithms discussed herein such that once the algorithm is optimized through validation and testing of respective data sets 1620, 1630, a suitable classification rule for use in the ensuing model is established.  This allows presently-acquired data unique to the individual being monitored to be input into the machine learning model in order to determine whether the current (that is to say, real-time) activity from the individual indicates whether the risk of a particular medical condition is heightened.”) teaches: the system 1 configured to analyze the significance of the data stored in local or remote memory 173B [perform an analysis of aggregated data]  and input into one or more of the machine learning algorithms [associated with the training of the machine learning model] such that once the algorithm is optimized through validation and testing of respective data sets 1620, 1630, a suitable classification rule for use in the ensuing model is established [wherein the analysis of the aggregated data comprises evaluation of one or more rules].  Therefore, respectfully, Examiner does not find Applicant’s above arguments to be persuasive.
As such, Claim 1 is rejected, under AIA  35 U.S.C. 103, as being un-patentable by FAULHABER, in view of Sobol. 
Independent Claims 5 and 15 are also rejected, under AIA  35 U.S.C. 103, as being un-patentable by FAULHABER, in view of Sobol based on the citations and rationale presented in rejecting these Claims under AIA  35 U.S.C. 103, hereinabove, in this office action.
Claims 2-4, 6-14, and 16-20, which depend on rejected independent Claims 1, 5, or 15, inherit the deficiencies of their respective parent Claim.  And Examiner maintains that FAULHABER, in view of Sobol, teaches all additional limitations of these dependent Claims as well, as evidenced by the citations and rationale presented in rejecting these Claims under AIA  35 U.S.C. 103, hereinabove, in this office action.
 As such, Claims 2-4, 6-14, and 16-20 are rejected, under AIA  35 U.S.C. 103, as being un-patentable by FAULHABER, in view of Sobol.



Conclusion
15.	Claims 1-20 are rejected.
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MOHAMMED N HUDA whose telephone number is (571)270-7171. The examiner can normally be reached Reg. Hrs M-F: 9am-5:30pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Wei Zhen can be reached on 571-272-3708. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/MOHAMMED N HUDA/           Examiner, Art Unit 2191       

/WEI Y ZHEN/           Supervisory Patent Examiner, Art Unit 2191