DETAILED ACTION
This is the response to applicant’s amendment action regarding application number 16/020,335, filed June 27, 2018.


Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendments
The amendment filed July 7, 2022 has been entered. Examiner acknowledges receipt of Amendments to Application 16/020,335, which include: Amendments to the Claims, Amendments to the Drawings, Amendments to the Specification, and Remarks containing Applicant’s amendments.
Regarding Applicant’s Remarks and Amendments to the Claims, Examiner acknowledges Applicant has amended Claims 1, 4, 7, 10, 11, 14, and 17. Claims 1-19 remain pending in the application. 
Regarding Applicant’s Remarks and Amendments to the Claims, Examiner acknowledges Applicant’s amendments have resolved the objections identified in Claims 1, 4, 7, 10-11, 14, and 17, and therefore the respective claim objections previously set forth in the Non-Final Office Action mailed April 12, 2022 are withdrawn. However, Examiner notes that Applicant’s amendments have introduced new claim objections, with the new claim objections identified in the relevant section below.
Regarding Applicant’s Remarks and Amendments to the Claims, Examiner acknowledges Applicant’s amendments have resolved the indefiniteness issues identified in Claims 4, 7, 14, and 17 (and inherited in their dependent claims), and therefore the respective 112(b) rejections previously set forth in the Non-Final Office Action mailed April 12, 2022 for Claims 4-7 and 14-17 are withdrawn. 
Regarding Applicant’s Remarks and Amendments to the Drawings, Examiner acknowledges Applicant’s latest submission for Figure 4 has resolved the identified drawing objection, and therefore the respective drawing objection previously set forth in the Non-Final Office Action mailed April 12, 2022 is withdrawn.
Regarding Applicant’s Remarks and Amendments to the Specification, Examiner acknowledges Applicant’s corrections to paragraphs [0013]-[0014] and [0015] have resolved the corresponding identified specification objections. Examiner also acknowledges Applicant’s latest submission for Figure 3A (removing element 340B) has resolved the identified specification objection in paragraph [0053] (where there is no description for element 340B in Figure 3A). Therefore, the respective specification objections previously set forth in the Non-Final Office Action mailed April 12, 2022 are withdrawn.

Response to Arguments
Examiner acknowledges receipt of Arguments to Application 16/020,335, which include: Remarks containing Applicant’s arguments.
Regarding Applicant’s Remarks for Claims 1-19 under 35 U.S.C. 101, Examiner acknowledges Applicant’s arguments and have considered them, and have found them to be persuasive. Applicant indicates that their invention represents an improvement in a technological field by not relying on a set of established or predetermined rules to identify anomalies and failures, and instead analyzing the received incoming sensor data using unsupervised machine learning to generate these rules, where these rules are further used to select a machine behavioral model and build an optimal model to identify those anomalies, thereby improving the accuracy of failure detection, and as such, the earlier 101 rejection previously set forth in the Non-Final Office Action mailed April 12, 2022 is withdrawn.
Regarding Applicant’s Remarks for Claims 1,3-6, 8-11, 13-16, and 18-19 under 35 U.S.C. 103 as being unpatentable over Shumpert, James Michael, U.S. PGPUB 2016/0342903, filed 5/21/2015 [hereafter referred as Shumpert] in view of Yu, Jianbo, Health Condition Monitoring of Machines Based on Hidden Markov Model and Contribution Analysis, IEEE August 2012 [hereafter referred as Yu]; for Claims 2 and 12 under 35 U.S.C. 103 as being unpatentable over Shumpert in view of Yu as applied to Claims 1 and 11; in further view of Kouadri et al., An adaptive threshold estimation scheme for abrupt changes detection algorithm in a cement rotary kiln, Elsevier B.V. 2013 [hereafter referred as Kouadri]; and for Claims 7 and 17 under 35 U.S.C. 103 as being unpatentable over Shumpert in view of Yu as applied to Claims 5 and 15; in further view of Mehta et al., U.S. PGPUB 2015/0149134, published 5/28/2015 [hereafter referred as Mehta], Examiner acknowledges Applicant’s arguments and have considered them, and have found them to be not persuasive. Examiner also notes that the Applicant has amended certain claims such that it necessitates further examination and re-evaluation of the amended and related original claims. The updated claim mappings according to the Applicant’s amended claims are provided in the sections indicated below.
Regarding Applicant’s Remarks:
“… Even assuming that the model selected in Shumpert is comparable to the claimed machine behavioral models (which applicant does not necessarily agree with or admit), applicant respectfully submits that Shumpert is at best silent on how the models are selected such that Shumpert does not teach selecting its models based on normal behavior patterns of a machine as claimed.
Paragraph 34 of Shumpert teaches selecting a model as being appropriate for a machine, but is silent as to how such appropriateness is determined. Shumpert and, in particular, paragraph 34, does not suggest that the model is selected based on machine behavior at all, let alone based on normal behavior patterns for the machine. The only other instances in which the concept of selecting is mentioned in Shumpert occur in claims 1 and 28, which do not further explain how the selection is performed.
The office action appears to explain this discrepancy by suggesting that "the appropriateness is interpreted as being that the sensor data producing the data instances are related with the rest of the clustered normal and anomalous data instances representing the model." Office Action, p. 19. However, the office action cites no evidence supporting this interpretation. Shumpert does not teach or suggest this, and the office action does not explain why this interpretation would or should be used. Moreover, applicant disagrees that this interpretation is inherent to Shumpert. Appropriateness as mentioned in Shumpert could be defined using other criteria such as, but not limited to, based on known machine types and a known type of a given machine, which would not require analysis of machine behavior patterns.
Applicant respectfully requests that the examiner articulate their reasoning for this interpretation. In the absence of such articulation, applicant submits that the rejection is improper because it does not clearly articulate the reasons as to why the claimed invention is obvious.
Moreover, even assuming that this interpretation of "appropriateness" is applicable (which applicant disagrees with for the reasons noted above), sensor data being related to data instances representing a model still does not necessarily read on using normal behavior patterns of that sensor data, let alone normal behavior patterns output via unsupervised machine learning as claimed.”
Examiner has considered this argument and finds the argument to be not persuasive. Examiner notes that Applicant’s above argument contains several sub-arguments, each of which will be addressed in the following paragraphs.
Regarding Applicant’s sub-argument focusing on the use of the term “appropriate” in the context of the Shumpert reference (in particular, Shumpert [0034]), Examiner finds this sub-argument to be not persuasive. Examiner points out that Applicant is arguing the intention of the term “appropriate” in the Shumpert reference as being something other than related to a selection of a model based on received input sensor data and the produced respective instances, where the received input sensor data further identifies behaviors or patterns within those produced respective instances that can be used to identify suitability or compatibility with a corresponding existing model. The term “appropriate” as defined in the Merriam-Webster dictionary refers to something that is especially suitable or compatible. A person having ordinary skill in the art would understand that performing the selection of a model would involve identifying and determining a model that meets the criteria to perform the relevant task, where meeting this criteria is a measure of suitability or compatibility (i.e., “appropriateness”) to the relevant task. Shumpert [0034] teaches “… a system for detecting anomalies in data dynamically received from a plurality of sensors associated with one or more machines is provided … Processing resources include at least one processor and a memory … for each instance of data received via the one or more interfaces, to at least: classify, using a model retrieved from the model store, the respective instance as being one or a normal instance type, and an anomalous instance type, the retrieved model being selected from the model store, as being appropriate for the machine that produced the data in the respective instance if such a model exists in the model store …”. Hence, in the context of this paragraph, Shumpert teaches that the model being selected is based on the model being suitable or compatible with the machine that produced the data in the respective instance, where the data in the respective instance points to the identification of normal and anomalous instances of data. Applicant asserts that Shumpert’s measure for identifying suitability or compatibility to perform selection of a model can be based other measures such as identifying known machine types; however, Examiner notes that Applicant ignores the fact that Shumpert [0034] specifically states the “appropriateness” (i.e., suitability or compatibility) of the model is based on a machine that produced the data in the respective instance. Under its broadest reasonable interpretation in light of Applicant’s specification paragraph [0052], Applicant’s term “machine behavioral model” broadly indicates a set of data and inferences representing a set of behaviors, patterns, or characteristics associated with a machine. As indicated in the Non-Final Office Action mailed April 12, 2022, Shumpert [0034] indicates that the respective instances associated with a machine are either normal instance types or anomalous instance types (Shumpert [0034]: “… for each instance of data received via the one or more interfaces, to at least: classify, using a model retrieved from the model store, the respective instance as being one of a normal instance type and an anomalous instance type …”). Shumpert teaches that the initial instances of the received input sensor data are assumed to be normal, with further determination of these normal or anomalous instance types being the result of further processing of the received input sensor data using unsupervised machine learning (Shumpert [0064]: “… learning begins immediately with live data … as the first instances come in, they are used to build a model of the characteristics of the data … unsupervised learning techniques are used to build the initial model, and initial instances may be assumed normal. Using the unsupervised prediction techniques below, some instances eventually will be flagged as potentially anomalous …”). Shumpert [0053] additionally clarifies each instance of received input sensor data represents an instance of machine behavior (Shumpert [0053]: “Data from multiple sensors in the same time frame for the same machine may be thought of as forming an instance of machine behavior.”). Shumpert further teaches the unsupervised prediction techniques are used to initially generate a model, such that the model includes information representing normal or anomalous machine behavior and is stored as part of the model in the model store (Shumpert [0055]-[0056]: “The class of the data that is received is predicted (step S404) using the shared learning and prediction component 506. That is, the shared learning and prediction component 506 predicts the class (normal or anomalous) of data instances as they arrive … Current instances are fed to the shared model and classified as either normal or one of several anomaly types …”; [0075]: “The shared learning and prediction component 506 receives the processed sensor data of FIG. 7. It separates out the anomalous instances and uses the normal data and unsupervised machine learning techniques to train multiple models, one for each machine being monitored …”; [0097]: “… it continues to learn indefinitely as new instances are received. It builds the clusters incrementally, using a distance threshold parameter to decide when a new cluster is warranted … the number of clusters k does not have to declared upfront; rather, that may be learned over time and may continue to be dynamic as the system encounters new machine behavior …”; [0116]: “… Unsupervised clustering occurs for each new instance, as long as the instance is within the limits of existing clusters …”; Figure 12, [0130]: “To see the shared model in action … FIG.12 shows the state of the shared model after 5 data instances have been received. The centroid of cluster             
                
                    
                        c
                    
                    
                        1
                    
                
            
         has evolved from its initial position at instance 1 and now represents the center of the cluster. This cluster is classified as normal, and therefore so are all 5 of these instances … instance 4 was some distance away from instances 1-3 but not so far away as to represent a potential anomaly …”; and [0058]: “Because the shared model is kept in the model store 508, predictions can be made with very low latency.”; [0066]: “As noted above, the shared model is stored in the model store 508 …”;  and Figures 12-14, [0133]: “The primary outputs of the shared learning and prediction component 506 are the trained models, which are sent to the model store 508 …”). Hence, these respective normal and anomalous instance types associated with a machine are part of the machine behavioral model stored in the model store, and hence can be retrieved by the system according to the teachings in Shumpert [0034] to identify and select the corresponding model from the model store that is associated with that machine. In other words, the selection is based on being appropriate for the machine that produced the data in the respective instance. Hence, Applicant’s sub-argument is not persuasive, and the prior art rejection is maintained.
Regarding Applicant’s sub-argument that the usage of appropriateness in the Shumpert reference is not within the scope of Applicant’s recited limitation in the independent claim (“… selecting, based on the output of at least one normal behavior pattern, at least one machine behavioral model …”), and that the Non-Final Office Action mailed April 12, 2022 does not cite evidence as to what is “appropriate” as taught in the Shumpert reference, Examiner also finds this sub-argument to be not persuasive. Examiner points out that the term “appropriate” is only used in the Shumpert reference, and is not part of Applicant’s recited claim limitation. Examiner further reminds Applicant that MPEP 2111 requires that during patent examination, the pending claims must be given their broadest reasonable interpretation consistent with the specification, and an Examiner must construe claim terms in the broadest reasonable manner during prosecution as is reasonably allowed in an effort to establish a clear record of what applicant intends to claim. Under its broadest reasonable interpretation in light of Applicant’s specification paragraph [0052], the term “machine behavioral model” broadly indicates a set of data and inferences representing a set of behaviors, patterns, or characteristics associated with a machine. Furthermore, under its broadest reasonable interpretation, the term “selecting” broadly indicates an action related to identifying and choosing an item from a group, and hence Applicant’s recited claim limitation broadly indicate identifying and choosing at least one machine behavioral model, where the selection criteria is based using a produced output that identifies at least one normal behavior pattern. Examiner additionally notes that Applicant’s recited limitation does not further specify nor restrict the scope of the term “selecting” to perform a specific set of criteria or steps related to the selection of a model, other than indicating that a normal behavior pattern is used to identify (and hence select) the machine behavioral model. As established earlier, a person having ordinary skill in the art would only select a model based on meeting certain criteria to perform the relevant task, where meeting this criteria is a measure of “appropriateness”, or suitability/compatibility of the model with respect to the relevant task. Hence, it would be obvious to a person having ordinary skill in the art that given a choice between different models stored in a model store, one would select an appropriate model that is suited for the relevant task (versus an inappropriate or random model). As established earlier in response to Applicant’s sub-argument in the preceding paragraph, Shumpert [0034] specifically states the “appropriateness” (i.e., suitability or compatibility) criteria of the model is based on a machine that produced the data in the respective instance, with the additional identified citations in Shumpert providing the evidence that a machine that produced the data in the respective instance is represented by a model containing those normal and abnormal machine behavior instances of data learned through unsupervised machine learning techniques. Hence, a model contains data that is also used to determine the suitability or compatibility of a model, and hence this data represents the criteria that identifies and determines the selection of the model from a plurality of stored models. In fact, Examiner points out that the scope of the Shumpert reference is consistent in scope with Applicant’s own specification [0062], where Applicant indicates “… based on the determined normal behavior patterns, at least one behavioral model is selected. The selected machine behavioral models may be selected from among a plurality of predetermined machine behavioral models stored in, e.g., at least one database.”. Hence, the Shumpert reference is within the same scope of the Applicant’s recited claim limitation, and therefore, Applicant’s sub-argument is not persuasive, and the prior art rejection is maintained.
Regarding Applicant’s sub-argument that the Shumpert reference does not teach identifying normal behavior patterns output via unsupervised machine learning, Examiner also finds this sub-argument to be not persuasive. Examiner notes that Applicant’s sub-argument is directed to the recited limitation in the independent claim: “… analyzing, via unsupervised machine learning, a plurality of sensory inputs associated with a machine, wherein the unsupervised machine learning outputs at least one normal behavior pattern of the machine”. Under its broadest reasonable interpretation, this limitation broadly recites identifying a normal behavior pattern as an output of unsupervised machine learning based on a plurality of sensory inputs associated with a machine. As established earlier in response to Applicant’s sub-argument in the preceding paragraphs and indicated in the Non-Final Office Action mailed April 12, 2022, Shumpert teaches that the initial instances of the received input sensor data are assumed to be normal, with further determination of these normal or anomalous instance types being the result of further processing of the received input sensor data using unsupervised machine learning (Shumpert [0034]: “… a system for detecting anomalies in data dynamically received from a plurality of sensors associated with one or more machines is provided. … the processing resources being configured, for each instance of data received via the one or more interfaces, to at least: classify, using a model retrieved from the model store, the respective instance as being one of a normal instance type and an anomalous instance type … Each model in the model store is implemented using a k-means cluster algorithm …”; [0064]: “… learning begins immediately with live data … as the first instances come in, they are used to build a model of the characteristics of the data … unsupervised learning techniques are used to build the initial model, and initial instances may be assumed normal. Using the unsupervised prediction techniques below, some instances eventually will be flagged as potentially anomalous …”; [0074]: “… individual sensor readings have been grouped by timestamp and machine to capture the operational state of that machine at that time …”; [0080]: “… the shared model itself may be implemented using a modified k-means clustering algorithm, where the current instance is predicted to be the class of its nearest cluster (in terms of a multivariate distance measure to the centroid of the cluster) … The current instance is predicted to be a new potential anomaly if it is nowhere near any of the existing clusters …”). As indicated earlier, the shared model and prediction component performs a classification process to initially generate a model using unsupervised prediction techniques. This classification process involves using a modified k-means clustering algorithm to identify and group machine behavioral instances into new and existing clusters according to the use of a distance threshold parameter, with the eventual result over time being groups of instances of machine behavior identified as normal or abnormal behavior. Using unsupervised machine learning techniques such as a modified k-means clustering algorithm to identify of groups of instances of machine behavior associated with normal behavior corresponds to a process that identifies normal behavior patterns output by an unsupervised machine learning process (Shumpert [0055]-[0056]: “… That is, the shared learning and prediction component 506 predicts the class (normal or anomalous) of data instances as they arrive … Current instances are fed to the shared model and classified as either normal or one of several anomaly types …”; [0075]: “The shared learning and prediction component 506 receives the processed sensor data of FIG. 7. It separates out the anomalous instances and uses the normal data and unsupervised machine learning techniques to train multiple models, one for each machine being monitored … ”; [0097]: “… it continues to learn indefinitely as new instances are received. It builds the clusters incrementally, using a distance threshold parameter to decide when a new cluster is warranted … the number of clusters k does not have to declared upfront; rather, that may be learned over time and may continue to be dynamic as the system encounters new machine behavior …”; and Figure 12, [0130]: “… FIG.12 shows the state of the shared model after 5 data instances have been received. The centroid of cluster             
                
                    
                        c
                    
                    
                        1
                    
                
            
         has evolved from its initial position at instance 1 and now represents the center of the cluster. This cluster is classified as normal, and therefore so are all 5 of these instances … instance 4 was some distance away from instances 1-3 but not so far away as to represent a potential anomaly …”). Hence, Applicant’s sub-argument is not persuasive, and the prior art rejection is maintained.
Regarding Applicant’s Remarks:
“The office action points to various portions of Shumpert for teaching use of unsupervised machine learning. In particular, the office action cites paragraph 64 of Shumpert as teaching "unsupervised learning techniques are used to build the initial model." However, using unsupervised learning to build a model is not the same as using unsupervised learning to select which model to use. Further, using the output of the model to classify an instance as being either normal or anomalous as mentioned in paragraph 34 of Shumpert is also different from using the output of the model to select a model to use. After diligent review, applicant could not identify any portion of Shumpert that suggests using outputs of machine learning to select machine behavior models, let alone machine behavior patterns output via unsupervised machine learning. The office action does not explain this discrepancy.”
Examiner has considered this argument and finds the argument to be not persuasive. Examiner points out that Applicant’s recited claim limitations in the independent claim (“… analyzing, via unsupervised machine learning, a plurality of sensory inputs associated with a machine, wherein the unsupervised machine learning outputs at least one normal behavior pattern of the machine; … selecting, based on the output of at least one normal behavior pattern, at least one machine behavioral model”) broadly recite identifying a normal behavior pattern as an output of unsupervised machine learning based on a plurality of sensory inputs associated with a machine, and using this normal behavior pattern (i.e., the output of the unsupervised machine learning analysis) to identify and choose at least one machine behavioral model. Examiner points out that the claims do not recite using unsupervised machine learning to select a machine learning model, as alleged by Applicant. Applicant’s arguments appear to be based on requesting the Examiner to ignore Applicant’s own limitations of requiring that the unsupervised machine learning outputs at least one normal behavior pattern of the machine, and using that normal behavior pattern to perform the selection of the machine behavioral model. Examiner also notes that Applicant’s specification is consistent with the Examiner’s above interpretation, where Applicant describes that it is the output of the unsupervised machine learning analysis (i.e., the output associated with a normal behavior pattern) that is used to select the machine behavioral model ([0059]: “At S520, the sensory inputs are analyzed to determine at least one normal behavior pattern. The analysis includes, but is not limited to, unsupervised machine learning using the preprocessed sensory inputs. The outputs of the unsupervised machine learning process include the at least one normal behavior pattern.” and [0062]: “At S530, based on the determined normal behavior patterns, at least one machine behavioral model is selected. The selected machine behavioral models may be selected from among a plurality of predetermined machine behavioral models stored in, e.g., at least one database.”). Hence, Applicant’s argument contradicts the findings in Applicant’s own specification, and hence is not persuasive from that standpoint. Additionally, as established earlier in response to Applicant’s sub-argument in the preceding paragraphs, Shumpert teaches the outputs of the model are the normal or abnormal machine behavior instances, where the normal machine behavior instances are used to select the corresponding machine behavioral model. Hence, Applicant’s argument is not persuasive, and the prior art rejection is maintained.
Regarding Applicant’s Remarks:
“Yu does not teach the missing features above. Accordingly, the references do not, either individually or in combination, teach all of the claim features. As to the alleged motivation for combining the references, any such motivation is irrelevant at least because the references do not teach all of the claim features. Thus, even assuming that a person having ordinary skill in the art would be motivated to combine the references, such a combination would not result in all of the claim features. Accordingly, the claimed invention would not be obvious to a person having ordinary skill in the art.”
Examiner has considered this argument and finds the argument to be not persuasive. Examiner notes that Applicant alleges that certain features or limitations that were previously entered were omitted from the rejection presented in the Examiner’s Non-Final Office Action mailed April 12, 2022, with these omitted features/limitations marked as “missing features”. However, Examiner points out that none of the Applicant’s earlier arguments labeled any previously entered feature or limitation as being “missing”. Hence, Examiner is assuming that Applicant is trying to use the term “missing features” as a blanket term to encompass the context of their earlier arguments, which Examiner has already addressed in the above paragraphs. Examiner also notes that Applicant further asserts that because Yu does not teach the recited claim limitations that were identified and taught by the Shumpert reference, this represents a failure to combine the Shumpert and Yu references. However, Applicant’s assertion that “any such motivation is irrelevant at least because the references do not teach all of the claim features” is not persuasive, as the Non-Final Office Action mailed April 12, 2022 presents the case of obviousness through the combination of the Shumpert and Yu references. Examiner further points to MPEP 2145 (III) which provides the guidance for the test for obviousness: “The test for obviousness is not whether the features of a secondary reference may be bodily incorporated into the structure of the primary reference …. Rather, the test is what the combined teachings of those references would have suggested to those of ordinary skill in the art.”, with further guidance indicating ("[I]t is not necessary that the inventions of the references be physically combinable to render obvious the invention under review.") and ("Combining the teachings of references does not involve an ability to combine their specific structures."). MPEP 2141(II)(C) further states that “A person of ordinary skill in the art is also a person of ordinary creativity, not an automaton.”, and that “[I]n many cases a person of ordinary skill will be able to fit the teachings of multiple patents together like pieces of a puzzle.”, such that Office personnel may also take into account “the inferences and creative steps that a person of ordinary skill in the art would employ.”. As indicated in the Non-Final Office Action mailed April 12, 2022, Examiner points out that the Yu reference is used to teach the limitation “… generating … an optimal machine behavioral model …”. As indicated earlier, under its broadest reasonable interpretation in light of Applicant’s specification paragraph [0052], the term “machine behavioral model” broadly indicates a set of data and inferences representing a set of behaviors, patterns, or characteristics associated with a machine. Yu teaches a HMM modelling procedure where effective features from healthy sensor data are extracted using a dynamic principal component analysis method and used as input into the HMM model, and the Baum-Welch EM algorithm is used to generate HMM models with this extracted feature data set by estimating and adjusting a model parameter set λ={A,B,π} to maximize the probability P(O, λ) of observing the extracted features. Each model parameter set (containing the state transition probability distribution A, observation probability distribution B, and state probability distribution π) corresponds to a model, where each of these parameters model sets includes B that defines the input observations/features associated with the machine representing normal machine behavior (i.e., healthy sensor data). Yu further teaches that each estimated model λ produced by an EM algorithm is evaluated based on a Bayesian Information Criterion metric, such that the lowest metric value associated with a particular model parameter set is chosen as the best model, where the identification of a best HMM-based model from a plurality of λ models generated by the EM algorithm corresponds to the generation of an optimal machine behavioral model (Yu p.2200 Section I. Introduction; p.2203 Section IV.A. HMM and HMM-Based Bearing Health Monitoring Model 1st-2nd paragraphs; p.2205 col.2 Section IV.D. Application Procedure, Part 1: Off-line modeling Step 3: “… Start off HMM modeling with the feature data set extracted by DPCA. We first use Baum-Welch’s expectation maximization (EM) algorithm to estimate the model parameter set λ … of HMMs that optimizes the likelihood of the training set and then to select an HMM by using BIC criterion (discussed in Section IV-A).”; p.2203 col.2 4th paragraph: “… a complete specification of an HMM requires specifications of model parameters λ={A,B,π}, observation symbols, and the probability distribution P(O, λ) …”; p.2203 col.2 algorithm 2) Baum-Welch Algorithm (Section IV.A. Theoretical Background of HMM); p.2204 col.1 3rd paragraph: “… a more general and appealing alternative testing procedure is to use the Bayesian information criterion (BIC) [32]. For a model λ, BIC is defined as follows: BIC(λ) = -2LL(λ) + p(λ)log(n) … The model having the lowest value is chosen as the best model.”; p.2205 Section IV.D. Application Procedure Part 1: Off-line modeling; and p.2206 col.1 Section V.A Bearing Health Degradation Monitoring 1st paragraph: “The healthy data set (i.e., the first one-third of the whole life) from bearings 1 and 2 is used to construct HMMs … HMMs with three and two hidden states are enough to model the given data set of bearings 1 and 2 …”). The motivation to combine is taught in Yu, where the HMM modelling procedure is shown to efficiently characterize multimodal distributions of the healthy data from complicated data signals sampled from complicated working conditions. Furthermore, identifying the most effective features used for the HMM modeling also simplifies the model and prevents data overfitting, as well as providing improved sensitivity of detecting changes in sensor data (vibration signals) that would indicate machine degradation, such that these benefits result in improved efficiency and robustness in a system implementing this procedure for machine degradation monitoring and analysis (Yu p.2204 col.1 3rd paragraph: “… The reduction of data dimension by DPCA alleviates these difficulties due to reduction of the number of parameters to be determined in HMM. Moreover, the use of DPCA minimizes the information loss resulting from dimension reduction because DPCA aims at extracting the most important information from the given data. Thus, DPCA will be used to extract PCs as the input features to reduce the model complexity and training time cost of HMM …”; and pp.2206-2208 Section V.A. Bearing Health Degradation Monitoring 3rd paragraph: “… it can be observed that the MD charts do not trigger false alarms when bearings are in healthy states, which shows that the warning scheme is effective for reducing false alarms. … the proposed model effectively detected the slight degradation and gave the early alarm. For bearing 1, HMM with DPCA detects the earlier slight degradation (at time point 1760) than HMM with PCA (at time point 1780). … DPCA improves the sensitivity of HMM for detection of slight health degradation.”). Hence, given the evidence provided above, the Yu reference does indeed teach the limitation under the broadest reasonable interpretation of the recited claim, and as such, Applicant’s argument is not persuasive, and the prior art rejection is maintained.
Regarding Applicant’s Remarks:
“Claims 2 and 12 stand rejected under 35 USC 103 as allegedly being unpatentable over Shumpert and Yu in view of Kouadri (An adaptive threshold estimation scheme for abrupt changes detection algorithm in a cement rotary kiln). The rejection is traversed. 
Claims 2 and 12 depend from claims 1 and 11, respectively. For the reasons noted above, claims 1 and 11 are allowable over Shumpert and Yu. Accordingly, claims 2 and 12 are allowable at least by virtue of their respective dependencies from allowable base claims. Kouadri does not teach the missing features noted above. …”
Examiner has considered this argument and finds the argument to be not persuasive. In response to Applicant's arguments against the references individually, one cannot show nonobviousness by attacking references individually where the rejections are based on combinations of references. See In re Keller, 642 F.2d 413, 208 USPQ 871 (CCPA 1981); In re Merck & Co., 800 F.2d 1091, 231 USPQ 375 (Fed. Cir. 1986). Examiner further notes that Applicant alleges that certain features or limitations that were previously entered were omitted from the rejection presented in the Examiner’s Non-Final Office Action mailed April 12, 2022, with these omitted features/limitations marked as “missing features”. However, Examiner points out that none of the Applicant’s earlier arguments labeled any previously entered feature or limitation as being “missing”. Examiner also points out that the response to Applicant’s arguments included all recited limitations that were previously entered. Hence, Examiner is assuming that Applicant is trying to use the term “missing features” as a blanket term to encompass the context of their earlier arguments, which Examiner has already addressed in the above paragraphs. Examiner further notes that Applicant does not provide any additional arguments other than trying to apply the Kouadri reference to teach the limitations in independent Claim 1 (that are also present in independent Claims 10 and 11), where Examiner has already shown that those limitations in the independent claims are taught by the Shumpert and Yu references. As indicated in the Non-Final Office Action mailed April 12, 2022, the Kouadri reference is used to teach the respective limitation from dependent claims 2 and 12 (“… generating, based on the analysis of the plurality of sensory inputs associated with the machine, at least one adaptive threshold for the at least one normal behavior pattern”). Examiner points to the same Non-Final Office Action for the claim mapping details for these dependent claim limitations, as well as the relevant 103 prior art section provided below. Hence, given the reasons provided above, Applicant’s argument is not persuasive, and the prior art rejection is maintained.
Regarding Applicant’s Remarks:
“Claims 7 and 11 stand rejected under 35 USC 103 as allegedly being unpatentable over Shumpert and Yu in view of Mehta (US 2015/0149134). The rejection is traversed. 
Claims 7 and 17 depend from claims 1 and 11, respectively. For the reasons noted above, claims 1 and 11 are allowable over Shumpert and Yu. Accordingly, claims 7 and 17 are allowable at least by virtue of their respective dependencies from allowable base claims. Mehta does not teach the missing features noted above. …”
Examiner has considered this argument and finds the argument to be not persuasive. In response to Applicant's arguments against the references individually, one cannot show nonobviousness by attacking references individually where the rejections are based on combinations of references. See In re Keller, 642 F.2d 413, 208 USPQ 871 (CCPA 1981); In re Merck & Co., 800 F.2d 1091, 231 USPQ 375 (Fed. Cir. 1986). Examiner further notes that Applicant alleges that certain features or limitations that were previously entered were omitted from the rejection presented in the Examiner’s Non-Final Office Action mailed April 12, 2022, with these omitted features/limitations marked as “missing features”. However, as indicated earlier, Examiner points out that none of the Applicant’s earlier arguments labeled any previously entered feature or limitation as being “missing”. Examiner also points out that the response to Applicant’s arguments included all recited limitations that were previously entered. Hence, Examiner is assuming that Applicant is trying to use the term “missing features” as a blanket term to encompass the context of their earlier arguments, which Examiner has already addressed in the above paragraphs. Examiner further notes that Applicant does not provide any additional arguments other than trying to apply the Mehta reference to teach the limitations in independent Claim 1 (that are also present in independent Claims 10 and 11), where Examiner has already shown that those limitations in the independent claims are taught by the Shumpert and Yu references. As indicated in the Non-Final Office Action mailed April 12, 2022, the Mehta reference is used to teach the respective limitations from dependent claims 7 and 17. One of those limitations (“… wherein the clustered at least two machine behavioral models includes each determined representative model”) exhibited a 112(b) indefiniteness issue regarding the term “at least one machine behavioral model” in independent claim 1 only requiring the identification and selection of a single machine behavioral model, and thus for the purposes of examination, this limitation in dependent claim 7 was interpreted as “… wherein the clustered at least one machine behavioral model includes each determined representative model”. Examiner notes that the Applicant has since amended the claims by adding a new limitation in dependent claim 4 to resolve the 112(b) indefiniteness issue of the “at least one machine behavioral model” to include “a plurality of machine behavioral models”. However, Examiner points out that this amended scope does not change the application of the Mehta reference, since the Mehta reference also teaches the grouping of models represented by metadata and associated analytics and parameters collected in a system. Mehta further teaches an example system such as a fan component being built and modeled through a series of subsystems, and shows an actual system representing this fan component such as a jet engine propulsion system, which contains similar components such as an inlet flow and outlet flow, and respective compressors, turbines, and fuel injection components, where each of these components contain sensors and thus can be modelled and monitored as different machine behavioral models. Hence, this fan component system corresponds to a group of at least two machine behavioral models, with each of these subsystem models being represented by the corresponding metadata and associated analytics and parameters, and associated with a jet propulsion system (containing corresponding compressor, turbine, and fuel injection components), resulting in each of these subsystem models including each determined representative model (i.e., the corresponding components in a jet propulsion system) (Mehta Figure 8, [0036]: “FIG. 8 shows an example machine health management system for generating models such as model 804 based on batch analytics 802, historical data 801, and metadata 800. For example, the metadata 800 may describe machines in a system and relationships between these machines. In the example, the metadata describes 12 subsystems, 24 parameters that are measured in those subsystems, and 8 dimensions to the measured parameters. The metadata 800 may include one or more generic definitions, one or more sub-definitions, and one or more relationships between definitions. The metadata 800 provides an organization, schema, or context in which the historical data may be analyzed … a model 804 is built for a fan component based on metadata 800 that describes which sensors relate to the fan component and optionally how these sensors relate to the fan component …”; and [0038]: “FIG. 9 shows an example system or unit with several components and several sensors that may measure operation of these components. The example system of FIG.9 is just one example system that may be monitored and diagnosed by the machine health management system of FIG. 8. As shown, the system is a jet engine propulsion system that has an inlet flow via an inlet. Compressors compress airflow to the combustion chamber where combustion is facilitated by a fuel injection component. Turbines operate due to the pressure from the combustion chamber, and byproduct escapes via the core exit flow.”). The motivation to combine the Shumpert, Yu, and Mehta references is provided in Mehta, since identifying and grouping models according to similar characteristics allows sharing of similar characteristics between models of the same group, as well as learning behaviors from each model that operate under certain operating states, thus making a system that performs this grouping more computationally efficient and storage efficient as only the representative characteristics and behaviors are stored and learned for each group of models (Mehta [0060]-[0061]). Hence, given the evidence provided above, the Mehta reference still teaches the amended limitation as recited, and as such, Applicant’s argument is not persuasive, and the prior art rejection is maintained.
As noted above, Applicant’s amended claims necessitates further examination and re-evaluation of the amended and related original claims. The updated claim mappings according to Applicant’s amended claims are provided in the relevant sections indicated below.

Claim Objections
Claims 4 and 14 are objected to 
because of the following informalities:
Claims 4 and 14: The term “clustering at least two of” in the following limitation “… clustering at least two of the selected plurality of machine behavioral models” should be corrected as “… clustering at least two machine behavioral models from the selected plurality of machine behavioral models” to make it more clear that it is two of the machine behavioral models (from the selected plurality of the machine behavioral models) that is being associated with a cluster/group. Appropriate correction is required.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1, 3-6, 8-11, 13-16, and 18-19 are rejected under 35 U.S.C. 103 as being unpatentable over 
Shumpert, James Michael, U.S. PGPUB 2016/0342903, filed 5/21/2015 [hereafter referred as Shumpert] in view of Yu, Jianbo, Health Condition Monitoring of Machines Based on Hidden Markov Model and Contribution Analysis, IEEE August 2012 [hereafter referred as Yu].
Regarding amended Claim 1, 
Shumpert teaches
A method for allocating machine behavioral models, comprising: 
analyzing, via unsupervised machine learning, a plurality of sensory inputs associated with a machine, wherein the unsupervised machine learning outputs at least one normal behavior pattern of the machine (Examiner’s note: Shumpert Figure 3 teaches a system for detecting anomalies from sensor data associated with machines, where this system incrementally trains a model using sensor data (where this data from multiple sensors containing readings or characteristics from the same machine represent instances of machine behavior, Shumpert Figure 6 and [0053]). Shumpert additionally teaches the received data instances are based on live unlabeled data, where the initial model is built using a modified k-means clustering algorithm that identifies normal data and anomalous data instances through clustering of data instances. The initial data instances fed into the model are initially predicted to be normal, with each created model eventually being represented as a set of normal and anomalous clusters of instance data, such that the normal clusters represent normal behavior patterns/operational states of the machine. Shumpert additionally teaches that the output of the unsupervised machine learning are the trained models containing the normal and anomalous clusters, and as such the process of generating these trained models containing the normal and anomalous clusters based on performing unsupervised machine learning on received sensor data correspond to an analyzing process involving applying unsupervised machine learning on a plurality of sensory inputs associated with a machine, where the unsupervised machine learning produces trained models that contain at least one normal behavior pattern of the machine (Shumpert [0034]: “… a system for detecting anomalies in data dynamically received from a plurality of sensors associated with one or more machines is provided. … the processing resources being configured, for each instance of data received via the one or more interfaces, to at least: classify, using a model retrieved from the model store, the respective instance as being one of a normal instance type and an anomalous instance type … Each model in the model store is implemented using a k-means cluster algorithm …”; Figure 4 step S404, [0055]; [0064]: “... learning begins immediately with live data. … as the first instances come in, they are used to build a model of the characteristics of the data. … unsupervised learning techniques are used to build the initial model, and initial instances may be assumed normal. Using the unsupervised prediction techniques described below, some instances eventually will be flagged as potentially anomalous …”; [0074]: “… individual sensor readings have been grouped by timestamp and machine to capture the operational state of that machine at that time …”; [0080]: “… the shared model itself may be implemented using a modified k-means clustering algorithm, where the current instance is predicted to be the class of its nearest cluster (in terms of a multivariate distance measure to the centroid of the cluster) … The current instance is predicted to be a new potential anomaly if it is nowhere near any of the existing clusters …”; and Figures 12-14 and [0133]: “The primary outputs of the shared learning and prediction component 506 are the trained models, which are sent to the model store 508. …”).); 
selecting … at least one machine behavioral model (Examiner’s note: Under its broadest reasonable interpretation, the term “model” as defined by the Merriam-Webster dictionary broadly recites a system of postulates (a set of conditions or hypothesis), data, and inferences presented as a mathematical description of an entity. In light of Applicant’s specification paragraph [0052], the term “machine behavioral model” broadly indicates a model (i.e., a set of data and inferences) representing a set of behaviors, patterns, or characteristics associated with a machine. As indicated earlier, Shumpert teaches a system for detecting anomalies associated with one or more machines, where the trained models are stored in a model store component. Shumpert teaches that the model being selected is based on the model being suitable or compatible (i.e., appropriate) with the machine that produced the data in the respective instance, where the data in the respective instance points to the identification of normal and anomalous instances of data. Shumpert indicates that the respective instances associated with a machine are either normal instance types or anomalous instance types, assuming that the initial instances of the received input sensor data associated with a machine are normal, with further determination of these normal or anomalous instance types being the result of further processing of the received input sensor data using unsupervised machine learning to classify the respective instances as either normal or anomalous (Shumpert [0034]: “… the processing resources being configured, for each instance of data received via the one or more interfaces, to at least: classify, using a model retrieved from the model store, the respective instance as being one of a normal instance type and an anomalous instance type, the retrieved model being selected from the model store as being appropriate for the machine that produced the data in the respective instance if such a model exists in the model store …”; [0053]: “Data from multiple sensors in the same time frame for the same machine may be thought of as forming an instance of machine behavior.”; [0055]-[0056]; [0075]; [0097]; [0116]; and Figure 12, [0130]). Shumpert further indicates that the shared model consisting of a set of data representing normal machine behavior and abnormal machine behavior (as shown in Shumpert Figures 12-14) is kept in the model store, and retrieved for additional updating and re-training based on the suitability or compatibility with the machine that produced the data in the respective instance (Shumpert [0066]; [0133]-[0134]: “The primary outputs of the shared learning and prediction component 506 are the trained models, which are sent to the model store 508 … These models are sent from and retrieved by the shared learning and prediction component 506 …”; and [0156]). Hence, these respective normal and anomalous instance types associated with a machine are part of the machine behavioral model stored in the model store, and hence can be retrieved by the system according to the teachings in Shumpert [0034] to identify and select the corresponding model from the model store that is associated with that machine.); 
generating, based on the selected at least one machine behavioral model, an … machine behavioral model representing behavior of the machine (Examiner’s note: As indicated earlier, under its broadest reasonable interpretation in light of Applicant’s specification paragraph [0052], the term “machine behavioral model” broadly indicates a set of data and inferences representing a set of behaviors, patterns, or characteristics associated with a machine. Shumpert teaches a system containing a shared learning and prediction component that updates existing retrieved models (one model for each machine being monitored) from the model store by performing an adaptation process such that the model is more robust to changing conditions over time. Shumpert further teaches applying an instance-weighting window algorithm (based on a mean-time between failure calculation to estimate a machine’s useful life) to apply stronger weights/higher priority to new data instances by identifying the clusters being formed that are more relevant over the useful life of a machine, where the cluster mean and the associated covariance matrix data (which is part of the stored machine behavioral model) are re-computed over time to allow a model to adapt to non-anomalous but slowly changing sensor readings caused by machine age or different environmental conditions. Shumpert teaches that this adaptation allows the model to continue to predict new anomalies during a useful life period of the machine while also accommodating gradual concept drift over time, and hence this adaptation process corresponds to a generation of a machine behavioral model representing behavior of a machine (Shumpert [0065], [0075]-[0076]: “The shared learning and prediction component 506 receives the processed sensor data … It separates out the anomalous instances and uses the normal data and unsupervised machine learning techniques to train multiple models, one for each machine being monitored … the shared learning and prediction component 506 labels the anomalous instances and normal instances and uses them to update the shared models …”; [0077]-[0078]: “In order for the models to handle concept drift and adapt to changing conditions over time, the shared learning and prediction component 506 may give stronger weights to newer data instances than older data … Once the shared models are trained, the shared learning and prediction component 506 predicts new anomalies by looking for instances that do not match the learned parameters of the normal instances or that match the parameters of known anomalies.”; and [0123]-[0129]).); and 
allocating the generated … machine behavioral model to the machine by providing the … machine behavioral model to a machine monitoring system, wherein the machine monitoring system is configured to monitor behavior of the machine (Examiner’s note: Under its broadest reasonable interpretation, this limitation broadly recites applying the generated behavioral model in a machine monitoring system to monitor and detect anomalies in the machine behavior. As indicated earlier, Shumpert teaches a system containing a shared learning and prediction component that updates existing retrieved models (one model for each machine being monitored) from the model store by performing an adaptation process to generate machine behavioral models representing the behavior of a machine. Shumpert additionally teaches the generated machine behavioral model is being used to determine in real-time whether the received input sensor data can be associated with to the known groups of normal or abnormal behavior instances associated with the generated machine behavioral model. Shumpert further teaches the same system containing a workflow management component that receives potential anomalous data instances from the shared learning and prediction component to be confirmed by a domain expert whether they are indeed anomalous, where the workflow management component takes these flagged instances and surrounding instances, and retrieves additional context data records from a knowledgebase including machine service history, failure type, known causes, and recommended remediation approaches associated with a machine. Shumpert also teaches that the anomalous indication is also sent back to the shared learning and prediction component to retrieve the shared model and further re-train the model with this data. Hence, this process of using the generated machine behavior model on a system to identify and associate input sensor data with either normal or anomalous data instances in real-time, and associating the anomalous data instances with corresponding data records pertaining to a machine’s service history, failure type, recommended remediation approaches, and using the confirmed anomalous data instances as feedback into re-training the model corresponds to a monitoring process that applies a generated machine behavioral model in a machine monitoring system to monitor behavior of the machine (Shumpert [0064]; [0116]: “… Unsupervised clustering occurs for each new instance, as long as the instance is within the limits of existing clusters … potential anomalous instances are routed for manual review and classification by a domain expert. If they are confirmed as anomalous, the new cluster to which the instance is assigned is classified as anomalous. Thus, in real-time as the data stream is received, the shared model’s clusters are elaborated with a classification from various states of normal or abnormal behavior …”; [0133]; [0137]-[0138]: “The primary input to the workflow management component 510 is a flagged instance record along with the immediately surrounding instances for context. It then retrieves additional context data from the knowledgebase 514 such as machine service history … it also retrieves failure type, known causes, and recommended remediation approaches … the workflow management component 510 presents relevant information to … an operator in order for the recommended remediation actions to be taken …”; [0143]-[0147]; and [0152]: “… The shared learning and prediction component 506 is also informed that the suspect instance has indeed been judged to be anomalous, and … the shared model is retrieved from the model store 508 and retrained.”).).  
While Shumpert teaches training and storing multiple models, where each training of a model generates a machine behavioral model using an instance-weighting window algorithm to identify normal and abnormal data instances in received input sensor data, Shumpert does not explicitly teach
… generating … an optimal machine behavioral model …
… by providing the optimal machine behavioral model … in order to detect anomalies using the allocated optimal machine behavioral model.
Yu teaches
… generating … an optimal machine behavioral model (Examiner’s note: As indicated earlier, under its broadest reasonable interpretation in light of Applicant’s specification paragraph [0052], the term “machine behavioral model” broadly indicates a set of data and inferences representing a set of behaviors, patterns, or characteristics associated with a machine. Yu teaches a HMM modelling procedure where dynamic principal component analysis (DPCA) is used to extract effective features from healthy sensor data to be further used as input into the HMM model. Yu additionally teaches applying the Baum-Welch EM algorithm to generate HMM models with this extracted feature data set by estimating and adjusting a model parameter set λ={A,B,π} to maximize the probability P(O, λ) of observing the extracted features. Each model parameter set (containing the state transition probability distribution A, observation probability distribution B, and state probability distribution π) corresponds to a model, where each of these parameters model sets includes B that defines the input observations/features associated with the machine representing normal machine behavior (i.e., healthy sensor data). Yu further teaches that each estimated model λ produced by an EM algorithm is evaluated based on a Bayesian Information Criterion metric, such that the lowest metric value associated with a particular model parameter set is chosen as the best model, where the identification of a best HMM-based model from a plurality of λ models generated by the EM algorithm corresponds to the generation of an optimal machine behavioral model (Yu p.2200 Section I. Introduction; p.2203 Section IV.A. HMM and HMM-Based Bearing Health Monitoring Model 1st-2nd paragraphs; p.2205 col.2 Section IV.D. Application Procedure, Part 1: Off-line modeling Step 3: “… Start off HMM modeling with the feature data set extracted by DPCA. We first use Baum-Welch’s expectation maximization (EM) algorithm to estimate the model parameter set λ … of HMMs that optimizes the likelihood of the training set and then to select an HMM by using BIC criterion (discussed in Section IV-A).”; p.2203 col.2 4th paragraph: “… a complete specification of an HMM requires specifications of model parameters λ={A,B,π}, observation symbols, and the probability distribution P(O, λ) …”; p.2203 col.2 algorithm 2) Baum-Welch Algorithm (Section IV.A. Theoretical Background of HMM); p.2204 col.1 3rd paragraph: “… a more general and appealing alternative testing procedure is to use the Bayesian information criterion (BIC) [32]. For a model λ, BIC is defined as follows: BIC(λ) = -2LL(λ) + p(λ)log(n) … The model having the lowest value is chosen as the best model.”; p.2205 Section IV.D. Application Procedure Part 1: Off-line modeling; and p.2206 col.1 Section V.A Bearing Health Degradation Monitoring 1st paragraph: “The healthy data set (i.e., the first one-third of the whole life) from bearings 1 and 2 is used to construct HMMs … HMMs with three and two hidden states are enough to model the given data set of bearings 1 and 2 …”).) …
… by providing the optimal machine behavioral model … in order to detect anomalies using the allocated optimal machine behavioral model (Examiner’s note: As indicated earlier, Yu teaches generating an optimal machine behavioral model based on evaluating a Bayesian Information Criterion metric that identifies a best model parameter set λ out of a plurality of estimated model parameter sets produced by an EM algorithm. Yu further teaches an online health monitoring procedure where incoming observations of bearing data from different bearing types are processed to identify a set of principal components which are fed as inputs into the generated baseline HMM model (“optimal machine behavioral model”), to further calculate and plot a Mahalanobis distance for each received input, and comparing the Mahalanobis distance against a predetermined threshold to determine whether a respective bearing (corresponding to a machine) is in a degradation or healthy state over a period of time (where the detection of a degradation state represents detection of an anomaly). A person having ordinary skill in the art would understand that the Mahalanobis distance calculations and plots shown in Yu p.2207 Figures 4-5 are produced by a computing system, and hence this computing system that executes this online health monitoring procedure based on incoming observations of bearing data from different bearing types correspond to a process that provides the optimal machine behavioral model to a machine monitoring system in order to detect anomalies (Yu p.2200 Section I. Introduction 2nd paragraph: “… monitoring and assessing the trend of machine degradation allow the degraded behavior or faults to be corrected before they cause machine breakdown …”; p.2204 Section IV.B. HMM-Based Bearing Health Monitoring Model 1st-2nd paragraphs: “… uses HMM as a monitoring tool of bearing health state … Once an HMM is trained by using a vibration data set from healthy bearings, it is then used to monitor bearing health states …”; p.2205 Section IV.D. Application Procedure Part 2: Online health monitoring and assessment; pp.2205-2206 Section V. Experiment and Result Analysis and p.2206 Figure 2; p.2207 Figures 4-5 and pp.2206-2208 Section V.A Bearing Health Degradation Monitoring 2nd-3rd paragraphs: “… the full life cycle data from bearings 1 and 2 are inputted into the baseline HMM and the corresponding MDs are calculated. A threshold with Type I error 99.9% is used to trigger alarms for determining whether bearing health degradation is happening. … The MD charts based on HMM with DPCA are presented in Figs.4(b) and 5(c) for bearings 1 and 2 … the proposed model effectively detected the slight degradation and gave the early alarm …These features of the MD chart facilitate a reliable bearing health monitoring … and can be an effective health indication.”).).
Both Shumpert and Yu are analogous art since they both teach generating models for monitoring and analyzing machine sensor data.
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to take the model generation taught in Shumpert and incorporate the HMM modeling procedure taught in Yu as a way to generate and determine an optimal machine model for a machine. The motivation to combine is taught in Yu, where the HMM modelling procedure is shown to efficiently characterize multimodal distributions of the healthy data from complicated data signals sampled from complicated working conditions. Identifying the most effective features used for the HMM modeling also simplifies the model and prevents data overfitting, as well as providing improved sensitivity of detecting changes in sensor data (e.g., vibration signals) that would indicate machine degradation, such that these benefits result in improved efficiency and robustness in a system implementing this procedure for machine degradation monitoring and analysis (Yu p.2204 col.1 3rd paragraph: “… The reduction of data dimension by DPCA alleviates these difficulties due to reduction of the number of parameters to be determined in HMM. Moreover, the use of DPCA minimizes the information loss resulting from dimension reduction because DPCA aims at extracting the most important information from the given data. Thus, DPCA will be used to extract PCs as the input features to reduce the model complexity and training time cost of HMM …”; and pp.2206-2208 Section V.A. Bearing Health Degradation Monitoring 3rd paragraph: “… it can be observed that the MD charts do not trigger false alarms when bearings are in healthy states, which shows that the warning scheme is effective for reducing false alarms. … the proposed model effectively detected the slight degradation and gave the early alarm. For bearing 1, HMM with DPCA detects the earlier slight degradation (at time point 1760) than HMM with PCA (at time point 1780). This illustrates that DPCA improves the sensitivity of HMM for detection of slight health degradation.”).
Regarding original Claim 3, 
Shumpert in view of Yu teaches
The method of claim 1, wherein selecting the at least one machine behavioral model further comprises: 
querying at least one database for machine behavioral models, wherein each selected machine behavioral model is among a plurality of machine behavioral models returned with respect to the query (Examiner’s note: Under its broadest reasonable interpretation, this limitation of querying at least one database for machine behavioral models broadly recites an action of requesting/accessing and retrieving a machine behavioral model from a database that stores a plurality of machine behavioral models. As indicated earlier, Shumpert teaches a model store serving as a repository for all trained models, where the model store is implemented as an in-memory data grid data structure (IMDG) with custom software code to perform retrieval of the IMDG data structure (Shumpert [0134]-[0135]: “The model store serves as a repository for all the trained models … The model store 508 may be implemented with or as an in-memory data grid (IMDG) … an IMDG is a data structure that resides entirely in RAM … the data model is non-relational and is object-based …”). As indicated earlier, this model store is accessed as incoming streaming data is being received, in order to select the appropriate model for the machine that produced the processed data instances (Shumpert [0034]: “… the retrieved model being selected from the model store as being appropriate for the machine that produced the data in the respective instance if such a model exists in the model store …”; [0075]: “The shared learning and prediction component 506 receives the processed sensor data … It separates out the anomalous instances and uses the normal data and unsupervised machine learning techniques to train multiple models, one for each machine being monitored …”; [0133]: “The primary outputs of the shared learning and prediction component 506 are the trained models … These models are retrieved for predictions and updated on a regular basis …”; and [0152]: “The shared learning and prediction component 506 … updates its training data to reflect this new information. The shared model is retrieved from the model store 508 and retrained …”).).  
Regarding amended Claim 4, 
 Shumpert in view of Yu teaches
The method of claim 1, 
wherein the selected at least one machine behavioral model is a plurality of machine behavioral models (Examiner’s note: Yu teaches performing bearing run-to-failure tests to collect observation data, associated test data, and the corresponding faults for each bearing device on a specially designed test rig. Each bearing device represents a machine, and as such, the whole life data and the associated test data (e.g., vibration data collected at every 20 min, experimental conditions such as radial load, alignment, rotation speed, and observed data representing different types of faults/failures such as inner race, outer race, and ball defect) from each bearing corresponds to a machine behavioral model. Yu further teaches using the first third of the whole life data set from bearings 1 and 2 (representing healthy bearing data) to construct the baseline HMM model (representing the “optimal machine behavioral model”), where the selection of data related to bearings 1 and 2 to generate the baseline HMM model corresponds to a selection of a plurality of machine behavioral models (Yu p.2206 Figure 2 and pp.2205-2206 Section V. Experiment and Result Analysis: “This experiment performed bearing run-to-failure tests under constant load conditions on a specially designed test rig, as shown in Fig. 2, and the bearing data are from … a prognostic data repository … Test will stop when the accumulated debris adhered to the magnetic plug exceeds a certain level … the collected debris by magnetic plug is still an effective indication as the evidence of bearing health degradation … The data sampling rate is 20kHz, and the data length is 20 480 points. Vibration data were collected every 20 min. Four testings … were implemented … other experimental conditions (e.g., radial load on the shaft, alignment, and rotation speed) were kept the same for the four testings. The full life data from five representative bearings (named bearings 1-5, respectively) whose faults include inner race, outer race, and ball defect are used to test the performance of the proposed methods…”; and p.2206 Section V.A. Bearing Health Degradation Monitoring 1st paragraph: “The healthy data set (i.e., the first one-third of the whole life) from bearings 1 and 2 is used to construct HMMs … HMMs with three and two hidden states are enough to model the given data set of bearings 1 and 2 …”).),
wherein generating the optimal machine behavioral model further comprises: clustering at least two machine behavioral models from the selected plurality of machine behavioral models (Examiner’s note: Under its broadest reasonable interpretation, this claim limitation broadly recites using a group containing at least two machine behavioral models to generate an optimal machine behavioral model. As indicated earlier, Yu teaches a method for performing bearing run-to-failure tests to generate models for several representative bearings (“machines”), where the respective vibration data, experimental condition data, and observed failure/fault data from each bearing corresponds to a different machine behavioral model. As indicated earlier, Yu teaches using the first third of the whole life data set from bearings 1 and 2 (representing healthy bearing data) to construct the baseline HMM (representing the “optimal machine behavioral model”), where the selection of data related to bearings 1 and 2 to generate the baseline HMM model corresponds to a selection of a plurality of machine behavioral models (Yu pp.2205-2206 Section V. Experiment and Result Analysis). Yu further teaches that the resulting baseline HMM model generated by these two models (“generated optimal machine behavioral model”) is further used to calculate Mahalanobis distances to determine whether a bearing (i.e., machine) is in a degradation or healthy state over a period of time. This process of using a group containing two bearings (e.g., bearings 1 and 2) to build respective HMM models to generate the baseline HMM model, corresponds to a process that uses a group containing two machine behavioral models to generate an optimal machine behavioral model (Yu p.2206 Figure 2 and pp.2205-2206 Section V. Experiment and Result Analysis; p.2205 Section IV.D. Application Procedure Parts 1 and 2; and pp.2206-2208 Section V.A. Bearing Health Degradation Monitoring 1st-4th paragraphs).).  
Regarding original Claim 5, 
Shumpert in view of Yu teaches
The method of claim 4, wherein generating the optimal machine behavioral model further comprises: 
extracting, from the plurality of sensory inputs, at least one optimal parameter for each selected machine behavioral model (Examiner’s note: In light of the applicant’s specification paragraph [0070], the term “optimal parameter” used by the Applicant broadly recites parameter values that most represents the behavior of the machine with respect to the model. As indicated earlier, Yu teaches applying dynamic principal component analysis (DPCA) to perform feature extraction, where the identified features and their measurement readings (collected in an observation vector) are augmented with previous l observations, in order to produce lower-dimensional principal component vectors that are further used as inputs into generating and identifying the best HMM model. Yu further teaches these lower-dimensional principal component vectors represent the features and associated parameters that has retained the most variance information of the original data set, and hence this feature extraction process involving DPCA corresponds to an extraction of the most effective features (“optimal parameters”) from the collected machine sensor data (Yu pp.2202-2203 Section III.B DPCA for Feature Extraction: “… DPCA is applied to extract the effective features for bearing health monitoring. DPCA is an extension of PCA … by augmenting each observation vector with the previous l observations … each component (e.g.,                         
                            
                                
                                    x
                                
                                
                                    k
                                
                                
                                    T
                                
                            
                        
                    ) in the data matrix X(l) is a feature vector consisting of the original features generated from the collected vibration signals … By performing PCA on the matrix X(l), the DPCA model is extracted directly from the given data collected from the healthy bearing. DPCA transforms X from a d*(l+1)-dimensional space to a new matrix Y in a new m-dimensional space … These projected vectors in Y are called the principal components (PCs) of the original data set. Thus, the PCs where dynamicity is removed are extracted by DPCA and will then be used as inputs features of HMM. … The implementation of DPCA is given as follows: 1) Obtain normal data … 2) Carry out PCA on the augmented matrix … 3) Use (3) to project the input X to obtain the low-dimensional Y (i.e., PCs), which keeps the most variance information of the given data.”).); and 
calibrating each selected machine behavioral model based on the at least one optimal parameter extracted for the selected machine behavioral model (Examiner’s note: Under its broadest reasonable interpretation, the term “calibrate” as defined by Merriam-Webster dictionary broadly recites an action for performing adjustments for a particular function. As indicated earlier, Yu teaches a HMM modelling procedure where dynamic principal component analysis (DPCA) is used to extract effective features from healthy sensor data to be further used as input into the HMM model. Yu further teaches the identified features and their measurement readings (collected in an observation vector) are augmented with previous l observations, in order to produce lower-dimensional principal component vectors that are further used as inputs into generating and identifying the best HMM model. This determination of the value of l is considered a form of calibration of the extracted features used to build the model. Yu additionally teaches applying the Baum-Welch EM algorithm to generate HMM models with this extracted feature data set by estimating and adjusting a model parameter set λ={A,B,π} to maximize the probability P(O, λ) of observing the extracted features. This process of adjusting the model parameter set to maximize the probability of observing the extracted features (observations) is also a form of calibrating a model based on the extracted features (observations) provided to the model. Yu further teaches performing bearing run-to-failure tests to generate models for several representative bearings (“machines”), where the respective vibration data, experimental condition data, and observed failure/fault data from each bearing corresponds to a different machine behavioral model. Yu teaches performing two time-lagged arrangements as part of the DPCA analysis to capture the dynamics in the vibration data for bearings 1 and 2 as part of extracting feature data to build each HMM model representing each bearing, where this determination of performing this two time-lagged arrangement on the vibration data for bearings 1 and 2 during DPCA analysis correspond to an adjustment (“calibration”) on the optimal parameters extracted performed for each machine behavioral model (Yu pp.2203-2204 III.B. DPCA for Feature Extraction: “Vibration data recorded through accelerometers often show inherent autocorrelation … For a dynamic system, the current values of the observed variables will depend on the past values … To address this important issue, DPCA is applied to extract the effective features for bearing health monitoring … by augmenting each observation vector with the previous l observations and stacking the data matrix … l is the number of lagged measurements … Thus, the PCs where dynamicity is removed are extracted by DPCA and will then be used as input features of HMM. The number l =1 or 2 is usually appropriate for dynamic systems …”; p.2203 col.2 4th paragraph; p.2203 col.2 algorithm 2) Baum-Welch algorithm; and p.2206 Section V.A. Bearing Health Degradation Monitoring 1st paragraph: “The healthy data set (i.e., the first one-third of the whole life) from bearings 1 and 2 is used to construct HMMs. DPCA is first used on the healthy data. It can be found that two time-lagged arrangements are good to capture the dynamics in the recorded vibration signals …”).).  
Regarding original Claim 6, 
Shumpert in view of Yu teaches
The method of claim 5, wherein extracting the at least one optimal parameter for each selected machine behavioral model further comprises: 
applying, for the selected behavioral model, a set of heuristics to the plurality of sensory inputs to determine the at least one optimal parameter for the selected machine behavioral model (Examiner’s note: Under its broadest reasonable interpretation, the term “heuristic” as defined in Merriam-Webster dictionary broadly recites actions involving or serving as an aid to learning, discovery, or problem-solving by experimental and especially trial-and-error methods. As indicated earlier, Yu teaches performing dynamic principal component analysis (DPCA), where the identified features and their measurement readings (collected in an observation vector) are augmented with previous l observations to form the data matrix of observations that is to be further normalized using the mean and standard deviation of each feature before applying DPCA. Yu further teaches the value of l is typically set to either 1 or 2 as being an appropriate value for dynamic systems, where this setting of l (as well as normalizing the data matrix to data values) represents applying a set of heuristic actions for determining the most effective features and their associated parameter values (Yu p.2203 col.1 1st paragraph: “… The number l =1 or 2 is usually appropriate for dynamic systems [28]. … The implementation procedure of DPCA is given as follows: 1) Obtain normal data … Determine the time lag l and constitute the normal data matrix as shown in (2). Normalize the augmented matrix of normal data using the mean and standard deviation of each feature. …”).).  
Regarding original Claim 8, 
Shumpert in view of Yu teaches
The method of claim 1, wherein allocating the generated optimal machine behavioral model further comprises sending the generated optimal machine behavioral model to a machine monitoring system, wherein the machine monitoring system monitors behavior of the machine via unsupervised machine learning using the allocated model (Examiner’s note: Under its broadest reasonable interpretation, this claim limitation broadly recites providing a machine behavioral model to a machine monitoring system to perform further monitoring. As indicated earlier, Shumpert teaches a workflow management component that receives flagged data instances from the shared learning and prediction component that are subject for review to be determined by a domain expert whether they are anomalous or not (Shumpert [0064], [0133], [0137]-[0138]). As indicated earlier, Shumpert teaches that if the flagged instance is determined to be anomalous, then the shared learning and prediction component is further informed, where the shared model is retrieved again from the model store and retrained using the modified k-means cluster algorithm to identify and associate a new cluster with this anomalous data instance, resulting in this model being updated to perform further monitoring and identification of new data instances as they come in over time, using the existing identified normal and anomalous clusters associated with the model, where this monitoring may also trigger retrieval of machine related information from a knowledgebase to perform certain remediation tasks to remediate repeated anomalies for the machine (Shumpert [0151]-[0152]; [0116]: “… Unsupervised clustering occurs for each new instance, as long as the instance is within the limits of existing clusters. …”; [0137]-[0138]; and [0143]-[0147]). As indicated earlier, Yu also teaches applying a HMM model to further monitor and detect changes in sensor data (vibration signals) that would indicate machine degradation, where this HMM model represents the optimal model (Yu p.2206 Section V.A Bearing Health Degradation 1st-3rd paragraphs). Hence, the combination of the Shumpert and Yu references teaches a process in which a shared model retrieved from the model store identifies an incoming data instance (based on sensor data) as anomalous and triggers further retrieval and training of the shared model, such that this process corresponds to the sending of a generated machine behavioral model to a machine monitoring system to perform further monitoring of a machine using the assigned optimal model.).  
Regarding original Claim 9, 
Shumpert in view of Yu teaches
The method of claim 1, further comprising: preprocessing the plurality of sensory inputs, wherein the preprocessing includes extracting at least one feature from raw sensory data (Examiner’s note: Shumpert teaches a ingestion, transformation, and aggregation system component performing data transformation and aggregation of the received sensor data from each machine (Shumpert [0053]-[0054]), where the raw sensor data (Shumpert Figure 6) is further processed and aggregated into data instance records, where the readings and characteristics from the different sensors of the machine (shown in the columns of Shumpert Figure 7) are extracted from the raw sensor data.).  
Regarding amended Claim 10,
Claim 10 recites a non-transitory computer readable storage medium storing instructions for causing a processing circuitry to perform a process comprising of claim limitations that are similar in scope to corresponding claim limitations in Claim 1, and hence is rejected under similar rationale and motivations provided by Shumpert and Yu as indicated in Claim 1. In addition, Shumpert teaches processing resources including at least one processor and memory for performing the receiving of the sensor data and training of each model using a modified k-means cluster algorithm, as well as executing instructions to operate the model store and knowledgebase associated with the system, where these instructions are located on non-transitory computer readable storage medium (Shumpert [0034]-[0035], [0069], [0073], [0153]).
Regarding amended Claim 11,
Claim 11 recites a system comprising of claim limitations that are similar in scope to corresponding claim limitations in Claim 1, and hence is rejected under similar rationale and motivations provided by Shumpert and Yu as indicated in Claim 1. In addition, Shumpert teaches processing resources including at least one processor and memory for performing the receiving of the sensor data and training of each model using a modified k-means cluster algorithm, as well as executing instructions to operate the model store and knowledgebase associated with the system, where these instructions are located on non-transitory computer readable storage medium (Shumpert [0034]-[0035], [0069], [0073], [0153]).
Regarding original Claim 13,
Claim 13 recites the system of claim 11, where the system is further configured to perform claim limitations that are similar in scope to corresponding claim limitations in Claim 3, and hence is rejected under similar rationale provided by Shumpert in view of Yu as indicated in Claim 3, in view of rejections from Claim 11.
Regarding amended Claim 14,
Claim 14 recites the system of claim 11, where the system is further configured to perform claim limitations that are similar in scope to corresponding claim limitations in Claim 4, and hence is rejected under similar rationale provided by Shumpert in view of Yu as indicated in Claim 4, in view of rejections from Claim 11.
Regarding original Claim 15,
Claim 15 recites the system of claim 14, where the system is further configured to perform claim limitations that are similar in scope to corresponding claim limitations in Claim 5, and hence is rejected under similar rationale provided by Shumpert in view of Yu as indicated in Claim 5, in view of rejections from Claim 14.
Regarding original Claim 16,
Claim 16 recites the system of claim 15, where the system is further configured to perform claim limitations that are similar in scope to corresponding claim limitations in Claim 6, and hence is rejected under similar rationale provided by Shumpert in view of Yu as indicated in Claim 6, in view of rejections from Claim 15.
Regarding original Claim 18,
Claim 18 recites the system of claim 11, where the system is further configured to perform claim limitations that are similar in scope to corresponding claim limitations in Claim 8, and hence is rejected under similar rationale provided by Shumpert in view of Yu as indicated in Claim 8, in view of rejections from Claim 11.
Regarding original Claim 19,
Claim 19 recites the system of claim 11, where the system is further configured to perform claim limitations that are similar in scope to corresponding claim limitations in Claim 9, and hence is rejected under similar rationale provided by Shumpert in view of Yu as indicated in Claim 9, in view of rejections from Claim 11.
Claims 2 and 12 are rejected under 35 U.S.C. 103 as being unpatentable over 
Shumpert, James Michael, U.S. PGPUB 2016/0342903, filed 5/21/2015 [hereafter referred as Shumpert] in view of Yu, Jianbo, Health Condition Monitoring of Machines Based on Hidden Markov Model and Contribution Analysis, IEEE August 2012 [hereafter referred as Yu] as applied to Claims 1 and 11; in further view of Kouadri et al., An adaptive threshold estimation scheme for abrupt changes detection algorithm in a cement rotary kiln, Elsevier B.V. 2013 [hereafter referred as Kouadri].
Regarding original Claim 2, 
Shumpert in view of Yu as applied to Claim 1 teaches
The method of claim 1.
While Shumpert in view of Yu teaches adapting distance thresholds based on an instance-weighting window algorithm to detect new anomalies over the useful life of a machine, Shumpert in view of Yu does not explicitly teach
… generating, based on the analysis of the plurality of sensory inputs associated with the machine, at least one adaptive threshold for the at least one normal behavior pattern.
Kouadri teaches
… generating, based on the analysis of the plurality of sensory inputs associated with the machine, at least one adaptive threshold for the at least one normal behavior pattern (Examiner’s note: Kouadri teaches generating adaptive thresholds based on determining mean and variance values for the collected time-series based data representing healthy mode and operating conditions, and computing a confidence interval for the center coordinates of the dataset, where the circles of centers within a confidence region and radii are considered normal, while those circles of centers outside a confidence region are considered anomalous (Kouadri p.838 Section 4. Proposed methodology 2nd-5th paragraphs: “… The proposed fault detection procedures are based on the repeated collected data from the process in a healthy case and for the same operating point. … the instantaneous mean and variance are evaluated using Eqs. (1) and (2). A circle of radius                         
                            
                                
                                    R
                                
                                
                                    k
                                
                            
                        
                     … is calculated in order to cover all the dataset … The circle radius is obtained after computing the center coordinates … of the dataset … For the purpose of an adaptive thresholding, it is required to compute the confidence interval for both sequences {                        
                            
                                
                                    R
                                
                                
                                    1
                                
                            
                        
                    ,                        
                            
                                
                                    R
                                
                                
                                    2
                                
                            
                        
                    ,                        
                            
                                
                                    R
                                
                                
                                    3
                                
                            
                        
                    ,…,                        
                            
                                
                                    R
                                
                                
                                    N
                                
                            
                        
                    } and {(                        
                            
                                
                                    m
                                
                                
                                    
                                        
                                            m
                                        
                                        
                                            i
                                            ,
                                            1
                                        
                                    
                                
                            
                        
                    ,                        
                             
                            
                                
                                    m
                                
                                
                                    
                                        
                                            σ
                                        
                                        
                                            i
                                            ,
                                            1
                                        
                                        
                                            2
                                        
                                    
                                
                            
                        
                    ), (                        
                            
                                
                                    m
                                
                                
                                    
                                        
                                            m
                                        
                                        
                                            i
                                            ,
                                            2
                                        
                                    
                                
                            
                        
                    ,                        
                             
                            
                                
                                    m
                                
                                
                                    
                                        
                                            σ
                                        
                                        
                                            i
                                            ,
                                            2
                                        
                                        
                                            2
                                        
                                    
                                
                            
                        
                    ), (                        
                            
                                
                                    m
                                
                                
                                    
                                        
                                            m
                                        
                                        
                                            i
                                            ,
                                            3
                                        
                                    
                                
                            
                        
                    ,                        
                             
                            
                                
                                    m
                                
                                
                                    
                                        
                                            σ
                                        
                                        
                                            i
                                            ,
                                            3
                                        
                                        
                                            2
                                        
                                    
                                
                            
                        
                    ),…, (                        
                            
                                
                                    m
                                
                                
                                    
                                        
                                            m
                                        
                                        
                                            i
                                            ,
                                            N
                                        
                                    
                                
                            
                        
                    ,                        
                             
                            
                                
                                    m
                                
                                
                                    
                                        
                                            σ
                                        
                                        
                                            i
                                            ,
                                            N
                                        
                                        
                                            2
                                        
                                    
                                
                            
                        
                    ), i=1, …,T}. … A fault in a system can be detected, if a sampled mean and variance of measurement signals is outside the corresponding confidence interval …”; p.839 Section 5. Experimental results and analysis 1st and 2nd paragraphs: “… These experiments were conducted under healthy mode and same operating conditions. … These database parts are recorded from 18 sensors … At each sample time and for each experiment, the instantaneous mean and variance are computed and the datasets centers of these statistical parameters are obtained. … The adaptive thresholding is based on the circles of centers belonging to the obtained confidence region and radii which are calculated under the assumption that the given sampled circles radius is normally distributed. … A fault is detected if the sampled m or                         
                            
                                
                                    σ
                                
                                
                                    2
                                
                            
                        
                     of measured signals do not belong to the confidence region as illustrated in Fig.4.”; and Figures 4, 6, and 7).). 
Both Shumpert in view of Yu and Kouadri are analogous art since they both teach detecting anomalies and faults in collected sensor data associated with a machine.
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to take the adaptive anomaly detection analysis taught in Shumpert in view of Yu and incorporate the adaptive threshold technique taught in Kouadri as a way to improve fault and anomaly detection from collected sensor data. The motivation to combine is taught in Kouadri, since computing confidence regions based on received time-series based sensor data allows a system to specify adaptive thresholds that are based on the statistical mean and variance of the received sensor data, and allows the system overcome the difficulties of using fixed thresholds to observing long time durations between time series sensor data (which tend to increase the rate of false alarms and/or non-detected faults), hence improving the accuracy and robustness of the system in detecting true faults associated with the machine (Kouadri pp.837-838 Section 3 Problem Statement and p.841 Section 6 Conclusion).
Regarding original Claim 12,
Claim 12 recites the system of claim 11, where the system is further configured to perform claim limitations that are similar in scope to corresponding claim limitations in Claim 2, and hence is rejected under similar rationale and motivations provided by Shumpert in view of Yu and Kouadri as indicated in Claim 2, in view of rejections from Claim 11.
Claims 7 and 17 are rejected under 35 U.S.C. 103 as being unpatentable over 
Shumpert, James Michael, U.S. PGPUB 2016/0342903, filed 5/21/2015 [hereafter referred as Shumpert] in view of Yu, Jianbo, Health Condition Monitoring of Machines Based on Hidden Markov Model and Contribution Analysis, IEEE August 2012 [hereafter referred as Yu] as applied to Claims 5 and 15; in further view of Mehta et al., U.S. PGPUB 2015/0149134, published 5/28/2015 [hereafter referred as Mehta].
Regarding amended Claim 7, 
Shumpert in view of Yu as applied to Claim 5 teaches
The method of claim 5 …
…  the calibrated plurality of machine behavioral models (Examiner’s note: Under its broadest reasonable interpretation, the term “calibrate” broadly indicates performing an adjustment, and hence this limitation broadly recites performing adjustments for a plurality of machine behavioral models. As indicated earlier, As indicated earlier, Yu teaches performing dynamic principal component analysis (DPCA), where the identified features and their measurement readings (collected in an observation vector) are augmented with previous l observations, in order to produce lower-dimensional principal component vectors that are further used as inputs into generating and identifying the best HMM model. This determination of the value of l is considered a form of calibration of the extracted features used to build the model. Yu additionally teaches applying the Baum-Welch EM algorithm to generate HMM models with this extracted feature data set by estimating and adjusting a model parameter set λ={A,B,π} to maximize the probability P(O, λ) of observing the extracted features, where this process of estimating and adjusting also a form of calibrating a model based on the extracted features (observations) provided to the model. As indicated earlier, Yu teaches performing bearing run-to-failure tests to generate models for several representative bearings (“machines”), where the respective vibration data, experimental condition data, and observed failure/fault data from each bearing corresponds to a different machine behavioral model. Yu further teaches performing two time-lagged arrangements as part of the DPCA analysis to capture the dynamics in the vibration data for bearings 1 and 2 as part of extracting feature data to build each HMM model representing each bearing, where this determination of performing this two time-lagged arrangement on the vibration data for bearings 1 and 2 during DPCA analysis correspond to an adjustment (“calibration”) performed on each machine behavioral model (Yu pp.2203-2204 III.B. DPCA for Feature Extraction; p.2203 col.2 4th paragraph; p.2203 col.2 algorithm 2) Baum-Welch algorithm; and p.2206 Section V.A. Bearing Health Degradation Monitoring 1st paragraph).) …
… clustered at least two machine behavioral models (Examiner’s note: As indicated earlier, Yu teaches a method for performing bearing run-to-failure tests to generate models for several representative bearings (“machines”), using the first third of the whole life data set from bearings 1 and 2 (representing healthy bearing data) to construct the baseline HMM (representing the “optimal machine behavioral model”), where the selection of data related to bearings 1 and 2 to generate the baseline HMM model corresponds to a selection of a plurality of machine behavioral models (Yu pp.2205-2206 Section V. Experiment and Result Analysis). Yu further teaches that the resulting baseline HMM model generated by these two models (“generated optimal machine behavioral model”) is further used to calculate Mahalanobis distances to determine whether a bearing (i.e., machine) is in a degradation or healthy state over a period of time. This process of using a group containing two bearings (e.g., bearings 1 and 2) to build respective HMM models to generate the baseline HMM model, corresponds to a process that uses a group containing two machine behavioral models to generate an optimal machine behavioral model (Yu p.2206 Figure 2 and pp.2205-2206 Section V. Experiment and Result Analysis; p.2205 Section IV.D. Application Procedure Parts 1 and 2; and pp.2206-2208 Section V.A. Bearing Health Degradation Monitoring 1st-4th paragraphs).)…
While Shumpert in view of Yu teaches a grouping of machine models of a same type, Shumpert in view of Yu does not explicitly teach
… determining, for each portion of the machine, at least one representative model …
… wherein the clustered at least two machine behavioral models includes each determined representative model.
Mehta teaches
… determining, for each portion of the machine, at least one representative model (Examiner’s note: Under its broadest reasonable interpretation, the phrase “each portion of the machine” is interpreted as identifying different elements of a machine (i.e., components or units of a machine). Hence, this claim limitation broadly recites determining a grouping of machine behavioral models representing components or units of a machine, and selecting or assigning a representative model from each respective group for each component of a machine. Mehta teaches storing generic definitions for types of models in computer storage, analyzing the stored current operational data obtained from sensors on machines, and applying models to the operational data to characterize the current operational behavior or determine future expected operational behavior of the system. Mehta also teaches that these generic modeling definitions describe the expected operational behavior of the machines corresponding to states of the machines, where the types of machines includes physical units and components representing machine functions (i.e., groups of components or units associated with a machine). Mehta further teaches a machine health management system that compares and matches/fuses measurements from incoming streaming operational data to existing data clusters in a meta-model description containing descriptions of different components in a machine, as a way to associate the operational data with different subsystems in the machine. Mehta teaches an example of analyzing a fan component in a machine containing multiple subsystems and components (with the data clusters representing normal and abnormal behavior of a machine component). This analysis process of comparing and matching incoming streaming data with existing data clusters associated with other subsystem models to further characterize a machine is a form of assigning or associating subsystem models to elements of a machine. A person having ordinary skill in the art would understand that the analysis process of identifying and selecting a representative model can be performed for all defined components in the meta-model description (Mehta Figure 1, [0021]-[0022]: … The machine operating models may include patterns, and each pattern may be associated with a different set of operating states of machines. …”; [0029]-[0032]: “… the process includes … storing generic definitions that describe expected operational behavior of types of machines that may occur in multiple systems … analyzing data that describes past operation of machines in a system … one or more computing devices … may apply the models to new data to characterize current operational behavior or determine expected future operational behavior of the system …”; Figure 8 and [0035]-[0036]: “… a model 804 is built for a fan component based on metadata 800 … Online analytics 805 is performed by comparing streaming time-series data to model 804 … matching and fusing current measurements from streaming data 807 to clusters in model 804 …”; Figure 9 and [0038]; and [0039]: “… The generic operating definitions describe expected operating behavior for types of machines (that is, types of systems, units, components, or sensors) rather than individual machines … A single type may cover multiple individual machine, and each of these individual machines may differ in some characteristics that do not define the type.”; [0041]: “… A high-level generic definition may include details that apply to systems that are organized to accomplish a particular purpose without regard to which machines may be implemented in those systems. The high-level generic definition may be instantiated by one or more lower-level generic definitions of systems of different types that are implemented. The different types of systems may be further instantiated by definitions that are specific to a given system that is currently deployed at a site.”; and [0046]-[0048]).), 
… wherein the clustered at least two machine behavioral models includes each determined representative model (Examiner’s note: Under its broadest reasonable interpretation, this limitation broadly recites a group of at least two machine behavioral models that includes respective representative models. As indicated earlier, Mehta teaches that these generic modeling definitions describe the expected operational behavior of the machines corresponding to states of the machines, where the types of machines includes physical units and components representing machine functions (i.e., groups of components or units associated with a machine). Mehta additionally teaches that the generic operating definitions store different levels of knowledge about the specific machine used, where the high-level definition contains details without regard to which specific machine may be implemented, with the lower-level definitions instantiating a more specific definition related to the machine being currently deployed. A person with ordinary skill in the art would understand that these generic definitions represent a grouping of machine model types, where each machine model type grouping contains a high-level definition (i.e., a grouping of at least one machine behavioral model) that includes multiple instantiations of specific models (i.e., the determined representative model), where these specific models are being used for characterization and analysis with respect to the incoming streaming data (Mehta [0039] and [0041]). As indicated earlier, Mehta further teaches an example system such as a fan component being built and modeled through a series of subsystems, and shows an actual system representing this fan component such as a jet engine propulsion system, which contains similar components such as an inlet flow and outlet flow, and respective compressors, turbines, and fuel injection components, where each of these components contain sensors and thus can be modelled and monitored as different machine behavioral models. Hence, this fan component system corresponds to a group of at least two machine behavioral models, with each of these subsystem models being represented by the corresponding metadata and associated analytics and parameters, and associated with a jet propulsion system (containing corresponding compressor, turbine, and fuel injection components), resulting in each of these subsystem models including each determined representative model (i.e., the corresponding components in a jet propulsion system) (Mehta Figure 8, [0036]: “FIG. 8 shows an example machine health management system for generating models such as model 804 based on batch analytics 802, historical data 801, and metadata 800. For example, the metadata 800 may describe machines in a system and relationships between these machines. In the example, the metadata describes 12 subsystems, 24 parameters that are measured in those subsystems, and 8 dimensions to the measured parameters. The metadata 800 may include one or more generic definitions, one or more sub-definitions, and one or more relationships between definitions. The metadata 800 provides an organization, schema, or context in which the historical data may be analyzed … a model 804 is built for a fan component based on metadata 800 that describes which sensors relate to the fan component and optionally how these sensors relate to the fan component …”; and Figure 9, [0038]: “FIG. 9 shows an example system or unit with several components and several sensors that may measure operation of these components. The example system of FIG.9 is just one example system that may be monitored and diagnosed by the machine health management system of FIG. 8. As shown, the system is a jet engine propulsion system that has an inlet flow via an inlet. Compressors compress airflow to the combustion chamber where combustion is facilitated by a fuel injection component. Turbines operate due to the pressure from the combustion chamber, and byproduct escapes via the core exit flow.”).).
Both Shumpert in view of Yu and Mehta are analogous art since they both teach performing monitoring of received sensor data using machine behavioral models.
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to take the model generation and calibration steps taught in Shumpert in view of Yu and incorporate the process of grouping at least one or more models taught in Mehta as a way to share and combine learnings from related behaviors from each model within a common group. The motivation to combine is taught in Mehta, since identifying and grouping models according to similar characteristics allows sharing of similar characteristics between models of the same group, as well as learning behaviors from each model that operate under certain operating states, thus making a system that performs this grouping more computationally efficient and storage efficient as only the representative characteristics and behaviors are stored and learned for each group of models (Mehta [0060]-[0061]).
Regarding amended Claim 17,
Claim 17 recites the system of claim 15, where the system is further configured to perform claim limitations that are similar in scope to corresponding claim limitations in Claim 7, and hence is rejected under similar rationale and motivations provided by Shumpert in view of Yu and Mehta as indicated in Claim 7, in view of rejections from Claim 15.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to WILLIAM WAI YIN KWAN whose telephone number is 303-297-4332. The examiner can normally be reached Monday-Friday 8:00am - 4:30pm PT.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Li B Zhen can be reached on 571-272-3768. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/WILLIAM WAI YIN KWAN/Examiner, Art Unit 2121                                                                                                                                                                                                        


/Li B. Zhen/Supervisory Patent Examiner, Art Unit 2121