DETAILED ACTION
1.	This request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on 3 June 2021 has been entered. 
Furthermore, this action is in response to the amendments filed 3 June 2021 for application 15/695694 filed 5 September 2017. The claim and specification objections have been withdrawn in light of the amendments; however, new claim objections are set forth in this action. Claims 1, 4, and 6-10 are currently pending; claims 2, 3, and 5 have been canceled; claims 9 and 10 are new.
  Should applicant desire to obtain the benefit of foreign priority under 35 U.S.C. 119(a)-(d) prior to declaration of an interference, a certified English translation of the foreign application must be submitted in reply to this action.  37 CFR 41.154(b) and 41.202(e).
Failure to provide a certified translation may result in no benefit being accorded for the non-English application.

  
Notice of Pre-AIA  or AIA  Status
The present application is being examined under the pre-AIA  first to invent provisions. 

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:




Claims 1, 4, and 6-10 are rejected under 35 U.S.C. 101. because the claims are directed to an abstract idea; and because the claims as a whole, considering all claim elements both individually and in combination, do not amount to significantly more than the abstract idea, see Alice Corporation Pty. Ltd. v. CLS Bank International, et al, 573 U.S. (2014).
As an initial matter, according to the first part of the Alice analysis (Step 1), the claims were determined to be directed to one of the four statutory categories: an article of manufacture, a method/process (claim 7), a machine/system/product (claims 1, 4, 6, 8-10), and/or a composition of matter.
Secondly, based on the claims being determined to be within one of the four categories (i.e., process, machine, manufacture, or composition of matter) it must be determined if the claims are directed to a judicial exception (i.e., law of nature, natural phenomenon, and abstract idea) (Step 2A). This step consists of a two-prong inquiry: (1) Does the claim recites an abstract idea, law of nature, or natural phenomenon? and (2) Does the claim recite additional elements that integrate the judicial exception into a practical application?
Claims 1, 4, and 6-10 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. The claims recite mathematical concepts. This judicial exception is not integrated into a practical application because it fails to integrate the judicial exception into a practical application and generic recited computer elements do not add meaningful limitations The claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception as discussed in the following analysis.
Regarding independent claims 1, 7, and 8 the following analysis shows that the limitations recite the judicial exception of an abstract idea in the mathematical concepts and mental processes groups and do not recite additional elements that integrate the judicial exception into a practical application.
Claim 1 does not satisfy the two-Prong Test as explained in the analysis of each limitation below:
Step 2A
Prong 1:
… a plurality of sets of first data collected for each operation of an 5 evaluation target, each of the set of first data having both a time width and a condition of an operation, and a plurality of sets of second data including a measured values of the evaluation target obtained by measuring operating states thereof, each of the set of second data being measured within a time width shorter than the time width of the set of first data; 1525  (Yes)  The claim, under its broadest reasonable interpretation, recites the mental concepts of (sensed) information having a temporal extent and condition associated with an action/operation with values associated with that information corresponding to observed/measured states and target values such that the measured/observed state information covers a different (narrower) temporal extent than that associated with the target values. The mere recitation of a generic computer device/processors to represent these data does not take the claim limitation out of the mental processes group. 
… generate a set of characteristic data from both a  set of first data and at least a set of second data, the set of first data being included in plurality of sets of first data, the at least set of second data being associated in time information with the set of first data, the at least set of second data being included in the plurality of sets of second data, the set of characteristic data representing a plurality of characteristics; (Yes)  The claim, under its broadest reasonable interpretation, recites the mental steps of associating according to time and determining/identifying features/characteristics of characteristic of data using both operational target and operational state information. The mere recitation of a generic computer device to perform this association and identification does not take the claim limitation out of the mental processes groups. 
define a plurality of classes for the generated plurality of sets of characteristic data (Yes)  The claim, under its broadest reasonable interpretation, recites the mental step of performing identifying/grouping data into classes/categories which is a mental process. The mere recitation of a generic computer device to perform this calculation does not take the claim limitation out of the mental processes group.
divide the plurality of sets of characteristic data into a plurality of groups on the basis of the defined plurality of classes and condition of operations included in the set of first data  20 …. (Yes)  The claim, under its broadest reasonable interpretation, recites the mental steps of dividing/organizing/grouping data/information according to their characteristics/categories. The mere recitation of a generic computer device to perform these calculation does not take the claim limitation out of the mental processes groups. 
and determine whether there is an abnormality in the evaluation target using a first difference between characteristic data predicted using the first conditional models and the characteristic data and a second difference between characteristic data predicted using the overall model and the characteristic data. (Yes)  The claim, under its broadest reasonable interpretation, recites the mathematical steps of determining an abnormality of an operating state f through the evaluation of differences computed between each predictive model and (observed) characteristic data. The mere recitation of a generic computer device to perform these calculations/determinations does not take the claim limitation out of the mental processes and mathematical concepts groups.
Prong 2 (No): The claim recites additional elements:
An evaluation device, comprising: a storage …   - The processors and memory/storage in the computer system that perform the mathematical and mental steps of generating, associating, determining, defining, dividing, and evaluating are recited at a high level of generality and are no more than mere instructions to apply the exception using a generic computer component.
storing … stored in the storage  – The function of storing data is a mere data gathering step that is recited at a high level of generality that does not impose a meaningful limit on the judicial exception. 
At least one processor configured to execute instructions to: … the at least one processor is further configured to executed the instructions to- The processors and instructions in the computer system that perform the mathematical and mental steps of generating, associating, determining, defining, dividing, evaluating, representing, and learning (a condition) are recited at a high level of generality and are no more than mere instructions to apply the exception using a generic computer component.
generate a first conditional model for each one of the plurality of groups, each of the first conditional models corresponding to a combination of conditions of the characteristic data of a respective one of the plurality of groups; generate an overall model for all of the plurality of sets of characteristic data considering no combination of conditions of the characteristic data – the generation of conditional and overall (non-conditional) models from characteristic data (based on combinations or not based on combinations of conditions) is recited at a high level of generality that simply links to a field of use (model formation/training) and therefore is only generally linked to the judicial exception.
None of these additional elements integrate the judicial exception into a practical application because the computing devices and the training of a machine learning model are recited at a high level of generality and correspond to generic computer functions.  
In addition, according to the second part of the Alice/Mayo test (step 2B), it must be determined if the claim as a whole recite something significantly more than the judicial exception, when considered both individually and as an ordered combination. The recitation in the preamble is insufficient to transform a judicial exception to a patentable invention because the preamble elements are recited at a high level of generality that simply linked to a field of use, see MPEP 2106.05(h). The examiner further notes that the claim limitation(s) below are deemed insufficient to transform a judicial exception to a patentable invention, as described in the analysis that follows below:
The elements in the limitations below are insufficient to transform a judicial exception to a patentable invention because the recited elements are considered insignificant extra-solution activity, see MPEP 2106.05(g):
Generic computer implemented method, processing resources as noted above.
storing … stored in the storage   – It is noted that the claimed extra-solution of data gathering is acknowledged to be well-understood, routine, conventional activity (see, e.g., court recognized WURC examples in MPEP 2106.05(d)(II)(i)). Mere instructions to apply an exception using a generic computer component cannot provide an exception using a generic computer component cannot provide an inventive concept.
generate a first conditional model for each one of the plurality of groups, each of the first conditional models corresponding to a combination of conditions of the characteristic data of a respective one of the plurality of groups; generate an overall model for all of the plurality of sets of characteristic data considering no combination of conditions of the characteristic data – as noted above. It is also noted that the generation of conditional models and (non-conditional) overall models are well-known and conventional functions. See, for example, Radhakrishnan et al. (“A Comparison between Polynomial and Locally Weighted Regression from Fault Detection and Diagnosis of HVAC Equipment”, IECON 2006-32nd Annual Conference on IEEE Industrial Electronics, IEEE, 2006, pp. 3668-3673) – viz., [p. 3669, Section IB, p. 3669, Section II, p. 3671, Section IIIB] In this paper we present a comparison between one local and one global model for the regression step in the context of fault detection and diagnosis of HVACs. We use locally weighted regression for the local model and polynomial regression for the global model., Such a model can either be a global or a local model. In the global model such as polynomial regression, each training point has the same influence on the model. In the local model such as locally weighted regression, the training points closer to the query have more influence on the model than the farther ones., One alternative method for non-linear regression that also has close ties with ordinary linear regression is locallyweighted regression (LWR) [3]. LWR operates by computing a custom model for each specific query point in input space, only after this query point is known. While in polynomial regression each training point has the same influence in determining the coefficients of the global model, in LWR the training points nearer to the query point have much more influence on the coefficients of the polynomial than the training points that are farther away., wherein a global/overall model is generated over all training points/characteristic data (all combinations of conditions) and a local/conditional model is generated over a smaller contextualized/conditional training points/characteristic data for performing anomaly detections.) 
As discussed in the step 1, 2A Prongs 1 and 2, and 2B analyses, claim 1 limitations examined individually or as an ordered combination recites no meaningful limitations that amount to significantly more than the exception itself. In particular, there are no indication that the combination of elements improves the functioning of a computer or improves another technology. Therefore, when looking at the claim elements individually or an ordered combination, claim 1 does not recite identified elements deemed by the courts as "significantly more”.

Claim 7 does not satisfy the two-Prong Test as explained in the analysis of each limitation below:
Step 2A
Prong 1:
An evaluation method, comprising: generating, …, a plurality of sets of first data collected for each operation of an evaluation target, each of the set of first data having both a time width and a condition of an operation a  25 plurality of sets of second data including a measured values of the evaluation target       data being associated in time information with the set of first data, the at least set of second data being included in the plurality of sets of second data, the set of characteristic data representing a plurality of characteristics; 1525    (Yes)  The claim, under its broadest reasonable interpretation, recites the mental concepts of forming/generating sets of data/information having a temporal extent and condition associated with an action/operation with values associated with that information corresponding to target values and to characteristics associated with those values (evaluation target data/state data). The mere recitation of a generic computer device/processors to generate these sets of data does not take the claim limitation out of the mental processes group. 
defining a plurality of classes for the generated plurality of sets of characteristic data;10.15   (Yes)  The claim, under its broadest reasonable interpretation, recites the mental step of performing identifying/grouping data into classes/categories which is a mental process. The mere recitation of a generic computer device to perform this calculation does not take the claim limitation out of the mental processes group. 
dividing the plurality of sets of characteristic data into a plurality of groups on the basis of the plurality of defined classes and condition of operations included in the set of first data; (Yes)  The claim, under its broadest reasonable interpretation, recites the mental steps of dividing/organizing/grouping data/information according to their characteristics/categories. The mere recitation of a generic computer device to perform these calculation does not take the claim limitation out of the mental processes group. 
and determining whether there is an abnormality in the evaluation target using a first difference between characteristic data predicted using the first conditional models and the characteristic data and a second difference between characteristic data predicted using the overall model and the characteristic data. (Yes)  The claim, under its broadest reasonable interpretation, recites the mathematical steps of determining an abnormality of an operating state f through the evaluation of differences computed between each predictive model and (observed) characteristic data. The mere recitation of a generic computer device to perform these calculations/determinations does not take the claim limitation out of the mental processes and mathematical concepts groups.
Prong 2 (No): The claim recites additional elements:
By a computer …   - The processors in the computer system that perform the mental steps of generating, defining, dividing, and evaluating are recited at a high level of generality and are no more than mere instructions to apply the exception using a generic computer component.
generate a first conditional model for each one of the plurality of groups, each of the first conditional models corresponding to a combination of conditions of the characteristic data of a respective one of the plurality of groups; generate an overall model for all of the plurality of sets of characteristic data considering no combination of conditions of the characteristic data – the generation of conditional and overall (non-conditional) models from characteristic data (based on combinations or not based on combinations of conditions) is recited at a high level of generality that simply links to a field of use (model formation/training) and therefore is only generally linked to the judicial exception.
None of these additional elements integrate the judicial exception into a practical application because the computing devices and the training of a machine learning model are recited at a high level of generality and correspond to generic computer functions.  
In addition, according to the second part of the Alice/Mayo test (step 2B), it must be determined if the claim as a whole recite something significantly more than the judicial exception, when considered both individually and as an ordered combination. The recitation in the preamble is insufficient to transform a judicial exception to a patentable invention because the preamble elements are recited at a high level of generality that simply linked to a field of use, see MPEP 2106.05(h). The examiner further notes that the claim limitation(s) below are deemed insufficient to transform a judicial exception to a patentable invention, as described in the analysis that follows below:
The elements in the limitations below are insufficient to transform a judicial exception to a patentable invention because the recited elements are considered insignificant extra-solution activity, see MPEP 2106.05(g):
Generic computer implemented method, processing resources as noted above.
generate a first conditional model for each one of the plurality of groups, each of the first conditional models corresponding to a combination of conditions of the characteristic data of a respective one of the plurality of groups; generate an overall model for all of the plurality of sets of characteristic data considering no combination of conditions of the characteristic data – as noted above. It is also noted that the generation of conditional models and (non-conditional) overall models are well-known and conventional functions. See, for example, Radhakrishnan et al. (“A Comparison between Polynomial and Locally Weighted Regression from Fault Detection and Diagnosis of HVAC Equipment”, IECON 2006-32nd Annual Conference on IEEE Industrial Electronics, IEEE, 2006, pp. 3668-3673) – viz., [p. 3669, Section IB, p. 3669, Section II, p. 3671, Section IIIB] In this paper we present a comparison between one local and one global model for the regression step in the context of fault detection and diagnosis of HVACs. We use locally weighted regression for the local model and polynomial regression for the global model., Such a model can either be a global or a local model. In the global model such as polynomial regression, each training point has the same influence on the model. In the local model such as locally weighted regression, the training points closer to the query have more influence on the model than the farther ones., One alternative method for non-linear regression that also has close ties with ordinary linear regression is locallyweighted regression (LWR) [3]. LWR operates by computing a custom model for each specific query point in input space, only after this query point is known. While in polynomial regression each training point has the same influence in determining the coefficients of the global model, in LWR the training points nearer to the query point have much more influence on the coefficients of the polynomial than the training points that are farther away., wherein a global/overall model is generated over all training points/characteristic data (all combinations of conditions) and a local/conditional model is generated over a smaller contextualized/conditional training points/characteristic data for performing anomaly detections.) 
As discussed in the step 1, 2A Prongs 1 and 2, and 2B analyses, claim 7 limitations examined individually or as an ordered combination recites no meaningful limitations that amount to significantly more than the exception itself. In particular, there are no indication that the combination of elements improves the functioning of a computer or improves another technology. Therefore, when looking at the claim elements individually or an ordered combination, claim 7 does not recite identified elements deemed by the courts as "significantly more”.

Claim 8 does not satisfy the two-Prong Test as explained in the analysis of each limitation below:
Step 2A
Prong 1:
…:generating, …, a plurality of sets of first data collected for each operation of an evaluation target, each of the set of first data having a both a time width and a condition, a  20 plurality of sets of second data including a measured values of the evaluation target obtained by measuring operating states thereof, each of the set of second data being measured within a time width shorter than the time width of the set of first data, and a set of characteristic data from both a set of first data and at least a set of second data, the set of first data being included in the plurality of sets of first data, the at least set of second25 data being associated in time information with the set of first data, the at least set of  31 second data being included in the plurality of sets of second data, the set of characteristic data representing a plurality of characteristics; second data being included in the plurality of sets of second data, the set of characteristic data representing a plurality of characteristics; 1525 (Yes)  The claim, under its broadest reasonable interpretation, recites the mental concepts of forming/generating sets of data/information having a temporal extent and condition associated with an action/operation with values associated with that information corresponding to target values and to characteristics associated with those values (evaluation target data/state data). The mere recitation of a generic computer device/processors to generate these sets of data does not take the claim limitation out of the mental processes group. 
defining a plurality of classes for the generated plurality of sets of characteristic data;10.15   (Yes)  The claim, under its broadest reasonable interpretation, recites the mental step of performing identifying/grouping data into classes/categories which is a mental process. The mere recitation of a generic computer device to perform this calculation does not take the claim limitation out of the mental processes group.  
dividing the plurality of sets of characteristic data into a plurality of groups on the basis of the plurality of defined classes and condition of operations included in the set of first data (Yes)  (Yes)  The claim, under its broadest reasonable interpretation, recites the mental steps of dividing/organizing/grouping data/information according to their characteristics/categories. The mere recitation of a generic computer device to perform these calculation does not take the claim limitation out of the mental processes group. 
and determining whether there is an abnormality in the evaluation target using a first difference between characteristic data predicted using the first conditional models and the characteristic data and a second difference between characteristic data predicted using the overall model and the characteristic data. (Yes)  The claim, under its broadest reasonable interpretation, recites the mathematical steps of determining an abnormality of an operating state f through the evaluation of differences computed between each predictive model and (observed) characteristic data. The mere recitation of a generic computer device to perform these calculations/determinations does not take the claim limitation out of the mental processes and mathematical concepts groups.
Prong 2 (No): The claim recites additional elements:
non-transitory computer-readable recording medium storing an evaluation program for causing a computer to execute:…, by a computer,…   - The processors, memory, and program in the computer system that perform the mental steps of generating, defining, dividing, and evaluating are recited at a high level of generality and are no more than mere instructions to apply the exception using a generic computer component. 
generating a first conditional model for each one of the plurality of groups, each of the first conditional models corresponding to a combination of conditions of the characteristic data of a respective one of the plurality of groups; generating an overall model for all of the plurality of sets of characteristic data considering no combination of conditions of the characteristic data – the generation of conditional and overall (non-conditional) models from characteristic data (based on combinations or not based on combinations of conditions) is recited at a high level of generality that simply links to a field of use (model formation/training) and therefore is only generally linked to the judicial exception.
None of these additional elements integrate the judicial exception into a practical application because the computing devices and the training of a machine learning model are recited at a high level of generality and correspond to generic computer functions.  
In addition, according to the second part of the Alice/Mayo test (step 2B), it must be determined if the claim as a whole recite something significantly more than the judicial exception, when considered both individually and as an ordered combination. The recitation in the preamble is insufficient to transform a judicial exception to a patentable invention because the preamble elements are recited at a high level of generality that simply linked to a field of use, see MPEP 2106.05(h). The examiner further notes that the claim limitation(s) below are deemed insufficient to transform a judicial exception to a patentable invention, as described in the analysis that follows below:
The elements in the limitations below are insufficient to transform a judicial exception to a patentable invention because the recited elements are considered insignificant extra-solution activity, see MPEP 2106.05(g):
Generic computer implemented method, processing resources as noted above.
generating a first conditional model for each one of the plurality of groups, each of the first conditional models corresponding to a combination of conditions of the characteristic data of a respective one of the plurality of groups; generating an overall model for all of the plurality of sets of characteristic data considering no combination of conditions of the characteristic data – as noted above. It is also noted that the generation of conditional models and (non-conditional) overall models are well-known and conventional functions. See, for example, Radhakrishnan et al. (“A Comparison between Polynomial and Locally Weighted Regression from Fault Detection and Diagnosis of HVAC Equipment”, IECON 2006-32nd Annual Conference on IEEE Industrial Electronics, IEEE, 2006, pp. 3668-3673) – viz., [p. 3669, Section IB, p. 3669, Section II, p. 3671, Section IIIB] In this paper we present a comparison between one local and one global model for the regression step in the context of fault detection and diagnosis of HVACs. We use locally weighted regression for the local model and polynomial regression for the global model., Such a model can either be a global or a local model. In the global model such as polynomial regression, each training point has the same influence on the model. In the local model such as locally weighted regression, the training points closer to the query have more influence on the model than the farther ones., One alternative method for non-linear regression that also has close ties with ordinary linear regression is locallyweighted regression (LWR) [3]. LWR operates by computing a custom model for each specific query point in input space, only after this query point is known. While in polynomial regression each training point has the same influence in determining the coefficients of the global model, in LWR the training points nearer to the query point have much more influence on the coefficients of the polynomial than the training points that are farther away., wherein a global/overall model is generated over all training points/characteristic data (all combinations of conditions) and a local/conditional model is generated over a smaller contextualized/conditional training points/characteristic data for performing anomaly detections.) 
As discussed in the step 1, 2A Prongs 1 and 2, and 2B analyses, claim 8 limitations examined individually or as an ordered combination recites no meaningful limitations that amount to significantly more than the exception itself. In particular, there are no indication that the combination of elements improves the functioning of a computer or improves another technology. Therefore, when looking at the claim elements individually or an ordered combination, claim 8 does not recite identified elements deemed by the courts as "significantly more”.
Furthermore, regarding the dependent claims 4, 6, and 9-10 which are dependent on claim 1, the disclosed limitations does not recite identified elements deemed by the courts as "significantly more”. The examiner notes that the dependent claims elements that are deemed insufficient to transform a judicial exception to a patentable invention and are considered part of the abstract idea as noted below:
Claim 4:
Step 2A
Prong 1 (Yes): wherein … to calculate the characteristic data predicted using the first conditional models by applying each of the set of characteristic data of the respective one of the plurality of groups to a corresponding first conditional model, evaluate an operating state on of the evaluation target on the basis of the first difference and the second difference.  (Yes) The claim, under its broadest reasonable interpretation, recites the mathematical calculation of predictive characteristic data associated with a group, the calculation two differences between this data and characteristic data, and the calculation/evaluation of an operating state based on the differences. The mere recitation of a generic computer device to perform this calculation does not take the claim limitation out of the mental processes and mathematical concepts groups. 
Prong 2 (No): The claim recites one additional element:
at least one processor is configured to execute the instructions to: - The processors and instructions in the computer system that perform the mathematical calculations of predictive characteristic data and an evaluative difference are recited at a high level of generality and are no more than mere instructions to apply the exception using a generic computer component.
Step 2B The claim does not recite additional elements that the courts have identified as “significantly more” for the same reasons as pointed out in Claim 1 (i.e., generic computing resources for implementing a computer method is considered insignificant extra-solution activity (MPEP 2106.05(g))). 
Claim 6:
Step 2A
Prong 1 (Yes):
wherein each of the characteristic data includes a plurality of characteristic label data representing measured values of the evaluation target by vectors.; (Yes)  The claim, under its broadest reasonable interpretation, recites mental steps of representing characteristic label data using vectors. The mere recitation of a generic computer device to perform this representation does not take the claim limitation out of the mental processes and mathematical concepts groups. 
Prong 2 (No): The claim does not recite any additional element.
Step 2B: The claim does not recite additional elements that the courts have identified as “significantly more” for the same reasons as pointed out in Claim 1.
Claim 9:
Step 2A
Prong 1 (Yes):
wherein each of the first conditional models is a model representing a relationship between the plurality of sets of the characteristic data of the respective one of the plurality of groups.;  (Yes)  The claim, under its broadest reasonable interpretation, recites the mental step of representing relationships between characteristics using a model for each group. The mere recitation of a generic computer device to perform this representation does not take the claim limitation out of the mental processes and mathematical concepts groups.
Prong 2 (No): (No): The claim does not recite any additional element.
Step 2B The claim does not recite additional elements that the courts have identified as “significantly more” for the same reasons as pointed out in Claim 1. 
Claim 10:
Step 2A
Prong 1 (Yes):
wherein the first model is a model configured to calculate first label data, which is included in each set of the plurality of sets of the characteristic data, on the basis of second label data that is a remainder of each set of the plurality of sets of the characteristic data. (Yes)  The claim, under its broadest reasonable interpretation, recites the mathematical calculation of a label for characteristic data using other labeled characteristic data. The mere recitation of a generic computer device to perform this calculation does not take the claim limitation out of the mental processes and mathematical concepts groups. 
Prong 2 (No): The claim recites does not recite any additional element:
Step 2B: The claim does not recite additional elements that the courts have identified as “significantly more” for the same reasons as pointed out in Claim 1. 

Therefore, as a whole claims 4, 6, and 9-10 do not recite what have the courts have identified as "significantly more”.
In summary, as shown in the analysis above, claims 1, 4, and 6-10 do not provide any additional elements that when considered individually or as an ordered combination, amount to significantly more than the abstract idea identified. Therefore, as a whole claims 1, 4, and 6-10 do not recite what have the courts have identified as "significantly more”. In particular, there is no indication that the combination of elements improves the functioning of a computer or improves another technology when claims are considered individually or as an ordered combination.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claim 1, 4, and 6-10 are rejected under 35 U.S.C. 103 as being unpatentable over Gil Thieberger (US2013/0103624), hereinafter referred to as Thieberger, in view of Jianbo Yu (“Hidden Markov models combining local and global information for nonlinear and multimodal process monitoring”, Journal of Process Control 20, 2010, pp. 344-359), hereinafter referred to as Yu.

In regards to claim 1, Thieberger teaches An evaluation device, comprising: a storage storing a plurality of sets of first data collected for each operation of an 5evaluation target, each of the set of first data having both a time width and a condition of an operation, ([0065, 0069, 0235, 0321, 0384, Figure 8], The term “temporal window of token instances”. also referred to as “window', refers to a set of token instances and other optional values, which correspond to a temporal scope defined by the window. In one example, the window may contain token instances for which at least some portion of their existence took place within the temporal scope that defines the window., Optionally, a target value may be associated with a temporal window of token instances or with one or more token instances. For example, a target value may be an emotional state prediction of the user, or a value derived from the user measurement channels., In one embodiment, tokens may be grouped according to various criteria Such as the tokens' typical context, and/or location of experience by the user. In one example, a high-level token group may be “activity type' which will typically include activities that may last hours like “watching a movie', 'rock climbing”, “reading a book”, “surfing the web'., Optionally, the data in the database is accessible as a collection of temporal windows of token instances and their corresponding annotations…. Optionally, additional information is incorporated into the vectors of the training data, such as variables identifying the situation in which the user is at the time, and/or variables describing a predicted baseline level., wherein a method for predicting a user's response after being exposed to a stream of token instances uses a representation for the stream of token instances as multiple vectors, wherein the vectors represent consecutive temporal window of token instances of a substantially fixed duration, for example 10 seconds, wherein a dataset contains data associated with a set of token instances each of which is an operational event (for example the display of stimuli to a user) from which target information in the form, for example, of the emotional response of the user (so that the determination of the user’s response is the evaluation target) is to be determined such that the span of the token instance corresponds to  the duration of a temporal window (time width) associated with that token instance, wherein this duration may be variable (to correspond to the duration of a scene of interest for example) or fixed at 10 seconds, and wherein each operation (and associated dataset) has condition attributes such as the context/situation of the tokens, the user who is experiencing the stimuli, and the various specific tokens or combinations of tokens (array of stimuli).) and a plurality of sets of second data including measured values of the evaluation target obtained by measuring operating states thereof, each of the set of second data being measured within a time width shorter than the time width of the set of first data; ([0053, 0219, 0225, 0344], The term “user measurement channels', …refer to physiological measurements and/or measurements of unsolicited behavioral cues of the user, which may be either raw measurements and/or processed measurements … such as heart-rate (HR), Blood-Volume Pulse (BVP), Galvanic Skin Response (GSR), Skin Temperature (ST), respiration, electroencephalography (EEG), electrocardiography (ECG), electromyography (EMG), Electrodermal Activity (EDA), and others., In Some cases, the users affective response to token instances may be measured by the sensor substantially continually throughout the period in which the user is exposed to the token instances., In one embodiment, some values of the user measurement channels are stored in a database as time series with short durations between consecutive measurement points., Optionally, the user measurement channels may be stored at different time resolutions, for example, values of EEG signals may stored every 50 milliseconds, while skin temperature may be stored every two seconds., wherein the raw user emotional response (values for evaluation target) data form the state data such that measurement period (sampling frequency/time width) for the channel data is less than the token temporal window duration in order to detect changes in the user measurement channel in response to the token instance stimulus such that the measurement channel may include continually sampled measurements at high sample rates (EEG/heart rate – measured values obtained by measuring operating states) relative to the duration of the token window (first data).) and at least one processor configured to execute instructions to: generate a set of characteristic data from both a  set of first data and at least a set of second data, the set of first data being included in plurality of sets of first data, the at least set of second data being associated in time information with the set of first data, the at least set of second data being included in the plurality of sets of second data, the set of characteristic data representing a plurality of characteristics; ([0200, 0201, 0226, 0087], For example, measurements of the user used that are taken at temporal proximity to the beginning of the exposure of the user to the token instances, may be taken a few seconds before and/or possibly a few seconds after the beginning of the exposure (some measurement channels such as GSR or skin temperature may change relatively slowly compared to fast changing measurement channel such as EEG)., For example, a response may be average of user channel measurement values (e.g., heart rate, GSR) taken during the exposure to the token instances. In another example, a response is a weighted average of values; for instance, user measurement values used to derive the response may be weighted according to the attention of the user as measured at when the user measurements were taken., The user measurement data may be processed and/or normalized in many ways, before, during and/or after the data is stored. In one example, the values of some of the measurements are scaled to be in the range -1, +1. …  user measurements are subjected to feature extraction and/or reduction techniques, such as Fisher projections, Principal Component Analysis (PCA), and/or feature selection techniques like Sequential Forward Selection (SFS) or Sequential Backward Selection (SBS)., The computation of the total response due to the user's exposure to the token instances maybe done with adjustments with respect to the baseline value., wherein characteristic data corresponding to (characteristic of) each token instance exposure is computed/generated by associating the user measurement channel data in time with the exposure onset and extent of the token instance and processing this data to obtain analyzable features, for example averaging, PCA, and SFS, and wherein the normalization of the state data or the modification of the state data based relative to existing baselines is another example of a data generation process applied to state data collected in association within a token instance temporal window.) ([0290, 0299, 0278], Expectation - using the current set of parameter values, compute for each sample the probability that it belongs to each of the situations., The sample generator 697 produces samples 702 corresponding to the temporal windows of token instances 693 and the other inputs 695; the samples 702 are provided to a clustering algorithm 722, which also receives a desired number of situations 704 (corresponding to the desired number of clusters). After running the clustering algorithm in the input data as described above, the clustering algorithms output be utilized as annotations describing the situations for the samples (e.g., the cluster index for each sample)., In one embodiment, the training data may be comprised of both labeled and unlabeled data (for which the situation is unknown), and a semi-supervised machine learning method may be employed to train the classifier., wherein an unsupervised learning algorithm, such as K-means clustering or the Expectation Maximization (EM) algorithm, assigns the state feature data (characteristic data including normalized, baselined, or derived pertinent features) associated with each token instance to a class/situation (associated with an emotional response or situation) and where it is noted the assignment to classes (EM algorithm) may also be performed in the context of semi-supervised learning such as for the learning of the predictor model.) divide the plurality of sets of characteristic data into a plurality of groups on the basis of the defined plurality of classes and condition of operations included in the set of first data 20stored in the storage; ([0288, 0294, 0313, 0407], The assignment process to situations may be hierarchical, and use more than one round of partitioning. For example, all sample grouped together because of a low heart-rate may be further refined by clustering them into several clusters., After running the EM algorithm on the input data as described above, the EM situation annotator can add annotations describing the situation (e.g., group index) for the samples. The samples representing windows that are annotated with situations 708 are provided to a module that performs machine learning classifier training 710, in order to produce the situation classifier model 710 … (e.g., the model may be for a classifier Such as a decision tree, neural network, or a naive Bayes classifier)., The module 710 utilizes a machine learning model training algorithm to train situation classifier model 712 (e.g., the model may be for a classifier such as a decision tree, neural network, or a naive Bayes classifier)., (ii) Situation identifiers and/or values of some token instances, or their attributes, at specific times (such as the time for which the baseline is predicted), which may be used to define the user's situation.,  The data is partitioned into multiple datasets according to the different sets of situations in which the user was in when the data was collected. … Each partitioned training dataset is used to train a separate situation-dependent machine learning-based user response model, from which a situation-dependent library may be derived, which describes the user's expected response to tokens when the user is in a specific situation.; wherein a situation/context (condition of operations) grouping of state response data (characteristic data) is performed given the situation/class labels (from the class definer) associated with that data such that this grouping may be in the form of a partitioning based on different sets of situations (based on situation identifiers for tokens/first data in storage and associated attributes or times) but also may be alternatively based on a situation classifier trained on the annotated situation cases (decision tree for instance) or on additional groupings formed in a hierarchical clustering process in which later rounds of clustering are conditioned on earlier rounds.) generate a first conditional model for each one of the plurality of groups, each of the first conditional models corresponding to a combination of conditions of the characteristic data of a respective one of the plurality of groups; ([0298, 0310, 0348, 0407, 0416, 0417], After running the clustering algorithm, each sample is assigned a situation identifier corresponding to its cluster. The samples with situation assignments may be used to train a classifier for predicting the situation for new unseen samples., A user's baseline level is predicted using a machine learning method, such as a Support vector machine, a regression method, a neural network, or Support vector machine for regression. …Optionally, the data for the samples is collected while the user is in specific situations, in order to train situation-specific baseline predictors for the user., In one embodiment, a machine learning-based predictor is trained for predicting the user's response when exposed to token instances., Each partitioned training dataset is used to train a separate situation-dependent machine learning-based user response model, from which a situation-dependent library may be derived, which describes the user's expected response to tokens when the user is in a specific situation., In one embodiment, a regression model is trained in order to create a library of the user's expected single dimensional real-valued response to token instances representing stimuli. Optionally, the model comprises the regression parameters fi, for 1 <i<N, that correspond to the N possible token instances included in the model. … Optionally, the regression model is a multidimensional regression, in which case, the response for each dimension may be evaluated in the library separately.,  In one embodiment, parameters from the regression model may be used to gain insights into the dynamics of the user's response., wherein various conditional response models (first conditional/baseline model which comprise, for example, a given clustering model) are learned/generated in which that conditional model (e.g., regression model, SVM, neural network, Hidden Markov Models) is learned/trained according to a set/combination (situational) conditions predicated on the user, the token/stimuli, the user emotional state, and/or the situation such that any of these conditioned aspects is a learned/model of a condition/conditional response given characteristic data (including operational state information) that has been organized/grouped according to the relevance to the prediction of that conditional response, including specifically the clustering of the data to learn (and predict) different situations using unsupervised learning (in other words, such that the learned situational condition forms a predicate for determining the particular division of training data used to train several user-specific prediction models that include baseline prediction, token response prediction, emotion prediction) so that the resultant learned predictive models associate the learned conditional operating state (e.g., emotional state, physiological response) with the learned conditional characteristic components of that operational data (conditional physiological response data associated with but not necessarily temporally aligned to stimuli/tokens).) generate an overall model for all of the plurality of sets of characteristic data …, ([0088, 0312, 0314, 0565], In one embodiment, the response to the token instance of interest is the response of the user to the token instance of interest. For example, the response to the token instance of interest is derived from a total response that was based on measurements of the user or on a prediction made from a model of the user (e.g., the model was trained on training data that includes measurements of the user). Additionally or alternatively, the response to the token instance of interest may be considered a response of a general and/or representative user. For example, the response to the token instance of interest is derived from a total response is based on a prediction of a general model (trained mostly on data that does not involve the user)., (i) Computed baseline values for the user for the response variable and/or other variables (such as user measurement channels). Optionally, the baseline values are computed using data collected in different ways, such as by collecting values from time intervals of different durations and/ or times in which the user was in certain situations., (iii) Baseline values computed or collected from other data sources, such as models of other users.,  Optionally, the emotional state predictor for the user uses collaborative filtering in order to estimate the user's response by weighting the response given to essentially similar token instances by other users., wherein a general/overall (baseline) model is generated from data across a set of users such that this model is not based on any combination of the set of conditions used for an (alternative) user-specific conditional model by virtue of forming (aggregating) the model across a plurality of users.) and determine whether there is an abnormality in the evaluation target using a first difference between characteristic data predicted using the first conditional models and the characteristic data ….25([ 0312, 0314, 0348, 0407, 0416, 0417], (i) Computed baseline values for the user for the response variable and/or other variables (such as user measurement channels). Optionally, the baseline values are computed using data collected in different ways, such as by collecting values from time intervals of different durations and/ or times in which the user was in certain situations., (iii) Baseline values computed or collected from other data sources, such as models of other users., In one embodiment, a machine learning-based predictor is trained for predicting the user's response when exposed to token instances. Optionally, the predictor predicts the user's affective response when exposed to the token instances., The data is partitioned into multiple datasets according to the different sets of situations in which the user was in when the data was collected. … Each partitioned training dataset is used to train a separate situation-dependent machine learning-based user response model, from which a situation-dependent library may be derived, which describes the user's expected response to tokens when the user is in a specific situation., In one embodiment, a regression model is trained in order to create a library of the user's expected single dimensional real-valued response to token instances representing stimuli. Optionally, the model comprises the regression parameters fi, for 1 <i<N, that correspond to the N possible token instances included in the model. … Optionally, the regression model is a multidimensional regression, in which case, the response for each dimension may be evaluated in the library separately.,  In one embodiment, parameters from the regression model may be used to gain insights into the dynamics of the user's response. In one example, a certain variable in the samples holds the difference between a current state and a predicted baseline state, for instance, the user's arousal level computed by a prediction model using user measurement channel vs. the user's predicted baseline level of arousal., wherein a user response (conditional) model (for example, regression model with predicted baseline state) is learned based upon the situational grouping (operating conditions) of characteristic data (including operational state information) such that this model is applied to characterize or predict the particular (baseline conditional) emotional response (operating state of evaluation target) of the user, including strong emotions (abnormal relative to baseline) when presented with stimuli that may include a combination of token instances such that this predictor/model functions as an evaluator for the contextualized/situational operational data relative to (i.e. a computed difference with)  baseline that associates the conditional model for a given operating state with that components of that operational data and wherein a general/overall model of a user’s response derived from a plurality of other users without considering a user-specific conditioning of the characteristic data can also be used to predict baseline values and associated deviations for anomaly detection.) 
However Thieberger does not explicitly teach … considering no combination of conditions of the characteristic data, … and a second difference between characteristic data predicted using the overall model and the characteristic data. In other words, although Theiberger teaches the generation of conditional models from characteristic data which are conditioned to a specific user (as well as to several other condition elements as noted above) as well as a general/overall model that is not conditioned to a specific user (it is derived from an aggregation over individual users), Theiberger does not explicitly disclose that the overall model considers no combination of conditions (i.e., the overall/general model possibly may be an aggregate over several conditional models with the overall/general model still potentially considering combinations of conditions). Furthermore, although Theiberger discloses the detection of anomalies according to deviations of observations from any particular conditional user-specific model and from deviations of observations from any non-user specific model, he does not disclose that this deviation is computed according to an overall model that considers no combinations of characteristic data.
However, Yu, in the analogous environment of deriving fault detection models for multimodal process monitoring, teaches generate a first conditional model for each one of the plurality of groups, each of the first conditional models corresponding to a combination of conditions of the characteristic data of a respective one of the plurality of groups;  Generate an overall model for all of the plurality of sets of characteristic data considering no combination of conditions of the characteristic data, and determine whether there is an abnormality in evaluation target using a first difference between characteristic data predicted using the first conditional models and the characteristic data and a second difference between characteristic data predicted using the overall model and the characteristic data ([Abstract, pp. 347-348, Section 3.2, Figure 11], A novel quantification indication for process state is proposed, which effectively combines local information (Mahalanobis distance) and global information (negative log likelihood probability) in HMM. Once an HMM is trained by using normal data set, it is then used to monitor process states. The quantification criterion is needed to evaluate whether a new input is normal or abnormal, where a threshold is setup to ensure the required Type-I error. Abnormal detection can be implemented based on quantization indication…. (NLLP). For each new observation xt, HMM provides P(xt|k), the unconditional probability density, which indications how the input follow the probability distribution of the trained HMM by normal data set. The HMM output corresponding to a novel data should be enough smaller than outputs of the HMM for normal data, namely should be below a threshold. On the other hand, an input from the same region in input space as the training data should result in a P(xt|k) value that will be equal or greater than the threshold…. NLLP can be calculated as follows: <equation 18>… For each new observation xt at time t, use Viterbi algorithm [48] (see Section 3) to find the best match state SBMS, and then recognize the best match Gaussian component (BMGC) uBMGC in the state SBMS (i.e., xt and hBMGC have the minimum MD compared to that of xt and other Gaussian components hi), the MD between xt and hBMGC can be calculated as follows: <equation 19> where lBMGC and PBMGC are the mean and variance parameter of the Gaussian component hBMGC, respectively…. Extremely high Dmaha value, i.e., exceeding the threshold value, means that the input vector is an outlier or belongs to an abnormal class. Thus, Dmaha can be used for quantifying the deviation degree of current process with normal process state space…. Thus, Dmaha considers the local information from one Gaussian component of one state in HMM. In the cases we would like to use the MD and NLLP together for process monitoring, the combined indication of MD and NLLP is proposed as a convenient alternative for merging the local and global information from both into a single value. The combined indication is a summation of the MD and NLLP weighted against their respective control bounds as follows: <equation 20>… Thus, a local MD-based probability relative each Gaussian component hi can be calculated as follows: <equation 22>,… In this study, MDNLLP mainly is used to evaluate the process states, while BIP is used to provide process failure risk probability once MDNLLP exceeds a given threshold…. Step 1: After data pre-processing, extract PCs or ICs from input xt and then input them to HMM to calculate the MDNLLP using Eq. (20). Step 2: Compare MDNLLP against the threshold gMDNLLP.If MDNLLP < gMDNLLP, it is classified as an in-control sample. Otherwise, it is detected as a fault. Step 3: Calculate the failure probability Pg (i.e., BIP) of the process using Eq. (23)., wherein anomaly detection is performed based on the summation of a global model (NLLP) and local model (MD) evaluations such that each determines a difference between a new observation and a trained model with the difference for the conditional/local model corresponding to the Mahalanobis distance (equation 19) for a particular Gaussian component associated with the best match HMM state (having a combination of conditions which characterize that local state) and with the difference for the global model corresponding to the probability that the new observation is different from the training data as predictable from all states and HMM Gaussian components so that this global model does not consider/does not focus on any combination of conditions (since all of the data and conditions are used to form this estimate) and wherein the principle components of the data determined using PCA or ICA form groupings which also indicates particular combinations of characteristic data (used to form corresponding local components of the HMM models that thereby predict that characteristic data and compare it to the observation vector also projected by PCA or ICA into characteristic data).) 
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Thielberger to incorporate the teachings of Yu to generate an overall model that considers no combinations of conditions of the characteristic data and to perform anomaly detection based on difference between the observed data and predicted characteristic data predicted by both the conditional model and the overall model. The modification would have been obvious because one of ordinary skill would have been motivated to improve the effectiveness and accuracy of anomaly detection for process monitoring by combining global and local model predictive statistics with the global model representing information from all process states and the each conditional model representing information from a particular process state (Yu, [Abstract, p. 347, Section 3.2, p. 358, Section 5, Figure 16, Table 4]).

In regards to claim 4, the rejection of claim 1 is incorporated and Thieberger further teaches, wherein the at least one processor is configured to execute the instructions to: calculate the characteristic data predicted using the first conditional models by applying each of the set of characteristic data of the respective one of the plurality groups to a corresponding conditional model, and evaluate an operating state of the evaluation target on the basis of the first difference …. ([0119, 0407, 0416, 0417] The response of a user to the token instance of interest is estimated based on the difference between the predicted response and the measured response. For example, the response to the token instance of interest essentially equals the difference between the value (e.g., heart rate) of the predicted response (which is a response to the background token instance), and the measured response (which is a response to both the background token instance and the token instance of interest). Optionally, the difference is computed by subtracting the value of the predicted response from the value of the measured response., The data is partitioned into multiple datasets according to the different sets of situations in which the user was in when the data was collected. … Each partitioned training dataset is used to train a separate situation-dependent machine learning-based user response model, from which a situation-dependent library may be derived, which describes the user's expected response to tokens when the user is in a specific situation., In one embodiment, a regression model is trained in order to create a library of the user's expected single dimensional real-valued response to token instances representing stimuli. Optionally, the model comprises the regression parameters fi, for 1 <i<N, that correspond to the N possible token instances included in the model. … Optionally, the regression model is a multidimensional regression, in which case, the response for each dimension may be evaluated in the library separately.,  In one embodiment, parameters from the regression model may be used to gain insights into the dynamics of the user's response. In one example, a certain variable in the samples holds the difference between a current state and a predicted baseline state, for instance, the user's arousal level computed by a prediction model using user measurement channel vs. the user's predicted baseline level of arousal., wherein the difference between observed and predicted states is used to determine the user response to a token instance through the application of a decomposer such that the predicted response (first conditional model), which is a prediction of characteristic data in a temporal window which is associated with a known background token instance using characteristic data associated with the conditional grouping of data, is subtracted from the actual measured characteristic data to determine a response (evaluate an operating state of the evaluation target) for a token instance of interest in that temporal window.)   
However Thieberger does not explicitly teach … and the second difference. In other words, as previously pointed out, although Theiberger discloses the detection of anomalies according to deviations of observations from any particular conditional user-specific model and from deviations of observations from any non-user specific model, he does not disclose that this deviation is computed according to an overall model that considers no combinations of characteristic data.
However, Yu, in the analogous environment of deriving fault detection models for multimodal process monitoring, teaches wherein the at least one processor is configured to execute the instructions to: calculate the characteristic data predicted using the first conditional models by applying each of the set of characteristic data of the respective one of the plurality groups to a corresponding conditional model, and evaluate an operating state of the evaluation target on the basis of the first difference and the second difference ([Abstract, pp. 347-348, Section 3.2, Figure 11], A novel quantification indication for process state is proposed, which effectively combines local information (Mahalanobis distance) and global information (negative log likelihood probability) in HMM.Once an HMM is trained by using normal data set, it is then used to monitor process states. The quantification criterion is needed to evaluate whether a new input is normal or abnormal, where a threshold is setup to ensure the required Type-I error. Abnormal detection can be implemented based on quantization indication…. (NLLP). For each new observation xt, HMM provides P(xt|k), the unconditional probability density, which indications how the input follow the probability distribution of the trained HMM by normal data set. The HMM output corresponding to a novel data should be enough smaller than outputs of the HMM for normal data, namely should be below a threshold. On the other hand, an input from the same region in input space as the training data should result in a P(xt|k) value that will be equal or greater than the threshold…. NLLP can be calculated as follows: <equation 18>… For each new observation xt at time t, use Viterbi algorithm [48] (see Section 3) to find the best match state SBMS, and then recognize the best match Gaussian component (BMGC) uBMGC in the state SBMS (i.e., xt and hBMGC have the minimum MD compared to that of xt and other Gaussian components hi), the MD between xt and hBMGC can be calculated as follows: <equation 19> where lBMGC and PBMGC are the mean and variance parameter of the Gaussian component hBMGC, respectively…. Extremely high Dmaha value, i.e., exceeding the threshold value, means that the input vector is an outlier or belongs to an abnormal class. Thus, Dmaha can be used for quantifying the deviation degree of current process with normal process state space…. Thus, Dmaha considers the local information from one Gaussian component of one state in HMM. In the cases we would like to use the MD and NLLP together for process monitoring, the combined indication of MD and NLLP is proposed as a convenient alternative for merging the local and global information from both into a single value. The combined indication is a summation of the MD and NLLP weighted against their respective control bounds as follows: <equation 20>… Thus, a local MD-based probability relative each Gaussian component hi can be calculated as follows: <equation 22>,… In this study, MDNLLP mainly is used to evaluate the process states, while BIP is used to provide process failure risk probability once MDNLLP exceeds a given threshold…. Step 1: After data pre-processing, extract PCs or ICs from input xt and then input them to HMM to calculate the MDNLLP using Eq. (20). Step 2: Compare MDNLLP against the threshold gMDNLLP.If MDNLLP < gMDNLLP, it is classified as an in-control sample. Otherwise, it is detected as a fault. Step 3: Calculate the failure probability Pg (i.e., BIP) of the process using Eq. (23)., wherein anomaly detection is performed based on the summation of a global model (NLLP) and local model (MD) evaluations such that each determines a difference between a new observation and a trained model with the difference for the conditional/local model corresponding to the Mahalanobis distance (equation 19) for a particular Gaussian component associated with the best match HMM state (having a combination of conditions which characterize that local state) and with the difference (a second difference) for the global model corresponding to the probability that the new observation is different from the training data as predictable from all states and HMM Gaussian components so that this global model does not consider/does not focus on any combination of conditions (since all of the data and conditions are used to form this estimate) and wherein the principle components of the data determined using PCA or ICA form groupings which also indicates particular combinations of characteristic data (used to form corresponding local components of the HMM models that thereby predict that characteristic data and compare it to the observation vector also projected by PCA or ICA into characteristic data).) 

It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Thielberger to incorporate the teachings of Yu to perform anomaly detection based on differences between the observed characteristic data and predicted characteristic data predicted by both the conditional model and the overall model. The modification would have been obvious because one of ordinary skill would have been motivated to improve the effectiveness and accuracy of anomaly detection for process monitoring by combining global and local model predictive statistics with the global model representing information from all process states and the each conditional model representing information from a particular process state (Yu, [Abstract, p. 347, Section 3.2, p. 358, Section 5, Figure 16, Table 4]).

In regards to claim 6, the rejection of claim 1 is incorporated and Thieberger further teaches wherein each of the characteristic data includes a plurality of characteristic label data representing measured values of the  evaluation target by vectors.([ 0263, 0272, 0278, 0394],  In one embodiment, the user's emotional state is annotated at Some time points, or for some temporal windows of token instances, using various methods for representing emotions. Optionally, the annotations are obtained utilizing a transformation from a domain representing measurements to a domain representing internal emotional states., In one embodiment, emotional states described as points in a multidimensional space are converted into a categorical representation in several ways. In one example, there are predefined categories, with each category having one or more representative points in the multidimensional space. An unassigned point P in the multidimensional space may be assigned to the category that has a representative point P' for which the Euclidian distance between P and P' is Smaller or equal to the distance between P and all other category representative points., Optionally, the training samples used to train such a classifier comprised of one or more of the following elements corresponding to a certain time and/or event: values of some token instances and/or their attributes, values from one or more user measurement channels, an emotional state annotation, a base line value for the emotional state, and/or baseline values for one or more user measurement channels., A baseline function for the annotated emotional State may be used as an input to a machine learning algorithm for predicting the user's emotional state., wherein the measured data may include labeled data corresponding to emotional states either directly obtained from the raw measurements or derived from the raw measurements such that these categorical emotional states may be represented vectorially in a multi-dimensional space over which a Euclidean distance metric between states may be defined and wherein this data is part of the characteristic data used for model development and application (situation/class definition/grouping and emotion response prediction/evaluation) 
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Thieberger to incorporate the teachings of Yu for the same reasons as pointed out for claim 1.

Claim 7 is also rejected because it is just a method implementation of the same subject matter of claim 1 which can be found in Thieberger and Yu.

Claim 8 is also rejected because it is just a computer readable medium implementation of the same subject matter of claim 1 which can be found in Thieberger and Yu. It is noted that claim 8 also recites a computer-readable medium with a program which is also found in Thieberger ([Figure 19, 0345], Further illustrated an optional computer readable medium 308 that includes instructions for training the machine learning based affective response model for the user 306.).

In regards to claim 9, the rejection of claim 1 is incorporated and Thieberger further teaches, wherein each of the first conditional models is a model representing a relationship between the plurality of sets of the characteristic data of the respective one of the plurality of groups.  ([0298, 0310, 0348, 0407, 0416, 0417], A maximal entropy model comprises weighting parameters lambdaij, for 1<i<N, and 1<j<C, that correspond to the NXC feature functions used to train the model (assuming the input vectors have N features and there are C categories to predict)., After running the clustering algorithm, each sample is assigned a situation identifier corresponding to its cluster. The samples with situation assignments may be used to train a classifier for predicting the situation for new unseen samples., A user's baseline level is predicted using a machine learning method, such as a Support vector machine, a regression method, a neural network, or Support vector machine for regression. …Optionally, the data for the samples is collected while the user is in specific situations, in order to train situation-specific baseline predictors for the user., In one embodiment, a machine learning-based predictor is trained for predicting the user's response when exposed to token instances., Each partitioned training dataset is used to train a separate situation-dependent machine learning-based user response model, from which a situation-dependent library may be derived, which describes the user's expected response to tokens when the user is in a specific situation., In one embodiment, a regression model is trained in order to create a library of the user's expected single dimensional real-valued response to token instances representing stimuli. Optionally, the model comprises the regression parameters fi, for 1 <i<N, that correspond to the N possible token instances included in the model. … Optionally, the regression model is a multidimensional regression, in which case, the response for each dimension may be evaluated in the library separately.,  In one embodiment, parameters from the regression model may be used to gain insights into the dynamics of the user's response., wherein various conditional response models (which comprise, for example, the clustering model) are learned in which that model (e.g., regression model) is learned/trained according to conditions predicated on the user, the token/stimuli, the user emotional state, and/or the situation such that any of these conditioned aspects is a learned/modeled condition/conditional response given characteristic data (including operational state information) that has been organized/grouped according to the relevance to the prediction of that conditional response, including specifically the clustering of the data to learn (and predict) different situations using unsupervised learning (in other words, such that the learned situational condition forms a predicate for determining the particular division of training data used to train several user-specific prediction models that include baseline prediction, token response prediction, emotion prediction) so that the resultant learned predictive conditional models associate the learned conditional operating state (e.g., emotional state, physiological response) with the learned conditional characteristic components of that operational data (conditional physiological response data associated with but not necessarily temporally aligned to stimuli/tokens).)

It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Thieberger to incorporate the teachings of Yu for the same reasons as pointed out for claim 1.

In regards to claim 10, the rejection of claim 1 is incorporated and Thieberger further teaches, wherein each of the first conditional models is a model configured to calculate first label data which is included in each set of the plurality of sets of the characteristic data and is explained by second label data that is included in each set of the plurality of sets of the characteristic data except the first label data. ([0353] Some of the samples used for training the machine learning-based predictor do not have corresponding target values (also referred to as labels). In this case, training may be performed using semi-supervised machine learning techniques. Often semi-Supervised methods are able utilize unlabeled samples, in order to gain additional accuracy. Optionally, different methods for semi-supervised training are used to train more accurate predictors,…The unlabeled data may be utilized in the learning process, such as (i) mixture models in which the models parameters are learned also from the unlabeled data using an expectation maximization (EM) algorithm; (ii) self-training (also referred to as bootstrapping), wherein the predictor or classifier is used to assign target values to unlabeled samples, and is thus able to increase the body of labeled samples from which it can learn…, wherein the machine learning predictor model (first model) may be trained using semi-supervised learning such that given a model formed from a set of labeled data, the learning process then assigns labels to the unlabeled data to increase the model accuracy (such as through mixture models trained with an EM algorithm such that the first label characteristic data is being predicted by the second label data and thereby is not being predicted by a set of data that includes the first label data).  

Response to Arguments
Applicant's arguments filed 3 June 2021 have been fully considered but they are not persuasive. 

Specifically, the Applicants Argue:
However, in order to expedite prosecution, claim 1 is amended to clarify generate a first conditional model for each one of the plurality of groups, each of the first conditional models corresponding to a combination of conditions of the characteristic data of a respective one of the plurality of groups; generate an overall model for all of the plurality of sets of characteristic data considering no combination of conditions of the characteristic data; and determine whether there is an abnormality in the evaluation target using a first difference between characteristic data predicted using the first conditional models and the characteristic data and a8 Reply to Office Action of January 4, 202 Lsecond difference between characteristic data predicted using the overall model and the characteristic data. As discussed during the interview, Thieberger fails to disclose or suggest a plurality of conditional models, an overall non-conditional model, and a first and second difference, as claimed. 

Examiner’s Response:
The Examiner respectfully disagrees in part. First, Thielberger clearly teaches a plurality of conditional models because he teaches that various conditional response models (first conditional/baseline model which comprise, for example, a given clustering model) are learned/generated in which each conditional model is learned/trained according to a set/combination (situational) conditions predicated on the user, the token/stimuli, the user emotional state, and/or the situation such that any of these conditioned aspects is a learned/model of a condition/conditional response given characteristic data (including operational state information) that has been organized/grouped according to the relevance to the prediction of that conditional response, including specifically the clustering of the data to learn (and predict) different situations using unsupervised learning (in other words, such that the learned situational condition forms a predicate for determining the particular division of training data used to train several user-specific prediction models that include baseline prediction, token response prediction, emotion prediction) so that the resultant learned predictive models associate the learned conditional operating state (e.g., emotional state, physiological response) with the learned conditional characteristic components of that operational data (conditional physiological response data associated with but not necessarily temporally aligned to stimuli/tokens)  (viz., [0298, 0310, 0348, 0407, 0416, 0417], After running the clustering algorithm, each sample is assigned a situation identifier corresponding to its cluster. The samples with situation assignments may be used to train a classifier for predicting the situation for new unseen samples., A user's baseline level is predicted using a machine learning method, such as a Support vector machine, a regression method, a neural network, or Support vector machine for regression. …Optionally, the data for the samples is collected while the user is in specific situations, in order to train situation-specific baseline predictors for the user., In one embodiment, a machine learning-based predictor is trained for predicting the user's response when exposed to token instances., Each partitioned training dataset is used to train a separate situation-dependent machine learning-based user response model, from which a situation-dependent library may be derived, which describes the user's expected response to tokens when the user is in a specific situation., In one embodiment, a regression model is trained in order to create a library of the user's expected single dimensional real-valued response to token instances representing stimuli. Optionally, the model comprises the regression parameters fi, for 1 <i<N, that correspond to the N possible token instances included in the model. … Optionally, the regression model is a multidimensional regression, in which case, the response for each dimension may be evaluated in the library separately.,  In one embodiment, parameters from the regression model may be used to gain insights into the dynamics of the user's response.). However, as also pointed out in this Office Action, although Theiberger teaches the generation of conditional models from characteristic data which are conditioned to a specific user (as well as to several other condition elements as noted above) as well as a general/overall model (see, for example, [0088]) that is  not conditioned to a specific user (it is derived from an aggregation over individual users), Theiberger does not explicitly disclose that the overall model considers no combination of conditions (i.e., the overall/general model possibly may be an aggregate over several conditional models with the overall/general model still potentially considering combinations of conditions and thus also forming a conditional model). Furthermore, although Theiberger discloses the detection of anomalies according to deviations of observations from any particular conditional user-specific model and from deviations of observations from any non-user specific model, he does not disclose that this deviation is computed according to an overall model that considers no combinations of characteristic data. Thus, the amended limitations that recite the training and application of the “overall model” have necessitated a new ground of rejection in view of Yu which otherwise render moot the specific application arguments.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Baraldi et al. (“Robust signal reconstruction for condition monitoring of industrial components via a modified Auto Associative Kernel Regression method”, Mechanical Systems and Signal Processing 60-61 2015, pp. 29-44) teach the anomaly detection in a process monitoring application which uses an auto associative model representation for predicting state variables as a function of other state variables.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to ROBERT LEWIS KULP whose telephone number is (571)272-7983.  The examiner can normally be reached on M, Th, F 8-5:30; Tu 8-3.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kakali Chaki, can be reached on 571-272-3719.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/ROBERT LEWIS KULP/Examiner, Art Unit 2122                                                                                                                                                                                                        

/KAKALI CHAKI/Supervisory Patent Examiner, Art Unit 2122