Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Examiner’s Note
Providing supporting paragraph(s) with a clear explanation for each amended/new claim in Remarks is strongly requested for clear and definite claim interpretations by Examiner.

Priority
Acknowledgment is made of applicant’s claim for foreign priority (07/09/2018) under 35 U.S.C. 119 (a)-(d). The certified copy has been filed in parent Application No IN201821025626, filed on 06/06/2019.

Response to Arguments
Applicant’s arguments with respect to the independent claims have been considered but are moot because the arguments are directed to amended limitation(s) that has/have not been previously examined.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 2, 5, 8 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claim 2 recites the limitation “target vector” in lines 6 and 8. However, it is not clear if it indicates “target vector" or “partial target vector” in claim 1 or something else. It appears that it may need to read “a second target vector” or something else. For the purposes of examination, “a second target vector” and “the second target vector” are used. In addition, claim(s) 5, 8 is/are rejected for the same reason.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-9 are rejected under 35 U.S.C. 103 as being unpatentable over Tian et al. (A neural network approach for remaining useful life prediction utilizing both failure and suspension histories) in view of Heng et al. (Intelligent Prognostics of Machinery Health Utilising Suspended Condition Monitoring Data) further in view of Hong et al. (PREDICTION OF REMAINING LIFE OF POWER TRANSFORMERS BASED ON LEFT TRUNCATED AND RIGHT CENSORED LIFETIME DATA) further in view of Li et al. (Happiness Level Prediction with Sequential Inputs via Multiple Regressions) further in view of MARTINSSON et al. (WTTE-RNN : Weibull Time To Event Recurrent Neural Network)

Regarding claim 1
Tian teaches
A processor implemented method, comprising: 

obtaining a first time series data and a second time series data pertaining to one or more entities, 
wherein the first time series data comprises time series data for one or more failed instances specific to one or more parameters associated with the one or more entities, and 
wherein the second time series data comprises time series data for one or more censored instances specific to the one or more parameters associated with the one or more entities (202); 
(Tian, [figs 2, 4-6] [sec 1] “A history of a unit refers to the period from the beginning of its life to the end of its life, failure or suspension, and the inspection data collected during this period. Thus, condition monitoring data consist of failure histories and suspension histories. In a failure history, a component ends up with a failure and it is replaced with a new component. In a suspension history, the component is taken out of service, i.e., replaced by a new component, before it fails and never used again in the equipment. Thus, a history is called a suspension history if the component is replaced with a new one during planned maintenance or inspection. If a component is replaced because other components in the system are damaged, we also call it a suspension history since the component has not failed when it is replaced” [sec 3] “The first step of the approach is to construct the failure history training data set, which will be combined with training data set based on the suspension histories to train the ANN. Suppose there are J condition monitoring measurements used in the ANN model. An ANN input vector based on failure history f takes the following form: 
    PNG
    media_image1.png
    50
    583
    media_image1.png
    Greyscale
 (7) where tf,i denotes the equipment age at inspection point i in failure history f, and zjf,i represents the measurement j at time tf,i. The corresponding output value is 
    PNG
    media_image2.png
    131
    218
    media_image2.png
    Greyscale
, (8) where TFf represents the failure time for failure history f. … As discussed before, the optimal failure time for a suspension history corresponds to the lowest validation MSE if we train the ANN using the training set constructed based on this suspension history and all the failure histories. For suspension history s, we specify L discrete possible failure time values, and obtain the corresponding ANN validation MSE values. The discrete failure time values are denoted by TSDs,1, TSDs,2, … ,TSDs,L, respectively. These values are selected based on the suspension time for the history, TSs.”;)

determining (i) a Remaining Useful Life (RUL) for the one or more failed instances and (ii) at least a minimum RUL for the one or more censored instances (204); 
(Tian, [figs 2, 4-6] [sec 2] “The output of the proposed ANN model is the life percentage, denoted by Pi. As an example, suppose the age of a bearing at the time of failure is 511 days and, at an inspection point i, the age is 400 days, then the life percentage at inspection point i would be Pi = 400/511 x 100% = 78:3% (1) … Life percentage is an excellent option for indicating the inherent health condition index of a piece of equipment due to the following two reasons: (1) the mapping between the inherent health condition index and the life percentage is monotonically non-decreasing, and (2) life percentage is also able to indicate when the failure occurs, that is, the failure occurs when the life percentage reaches 100%.”[sec 3] “The first step of the approach is to construct the failure history training data set, which will be combined with training data set based on the suspension histories to train the ANN. Suppose there are J condition monitoring measurements used in the ANN model. An ANN input vector based on failure history f takes the following form: 
    PNG
    media_image1.png
    50
    583
    media_image1.png
    Greyscale
 (7) where tf,i denotes the equipment age at inspection point i in failure history f, and zjf,i represents the measurement j at time tf,i. The corresponding output value is 
    PNG
    media_image2.png
    131
    218
    media_image2.png
    Greyscale
, (8) where TFf represents the failure time for failure history f. … As discussed before, the optimal failure time for a suspension history corresponds to the lowest validation MSE if we train the ANN using the training set constructed based on this suspension history and all the failure histories. For suspension history s, we specify L discrete possible failure time values, and obtain the corresponding ANN validation MSE values. The discrete failure time values are denoted by TSDs,1, TSDs,2, … ,TSDs,L, respectively. These values are selected based on the suspension time for the history, TSs. Specifically, we can have 
    PNG
    media_image3.png
    55
    514
    media_image3.png
    Greyscale
 for most of the failure time values, and have 1 and 2 values smaller than TSs, so that we can find the optimal failure time based on the validation MSE values at these discrete points.”; e.g., “
    PNG
    media_image2.png
    131
    218
    media_image2.png
    Greyscale
” along with “life percentage at inspection point i would be Pi = 400/511 x 100% = 78:3%” may read on “determining (i) a Remaining Useful Life (RUL)”. In addition, e.g., “TSDs,1, TSDs,2, … ,TSDs,L” may read on “(ii) at least a minimum RUL for the one or more censored instances”.)

(Note: Hereinafter, if a limitation has brackets (i.e. [ ]) around claim languages, the bracketed claim languages indicate that they have not been taught yet by the current prior art reference but they will be taught by another prior art reference afterwards.)

generating (i) a first set of [binary] labels using the RUL for the one or more failed instances and (ii) a second set of [binary] labels using the at least a minimum RUL for the one or more censored instances respectively (206); and 
(Tian, [figs 2, 4-6] [sec 3] “Suppose there are J condition monitoring measurements used in the ANN model. An ANN input vector based on failure history f takes the following form: 
    PNG
    media_image1.png
    50
    583
    media_image1.png
    Greyscale
 (7) where tf,i denotes the equipment age at inspection point i in failure history f, and zjf,i represents the measurement j at time tf,i. The corresponding output value is 
    PNG
    media_image2.png
    131
    218
    media_image2.png
    Greyscale
, (8) where TFf represents the failure time for failure history f. … The discrete failure time values are denoted by TSDs,1, TSDs,2, … ,TSDs,L, respectively. These values are selected based on the suspension time for the history, TSs. Specifically, we can have 
    PNG
    media_image3.png
    55
    514
    media_image3.png
    Greyscale
 for most of the failure time values, and have 1 and 2 values smaller than TSs, so that we can find the optimal failure time based on the validation MSE values at these discrete points. … The input/output set is further divided into the ANN training set and the ANN validation set: 2/3 of the input/output pairs for the training set and 1/3 for the validation set.” [sec 4.1] “The condition monitoring data were collected from bearings on a group of Gould pumps at a Canadian Kraft pulp mill company [25]. In total, there are 10 bearing failure histories and 14 suspension histories available.”; e.g., “
    PNG
    media_image2.png
    131
    218
    media_image2.png
    Greyscale
” may read on “RUL”. In addition, e.g., “TSDs,1, TSDs,2, … ,TSDs,L” may read on “minimum RUL”. Furthermore, e.g., “training set … validation set” may read on “labels”.)

training, a [Recurrent] Neural Network [(RNN) based Ordinal Regression Model (ORM) comprising one or more binary classifiers], using (i) the first set of [binary] labels and (ii) the second set of [binary] labels and associated label information thereof, 
 (Tian, [figs 2, 4-6] [sec 3] “Suppose there are J condition monitoring measurements used in the ANN model. An ANN input vector based on failure history f takes the following form: 
    PNG
    media_image1.png
    50
    583
    media_image1.png
    Greyscale
 (7) where tf,i denotes the equipment age at inspection point i in failure history f, and zjf,i represents the measurement j at time tf,i. The corresponding output value is 
    PNG
    media_image2.png
    131
    218
    media_image2.png
    Greyscale
, (8) where TFf represents the failure time for failure history f. … The discrete failure time values are denoted by TSDs,1, TSDs,2, … ,TSDs,L, respectively. These values are selected based on the suspension time for the history, TSs. Specifically, we can have 
    PNG
    media_image3.png
    55
    514
    media_image3.png
    Greyscale
 for most of the failure time values, and have 1 and 2 values smaller than TSs, so that we can find the optimal failure time based on the validation MSE values at these discrete points. … The input/output set is further divided into the ANN training set and the ANN validation set: 2/3 of the input/output pairs for the training set and 1/3 for the validation set.” [sec 4.1] “The condition monitoring data were collected from bearings on a group of Gould pumps at a Canadian Kraft pulp mill company [25]. In total, there are 10 bearing failure histories and 14 suspension histories available.”; e.g., “training set … validation set” may read on “labels”.)
wherein during the training of [R]NN [based ORM], the RUL of the one or more failed instances is encoded into a target vector and the at least a minimum RUL of the one or more censored instances is encoded into a partial target vector, and wherein a set of target labels from a plurality of target labels in the partial target vector are [masked] (208).
(Tian, [figs 2, 4-6] [sec 3] as cited above, and “For a certain failure time value TSDs,l, we can obtain the ANN input/output pairs for suspension history s. The input vectors take the same form as that for failure histories, given in Eq. (7). The ANN output value corresponding to the QUOTE inspection point is given as 
    PNG
    media_image4.png
    126
    298
    media_image4.png
    Greyscale
, (10) where ts,i denotes the equipment age at inspection point i in suspension history s.” [sec 4.1] as cited above, [sec(s) 2] “During the training process, based on the training data set including a set of input vectors and the corresponding output values, the weights and the bias values of the ANN model are adjusted to minimize the error between the model outputs and the actual outputs.” [sec(s) 4.2] “The ANN has two hidden layers with 2 hidden neurons in each hidden layer, resulting in totally 23 trainable weights. The total number of inspection points in the two failure histories is 37, giving a total of 34 ANN input/output pairs, according to Eq. (9). The ANN training set and the validation set are constructed from the input/output pairs. The ANN is trained using the resilient backpropagation algorithm 30 times, and the ANN with the smallest validation MSE is saved for prediction performance testing.” [sec(s) 4.3] “10 actual suspension histories are used, and the suspension times range from approximately 500 to 1400 days. Similar to case study 1, for a certain suspension history s, we investigate the following 7 possible failure time values: TSs-300, TSs-150, TSs, TSs+150, TSs+300, TSs+450 and TSs+600, and we can obtain the ‘‘validation MSE vs. remaining life’’ points.”; e.g., “training set … validation set” may read on “labels”. In addition, e.g., “The total number of inspection points in the two failure histories is 37, giving a total of 34 ANN input/output pairs” may read on “RUL of the one or more failed instances is encoded into a target vector”. Furthermore, e.g., “For a certain failure time value TSDs,l, we can obtain the ANN input/output pairs for suspension history s” and “ts,i denotes the equipment age at inspection point i in suspension history s” along with “possible failure time values” may read on “at least a minimum RUL of the one or more censored instances is encoded into a partial target vector” since the “7 possible failure time values: TSs-300, TSs-150, TSs, TSs+150, TSs+300, TSs+450 and TSs+600” is part of “the suspension times range from approximately 500 to 1400 days”.)

However, Tian does not appear to distinctly disclose
generating (i) a first set of [binary] labels using the RUL for the one or more failed instances and (ii) a second set of [binary] labels using the at least a minimum RUL for the one or more censored instances respectively (206); and 
training, a [Recurrent] Neural Network [(RNN) based Ordinal Regression Model (ORM) comprising one or more binary classifiers], using (i) the first set of [binary] labels and (ii) the second set of [binary] labels and associated label information thereof,
wherein during the training of [R]NN [based ORM], the RUL of the one or more failed instances is encoded into a target vector and the at least a minimum RUL of the one or more censored instances is encoded into a partial target vector, and wherein a set of target labels from a plurality of target labels in the partial target vector are [masked] (208).

(Note: Hereinafter, if a limitation has one or more underlines, the one or more underlined claim languages indicate that they are taught by the current prior art reference, while the one or more non-underlined claim languages indicate that they have been taught already by one or more previous art references.)

Heng teaches
generating (i) a first set of binary labels using the RUL for the one or more failed instances and (ii) a second set of [binary] labels using the at least a minimum RUL for the one or more censored instances respectively (206); and 
training, a [Recurrent] Neural Network [(RNN) based Ordinal Regression Model (ORM) comprising one or more binary classifiers], using (i) the first set of binary labels and (ii) the second set of [binary] labels and associated label information thereof,
wherein during the training of [R]NN [based ORM], the RUL of the one or more failed instances is encoded into a target vector and the at least a minimum RUL of the one or more censored instances is encoded into a partial target vector, and wherein a set of target labels from a plurality of target labels in the partial target vector are [masked] (208).
(Heng, [fig 4-23] [sec 4.2.1] “(i) Training Targets for Complete Histories. A trending history is considered complete if the monitored unit has reached failure when removed from operation. … Let i = 1, 2, …, m , where m represents the number of monitored historical units. If unit i has reached failure before repair or replacement, its survival probability is assigned with a value of “1” (100% survival) up until its failure time, Ti , and a value of “0” thereafter 
    PNG
    media_image5.png
    74
    379
    media_image5.png
    Greyscale
 (4-29) … (ii) Training Targets for Suspended Degradation Datasets. A trending history is considered suspended if the unit has not reached failure when it is overhauled or removed from operation. For such suspended histories, the survival probability is similarly assigned a value of “1” up until the time interval in which survival was last observed. Survival probability for the following time intervals is computed using a variation of the KM estimator [137] based on the true survival rate of the other complete histories from this moment onwards. … For true suspensions … the adapted KM estimator tracks the cumulative survival probability of the suspended unit i in the following fashion: 
    PNG
    media_image6.png
    120
    569
    media_image6.png
    Greyscale
 (4-31) where Li denotes the time interval in which historical unit i was last observed to be still surviving. Note that we use the last observed survival interval Li, as the starting time, rather than time zero, to compute the cumulative survival probabilities of each suspended unit.”; e.g., (4-29) may read on “binary”.)

Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the remaining useful life prediction system of Tian with the binary labels of Heng. 
Doing so would lead to computing the most accurate survival probability possible, taking into account all of the information available.
(Heng, [sec 4.1.2 (iv)] “In situations where some of the failure times are not known (e.g.  preventive replacement of machine components before failure), the KM estimator can be used to produce the most accurate possible survivor/reliability function, taking into account all information available. The purpose here is to allow a unit to contribute to the survivor function for the entire length of time it was monitored, and to then statistically "remove" the unit from the function after that.”)

However, the combination of Tian, Heng does not appear to distinctly disclose
generating (i) a first set of binary labels using the RUL for the one or more failed instances and (ii) a second set of [binary] labels using the at least a minimum RUL for the one or more censored instances respectively (206); and 
training, a [Recurrent] Neural Network [(RNN) based Ordinal Regression Model (ORM) comprising one or more binary classifiers], using (i) the first set of binary labels and (ii) the second set of [binary] labels and associated label information thereof,
wherein during the training of [R]NN [based ORM], the RUL of the one or more failed instances is encoded into a target vector and the at least a minimum RUL of the one or more censored instances is encoded into a partial target vector, and wherein a set of target labels from a plurality of target labels in the partial target vector are [masked] (208).

Hong teaches
generating (i) a first set of binary labels using the RUL for the one or more failed instances and (ii) a second set of binary labels using the at least a minimum RUL for the one or more censored instances respectively (206); and
training, a [Recurrent] Neural Network [(RNN) based Ordinal Regression Model (ORM) comprising one or more binary classifiers], using (i) the first set of binary labels and (ii) the second set of binary labels and associated label information thereof,
wherein during the training of [R]NN [based ORM], the RUL of the one or more failed instances is encoded into a target vector and the at least a minimum RUL of the one or more censored instances is encoded into a partial target vector, and wherein a set of target labels from a plurality of target labels in the partial target vector are [masked] (208).
(Hong, [figs 1, 4] [sec 3] “Right-censored lifetime data result when unfailed units are still in service (unfailed) when data are analyzed. A transformer still in service in March 2008 (the “data-freeze” point) is considered as a censored unit in this study. … Let νi be the truncation indicator. In particular, νi = 0 if transformer i is truncated (installed before 1980) and νi = 1 if transformer i is not truncated (installed after 1980). Let ci be the censoring time (time that a transformer has survived) and let δi be the censoring indicator. In particular, δi = 1 if transformer i failed and δi = 0 if it was censored (not yet failed). The likelihood function for the transformer lifetime data is (1) 
    PNG
    media_image7.png
    231
    1048
    media_image7.png
    Greyscale
” [sec 5] “we present prediction intervals for the remaining life for individual transformers based on using the Weibull distribution and a stratification cutting at year 1987. Figure 4 shows 90% prediction intervals for remaining life for a subset of individual transformers that are at risk.”; e.g., “νi = 0 if transformer i is truncated (installed before 1980) and νi = 1 if transformer i is not truncated (installed after 1980)” and/or “δi = 1 if transformer i failed and δi = 0 if it was censored (not yet failed)” may read on “binary”.)

Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the remaining useful life prediction system of Tian, Heng with the binary labels for censored instances of Hong. 
Doing so would lead to providing an effective prediction model with a reasonable precision a using generic statistical procedure for the reliability prediction problem that can be used with complicatedly censored and truncated data.
(Hong, [secs 7-8] “Figure 10 also shows 90% pointwise prediction intervals. The zigzag in the prediction intervals is caused by the new units entering into the risk set over the time period. The prediction results agree reasonably well with the nonparametric estimate. … In this paper we developed a generic statistical procedure for the reliability prediction problem that can be used with complicatedly censored and truncated data.”)

In the alternative, Hong can also be interpreted to teach the following limitation:
Hong teaches
determining (i) a Remaining Useful Life (RUL) for the one or more failed instances and (ii) at least a minimum RUL for the one or more censored instances (204);
(Hong, [figs 1, 4] [sec 3] “Right-censored lifetime data result when unfailed units are still in service (unfailed) when data are analyzed. A transformer still in service in March 2008 (the “data-freeze” point) is considered as a censored unit in this study. Truncation, which is similar to but different from censoring, arises when failure times are observed only when they take on values in a particular range. When the existence of the unseen “observation” is not known for observations that fall outside the particular range, the data that are observed are said to be truncated. Because we have no information about transformers that were installed and failed before 1980, the units that were installed before 1980 and failed after 1980 should be modeled as having been sampled from a left-truncated distribution(s). Ignoring truncation causes bias in estimation.” [sec 5] “we present prediction intervals for the remaining life for individual transformers based on using the Weibull distribution and a stratification cutting at year 1987. Figure 4 shows 90% prediction intervals for remaining life for a subset of individual transformers that are at risk.”; e.g., “prediction intervals” may read on “minimum RUL”. Note that Tian teaches “(i) a Remaining Useful Life (RUL) for the one or more failed instances”.)

Tian, Heng, Hong are combinable with Hong for the same rationale as set forth above with respect to claim 1.

However, the combination of Tian, Heng, Hong does not appear to distinctly disclose
training, a [Recurrent] Neural Network [(RNN) based Ordinal Regression Model (ORM) comprising one or more binary classifiers], using (i) the first set of binary labels and (ii) the second set of binary labels and associated label information thereof,
wherein during the training of [R]NN [based ORM], the RUL of the one or more failed instances is encoded into a target vector and the at least a minimum RUL of the one or more censored instances is encoded into a partial target vector, and wherein a set of target labels from a plurality of target labels in the partial target vector are [masked] (208).

Li teaches
training, a Recurrent Neural Network (RNN) based Ordinal Regression Model (ORM) comprising one or more binary classifiers, using (i) the first set of binary labels and (ii) the second set of binary labels and associated label information thereof,
wherein during the training of RNN based ORM, the RUL of the one or more failed instances is encoded into a target vector and the at least a minimum RUL of the one or more censored instances is encoded into a partial target vector, and wherein a set of target labels from a plurality of target labels in the partial target vector are [masked] (208).
(Li, [figs 3-4] [table 1] “Conversion from the original happiness level ground truth label to the labels of all the 6 binary classifiers” [sec 2.5] “To use ordinal regression within the neural network framework, we convert the ordinal regression problem into a series of binary classification problems adopted from [9]. In order to predict the happiness value of 0 to 5, we use a committee of 6 binary classifiers. We denote the 6 classifiers as ci, where i = 1, 2,... 6. We define the rank of each classifier as rci = i − 1. The binary label for each classifier ci indicates whether the target happiness value is larger than the rank of the classifier ci. The labels for each classifier is 1 if the image happiness level is larger than its rank else the label is 0. Formally, if we use lci to denote the label for the i-th classifier, we have 
    PNG
    media_image8.png
    171
    844
    media_image8.png
    Greyscale
 (3) … With the above modification, the ordinal regression problem can be handled by several parallel neural network layers, which can be incorporated in the framework of LSTM. To be more specific, the hidden layer of the last LSTM unit is connected to 6 softmax layers in parallel to act as the committee of 6 binary classifiers. During testing, each of the binary classifiers in the committee will produce a probability of whether the image’s happiness level is above its rank. So the prediction is simply the summation of all the predicted probabilities of all the six classifiers. Note that the last classifier will always produce 0, so including or excluding its prediction will not have a big impact on the final result.”;)

In the alternative, Li can also be interpreted to teach the following limitation:
Li teaches
wherein during the training of RNN based ORM, the RUL of the one or more failed instances is encoded into a target vector and the at least a minimum RUL of the one or more censored instances is encoded into a partial target vector, and wherein a set of target labels from a plurality of target labels in the partial target vector are [masked] (208).
(Li, [figs 3-4] [table 1] “Conversion from the original happiness level ground truth label to the labels of all the 6 binary classifiers” [sec 2.5] as cited above; e.g., “Conversion from the original happiness level ground truth label to the labels of all the 6 binary classifiers” and “binary label for each classifier ci indicates whether the target happiness value is larger than the rank of the classifier ci” may read on “target vector”.)

Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the remaining useful life prediction system of Tian, Heng, Hong with the RNN based ORM with binary classifiers of Li. 
Doing so would lead to achieving the prediction accuracy improvement using the proposed approach since the ANN model is able to capture useful information in the suspension histories which is not available in the failure histories, and use the information to achieve more accurate prediction.
(Tian, [sec 4.2.2] “The prediction accuracy improvement is due to fact that using the proposed approach, the ANN model is able to capture useful information in the four suspension histories which is not available in the two failure histories, and use the information to achieve more accurate prediction.”)

However, the combination of Tian, Heng, Hong, Li does not appear to distinctly disclose:
wherein during the training of RNN based ORM, the RUL of the one or more failed instances is encoded into a target vector and the at least a minimum RUL of the one or more censored instances is encoded into a partial target vector, and wherein a set of target labels from a plurality of target labels in the partial target vector are [masked] (208).

MARTINSSON teaches
wherein during the training of RNN based ORM, the RUL of the one or more failed instances is encoded into a target vector and the at least a minimum RUL of the one or more censored instances is encoded into a partial target vector, and wherein a set of target labels from a plurality of target labels in the partial target vector are masked (208).
(MARTINSSON, [fig(s) 1.1] [fig(s) 1.2] “yt” [fig(s) 4.14] “During training we randomly chose the sequence lengths n ∈ [10, 100] which can be seen as masking all but the first n timesteps and calculating the censored time to event with step n being the horizon. Models were evaluated on the full 100 step sequences after calculating the true time to event.” [sec(s) 2-2.1] “Definition 2.1 (Waiting times). We refer to data or random variables that are modeling time between-, since- or to events as waiting times. … For the generative framework we will be concerned with the special case when x = t represents the current position on a timeline from where we want to make predictions about the future, represented by s. In this framework the coordinate (t, s) can be interpreted as looking s steps into the future from time t.” [sec(s) 3, pp. 39-45] “ut {0,1} 0 if timestep t is right censored. 1 otherwise … The main problem is then to find a functional Rˆ that maximizes the log-likelihood for censored observed waiting times y given feature data x0:t observed until timesteps t for each timestep: 
    PNG
    media_image9.png
    394
    1053
    media_image9.png
    Greyscale
  The optimum is some Recurrent Cumulative Hazard function R which fits the data.”; e.g., “censored observed waiting times y given feature data x0:t observed until timesteps t for each timestep”, “ut {0,1} 0 if timestep t is right censored. 1 otherwise” along with eq (3.3) and fig 1.2 may read on “a set of target labels from a plurality of target labels in the partial target vector are masked” since “censored observed waiting times y” and “feature data x0:t observed” are masked based on ut.)

Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the remaining useful life prediction system of Tian, Heng, Hong, Li with the masked partial target vector of MARTINSSON. 
Doing so would lead to performing better than the conventional competitive approach (i.e. Box-model) after proper calibration when very rare events are used.
(MARTINSSON, [sec 4.2.2] “This gives some basis to believe the Weibull model could perform better than the Box-model after proper calibration when we have very rare events. The first reason is that the Weibull model gets to see longer sequences as it can use all training data. The second is that that the Weibull model will get continuous feedback in each step unlike the box-model which only gets binary feedback. We did not evaluate the models performance on scale-invariant measures such as AUC. A hypothesis is that the Weibull models would outperform the Box-model but verifying this is proposed as future work.”)

Regarding claim 2
The combination of Tian, Heng, Hong, Li, MARTINSSON teaches claim 1.

Tian further teaches 
obtaining a time series data pertaining to one or more parameters of the one or more entities, wherein the time series data comprises one or more test instances (210); 
(Tian, [sec 4.1] “Condition monitoring data collected from the field is used to validate the proposed ANN approach for RUL prediction utilizing both failure and suspension histories, particularly when there are only few failure histories available. The condition monitoring data were collected from bearings on a group of Gould pumps at a Canadian Kraft pulp mill company [25]. In total, there are 10 bearing failure histories and 14 suspension histories available. Vibration monitoring data were collected from the pump bearings using accelerometers. The collected vibration measurements include the overall vibration magnitudes in the axial, horizontal and vertical directions, and in each of these directions, the vibration magnitude values are obtained in five frequency bands. In addition, the overall acceleration values are also measured in the three directions. … We compare the prediction performance of the proposed approach and the ANN method using only the failure histories. Several failure histories are used as test histories to test the prediction performance.” [sec 4.2] “Using the training set and validation set constructed based on the suspension histories with the optimal failure times and the failure histories, the ANN can be trained for RUL prediction. The prediction performance of the trained ANN is tested using the same four test histories at the 127 inspection points.”;)

applying the trained [R]NN [based ORM comprising the one or more trained binary classifiers] on the time series data comprising the one or more test instances to obtain an estimate of target label [for each of the one or more trained binary classifiers], wherein an estimate of target vector is obtained using the estimate of target label [obtained for each of the one or more trained binary classifiers] (212); and 
(Tian, [figs 2, 4-6] “Train the ANN”, “Training”, “Prediction” [sec 3] “The input/output set is further divided into the ANN training set and the ANN validation set: 2/3 of the input/output pairs for the training set and 1/3 for the validation set. … Once the ANN is trained, as discussed in the previous section, it can be used for RUL prediction for other equipments being monitored, as shown in Fig. 2. The age and condition monitoring measurements at the current and previous data points are used as inputs to the trained ANN, and the current life percentage can be obtained. The RUL is obtained by dividing the current age by the predicted life percentage.” [sec 4.1] “The condition monitoring data were collected from bearings on a group of Gould pumps at a Canadian Kraft pulp mill company [25]. In total, there are 10 bearing failure histories and 14 suspension histories available.” [sec 4.3] “The prediction performance of the proposed approach is tested using the 5 test histories at the 156 inspection points, and the results are given in Table 3.”; e.g., “current life percentage can be obtained. The RUL is obtained by dividing the current age by the predicted life percentage” may read on “estimate of target label”. In addition, e.g., “inspection points” along with fig 6 may read on “target vector”.)

generating, by using the estimate of target vector, a RUL estimate specific to the one or more test instances of the one or more entities (214).
(Tian, [figs 2, 4-6] “Train the ANN”, “Training”, “Prediction” [sec 4.3] “for a certain suspension history s, we investigate the following 7 possible failure time values: TSs-300, TSs-150, TSs, TSs+150, TSs+300, TSs+450 and TSs+600, and we can obtain the ‘‘validation MSE vs. remaining life’’ points. The results for the 10 suspension histories are shown in Fig. 5 … The prediction performance of the proposed approach is tested using the 5 test histories at the 156 inspection points, and the results are given in Table 3. … It is particularly worth pointing out, as shown in Table 3, that the proposed approach achieves excellent prediction performance over the last 10 inspection points in the test histories: the average prediction error is 3.65%, and the standard deviation of the prediciton error is 1.09%. The prediction results on a sample test history is shown in Fig. 6. This case study illustrates the capability of the proposed ANN approach to achieve more accurate and stable remaining life preciction.”; e.g., “inspection points” along with “The prediction results on a sample test history is shown in Fig. 6” may read on “generating, by using the estimate of target vector, a RUL estimate”.)

Li further teaches 
applying the trained RNN based ORM comprising the one or more trained binary classifiers on the time series data comprising the one or more test instances to obtain an estimate of target label for each of the one or more trained binary classifiers, wherein an estimate of target vector is obtained using the estimate of target label obtained for each of the one or more trained binary classifiers (212); and 
(Li, [figs 3-4] [table 1] “Conversion from the original happiness level ground truth label to the labels of all the 6 binary classifiers” [sec 2.2] “The flow diagram details both the training and testing work-flow. Hence during the training stage, the combination of Stage 3 and 4 is used to learn a model optimized against the ground-truth group level score available. During the testing stage, given an image, the output of Stage 4 is the group level score” [sec 2.5] “To use ordinal regression within the neural network framework, we convert the ordinal regression problem into a series of binary classification problems adopted from [9]. In order to predict the happiness value of 0 to 5, we use a committee of 6 binary classifiers. We denote the 6 classifiers as ci, where i = 1, 2,... 6. We define the rank of each classifier as rci = i − 1. The binary label for each classifier ci indicates whether the target happiness value is larger than the rank of the classifier ci. The labels for each classifier is 1 if the image happiness level is larger than its rank else the label is 0. Formally, if we use lci to denote the label for the i-th classifier, we have 
    PNG
    media_image8.png
    171
    844
    media_image8.png
    Greyscale
 (3) … 
With the above modification, the ordinal regression problem can be handled by several parallel neural network layers, which can be incorporated in the framework of LSTM. To be more specific, the hidden layer of the last LSTM unit is connected to 6 softmax layers in parallel to act as the committee of 6 binary classifiers. 
During testing, each of the binary classifiers in the committee will produce a probability of whether the image’s happiness level is above its rank. So the prediction is simply the summation of all the predicted probabilities of all the six classifiers. Note that the last classifier will always produce 0, so including or excluding its prediction will not have a big impact on the final result.”; e.g., “During testing, each of the binary classifiers in the committee will produce a probability of whether the image’s happiness level is above its rank” may read on “obtain an estimate of target label for each of the one or more trained binary classifiers” . In addition, e.g., “During testing, each of the binary classifiers in the committee will produce a probability of whether the image’s happiness level is above its rank” along with “all the predicted probabilities of all the six classifiers” may read on “estimate of target vector is obtained using the estimate of target label obtained for each of the one or more trained binary classifiers”.)

Tian, Heng, Hong, Li are combinable with Li for the same rationale as set forth above with respect to claim 1.

In the alternative, Li can also be interpreted to teach the following limitation:
generating, by using the estimate of target vector, a RUL estimate specific to the one or more test instances of the one or more entities (214).
(Li, [figs 3-4] [table 1] “Conversion from the original happiness level ground truth label to the labels of all the 6 binary classifiers” [sec 2.5] “During testing, each of the binary classifiers in the committee will produce a probability of whether the image’s happiness level is above its rank. So the prediction is simply the summation of all the predicted probabilities of all the six classifiers. Note that the last classifier will always produce 0, so including or excluding its prediction will not have a big impact on the final result.”; e.g., “During testing, each of the binary classifiers in the committee will produce a probability of whether the image’s happiness level is above its rank” along with “all the predicted probabilities of all the six classifiers” may read on “estimate of target vector”. Note that Tian teaches “RUL estimate”.)

Tian, Heng, Hong, Li, MARTINSSON are combinable with Li for the same rationale as set forth above with respect to claim 1.

Regarding claim 3
The combination of Tian, Heng, Hong, Li, MARTINSSON teaches claim 1.

Tian further teaches 
the one or more parameters are obtained from one or more sensors.
(Tian, [sec 4.1] “Condition monitoring data collected from the field is used to validate the proposed ANN approach for RUL prediction utilizing both failure and suspension histories, particularly when there are only few failure histories available. The condition monitoring data were collected from bearings on a group of Gould pumps at a Canadian Kraft pulp mill company [25]. In total, there are 10 bearing failure histories and 14 suspension histories available. Vibration monitoring data were collected from the pump bearings using accelerometers. The collected vibration measurements include the overall vibration magnitudes in the axial, horizontal and vertical directions, and in each of these directions, the vibration magnitude values are obtained in five frequency bands. In addition, the overall acceleration values are also measured in the three directions.”;)

Regarding claim 4
Claim 4 is a system claim corresponding to the method claim 1, and is directed to largely the same subject matter. Thus, it is rejected for the same reasons as given in the rejection of claim 1. Note that Tian teaches a memory, one or more communication interfaces, and one or more hardware processors ([sec 4.1] “Significance analysis, which is built into the software EXAKT developed by OMDEC Inc., is utilized to identify the significant condition monitoring measurements”) since it is appreciated by one of ordinary skill in the art that software is run on a computer.

Regarding claim 5
Claim 5 is a system claim corresponding to the method claim 2, and is directed to largely the same subject matter. Thus, it is rejected for the same reasons as given in the rejection of claim 2. 

Regarding claim 6
Claim 6 is a system claim corresponding to the method claim 3, and is directed to largely the same subject matter. Thus, it is rejected for the same reasons as given in the rejection of claim 3. 

Regarding claim 7
Claim 7 is a machine readable information storage medium claim corresponding to the method claim 1, and is directed to largely the same subject matter. Thus, it is rejected for the same reasons as given in the rejection of claim 1. Note that Tian teaches one or more mediums, and one or more hardware processors ([sec 4.1] “Significance analysis, which is built into the software EXAKT developed by OMDEC Inc., is utilized to identify the significant condition monitoring measurements”) since it is appreciated by one of ordinary skill in the art that software is run on a computer.

Regarding claim 8
Claim 8 is a system claim corresponding to the method claim 2, and is directed to largely the same subject matter. Thus, it is rejected for the same reasons as given in the rejection of claim 2. 

Regarding claim 9
Claim 9 is a system claim corresponding to the method claim 3, and is directed to largely the same subject matter. Thus, it is rejected for the same reasons as given in the rejection of claim 3. 
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Niu et al. (Ordinal Regression with Multiple Output CNN for Age Estimation) teaches ordinal regression for CNN.
Falcaro et al. (A flexible model for multivariate interval-censored survival times with complex correlation structure) teaches binary labels (pp. 666-667).
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SEHWAN KIM whose telephone number is (571)270-7409. The examiner can normally be reached Mon - Thu 7:00 AM - 5:00 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Michael J Huntley can be reached on (303) 297-4307. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/S.K./Examiner, Art Unit 2129               

8/25/2022
/MICHAEL J HUNTLEY/Supervisory Patent Examiner, Art Unit 2129