DETAILED ACTION
This action is in response to a request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on 7 June 2022 has been entered. Furthermore, this action is in response to the amendments filed 7 June 2022  for application 16/555644 filed on 29 August 2019.  Currently claims 1-9 are pending. Claim rejections under 35 USC 112(b) have also been withdrawn in light of the amendments and arguments. It is noted that an English translation of the certified copy of the Japanese application (JP2018-170769) to which the instant application claims priority is not currently on file. 

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
Applicant's arguments filed 7 June 2022 have been fully considered but they are not persuasive.

Specifically, Applicants Argue:
The Applicant respectfully traverses the 35 U.S.C. §103 rejections and submits the cited references do not disclose, teach or suggest the elements as recited in the Applicant's amended claims. … The Examiner has introduced the Esteban reference to cure the deficiencies of the Su reference with respect to the Applicant's independent claims, for example, Applicant's amended independent claim 1. Although the Esteban reference teaches the combining of static and dynamic information to predict future events, the reference is not directed to an optimization process similar to that described in the  Applicant's amended independent claims. The Esteban reference discloses a formula That is, there is no disclosure, teaching or suggestion in Esteban of  "an optimization process optimizing the first learning parameter, the second learning parameter, and the third learning parameter by statistical gradient on a basis of the response variable and the first predicted value calculated by the first calculation process, wherein the optimizing is calculated in accordance with the following equation:  
    PNG
    media_image1.png
    28
    612
    media_image1.png
    Greyscale
 wherein RWs is the first learning parameter, W is the second learning parameter, and w is the third learning parameter, and Y(n) is a response variable, and y(n) is the first predicted value, wherein the optimized first learning parameter, second learning parameter, and third learning parameter are input as learning inputs during importance calculation for utilization in machine learning to optimize the learning based on prior learning inputs”

Examiner’s Response:
The Examiner respectfully disagrees, noting that a claim must be given its broadest reasonable interpretation consistent with the specification.  M.P.E.P. 2173.01(I), M.P.E.P. 2111.01(II). As set forth in the current office action and the 15 March 2022 FOA, Su partially teaches these limitations; he specifically teaches “an optimization process optimizing the first learning parameter, the second learning parameter, and the third learning parameter by statistical gradient on a basis of the response variable and the first predicted value calculated by the first calculation process; wherein the optimizing is calculated in accordance with the following equation:  … wherein the optimized first learning parameter, second learning parameter, and third learning parameter are input as learning inputs during importance calculation for utilization in machine learning to optimize the learning based on prior learning inputs” because he teaches that all model parameters (in particular, all W’s) are learned/optimized using back propagation (statistical) gradient procedures for minimizing the MSE (cost/loss/objective function) as shown in equation 15 with the first learning parameter corresponding to W_xc  but also, more generally, also, W_xo, W_ho, Whc), the second learning parameter corresponding to W_hz, and the third learning parameter corresponding to W_xz^L (but also b^L))   (-viz., [p. 2, Section IIC, p. 3, Section III, Figure 1] Accordingly, the training objective is to minimize the mean squared error (MSE) of total N training samples as follow: <equation 15> where yt = [SBP, DBP, MBP] represents ground truth, zt is corresponding prediction. And kθk 2 represents the L2 regulation of model parameters and λ is the corresponding penalty coefficient., To simplify the analysis, here we mainly focus on the gradient flow along the depth of layers. Through recursively updating Equation 12, we will have: <equation 16>.) However, Su does not explicitly disclose “
    PNG
    media_image1.png
    28
    612
    media_image1.png
    Greyscale
wherein RWs is the first learning parameter, W is the second learning parameter, and w is the third learning parameter, and Y(n) is a response variable, and y(n) is the first predicted value”; Su makes use of a MSE cost function for determining neural network parameters (e.g., equation 15) rather than the recited cross-entropy equation. However, Esteban teaches wherein “the optimizing is calculated in accordance with the following equation:  
    PNG
    media_image1.png
    28
    612
    media_image1.png
    Greyscale
 wherein RWs is the first learning parameter, W is the second learning parameter, and w is the third learning parameter, and Y(n) is a response variable, and y(n) is the first predicted value, wherein the optimized first learning parameter, second learning parameter, and third learning parameter are input as learning inputs during importance calculation for utilization in machine learning to optimize the learning based on prior learning inputs”
because he teaches that model parameters of an RNN (A, B, W, U) are learned (optimized) using a Binary Cross Entropy cost/loss/objective function in which this function includes predicted outputs (y_tilda) and corresponding true outputs (y) such that the parameter U corresponds to a second learning parameter (applied to the Hadamard product between the reset gates and the time delayed hidden states as shown in equation 10 but also interpreted to include the model parameters U_r and U_z shown in equations 9 and 11), such that the parameter W_o is a third learning parameter (applied to the concatenated hidden state vector to form the activation response as shown in equations 1, 17, and 18 interpreted as being included in the parameter W in equation 19 in which a re-allocation function is performed in equation 18 similarly to the corresponding re-allocation function disclosed by Su as previously indicated but also interpreted as corresponding to W_z or U_z in equation 11 similarly interpreted as being included in the parameters W and U in equation 19), and such that each of the parameters A, B, W_i  is a first learning parameter (applied to the input xt in the determination of the hidden state as shown in Figure 3 where W is interpreted as including W_i but where the parameters W_xc in equation 6 is being also interpreted to be a first model parameter included in W to be learned) (-viz., [p. 98, Section IVD, Figure 1, Figure 3, Equations, 1, 9, 11, 17, and 19] 

    PNG
    media_image2.png
    164
    361
    media_image2.png
    Greyscale
).

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claims 1, 2, 4-6, 8, and 9 are rejected under 35 U.S.C. 103 as being unpatentable over Su et al. (“Long-term Blood Pressure Prediction with Deep Recurrent Neural Networks”, http://arxiv.org/abs/1705. 04524, arXiv:1705.04524v3 [cs.LG], 14 January 2018, pp. 1-7), hereinafter referred to as Su, in view of Esteban et al. (“Predicting Clinical Events by Combining Static and Dynamic Information using Recurrent Neural Networks”, 2016 IEEE International Conference on Healthcare Informatics, 2016, pp. 93-101), hereinafter referred to as Esteban.

In regards to claim 1, Su teaches A time series data analysis apparatus accessible to a database, comprising: a processor that executes a program; and a storage device that stores the program, the database storing a training data set having a predetermined number of first feature data groups in each of which plural pieces of first feature data each containing a plurality of features are present in time series and a predetermined number of response variables each corresponding to each piece of the first feature data in each of the first feature data groups, wherein the processor executes: ([Abstract, p. 1, Section II, p.3, Section IVA, p. 3, Section IVB,  p. 4, Section IVC, Figure 1, Figure 3] In this work, we address this issue by formulating BP estimation as a sequence prediction problem in which both the input and target are temporal sequences. We propose a novel deep recurrent neural network (RNN) consisting of multilayered Long Short-Term Memory (LSTM) networks, which are incorporated with (1) a bidirectional structure to access larger-scale context information of input sequence, and (2) residual connections to allow gradients in deep RNN to propagate more effectively., The goal of arterial BP prediction is to use multiple temporal physiological signals to predict BP sequence. Let XT = [x1, x2 . . . , xT ] be the input features extracted from electrocardiography (ECG) and photoplethysmogram (PPG) signals, and YT = [y1, y2 . . . , yT ] denote the target BP sequence., Similar dataset was obtained from 12 healthy subjects including 11 males and 1 female… , We simply select 7 representative handcrafted features of ECG and PPG signals (shown in Fig 3) as follows …, All the RNN models were trained using mini-batches of size 64 and the Adam optimizer [16]…. Each training dataset was divided such that 70% of the data was used for training, 10% for validation and 20% for test. SBP, DBP and MBP were normalized to (0, 1] by their corresponding maximum, respectively. For evaluation on the multi-day continuous BP dataset, all deep RNN models were first pretrained on the static BP dataset then fine-tuned using part of the first-day data, and finally tested on the rest of the first-day data as well as the following days’ data., wherein a computer-based machine learning framework (program executing ML-based functions on processors including BP for training) implements time series analysis (recurrent neural network) that uses training data that includes different event sequences (Figure 3) for patients (each patient being a different group) each represented by (clinical) features (either static, but more pertinently, dynamic) in a feature space (handicrafted features of ECG and PPG signals) so that each event in the timeline is a piece of the first feature data (element in a sequence), wherein the “first feature” attribute corresponds to the training data, wherein the number of  first feature data groups is predetermined based on the number of patients used in the training data (but also predetermined in a similar sense in the second feature data group interpreted as a testing set).) a first generation process generating first internal data based on time of one piece of the first feature data for each piece of the first feature data on a basis of the first feature data groups, a first internal parameter that is at least part of other piece of the first feature data at time before the time of the one piece of the first feature data, and a first learning parameter; ([p. 2, Section IVA, p. 3, Section IVC, Figure 1]  First, we introduce the basic block of our deep RNN model, a one-layer bidirectional Long short-term memory (LSTM). LSTM [8] was designed to address the vanishing gradient problem of conventional RNN by introducing a memory cell state ct and multiple gating mechanisms inside a standard RNN hidden state transition process. The hidden state ht in LSTM is generated by: <equations 3-7>, And kθk 2 represents the L2 regulation of model parameters and λ is the corresponding penalty coefficient. One advantage of multi-task training is that learning to predict different BP values simultaneously could implicitly encode the quantitative constrains among SBP, DBP and MBP., wherein each temporal element (piece) in the training data (first feature data) is input into the RNN framework (e.g., for learning model parameters) and processed in that framework to generate the internal states of the RNN (h_t) using the (first) internal parameter c_t-1 (which is used to determine ht as expressed in equation 7 in view of equations 3-6) that corresponds to a result formed in the RNN framework using an element (piece) of the feature data at a preceding time step (equation 6 via dependence on c_(t-1)) and at least one model parameter of the RNN framework (e.g., W_xc  but also, more generally, also, W_xo, W_ho, Whc) to be learned/optimized during the training process.) a first transform process transforming a position of the one piece of the first feature data in a feature space on a basis of a plurality of first internal data each generated by the first generation process for each piece of the first feature data and a second learning parameter;  44 5864505-1HITACHI3-341800680US01 ([p. 2, Section IIA, p. 3, Section IVC, Figure 2, Figure 3]First, we introduce the basic block of our deep RNN model, a one-layer bidirectional Long short-term memory (LSTM). LSTM [8] was designed to address the vanishing gradient problem of conventional RNN by introducing a memory cell state ct and multiple gating mechanisms inside a standard RNN hidden state transition process. The hidden state ht in LSTM is generated by: <equations 3-7>, And kθk 2 represents the L2 regulation of model parameters and λ is the corresponding penalty coefficient. One advantage of multi-task training is that learning to predict different BP values simultaneously could implicitly encode the quantitative constrains among SBP, DBP and MBP., wherein, in the a matrix product of W_hz and h_t^L shown in the argument on the right hand side of equation 14, the internal data h_t^L associated with an element/piece of the input sequence is transformed using a second learning parameter_hz^L such that this is a transformation in position of the information contained in that element/piece as represented by the internal/hidden state by virtue of communicating that information forward (closer to an output – see Figure 1 for example) in the RNN framework and wherein this transformation is applied to the first feature (training) data with the learning parameter W_hz to be optimized/learned in the training process.)   a reallocation process reallocating each piece of the first feature data into a transform destination position in the feature space on a basis of a first transform result in time series by the first transform process for each piece of the first internal data and the first feature data groups; ([Abstract, p. 2, Section IIB, Figure 1, Figure 2] In this work, we address this issue by formulating BP estimation as a sequence prediction problem in which both the input and target are temporal sequences. We propose a novel deep recurrent neural network (RNN) consisting of multilayered Long Short-Term Memory (LSTM) networks, which are incorporated with (1) a bidirectional structure to access larger-scale context information of input sequence, and (2) residual connections to allow gradients in deep RNN to propagate more effectively., The LSTM block with residual connections can be implemented by: <equations 11-13> …. Once the top-layer hidden state is computed, the output zt can be obtained by <equation 14>, wherein the input feature data (first feature data, including for training) is reallocated in the RNN framework using residual connections, including a reallocation ultimately at the transformation result prior to target prediction in which (as shown in equation 14), the transformation Wht is augmented by Wxz^L x_t^L in which the x_t^L includes both the input feature data piece/element as well as intervening internal data h_t (as shown in Figure 2 and equations 11-13) such that the result of these residual connections is to communicate both the feature data and intervening internal data into the transform result corresponding to the transformation of a position (in the structure of the LSTM) of the feature data.) a first calculation process calculating a first predicted value corresponding to the first feature data groups on a basis of a reallocation result by the reallocation process and a third learning parameter; ([p. 2, Section IIB, p. 2, Section IIC, Figure 1, Figure 2] Once the top-layer hidden state is computed, the output zt can be obtained by <equation 14>, Accordingly, the training objective is to minimize the mean squared error (MSE) of total N training samples as follow:, wherein an output is calculated based on the combination of the skip connections and the transformation/output of the internal state (reallocation result – argument of the logistic function in equation 14) such that this output is based on the reallocation process described above and at least one learning parameter such as W_xz^L in equation 14 (but also b^L).) an optimization process optimizing the first learning parameter, the second learning parameter, and the third learning parameter by statistical gradient on a basis of the response variable and the first predicted value calculated by the first calculation process; wherein the optimizing is calculated in accordance with the following equation:  … wherein the optimized first learning parameter, second learning parameter, and third learning parameter are input as learning inputs during importance calculation for utilization in machine learning to optimize the learning based on prior learning inputs;  ([p. 2, Section IIC, p. 3, Section III, Figure 1] Accordingly, the training objective is to minimize the mean squared error (MSE) of total N training samples as follow: <equation 15> where yt = [SBP, DBP, MBP] represents ground truth, zt is corresponding prediction. And kθk 2 represents the L2 regulation of model parameters and λ is the corresponding penalty coefficient., To simplify the analysis, here we mainly focus on the gradient flow along the depth of layers. Through recursively updating Equation 12, we will have: <equation 16>, wherein all model parameters (in particular, all W’s) are learned/optimized using back propagation (statistical) gradient procedures for minimizing the MSE (cost/loss/objective function) as shown in equation 15 with, as previously pointed out, the first learning parameter corresponding to W_xc  but also, more generally, also, W_xo, W_ho, Whc), the second learning parameter corresponding to W_hz, and the third learning parameter corresponding to W_xz^L (but also b^L)) a second generation process generating second internal data based on time of one piece of second feature data among plural pieces of the second feature data each containing a plurality of features, the second internal data being generated for each piece of the second feature data on a basis of second feature data groups in each of which the plural pieces of the second feature data each containing the plurality of features are present in time series, a second internal parameter that is at least part of other piece of the second feature data at time before the time of the one piece of the second feature data, and a first learning parameter optimized by the optimization     process; ([p. 2, Section IIA, p. 4, Section IVC, Figure 1] First, we introduce the basic block of our deep RNN model, a one-layer bidirectional Long short-term memory (LSTM). LSTM [8] was designed to address the vanishing gradient problem of conventional RNN by introducing a memory cell state ct and multiple gating mechanisms inside a standard RNN hidden state transition process. The hidden state ht in LSTM is generated by: <equations 3-7>, For evaluation on the multi-day continuous BP dataset, all deep RNN models were first pretrained on the static BP dataset then fine-tuned using part of the first-day data, and finally tested on the rest of the first-day data as well as the following days’ data., wherein each temporal element (piece) in the test input (second feature data) is input into the RNN framework (e.g., for evaluation) and processed in that framework to generate the internal states of the RNN (h_t) using the (second) internal parameter c_t-1 (which is used to determine ht as expressed in equation 7 in view of equations 3-6) that corresponds to a result formed in the RNN framework using an element (piece) of the feature data at a preceding time step (equation 6 via dependence on c_(t-1)) and at least one model parameter of the RNN framework (e.g., W_xc but also, more generally, also, W_xo, Who, Whc) optimized during the training process.) a second transform process transforming a position of the one piece of the second feature data in the feature space on a basis of a plurality of second internal data generated by the second generation process for each piece of the second feature data and a second learning parameter optimized by the optimization process; ([p. 2, Section IIA] First, we introduce the basic block of our deep RNN model, a one-layer bidirectional Long short-term memory (LSTM). … The deep RNN model can be created by stacking multiple such LSTM blocks on top of each other, with the output of previous block forming the input of the next. Once the top layer hidden state is computed, the output zt can be obtained by <equation 14>, wherein, in the a matrix product of W_hz and h_t^L shown in the argument on the right hand side of equation 14, the internal data h_t^L associated with an element/piece of the input sequence is transformed using a second learning parameter_hz^L such that this is a transformation in position of the information contained in that element/piece as represented by the internal/hidden state by virtue of communicating that information forward (closer to an output – see Figure 1 for example) in the RNN framework and wherein this transformation is applied to the second feature (testing) data with the learning parameter W_hz having been optimized/learned in the training process.)  and an importance calculation process calculating importance data indicating an importance of each piece of the second feature data on a basis of a second transform result in time series by the second transform process for each piece of the second internal data and a third learning parameter optimized by the optimization process.  ([p. 2, Section IIB, Figure 1] Once the top-layer hidden state is computed, the output zt can be obtained by <equation 14>, wherein, in the argument of equation 14 as part of the calculation of a predictive output (such as from the second/test feature data) a weighted sum of features is computed from the product W_xz^L x_t^L  which, together with the product W_hz^L h_t^L (second transform result for second internal data) is being interpreted as an importance calculation process because it applies a weight to each of the elements in the feature data (with the weight being indicative of the relative importance of the features) calculated based on the combination of the skip connections and includes the weighted transformation/output of the internal state such that this calculation is based both the transformation process and at least one learning parameter such as W_xz^L in equation 14 (but also b^L).
However, Su does not explicitly disclose 
    PNG
    media_image1.png
    28
    612
    media_image1.png
    Greyscale
 
wherein RWs is the first learning parameter, W is the second learning parameter, and w is the third learning parameter, and Y(n) is a response variable, and y(n) is the first predicted value. In other words, Su makes use of a MSE cost function for determining neural network parameters (e.g., equation 15) rather than the recited equation.
However, Esteban, in the analogous environment of training RNN’s, teaches wherein the optimizing is calculated in accordance with the following equation:  
    PNG
    media_image1.png
    28
    612
    media_image1.png
    Greyscale
 wherein RWs is the first learning parameter, W is the second learning parameter, and w is the third learning parameter, and Y(n) is a response variable, and y(n) is the first predicted value, wherein the optimized first learning parameter, second learning parameter, and third learning parameter are input as learning inputs during importance calculation for utilization in machine learning to optimize the learning based on prior learning inputs;
([p. 98, Section IVD, Figure 1, Figure 3, Equations, 1, 9, 11, 17, and 19] 

    PNG
    media_image2.png
    164
    361
    media_image2.png
    Greyscale

wherein model parameters of an RNN (A, B, W, U) are determined (optimized) using a Binary Cross Entropy cost/loss/objective function in which this function includes predicted outputs (y_tilda) and corresponding true outputs (y) such that the parameter U corresponds to a second learning parameter (applied to the Hadamard product between the reset gates and the time delayed hidden states as shown in equation 10 but also interpreted to include the model parameters U_r and U_z shown in equations 9 and 11), such that the parameter W_o is a third learning parameter (applied to the concatenated hidden state vector to form the activation response as shown in equations 1, 17, and 18 interpreted as being included in the parameter W in equation 19 in which a re-allocation function is performed in equation 18 similarly to the corresponding re-allocation function disclosed by Su as previously indicated but also interpreted as corresponding to W_z or U_z in equation 11 similarly interpreted as being included in the parameters W and U in equation 19), and such that each of the parameters A, B, W_i  is a first learning parameter (applied to the input xt in the determination of the hidden state as shown in Figure 3 where W is interpreted as including W_i but where the parameters W_xc in equation 6 is being also interpreted to be a first model parameter included in W to be learned).)
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Su to incorporate the teachings of Esteban for the optimizing to be calculated in accordance with the following equation:  
    PNG
    media_image1.png
    28
    612
    media_image1.png
    Greyscale
 i wherein RWs is the first learning parameter, W is the second learning parameter, and w is the third learning parameter, and Y(n) is a response variable and wherein the optimized first learning parameter, second learning parameter, and third learning parameter are input as learning inputs during importance calculation for utilization in machine learning to optimize the learning based on prior learning inputs. The modification would have been obvious because one of ordinary skill would have been motivated to achieve improved prediction of target variables for temporally evolving input variables by using an RNN with training of that RNN based on the Binary Cross Entropy when predicted target outputs are binary (Esteban, [Abstract, p. 93, Section 1, p. 100, Section VI, Table 1]).

In regards to claim 2, the rejection of claim 1 is incorporated and Su further teaches wherein the processor executes the first generation process and the second generation process using a recurrent neural network.  ([p. 2, Section IIA, pp. 4, Section IVC] First, we introduce the basic block of our deep RNN model, a one-layer bidirectional Long short-term memory (LSTM). LSTM [8] was designed to address the vanishing gradient problem of conventional RNN by introducing a memory cell state ct and multiple gating mechanisms inside a standard RNN hidden state transition process. The hidden state ht in LSTM is generated by: <equations 3-7>, For evaluation on the multi-day continuous BP dataset, all deep RNN models were first pretrained on the static BP dataset then fine-tuned using part of the first-day data, and finally tested on the rest of the first-day data as well as the following days’ data., wherein each temporal element (piece) in the test or training input (second feature data) is input into the RNN framework (e.g., for evaluation or training) and processed in that framework to generate the internal states of the RNN (h_t) using the (second) internal parameter c_t-1 (which is used to determine ht as expressed in equation 7 in view of equations 3-6) that corresponds to a result formed in the RNN framework using an element (piece) of the feature data at a preceding time step (equation 6 via dependence on c_(t-1)) and at least one model parameter of the RNN framework (e.g., W_xc but also, more generally, also, W_xo, Who, Whc) optimized during the training process.)
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Su to incorporate the teachings of Esteban for the same reasons as pointed out for claim 1.

In regards to claim 4, the rejection of claim 1 is incorporated and Su further teaches wherein the processor executes the first calculation process as an identification operation of the first feature data groups. ([p. 1, Section I, pp. 2-3, Section IIIC, p. 3, Section IVB, Figure 1, Figure 3, Figure 5] As the leading risk factor of cardiovascular diseases (CVD) [1], high blood pressure (BP) has been commonly used as the critical criterion for diagnosing and preventing CVD. Therefore, accurate and continuous BP monitoring during people’s daily life is imperative for early detection and intervention of CVD., Given that we have multiple supervision signals like systolic BP (SBP), diastolic BP (DBP) and mean BP (MBP) which are closely related to each other, we adopt multi-task training strategy … where yt = [SBP, DBP, MBP] represents ground truth, zt is corresponding prediction., Since the primary goal of this paper is to prove the importance of modeling temporal dependencies in BP dynamics for accurate BP prediction, we simply select 7 representative handcrafted features of ECG and PPG signals (shown in Fig 3) as follows:…,  wherein a predictive output (either training/first feature data groups or testing/second feature data) is calculated in order to identify/predict temporal variations in blood pressure (in other words, the predictive output is a BP trend/variation identification that may be used for diagnosis of CVD symptoms).)
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Su to incorporate the teachings of Esteban for the same reasons as pointed out for claim 1.

In regards to claim 5, the rejection of claim 1 is incorporated and Su further teaches wherein the processor executes the first calculation process as a regression operation of the first feature data groups.  ([p. 2, Section IIB, Figure 1] Once the top-layer hidden state is computed, the output zt can be obtained by <equation 14>,wherein a predicted output is calculated based on the combination of the skip connections and the transformation/output of the internal state (reallocation result – argument of the logistic function in equation 14) such that this output is not only based on the reallocation process described above and at least one learning parameter such as W_xz^L in equation 14 (but also b^L) but is itself a regression operation over the input features by representing the prediction as a sum of weights (W_xz^L) applied to the (first) feature data (more precisely, this is a logistic regression operation because of the application of the sigmoid function) and wherein this operation is applied to all patients/groups in the training data.)
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Su to incorporate the teachings of Esteban for the same reasons as pointed out for claim 1.

In regards to claim 6, the rejection of claim 1 is incorporated and Su further teaches wherein the processor executes a second calculation process calculating a second predicted value corresponding to the second feature data groups on a basis of the importance data calculated by the importance calculation process and the second feature data groups.  ([p. 2, Section IIB, p. 4, Section V, Figure 1] Once the top-layer hidden state is computed, the output zt can be obtained by <equation 14>, The best accuracy was obtained by our 4-layer deep RNN (DeepRNN-4L) model which achieves a RMSE of 3.73 and 2.43 for SBP and DBP prediction respectively. The Bland-Altman plots (Figure 4) indicate that the DeepRNN4L predictions agreed well with the ground truth, with 95% of the differences lie within the agreement area., wherein a prediction output is calculated for test data (a second/test prediction corresponding to the second/test feature data) based on the combination of the skip connections and the transformation/output of the internal state (including the weighted feature data – importance data as seen in the argument of the logistic function in equation 14) such that this output is based on the importance calculation process described above (using the learned weights) and the (second) feature data (for a set of patients/groups) as weighted according to W_xz^L in equation 14.)
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Su to incorporate the teachings of Esteban for the same reasons as pointed out for claim 1.

Claim 8 is also rejected because it is just a method implementation of the same subject matter of claim 1 which can be found in Su and Esteban. 

Claim 9 is also rejected because it is just a data analysis program implementation of the same subject matter of claim 1 which can be found in Su and Esteban. 

Claim 3 is  rejected under 35 U.S.C. 103 as being unpatentable over Su, in view of Esteban, and in further view of Xue et al. (“Full Quantification of Left Ventricle via Deep Multitask Learning Network Respecting Intra- and Inter-Task Relatedness”, http://arxiv.org/abs/1706. 01912, arXiv:1706.01912v2 [cs.CV], 14 June 2017, pp. 1-8), hereinafter referred to as Xue.

In regards to claim 3, the rejection of claim 1 is incorporated and Su and Esteban do not further teach  wherein the processor executes the first generation process and the second generation process using a convolutional neural network.  Su does not make use of CNN’s (i.e., the time series data is not dependent on a CNN to extract features such as from images). Esteban also does not make use of a CNN.
However, Xue, in the analogous environment of using an RNN for predictive time series analysis, teaches wherein the processor executes the first generation process and the second generation process using a convolutional neural network.  ([p. 4, Section 2.1, p. 6, Section 3, Figure 2]To obtain expressive and task-aware features, we design a specially tailored deep CNN for cardiac images, as shown in the left of Fig. 2… As a feature embedding network, our CNN maps each cardiac image Xs,f into a fixed-length low dimension vector…w_cnn…. In this work, two RNN modules, as shown by the green and yellow blocks in Fig. 2, are deployed for the regression tasks and the classification task. For the three regression tasks, the indices to be estimated are mainly related to the spatial structure of cardiac LV in each frame. For the classification task, the cardiac phase is mainly related to the structure difference between successive frames. Therefore, the two RNN modules are designed to capture these two kinds of dependencies. The outputs of RNN modules are {h s,1 m , , , , hs,F m } = frnn([e s,1 , ...es,F ]|wm), m ∈ {rnn1, rnn2}., We apply a two-step strategy for training our network to alleviate the difficulties caused by the different learning rate and loss function in multitask learning [15,16]. Firstly the CNN embedding, the first RNN module and the three regression models are learned together with no back propagation from the classification task, to obtain accuracy prediction for the regression tasks; with the obtained CNN embedding, the second RNN module and the linear classification model are then learned while the rest of the network are kept frozen., wherein a CNN performs feature extraction from a sequence of images and embeds them in a lower dimensional space for use in an RNN for predicting various medical parameters from the temporal sequence of the embedded feature data such that not only do the RNN’s perform a generation process (for either first/training or second/evaluation data) to generate internal data (hidden states) but the CNN also generates internal data in the form of the representation of the features in the embedded feature space (also for either first/training or second/evaluation data) according to (learned) CNN model parameters (such as w_cnn).)   
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Su and Esteban to incorporate the teachings of Xue to generate internal data for first and second feature data groups using a CNN. The modification would have been obvious because one of ordinary skill would have been motivated to improved predictive accuracy for multitask deep learning frameworks according to temporally sequenced feature data using a CNN when that feature data is derived from image sequences, such as cardiac images (Xue, [Abstract, p. 7, Section 5, Table 1]).

Claim 7 is  rejected under 35 U.S.C. 103 as being unpatentable over Su, in view of Esteban, and in further view of Che et al. (“Interpretable Deep Models for ICU Outcome Prediction”, AMIA annual symposium proceedings, Vol. 2016, pp. 371-380) , hereinafter referred to as Che.

In regards to claim 7, the rejection of claim 6 is incorporated and Su and Esteban do not further teaches wherein the processor executes an output process outputting the second feature data and the importance data to be associated with each other.  Neither Su nor Esteban discloses an outputting process that indicates the relative importance (i.e., weighting) associated with the features used to form the predictions.
However, Che, in the analogous environment of using an RNN for time series analysis, teaches wherein the processor executes an output process outputting the second feature data and the importance data to be associated with each other. ([pp. 372-373, Section 2.2, p. 373, Section 3.1, p. 374, Section 3.3, p. 377, Section 4.4.1, Figure 1, Figure 2, Figure 5] The structure of GRU is shown in Figure 1(b). Let xt ∈ R P denotes the variables at time t, where 1 ≤ t ≤ T. At each time t, GRU has a reset gate r j t and an update gate z j t for each of the hidden state h j t . The update function of GRU is shown as follows:… where matrices Wz,Wr,W, Uz, Ur, U and vectors bz, br, b are model parameters. At time t, we take the hidden states ht and treat it as the output of GRU xnnt at that time. As shown in Figure 1(c), we flatten the output of GRU at each time step and add another sigmoid layer on top of them to get the prediction ynn., The way of distilling knowledge, a.k.a. mimicking the complex models, is to utilize the soft labels learned from the teacher/base model as the target labels while training the student/mimic model…. The parameters of the student model can be estimated by minimizing the squared loss between the soft labels from the teacher model and the predictions by the student model., In Pipeline 1 (Figure 2), we directly use the predicted soft labels from deep learning models. In the first step, we train a deep learning model, which can be a simple feedforward network or GRU, given the input X and the original target y (which is either 0 or 1 for binary classification). Then, for each input sample X, we obtain the soft prediction score ynn ∈ [0, 1] from the prediction layer of the neural network. Usually, the learned soft score ynn is close but not exactly the same as the original binary label y. In the second step, we train a mimic Gradient boosting model, given the raw input X and the soft label ynn as the model input and target, respectively. We train the mimic model to minimize the mean squared error of the output ym to the soft label ynn., We show the aggregated feature importance scores on different days in Figure 5. The trend of feature importance for GBTmimic methods is Day 1 > Day 0 > Day 2 > Day 3, which means early observations are more useful for both MOR and VFD prediction tasks. On the other hand, for GBT methods, the trend is Day 1 > Day 3 > Day 2 > Day 0 for both the tasks. Overall, Day-1 features are more useful across all the tasks and models., wherein, in a RNN-based deep learning framework, an output process outputs the association between feature data (second feature data – test/evaluation data) and respective importance measures in a predicted output in which the importance metrics are derived using Gradient Boosted trees and in which the prediction output is based on the output of the RNN (hidden states and transformations leading to the soft prediction) as well as the input features themselves for which the importance is computed.) 
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Su and Esteban to incorporate the teachings of Che to generate an output that associates the feature importance measures with the corresponding/respective (second) feature data. The modification would have been obvious because one of ordinary skill would have been motivated to improve the interpretability of time feature data in an effective deep learning time series analysis framework used to predict future events, such as to facilitate decision making in a clinical or medical setting (Che, [Abstract, p. 378, Section 5, Figure 8]).
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Lu et al. (“Knowledge Distillation for Small-Footprint Highway Networks”, 2017 IEEE International Conference on Acoustics, Speech and signal Processing (ICASSP), 2017, pp. 4820-4824) teach the learning of RNN model parameters (including 3 distinct weight matrices to be learned) using cross-entropy training for efficient RNN implementation

Any inquiry concerning this communication or earlier communications from the examiner should be directed to ROBERT LEWIS KULP whose telephone number is (571)272-7983. The examiner can normally be reached M, Th, F 8-5:30; Tu 8-3.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Miranda Huang, can be reached on 571-270-7092. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/ROBERT LEWIS KULP/Examiner, Art Unit 2124                                                                                                                                                                                                        
/MIRANDA M HUANG/Supervisory Patent Examiner, Art Unit 2124