Detailed Action
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim 1-4 and 7-8 are pending.
Claim 5-6 are cancelled.

Priority
Acknowledgment is made of applicant’s claim for foreign priority under 35 U.S.C. 119 (a)-(d). The certified copy has been filed in parent Application EP 18170554.2, filed on 2018-05-03.

Claim Objections
The amended claims are received 07/22/2022. The amended claims are acceptable.

Claim Interpretation
The amended claims are received 07/22/2022. The amended claims are acceptable.

Claim Rejections - 35 USC § 112
	The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claim 7 is rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claim 7 recites the limitation "a computer system, comprising a processor, a memory, and a computer readable hardware storage device, said code executable by the processor via a memory to implement a method " in line 2.  There is insufficient antecedent basis for this limitation ‘code’ in the claim.
	For purpose of examination that claim is interpreted as: ‘said a code executable …’
	
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claim 1-4, and 7-8 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.

Regarding claim 1,
2A Prong 1: The limitation of determining influence of attributes in and trained on therapy prediction, comprising the following steps starting at time step is a mental process, as the limitation merely recites training and making prediction using a model, which can be performed in human mind.
The limitation of b) determining for each output proportions for each input vector, where the proportions are each based on a respective component of the input vector, a weight for the respective component and the respective output, wherein the weight is known from the respective layer 1 is a mathematical process, as it merely recites calculating proportions using the respective components and the weight.
The limitation of c) decomposing for each output a relevance score Rki, wherein said relevance score Rki is known from a relevance score Rjl" of the previous step 1+1 or in step L from the first relevance score RkL, into decomposed relevance scores Rkeji for each component xj1 of the input vector x based on the proportions pji is a mathematical concept, because the limitation recites a mathematical calculation to compute the relevance score for each of the neuron. 
The limitation of d) combining all decomposed relevance scores Rkeji of the present step 1 to the relevance score Rl for the next step 1-1 is a mathematical concept, because the limitation recites a process of computing the sum of the calculated scores.
The limitation of e) executing steps a) to d) for the next time step, wherein the layers are the layers for the next time step, the input vector is a last hidden state, which is based on the output of the previous time step, and the first relevance score is a relevance score of the previous hidden state which is the last relevance score of the first layer of the previous time step is a mathematical concept, as it merely recites a process of calculating output using the given vectors and network.
The limitation of f) outputting a sequence of relevance scores of the respective first layer of all time steps t is a mathematical process, because the limitation merely recites returning the result of calculations.
2A Prong 2: This judicial exception is not integrated into a practical application. The limitation of further comprising the following iterative steps for each layer 1 starting at layer L is a form of insignificant extra-solution activity. The limitation of a) receiving the layers 1 , an input vector xi of size M for the first layer 1=1 comprising input features and a first relevance score RkL of size M for each output neuron zk, where k is 1 to N is also a form of insignificant extra-solution activity.
2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. The limitation of Recurrent Neural Networks, RNN, having 1 layers, where 1 is 1 to L, and time steps t, where t is 1 to T , all of the neurons, and hidden-to-hidden networks, merely says which particular technological field or environment the abstract idea is performed in (MPEP 2106.05(h)). The limitation of further comprising the following iterative steps for each layer 1 starting at layer L was considered to be insignificant extra-solution activity in Step 2A Prong 2, and thus re-evaluated in Step 2B to determine if it is more than what is well-understood, routine, conventional activity in the field. The limitation merely recites performing repetitive calculations (MPEP 2106.05(d)(II)ii). The limitation of receiving the layers 1 of an input-to-hidden network, an input vector xi of size M for the first layer 1=1 comprising input features and a first relevance score RkL of size M for each output neuron zk, where k is 1 to N is a mere data gathering (MPEP 2106.05(g)). The claim is not patent eligible.

Regarding claim 7, the limitation of a computer system, comprising a processor, a memory, and a computer readable hardware storage device, said code executable by the processor via a memory to implement a method is a generic computer hardware that performs generic computer functions being used to implement the abstract idea.
Claim 7 is a system claim having similar limitation to the method claim 1. Therefore, it is rejected under the same rationale as the claim 1.

Regarding claim 8, the limitation of a computer program product, comprising a computer readable hardware storage device having computer readable program code stored therein, said program code executable by a processor of a computer system to implement a method is a generic computer hardware that performs generic computer functions being used to implement the abstract idea.
Claim 8 is a computer program product claim having similar limitation to the method claim 1. Therefore, it is rejected under the same rationale as the claim 1.

Regarding claim 2, the limitation of wherein in step b) the respective output is determined by the input vector xi and a respective weight vector wij is a mathematical concept, as it merely recites using two vectors to compute the value of the output neuron. 
The limitation of output neuron k merely says which particular technological field or environment the abstract idea is performed in (MPEP 2106.05(h)).

Regarding claim 3, the limitation of wherein in step b) stabilizers are introduced to avoid numerical instability is a mathematical concept, as it merely recites introducing variable needed to calculate the final output.

Regarding claim 4, the limitation of wherein the RNN is a simple RNN or a Long Short-Term Memory, LSTM, network or a Gated Recurrent Unit, GRU, network merely says which particular technological field or environment the abstract idea is performed in (MPEP 2106.05(h)).


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim 1-4, and 7-8 are rejected under 35 U.S.C. 103 over Arras (Arras et al, 2017, “Explaining Recurrent Neural Network Predictions in Sentiment Analysis”) in view of Bach (Bach et al, 2015, “On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation”) and further in view of Corrado (US 20170032241 A1).

Regarding claim 1, Arras teaches method of determining influence of attributes in Recurrent Neural Networks, RNN, having n layers, where n is 1 to L, and time steps t, where t is 1 to T ([Arras, page 2, right column, first paragraph, line 2-8] “for a given input x, it redistributes the quantity fc(x), starting from the output layer of the network and backpropagating this quantity up to the input layer. The LRP relevance propagation procedure can be described layer-by-layer for each type of layer occurring in a deep convolutional neural network”, teaches variety of layers. 1 to L is merely a number given to each of the layers, [Arras, page 9, Appendix] “the last hidden state hT is eventually attached to a fully-connected linear layer yielding a prediction score vector f(x), with one entry fc(x) per class, which is used for sentiment prediction”, discloses accepting last hidden state as a input, and as shown in the equation,             
                
                    
                        f
                    
                    
                        t
                    
                
                =
                s
                i
                g
                m
                 
                
                    
                        
                            
                                W
                            
                            
                                f
                            
                        
                        
                            
                                h
                            
                            
                                t
                                -
                                1
                            
                        
                        +
                        
                            
                                U
                            
                            
                                f
                                
                                    
                                        x
                                    
                                    
                                        t
                                    
                                
                            
                        
                        +
                        
                            
                                b
                            
                            
                                f
                            
                        
                    
                
            
         , the LSTM involves the time steps, t-1 and t), comprising the following steps starting at time step T: 
a) receiving the layers n of an RNN, an input vector for the first layer n=1 comprising input features for the RNN and a first relevance score RkL for each output neuron zk, where k is 1 to N ([Arras, page 2, right column, Weighted Connections, second paragraph] “Given the relevances Rj of the upper-layer neurons zj , the goal is to compute the lower-layer relevances Ri of the neurons zi. (In the particular case of the output layer, we have a single upper-layer neuron zj , whose relevance is set to its value, more precisely we set Rj = fc(x) to start the LRP procedure.)”, receiving layer and input vector is inherent, because they are needed to get output from the RNN. According to the [Arras, page 2, left column, 2 Methods, first paragraph] “Given a trained neural network that models a scalar-valued prediction function fc (also commonly referred to as a prediction score) for each class c of a classification problem, and given an input vector x,”, fc can be interpreted as a layer, and x is the input vector);
further comprising the following iterative steps for each layer ([Arras, page 2, right column, first paragraph, line 1-12] “It is based on a layer-wise relevance conservation principle, and, for a given input x, it redistributes the quantity fc(x), starting from the output layer of the network and backpropagating this quantity up to the input layer. The LRP relevance propagation procedure can be described layer-by-layer for each type of layer occurring in a deep convolutional neural network (weighted linear connections following non-linear activation, pooling, normalization), and consists in defining rules for attributing relevance to lower-layer neurons given the relevances of upper-layer neurons”, discloses the process is iterative process performed in each of the layers): 
b) determining for each output neuron zkn proportions pkjn for each input vector x, where the proportions pkjn are each based on a respective component xj1 of the input vector x, a weight wkjn for the respective component xij and the respective output neuron zkn, wherein the weight wkjn is known from the respective layer n ([Arras, page 3, left column, first paragraph] “The messages Ri←j are computed as a fraction of the relevance Rj accordingly to the following rule:             
                
                    
                        R
                    
                    
                        i
                        ←
                        j
                    
                
                =
                
                    
                        
                            
                                z
                            
                            
                                i
                            
                        
                        ∙
                        
                            
                                w
                            
                            
                                i
                                j
                            
                        
                        +
                        
                            
                                ϵ
                                ∙
                                s
                                i
                                g
                                n
                                
                                    
                                        
                                            
                                                z
                                            
                                            
                                                j
                                            
                                        
                                    
                                
                                +
                                δ
                                ∙
                                
                                    
                                        b
                                    
                                    
                                        j
                                    
                                
                            
                            
                                N
                            
                        
                    
                    
                        
                            
                                z
                            
                            
                                j
                            
                        
                        +
                        ϵ
                        ∙
                        s
                        i
                        g
                        n
                        (
                        
                            
                                z
                            
                            
                                j
                            
                        
                        )
                    
                
                ∙
                
                    
                        R
                    
                    
                        j
                    
                
            
          , where N is the total number of lower-layer neurons to which zj is connected,             
                ϵ
            
         is a small positive number which serves as a stabilizer (we use             
                ϵ
            
         = 0.001 in our experiments), and sign(zj ) = (1zj≥0 − 1zj<0) is defined as the sign of zj”, shows how Arras calculates the proportions, 
[Arras, page 2, right column, Weighted Connections] “Let             
                
                    
                        z
                    
                    
                        j
                    
                
            
         be an upper-layer neuron, whose value in the forward pass is computed as             
                
                    
                        z
                    
                    
                        j
                    
                
                =
                
                    
                        ∑
                        
                            i
                        
                    
                    
                        
                            
                                z
                            
                            
                                i
                            
                        
                        ∙
                        
                            
                                ω
                            
                            
                                i
                                j
                            
                        
                        +
                        
                            
                                b
                            
                            
                                j
                            
                        
                    
                
            
         , where zi are the lower-layer neurons, and             
                
                    
                        ω
                    
                    
                        i
                        j
                    
                
            
         ,             
                
                    
                        b
                    
                    
                        j
                    
                
            
         are the connection weights and biases”,             
                
                    
                        R
                    
                    
                        i
                        ←
                        j
                    
                
            
         corresponds to the proportion, the value of the             
                
                    
                        z
                    
                    
                        j
                    
                
                 
            
        is the xj1,             
                
                    
                        z
                    
                    
                        i
                    
                
            
         corresponds to the xjl, zj corresponds to zkl, and             
                
                    
                        ω
                    
                    
                        i
                        j
                    
                
            
         corresponds to the weight); 
c) decomposing for each output neuron a relevance score, wherein said relevance score is known from a relevance score of the previous step ([Arras, page 3, right column, last paragraph, line 1-4] “Using this trained bi-LSTM, we compare two relevance decomposition methods: sensitivity analysis (SA) and Layer-wise Relevance Propagation (LRP)”, so the relevance propagation method is the decomposition process. 
[Arras, page 3, left column, first paragraph] “The relevance redistribution onto lower-layer neurons is performed in two steps. First, by computing relevance messages Ri←j going from upper-layer neurons zj to lower layer neurons zi … The messages Ri←j are computed as a fraction of the relevance Rj accordingly to the following rule:             
                
                    
                        R
                    
                    
                        i
                        ←
                        j
                    
                
                =
                
                    
                        
                            
                                z
                            
                            
                                i
                            
                        
                        ∙
                        
                            
                                w
                            
                            
                                i
                                j
                            
                        
                        +
                        
                            
                                ϵ
                                ∙
                                s
                                i
                                g
                                n
                                
                                    
                                        
                                            
                                                z
                                            
                                            
                                                j
                                            
                                        
                                    
                                
                                +
                                δ
                                ∙
                                
                                    
                                        b
                                    
                                    
                                        j
                                    
                                
                            
                            
                                N
                            
                        
                    
                    
                        
                            
                                z
                            
                            
                                j
                            
                        
                        +
                        ϵ
                        ∙
                        s
                        i
                        g
                        n
                        (
                        
                            
                                z
                            
                            
                                j
                            
                        
                        )
                    
                
                ∙
                
                    
                        R
                    
                    
                        j
                    
                
            
          , where N is the total number of lower-layer neurons to which zj is connected,             
                ϵ
            
         is a small positive number which serves as a stabilizer (we use             
                ϵ
            
         = 0.001 in our experiments), and sign(zj ) = (1zj≥0 − 1zj<0) is defined as the sign of zj ”. According to the Arras, this process is the decomposition process, and Rj is the previous relevance score. The formula in front of the Rj is the proportion. All of the process involves time steps as described in [Arras, page 9, right column,-page 10, entire Appendix], as all of the models used in this experiment includes time steps i.e. variable t of              
                
                    
                        h
                    
                    
                        t
                        -
                        1
                    
                
            
        ); 
d) combining all decomposed relevance scores ([Arras, page 3, left column, first paragraph] “The relevance redistribution onto lower-layer neurons is performed in two steps. First, by computing relevance messages Ri←j going from upper-layer neurons zj to lower layer neurons zi . Then, by summing up incoming messages for each lower-layer neuron zi to obtain the relevance Ri. The messages Ri←j are computed as a fraction of the relevance Rj accordingly to the following rule:             
                
                    
                        R
                    
                    
                        i
                        ←
                        j
                    
                
                =
                
                    
                        
                            
                                z
                            
                            
                                i
                            
                        
                        ∙
                        
                            
                                w
                            
                            
                                i
                                j
                            
                        
                        +
                        
                            
                                ϵ
                                ∙
                                s
                                i
                                g
                                n
                                
                                    
                                        
                                            
                                                z
                                            
                                            
                                                j
                                            
                                        
                                    
                                
                                +
                                δ
                                ∙
                                
                                    
                                        b
                                    
                                    
                                        j
                                    
                                
                            
                            
                                N
                            
                        
                    
                    
                        
                            
                                z
                            
                            
                                j
                            
                        
                        +
                        ϵ
                        ∙
                        s
                        i
                        g
                        n
                        (
                        
                            
                                z
                            
                            
                                j
                            
                        
                        )
                    
                
                ∙
                
                    
                        R
                    
                    
                        j
                    
                
            
          , where N is the total number of lower-layer neurons to which zj is connected,             
                ϵ
            
         is a small positive number which serves as a stabilizer (we use             
                ϵ
            
         = 0.001 in our experiments), and sign(zj ) = (1zj≥0 − 1zj<0) is defined as the sign of zj ”, the summing up process corresponds to the combining process); 
e) executing steps a) to d) for the next time step t-1 of the RNN, wherein the layers n are the layers n of a hidden-to-hidden network of the neural network for the next time step t-1, the input vector x^n is a last hidden state h~t, which is based on the output neuron zit of the neural network of the previous time step t, and the first relevance score RkL is a relevance score of the previous hidden state Rjnt which is the last relevance score Rj of the first layer n=1 of the previous time step t ([Arras, page 3, right column, 3 Recurrent Model and Data, first paragraph] “As a recurrent neural network model we employ a one hidden-layer bi-directional LSTM (bi-LSTM), … This model takes as input a sequence of words x1; x2; :::; xT (as well as this sequence in reversed order), where each word is represented by a word embedding of dimension 60, and has a hidden layer size of 60”, discloses the hidden layers. 
[Arras, page 9, right column, Appendix] “the last hidden state hT is eventually attached to a fully-connected linear layer yielding a prediction score vector f(x), with one entry fc(x) per class, which is used for sentiment prediction”, discloses accepting last hidden state as a input, and as shown in the equation,             
                
                    
                        f
                    
                    
                        t
                    
                
                =
                s
                i
                g
                m
                 
                
                    
                        
                            
                                W
                            
                            
                                f
                            
                        
                        
                            
                                h
                            
                            
                                t
                                -
                                1
                            
                        
                        +
                        
                            
                                U
                            
                            
                                f
                                
                                    
                                        x
                                    
                                    
                                        t
                                    
                                
                            
                        
                        +
                        
                            
                                b
                            
                            
                                f
                            
                        
                    
                
            
         , the LSTM involves the time steps, t-1 and t, and each calculation is done each of the timesteps. 
[Arras, page 2, right column, Weighted Connections, second paragraph – page 3, left column, first and second paragraph] “Given the relevances Rj of the upper-layer neurons zj , the goal is to compute the lower-layer relevances Ri of the neurons zi. (In the particular case of the output layer, we have a single upper-layer neuron zj , whose relevance is set to its value, more precisely we set Rj = fc(x) to start the LRP procedure.) The relevance redistribution onto lower-layer neurons is performed in two steps. First, by computing relevance messages Ri j going from upper-layer neurons zj to lower-layer neurons zi. Then, by summing up incoming messages for each lower-layer neuron zi to obtain the relevance Ri. The messages Ri j are computed as a fraction of the relevance Rj accordingly to the following rule:             
                
                    
                        R
                    
                    
                        i
                        ←
                        j
                    
                
                =
                
                    
                        
                            
                                z
                            
                            
                                i
                            
                        
                        ∙
                        
                            
                                w
                            
                            
                                i
                                j
                            
                        
                        +
                        
                            
                                ϵ
                                ∙
                                s
                                i
                                g
                                n
                                
                                    
                                        
                                            
                                                z
                                            
                                            
                                                j
                                            
                                        
                                    
                                
                                +
                                δ
                                ∙
                                
                                    
                                        b
                                    
                                    
                                        j
                                    
                                
                            
                            
                                N
                            
                        
                    
                    
                        
                            
                                z
                            
                            
                                j
                            
                        
                        +
                        ϵ
                        ∙
                        s
                        i
                        g
                        n
                        (
                        
                            
                                z
                            
                            
                                j
                            
                        
                        )
                    
                
                ∙
                
                    
                        R
                    
                    
                        j
                    
                
            
          where N is the total number of lower-layer neurons to which zj is connected”, discloses the previous relevance score fed into the current );
f) outputting a sequence of scores Rjnt of the respective first layer n=1 of all time steps t ([Arras, page 7, left column, second paragraph, line 9-19] “The resulting distributions, for different relevance target classes, are reported in Fig. 4. Interestingly, the relevance distributions are not symmetric w.r.t. to the sentence middle, and the major part of the relevance is attributed to the second half of the sentences, except for the target class “neutral”, where the most relevance is attributed to the last computational time steps of the left or the right encoder, resulting in an almost symmetric distribution of the total relevance for that class”, discloses time step is involved in the calculation, [Arras, page 5, Figure 1; page 8, Figure 4] Figure 1 shows the measured relevance of each of the words that processed each time steps, and figure 4 shows the distribution of the sequence of the scores).
Arras does not specifically disclose following limitations:
an input vector xi of size M for the first layer n=1 comprising input features and a first relevance score RkL of size M for each output neuron zk, where k is 1 to N, further comprising the following iterative steps for each layer n starting at layer L , 
further comprising the following iterative steps for each layer 1 starting at layer L:
decomposing for each output neuron zkl a relevance score Rki, into decomposed relevance scores Rkeji for each component xj1 of the input vector x based on the proportions pji, 
combining all decomposed relevance scores Rkeji of the present step 1 to the relevance score Rl for the next step 1-1. Arras failed to teach the RNN trained on therapy prediction.
Bach teaches a) an input vector xi of size M for the first layer 1=1 comprising input features for the neural network and a first relevance score RkL of size M for each output neuron zk, where k is 1 to N ( [Bach, page 4, second paragraph] “The first layer are the inputs, the pixels of the image, the last layer is the real-valued prediction output of the classifier f. The l-th layer is modeled as a vector             
                z
                =
                
                    
                        
                            
                                
                                    
                                        z
                                    
                                    
                                        d
                                    
                                    
                                        l
                                    
                                
                            
                        
                    
                    
                        d
                        =
                        1
                    
                    
                        V
                        
                            
                                l
                            
                        
                    
                
            
         with dimensionality V(l). Layer-wise relevance propagation assumes that we have a Relevance score             
                
                    
                        R
                    
                    
                        d
                    
                    
                        l
                        +
                        1
                    
                
            
         for each dimension             
                
                    
                        z
                    
                    
                        d
                    
                    
                        l
                        +
                        1
                    
                
            
        of the vector z at layer l + 1. The idea is to find a Relevance score             
                
                    
                        R
                    
                    
                        d
                    
                    
                        l
                    
                
            
         for each dimension             
                
                    
                        z
                    
                    
                        d
                    
                    
                        l
                    
                
            
        of the vector z at the next layer l which is closer to the input layer such that the following equation holds”, discloses that if the vector size is equal to the relevance score size);
further comprising the following iterative steps for each layer 1 starting at layer L ([Bach, page 4, second paragraph, line 1-2] “Iterating Eq (2) from the last layer which is the classifier output f(x) down to the input layer x consisting of image pixels then yields the desired Eq (1)”, discloses the process is iterative process performed in each of the layers. See page 4 for Eq(2), and see page 3 for Eq(1)): 
c) decomposing for each output neuron zkl a relevance score Rki, wherein said relevance score Rki is known from a relevance score Rjl" of the previous step 1+1 or in step L from the first relevance score RkL, into decomposed relevance scores Rkeji for each component xj1 of the input vector x based on the proportions pji ([Bach, page 4, second paragraph - third paragraph] “The first layer are the inputs, the pixels of the image, the last layer is the real-valued prediction output of the classifier f. The l-th layer is modeled as a vector             
                z
                =
                
                    
                        
                            
                                
                                    
                                        z
                                    
                                    
                                        d
                                    
                                    
                                        l
                                    
                                
                            
                        
                    
                    
                        d
                        =
                        1
                    
                    
                        V
                        
                            
                                l
                            
                        
                    
                
            
         with dimensionality V(l). Layer-wise relevance propagation assumes that we have a Relevance score             
                
                    
                        R
                    
                    
                        d
                    
                    
                        l
                        +
                        1
                    
                
            
         for each dimension             
                
                    
                        z
                    
                    
                        d
                    
                    
                        l
                        +
                        1
                    
                
            
        of the vector z at layer l + 1. The idea is to find a Relevance score             
                
                    
                        R
                    
                    
                        d
                    
                    
                        l
                    
                
            
         for each dimension             
                
                    
                        z
                    
                    
                        d
                    
                    
                        l
                    
                
            
        of the vector z at the next layer l which is closer to the input layer such that the following equation holds … Iterating Eq (2) from the last layer which is the classifier output f(x) down to the input layer x consisting of image pixels then yields the desired Eq (1). The Relevance for the input layer will serve as the desired sum decomposition in Eq (1). In the following we will derive further constraints beyond Eqs (1) and (2) and motivate them by examples. As we will show now, a decomposition satisfying Eq (2) per se is neither unique, nor it is guaranteed that it yields a meaningful interpretation of the classifier prediction”, all of the process that calculates the relevance score of each of the layers, is the decomposition process, [Bach, page 4, 3rd paragraph and equation (4)] “Let us define the relevance for the second layer trivially as Rð2Þ 1 ¼ f ðxÞ. Then, one possible layer-wise relevance propagation formula would be to define the relevance R(1) for the inputs x as             
                
                    
                        R
                    
                    
                        d
                    
                    
                        1
                    
                
                =
                f
                (
                x
                )
                
                    
                        
                            
                                
                                    
                                        |
                                        α
                                    
                                    
                                        d
                                    
                                
                                
                                    
                                        ϕ
                                    
                                    
                                        d
                                    
                                
                                
                                    
                                        
                                            
                                                x
                                            
                                            
                                                d
                                            
                                        
                                    
                                
                                |
                            
                        
                    
                    
                        
                            
                                ∑
                                
                                    d
                                
                            
                            
                                |
                                
                                    
                                        α
                                    
                                    
                                        d
                                    
                                
                                
                                    
                                        ϕ
                                    
                                    
                                        d
                                    
                                
                                (
                                
                                    
                                        x
                                    
                                    
                                        d
                                    
                                
                                )
                                |
                            
                        
                    
                
            
        ”, shows that the relevance score is based on the proportions); 
d) combining all decomposed relevance scores Rkeji of the present step 1 to the relevance score Rl for the next step 1-1 ([Bach, page 5, line 11-13, Equation (6) and (7)] “The top layer consists of one output neuron, indexed by 7. For each neuron i we would like to compute a relevance Ri. We initialize the top layer relevance             
                
                    
                        R
                    
                    
                        7
                    
                    
                        3
                    
                
            
         as the function value, thus             
                
                    
                        R
                    
                    
                        7
                    
                    
                        3
                    
                
                =
                f
                (
                x
                )
            
        . Layer-wise relevance propagation in Eq (2) requires now to hold             
                
                    
                        R
                    
                    
                        7
                    
                    
                        3
                    
                
                =
                
                    
                        R
                    
                    
                        4
                    
                    
                        2
                    
                
                +
                
                    
                        R
                    
                    
                        5
                    
                    
                        2
                    
                
                +
                
                    
                        R
                    
                    
                        6
                    
                    
                        2
                    
                
                 
                 
                
                    
                        6
                    
                
                 
                 
                 
                 
                
                    
                        R
                    
                    
                        4
                    
                    
                        2
                    
                
                +
                
                    
                        R
                    
                    
                        5
                    
                    
                        2
                    
                
                +
                
                    
                        R
                    
                    
                        6
                    
                    
                        2
                    
                
                =
                
                    
                        R
                    
                    
                        1
                    
                    
                        1
                    
                
                +
                
                    
                        R
                    
                    
                        2
                    
                    
                        1
                    
                
                +
                
                    
                        R
                    
                    
                        3
                    
                    
                        1
                    
                
                 
                 
                 
                (
                7
                )
            
          ... The messages are, however, directed from a neuron towards its input neurons, in contrast to what happens at prediction time, as shown in the right panel of Fig 2. Secondly, we define the relevance of any neuron except neuron 7 as the sum of incoming messages:             
                
                    
                        R
                    
                    
                        i
                    
                    
                        l
                    
                
                =
                
                    
                        ∑
                        
                            k
                             
                            i
                             
                            i
                            s
                             
                            i
                            n
                            p
                            u
                            t
                             
                            f
                            o
                            r
                             
                            n
                            e
                            u
                            r
                            o
                            n
                             
                            k
                        
                    
                    
                        
                            
                                R
                            
                            
                                i
                                <
                                -
                                k
                            
                            
                                l
                                ,
                                l
                                +
                                1
                            
                        
                    
                
            
          ”, the summing up process corresponds to the combining process,             
                l
            
         in the equation is the next step,             
                l
                +
                1
            
         is the present step); 
Before the effective filing date of the invention to a person of ordinary skill in the art, it would have been obvious, having both the teachings of Arras and Bach, to use time-step dependent process of using previous hidden state and relevance score to calculate the result score (also known as LSTM) of Bach to implement the method of determining influence of attributes of Arras. The suggestion and/or motivation for doing so is to obtain relevance score for every each of the words in a paragraph so that the relevance scores of each of the word can be compared.
Arras in view of Bach failed to teach the RNN trained on therapy prediction. 
Corrado teaches the RNN trained on therapy prediction ([Corrado, Abstract] “One of the methods includes obtaining a first temporal sequence of health events, wherein the first temporal sequence comprises respective health-related data associated with a particular patient at each of a plurality of time steps; processing the first temporal sequence of health events using a recurrent neural network to generate a neural network output for the first temporal sequence”),
Before the effective filing date of the invention to a person of ordinary skill in the art, it would have been obvious, having both the teachings of Arras, Bach and Corrado, to use the process of using therapy prediction data to train the RNN of Corrado to implement the method of determining influence of attributes of Arras and Bach. The suggestion and/or motivation for doing so is to predict which kind of therapy works better for a patient.

Regarding claim 7, Arras in view of Bach and further in view of Corrado teaches a computer system, comprising a processor, a memory, and a computer readable hardware storage device, said code executable by the processor via a memory to implement a method ([Arras, page 1, left column, Abstract] “Recently, a technique called Layer-wise Relevance Propagation (LRP) was shown to deliver insightful explanations in the form of input space relevances for understanding feed-forward neural network classification decisions. In the present work, we extend the usage of LRP to recurrent neural networks. We propose a specific propagation rule applicable to multiplicative connections as they arise in recurrent network architectures such as LSTMs and GRUs. We apply our technique to a word-based bi-directional LSTM model on a five-class sentiment prediction task …”. LSTM and GRU are neural network architecture and software elements executed on computers which inherently includes processor, memory, hardware storage device and code that are executable by processor.).
Claim 7 is a system claim having similar limitation to the method claim 1. Therefore, it is rejected under the same rationale as the claim 1.

Regarding claim 8, Arras in view of Bach and further in view of Corrado teaches a computer program product, comprising a computer readable hardware storage device having computer readable program code stored therein, said program code executable by a processor of a computer system to implement a method ([Arras, page 1, left column, Abstract] “Recently, a technique called Layer-wise Relevance Propagation (LRP) was shown to deliver insightful explanations in the form of input space relevances for understanding feed-forward neural network classification decisions. In the present work, we extend the usage of LRP to recurrent neural networks. We propose a specific propagation rule applicable to multiplicative connections as they arise in recurrent network architectures such as LSTMs and GRUs. We apply our technique to a word-based bi-directional LSTM model on a five-class sentiment prediction task …”. LSTM and GRU are neural network architecture and software elements executed on computers which inherently includes processor, memory, hardware storage device and code that are executable by processor.).
Claim 8 is a computer program product claim having similar limitation to the method claim 1. Therefore, it is rejected under the same rationale as the claim 1.

Regarding claim 2, Arras in view of Bach, and further in view of Corrado teaches wherein in step b) the respective output neuron k is determined by the input vector x1 and a respective weight vector wk1 ([Arras, page 2, right column, Weighted Connections] “Let             
                
                    
                        z
                    
                    
                        j
                    
                
            
         be an upper-layer neuron, whose value in the forward pass is computed as             
                
                    
                        z
                    
                    
                        j
                    
                
                =
                
                    
                        ∑
                        
                            i
                        
                    
                    
                        
                            
                                z
                            
                            
                                i
                            
                        
                        ∙
                        
                            
                                ω
                            
                            
                                i
                                j
                            
                        
                        +
                        
                            
                                b
                            
                            
                                j
                            
                        
                    
                
            
         , where zi are the lower-layer neurons, and             
                
                    
                        ω
                    
                    
                        i
                        j
                    
                
            
         ,             
                
                    
                        b
                    
                    
                        j
                    
                
            
         are the connection weights and biases”).

Regarding claim 3, Arras in view of Bach, and further in view of Corrado teaches wherein in step b) stabilizers are introduced to avoid numerical instability ([Bach, page 21, first paragraph, line 1-6 and equation 58, 59] “A drawback of the propagation rule of Eq (56) is that for small values zj, relevances Ri j can take unbounded values. Unboundedness can be overcome by introducing a predefined stabilizer ε >=0: … where we can observe that some further relevance is absorbed by the stabilizer. In particular, relevance is fully absorbed if the stabilizer ε becomes very large”).

Regarding claim 4, Arras in view of Bach, and further in view of Corrado teaches wherein the RNN is a simple RNN or a Long Short-Term Memory, LSTM, network or a Gated Recurrent Unit, GRU, network ([Arras, page 1, left column, Abstract, line 6-15] “In the present work, we extend the usage of LRP to recurrent neural networks. We propose a specific propagation rule applicable to multiplicative connections as they arise in recurrent network architectures such as LSTMs and GRUs. We apply our technique to a word-based bi-directional LSTM model on a five-class sentiment prediction task”).


Response to Argument
Applicant’s arguments filed 07/22/2022 have been fully considered but they are not persuasive.
The applicant respectfully argues that the 35 U.S.C. 101 rejection failed to show that the invention is an abstract idea, because the limitation of “determining influence of attributes in and trained on therapy prediction, comprising the following steps” cannot be performed practically in human mind, and step c) decomposing for each output a relevance score Rkn, wherein said relevance score Rkn is known from a relevance score Rkn+1 of the previous step n+1 or in step L from the first relevance score RkL, into decomposed relevance scores Rk->jn for each component xjn of the input vector xn based on the proportions pkjn; is not a mathematical concept.
The examiner respectfully disagrees. Regarding the limitation of “determining influence of attributes in and trained on therapy prediction, comprising the following steps”, it is a mental process as it is possible for humans to be trained to predict the type of therapy that needs to be performed and influence of it by performing data analysis mentally with the aid of pen and paper and also in combination of mathematical function or operation. Regarding the step c) decomposing for each output, according to the page 11 of the specification, the decomposing process recites a multiplication operation between pkjn and Rkn, which is a mathematical function as disclosed below from the specification page 11.

    PNG
    media_image1.png
    58
    191
    media_image1.png
    Greyscale

Therefore, applicant’s argument regarding the 101 rejection is not persuasive.

Regarding the 35 U.S.C. 103 rejection of claim 1, 7, and 8, the applicant respectfully argues that the combination of Arras, Bach, and Corrado failed to disclose or suggest b) determining for each output neuron zkl proportions pkjl for each input vector x, where the proportions pkjl are each based on a respective component xj1 of the input vector x, a weight wkjl for the respective component xij and the respective output neuron zkl, wherein the weight wkj1 is known from the respective layer 1;  c) decomposing for each output neuron a relevance score, wherein said relevance score is known from a relevance score of the previous step;  e) executing steps a) to d) for the next time step t-1 of the RNN, wherein the layers 1 are the layers 1 of a hidden-to-hidden network of the neural network for the next time step t-1, the input vector xi is a last hidden state h~t, which is based on the output neuron zit of the neural network of the previous time step t, and the first relevance score RkL is a relevance score of the previous hidden state Rjlt which is the last relevance score Rj of the first layer 1=1 of the previous time step t, because Arras failed to teach the proportions for each input vector.
The examiner respectfully disagrees. Arras teaches the fraction of the relevance for each of the output neuron. The fraction of the relevance for each of the output neuron can be calculated using the equation on the page 3 left column, first paragraph of Arras. The equation recites calculating the proportion of the relevance Rj for each of the output neuron, which can be interpreted as a proportion of the relevance for each of neural network. Furthermore, the w_ij of Arras reference and the w_kj^n of the application is a labeling difference - the w_ij definitely are different for each layer of the NN in Arras. The weights are indeed different for each layer, it's just a difference of how they are referred to, not any difference in the structure of the network or in the steps of the method. 
Regarding the step (c) decomposing process, the page 3, right column, last paragraph, line 1-4 of Arras discloses that the relevance propagation corresponds to the relevance decomposition process, and the relevance decomposition process at the page 3, left column, first paragraph. The Ri->j corresponds to the propagation of the relevance from i to j. Therefore, the cited portion can be interpreted as the decomposition of the relevance score for each of the neurons. 
Regarding the step (e), the claim requires the process of repeating the steps (a) to (d) for the given time steps. The page 2, right column, the last paragraph discloses the relevance redistribution process that calculates the relevance message Ri<-j from upper neurons to lower neurons. This process can be interpreted as the process of repeating steps (a) to (d), which are the relevance decomposition process. Therefore, the claims are not eligible under U.S.C. 103.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
US-20090064332-A1
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JUN KWON whose telephone number is (571)272-2072. The examiner can normally be reached on M-F 7:30AM – 4:00PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Abdullah Kawsar can be reached on (571)270-3169. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/JUN KWON/
Patent Examiner, Art Unit 2127

/ABDULLAH AL KAWSAR/Supervisory Patent Examiner, Art Unit 2127