Detailed Action
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim 1-34 are pending.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claim 19-33 are rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter.  
Regarding claim 19, the claim(s) does/do not fall within at least one of the four categories of patent eligible subject matter because the claim is directed to software per se, as the claim does not disclose any hardware component even though it is a system claim.
The claim recites ‘A system comprising: a history recurrent neural network configured to process a set of hidden states … and an updated recurrent neural network configured to update a current cell state …’. According to the specification and the Figure 2 which discloses the system, ‘[0007] Figure 2 illustrates a block diagram of a dual recurrent neural network architecture system, in accordance with an embodiment. [0008] Figure 3 illustrates an attention mechanism for one implementation of the dual recurrent neural network architecture system of Figure 2, in accordance with an embodiment. [0009] Figure 4 illustrates skip connections for one implementation of the dual recurrent neural network architecture system of Figure 2, in accordance with an embodiment. [0010] Figure 5 illustrates an architecture for each of the recurrent neural networks in the dual recurrent neural network architecture system of Figure 2, in accordance with an embodiment’, the paragraph and the figures do not disclose the claim elements. Therefore under broadest reasonable interpretation the claimed elements can be software or hardware elements. Accordingly, the claim is directed to software per se.  
Claim 20-33 depends on the claim 19. Therefore, the claim inherits the same deficiency.

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claim(s) 1, 7, 11-12, 17-19, 22, 26-27, and 32-34 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Ortiz (Ortiz et al, 2018, “Learning State Representations for Query Optimization with Deep Reinforcement Learning”).

Regarding claim 1, Ortiz teaches a method, comprising:
 identifying a set of hidden states associated with an input sequence ([Ortiz, Figure 3; page 3, left column, line 5-23] “Before using the recursive NNST model, we must learn an additional function, NNinit , as shown in Figure 3. NNinit takes as input (x0,a0), where x0 is a vector that captures the properties of the database D and a0 is a single relational operator. The model outputs the cardinality of the subquery that executes the operation encoded in a0 on D. We define the vector, x0 to represent simple properties of the database, D. The list of properties we provide next is not definitive and more features can certainly be added. Currently, for each attribute in the dataset D, we use the following features to define x0: the min value, the max value, the number of distinct values, and a representation of a 1-D equi-width histogram.” 
[Ortiz, page 3, 3.2 Preliminary Results, second paragraph, line 6-8] “NNinit contains 50 hidden nodes in the hidden layer. We update the model via stochastic gradient descent with a loss based on relative error and a learning rate of .01.” The paragraph teaches NNinit contains 50 hidden states.); 
processing, by a history recurrent neural network, the set of hidden states to learn a cell state transition function associated with the input sequence ([Ortiz, Figure 3; page 3, left column, line 5-23] “Before using the recursive NNST model, we must learn an additional function, NNinit , as shown in Figure 3. NNinit takes as input (x0,a0), where x0 is a vector that captures the properties of the database D and a0 is a single relational operator. The model outputs the cardinality of the subquery that executes the operation encoded in a0 on D. We define the vector, x0 to represent simple properties of the database, D. The list of properties we provide next is not definitive and more features can certainly be added. Currently, for each attribute in the dataset D, we use the following features to define x0: the min value, the max value, the number of distinct values, and a representation of a 1-D equi-width histogram.” 
[Ortiz, page 3, 3.2 Preliminary Results, second paragraph, line 6-8] “NNinit contains 50 hidden nodes in the hidden layer. We update the model via stochastic gradient descent with a loss based on relative error and a learning rate of .01.” The paragraph teaches NNinit contains 50 hidden states 
[Ortiz, page 2, right column, 3.1 Approach, line 14-18] “The NNST function generates these representations by adjusting the weights based on feedback from the NNObserved function. This NNObserved function learns to map a subquery representation to predict a set of observed variables. As we train this model, we use back propagation to adjust the weights for both functions.” Both NNinit and the NNObserved processes the set of hidden states and corresponds to the history recurrent neural network.); 
updating, by an update recurrent neural network, a current cell state and corresponding hidden states for each input of the input sequence, based on the cell state transition function ([Ortiz, page 3, left column, line 5-23; Figure. 3] “... Currently, for each attribute in the dataset D, we use the following features to define x0: the min value, the max value, the number of distinct values, and a representation of a 1-D equi-width histogram. As shown in the figure, we then include the recursive model, NNST , that takes (ht , at ) as input and predicts the observed variables of the subqueries as well as the representation, ht+1 of the new subquery. We combine these models to train them together. During training, the weights are adjusted based on the combined loss from observed variable predictions ...” 
[Ortiz, page 3, right column, line 9-21] “Training NNinit and NNST : … For this next experiment, we predict the cardinality of a query containing both a selection and join operation by using the combined model. Here, a0 represents the selection, while the subsequent action a1 represents the join. Through this combined model, we can ensure that h1 (the hidden state for the selection) captures enough information to be able to predict the cardinality after the join. In Figure 5, we show the cardinality prediction for h1 and h2. In these scatter plots, the x-axis shows the real cardinality, while the y-axis shows the predicted cardinality from the model. Although there is some variance, h1 was able to hold enough information about the underlying data to make reasonable predictions for h2.” The paragraph and the figure discloses training the NNST with hidden states. The NNST corresponds to the update neural network.
[Ortiz, page 4, left column, the 5th paragraph] “Initially, all state-action pairs are random values. At each timestep, the agent selects an action and observes the reward, rt+1 at state st+1. As the agent explores, these state-action pairs will converge to represent the expected reward of the states in future timesteps. At each state transition, each QL(s, a) is updated as follows: QL(st , at ) ←QL(st , at ) + α[rt+1 +γmaxa′QL(st+1, a′) − QL(st , at )] Where themaxa′QL(st+1, a′) represents the maximum value from st+1 given the target policy. We compute the subsequent state given the state transition function, NNST .”).

Claim 19 is a system claim having similar limitation to the method claim 1. Therefore, it is rejected with the same rationale as claim 1.

Regarding claim 34, Ortiz teaches a non-transitory computer-readable media storing computer instructions that, when executed by one or more processors, cause the one or more processors to perform a method ([Ortiz, page 1, Introduction, 2nd paragraph] “Recently, thanks to dropping hardware costs and growing datasets available for training, deep learning has successfully been applied to solving computationally intensive learning tasks in other domains. The advantage of these type of models comes from their ability to learn unique patterns and features of the data that are difficult to manually find or design [3].” Ortiz discloses the machine learning process, which runs in computer with at least one processor that runs the program. A computer-readable media storing computer instruction executed by processors is inherent feature of machine learning systems.). Claim 34 is a non-transitory computer-readable media claim having similar limitation to the method claim 1. Therefore, it is rejected with the same rationale as claim 1.

Regarding claim 7, Ortiz teaches wherein the set of hidden states associated with the input sequence includes all hidden states associated with the input sequence ([Ortiz, page 3, second paragraph – third paragraph; Figure 3] “Before using the recursive NNST model, we must learn an additional function, NNinit , as shown in Figure 3. NNinit takes as input (x0,a0), where x0 is a vector that captures the properties of the database D and a0 is a single relational operator. The model outputs the cardinality of the subquery that executes the operation encoded in a0 on D … As shown in the figure, we then include the recursive model, NNST , that takes (ht , at ) as input and predicts the observed variables of the subqueries as well as the representation, ht+1 of the new subquery. We combine these models to train them together. During training, the weights are adjusted based on the combined loss from observed variable predictions. We want to learn an h1 representation that captures not only enough information to predict the cardinality of that subquery but of other subqueries built by extending it.” The input x0 goes through a set of hidden states h1 and h2, which are the all hidden states.
[Ortiz, page 3, 3.2 Preliminary Results, second paragraph] “Training NNinit : As a first experiment, we initialize x0 with properties of the IMDB dataset and train NNinit to learn h1. a0 represents a conjunctive selection operation overm attributes from the aka_title relation. We generate 20k unique queries, where 15k are used for training the model and the rest are used for testing. NNinit contains 50 hidden nodes in the hidden layer. We update the model via stochastic gradient descent with a loss based on relative error and a learning rate of .01.” This paragraph teaches the NNinit contains 50 hidden states.).

Claim 22 is a system claim having similar limitation to the method claim 7. Therefore, it is rejected under the same rationale as claim 7. 

Regarding claim 11, Ortiz teaches wherein a loss function is utilized to train the history recurrent neural network and the update recurrent neural network ([Ortiz, page 3, left column, 2nd paragraph – 3rd paragraph] “Before using the recursive NNST model, we must learn an additional function, NNinit , as shown in Figure 3. NNinit takes as input (x0,a0), where x0 is a vector that captures the properties of the database D and a0 is a single relational operator. The model outputs the cardinality of the subquery that executes the operation encoded in a0 on D. We define the vector, x0 to represent simple properties of the database, D. The list of properties we provide next is not definitive and more features can certainly be added. Currently, for each attribute in the dataset D, we use the following features to define x0: the min value, the max value, the number of distinct values, and a representation of a 1-D equi-width histogram. As shown in the figure, we then include the recursive model, NNST , that takes (ht , at ) as input and predicts the observed variables of the subqueries as well as the representation, ht+1 of the new subquery. We combine these models to train them together. During training, the weights are adjusted based on the combined loss from observed variable predictions. We want to learn an h1 representation that captures not only enough information to predict the cardinality of that subquery but of other subqueries built by extending it.”).

Claim 26 is a system claim having similar limitation to the method claim 11. Therefore, it is rejected under the same rationale as claim 11. 

Regarding claim 12, Ortiz teaches wherein a perceptual loss is further utilized to train the history recurrent neural network and the update recurrent neural network ([Ortiz, page 3, left column, 3.2 Preliminary Results, second paragraph] “Training NNinit : As a first experiment, we initialize x0 with properties of the IMDB dataset and train NNinit to learn h1. a0 represents a conjunctive selection operation overm attributes from the aka_title relation. We generate 20k unique queries, where 15k are used for training the model and the rest are used for testing. NNinit contains 50 hidden nodes in the hidden layer. We update the model via stochastic gradient descent with a loss based on relative error and a learning rate of .01.” Corresponds to the process of using loss in history RNN.
[Ortiz, page 3, left column, 3rd paragraph] “As shown in the figure, we then include the recursive model, NNST , that takes (ht , at ) as input and predicts the observed variables of the subqueries as well as the representation, ht+1 of the new subquery. We combine these models to train them together. During training, the weights are adjusted based on the combined loss from observed variable predictions.” Corresponds to the process of using loss in updated RNN.).

Claim 27 is a system claim having similar limitation to the method claim 12. Therefore, it is rejected under the same rationale as claim 12. 

Regarding claim 17, Ortiz teaches wherein the history recurrent neural network and the update recurrent neural network form a dual recurrent neural network architecture modeling long-term dependencies in sequential data represented by the input sequence ([Ortiz, page 3, left column, 2nd paragraph – 3rd paragraph] “Before using the recursive NNST model, we must learn an additional function, NNinit , as shown in Figure 3. NNinit takes as input (x0,a0), where x0 is a vector that captures the properties of the database D and a0 is a single relational operator. The model outputs the cardinality of the subquery that executes the operation encoded in a0 on D. We define the vector, x0 to represent simple properties of the database, D. The list of properties we provide next is not definitive and more features can certainly be added. Currently, for each attribute in the dataset D, we use the following features to define x0: the min value, the max value, the number of distinct values, and a representation of a 1-D equi-width histogram. As shown in the figure, we then include the recursive model, NNST , that takes (ht , at ) as input and predicts the observed variables of the subqueries as well as the representation, ht+1 of the new subquery. We combine these models to train them together. During training, the weights are adjusted based on the combined loss from observed variable predictions. We want to learn an h1 representation that captures not only enough information to predict the cardinality of that subquery but of other subqueries built by extending it.”).

Claim 32 is a system claim having similar limitation to the method claim 17. Therefore, it is rejected under the same rationale as claim 17. 

Regarding claim 18, Ortiz teaches further comprising using the dual recurrent neural network architecture to predict long-term future data from the input sequence ([Ortiz, page 4, left column, 5th paragraph] “Initially, all state-action pairs are random values. At each timestep, the agent selects an action and observes the reward, rt+1 at state st+1. As the agent explores, these state-action pairs will converge to represent the expected reward of the states in future timesteps. At each state transition, each QL(s, a) is updated as follows: QL(st , at ) ←QL(st , at ) + α[rt+1 +γmaxa′QL(st+1, a′) − QL(st , at )] Where the maxa′QL(st+1, a′) represents the maximum value from st+1 given the target policy. We compute the subsequent state given the state transition function, NNST .”).

Claim 33 is a system claim having similar limitation to the method claim 18. Therefore, it is rejected under the same rationale as claim 18. 

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 2, 4-5, 8-10, 20-21, and 23-25 is/are rejected under 35 U.S.C. 103 as being unpatentable over Ortiz (Ortiz et al, 2018, “Learning State Representations for Query Optimization with Deep Reinforcement Learning”) in view of Jain (US 20170262996 A1).

Regarding claim 2, Ortiz teaches the method of claim 1. 
However, Ortiz does not specifically teach wherein the input sequence is a sequence of frames of video.
Jain teaches wherein the input sequence is a sequence of frames of video ([Jain, 0087] “Based on the training, as each frame is received, the attention recurrent neural network outputs a classification score for a certain action and an attention feature map 704 for each frame. As shown in FIG. 7, multiple attention feature maps 704 are generated from a frame sequence. In one configuration, the attention recurrent neural network generates a classification score for an action class in each frame based on the training.”).
Before the effective filing date of the invention to a person of ordinary skill in the art, it would have been obvious, having the teachings of Ortiz and Jain to use the method of input sequence is a sequence of frame of video of Jain to implement the prediction method of Ortiz. The suggestion and/or motivation to do so is to perform the prediction of the video frame, since we need the video frame to make the prediction.

Claim 20 is a system claim having similar limitation to the method claim 2. Therefore, it is rejected under the same rationale as claim 2. 

Regarding claim 4, Ortiz in view of Jain teaches wherein the history recurrent neural network and the update recurrent neural network are long short-term memory (LSTM) networks ([Jain, Fig. 6] discloses that both inference network and prediction network are LSTM. 
[Jain, 0067] “In addition, the first upper layer unit r.sub.p1 outputs its hidden state h.sub.p1 as an input to the subsequent lower hidden unit r.sub.l2 at a next frame. In this example, the second lower hidden unit r.sub.I2 receives a lower unit hidden state h.sub.I1 from the first lower hidden unit r.sub.I1. The hidden state of the first upper hidden unit r.sub.p1 is also provided to the second upper hidden unit r.sub.p2 of the upper layer LSTM.”).
Before the effective filing date of the invention to a person of ordinary skill in the art, it would have been obvious, having the teachings of Ortiz and Jain to use the LSTM networks of Jain to implement the prediction method of Ortiz. The suggestion and/or motivation to do so is to improve the efficiency of the method, as LSTM specialize in processing sequential input and video frames are sequential.

Regarding claim 5, Ortiz in view of Jain teaches wherein the history recurrent neural network and the update recurrent neural network are convolutional long short-term memory (ConvLSTM) networks ([Jain, 0063] “FIG. 6 is a diagram illustrating an exemplary architecture 600 for predicting action in a frame and/or a sequence of frames, in accordance with aspects of the present disclosure. The exemplary architecture 600 may be configured as a stratified network including upper layer recurrent neural networks and lower layer recurrent neural networks (e.g., two layers of recurrent neural networks). Although the exemplary architecture includes recurrent neural networks, this is exemplary, as other neural network architectures are also considered. For example, the upper layer may be a recurrent neural network and the lower layer may be an upper convolution layer of a convolutional neural network. For example, the upper layer or lower layer may comprise a recurrent neural network and the upper layer or lower layer may comprise multiple stacked recurrent neural network or long short-term memory layers.” The LSTM of the Jain reference can be a combination of convolution layers and LSTM, which corresponds to convLSTM. ).

Regarding claim 8, Ortiz in view of Jain teaches the limitation of wherein the history recurrent neural network includes an attention mechanism ([Jain, 0066] “In operation, as shown in FIG. 6, for a first frame (time step t.sub.1), an attention map a.sub.1 may be predicted using a first lower hidden unit r.sub.I1, which receives the motion information f.sub.t as input.” The lower unit which corresponds to the history RNN generates an attention map which is an attention mechanism. ).

Claim 23 is a system claim having similar limitation to the method claim 8. Therefore, it is rejected under the same rationale as claim 8. 

Regarding claim 9, Ortiz in view of Jain teaches wherein the history recurrent neural network applies the attention mechanism to the set of hidden states associated with the input sequence ([Jain, 0066] “In operation, as shown in FIG. 6, for a first frame (time step t.sub.1), an attention map a.sub.1 may be predicted using a first lower hidden unit r.sub.I1, which receives the motion information f.sub.t as input.” The lower unit which corresponds to the history RNN generates an attention map which is an attention mechanism.
[Jain, 0064] “The lower layer recurrent neural network uses the motion information f.sub.t and the hidden states from the previous frame to generate an attention saliency map for the current frame t. The motion information ƒ.sub.t may be produced from optical flow, which may be estimated using the current frame and the next frame.” Teaches the f.sub.t is the sequence (optical flow). ).

Claim 24 is a system claim having similar limitation to the method claim 9. Therefore, it is rejected under the same rationale as claim 9. 

Regarding claim 10, Ortiz in view of Jain teaches the method of claim 9, wherein the attention mechanism computes, for a time step k, a relationship between a last hidden state and each earlier hidden state to indicate a weight for each earlier hidden state ([Jain, 0058] “… As shown in FIG. 5A, an attention map a.sub.t and a feature map X.sub.t may be combined by a weighted sum over all the spatial locations in the frame to compute a weighted feature map x.sub.t (e.g., x.sub.t=Σ.sub.k a.sub.t.sup.kX.sub.t.sup.k) as an input, where X.sub.t.sup.k indicates a feature vector (slice) in the feature map X.sub.t at each location k, and a.sub.t.sup.k is the weight in the attention map at its location.” The paragraph discloses the attention map is used to compute the weighted feature map.
[Jain, 0064] “The lower layer recurrent neural network uses the motion information f.sub.t and the hidden states from the previous frame to generate an attention saliency map for the current frame t. The motion information ƒ.sub.t may be produced from optical flow, which may be estimated using the current frame and the next frame. The motion information ƒ.sub.t may be produced via an upper convolution layer of a CNN. For example, the lower layer recurrent neural network unit r.sub.l2 may use the hidden state of the previous hidden unit r.sub.l1, the hidden state h.sub.p1 of the upper layer recurrent neural network unit r.sub.p1 along with motion information f.sub.2 to generate the attention map a.sub.2 for a second frame. As such, the lower layer recurrent neural network may provide the upper layer recurrent neural network with attention maps. The layer units may be artificial neurons or neural units.”
[Jain, 0065] “The generated attention map a.sub.t for a current frame may be combined with a representation of the current frame of a sequence of frames (e.g., video stream). The frame representation may be a frame feature map X.sub.t to create the input x.sub.t for the upper layer recurrent neural networks. In turn, the upper layer recurrent neural network may be configured to output a classification label y.sub.t for the current frame t. In addition, the upper layer recurrent neural network r.sub.pt may output a hidden state h.sub.t of the upper layer recurrent neural network unit, which may be supplied to a subsequent hidden units r.sub.It of the lower layer recurrent neural network and used to calculate or infer the attention map for the subsequent frame a.sub.t+1.” The paragraphs disclose the lower LSTM (previous hidden state) providing attention map to the upper layer (next hidden state) to indicate a weight, which is disclosed in the 0058. ).

Claim 25 is a system claim having similar limitation to the method claim 10. Therefore, it is rejected under the same rationale as claim 10. 

Regarding claim 21, Ortiz in view of Jain teaches wherein the history recurrent neural network and the update recurrent neural network are: long short-term memory (LSTM) networks, convolutional long short-term memory (ConvLSTM) networks, or gated recurrent unit (GRU) networks ([Jain, Fig. 6] discloses that both inference network and prediction network are LSTM. 
[Jain, 0067] “In addition, the first upper layer unit r.sub.p1 outputs its hidden state h.sub.p1 as an input to the subsequent lower hidden unit r.sub.l2 at a next frame. In this example, the second lower hidden unit r.sub.I2 receives a lower unit hidden state h.sub.I1 from the first lower hidden unit r.sub.I1. The hidden state of the first upper hidden unit r.sub.p1 is also provided to the second upper hidden unit r.sub.p2 of the upper layer LSTM.”).

Claim 3 and 6 is/are rejected under 35 U.S.C. 103 as being unpatentable over Ortiz (Ortiz et al, 2018, “Learning State Representations for Query Optimization with Deep Reinforcement Learning”) in view of Li (US 20170293836 A1).

Regarding claim 3, Ortiz teaches the method of claim 1. 
Ortiz does not specifically teach wherein the input sequence is a sequence of speech.
Li teaches wherein the input sequence is a sequence of speech ([Li, 0036] “Each layer can have multiple neuron-like units or nodes (hereinafter “nodes”), with each of the nodes connected to the other nodes. The input layer 210 includes input nodes, the output layer 240 includes output nodes, while the recurrent layer 220 and the aggregate layer 230 include hidden nodes. As an example to which RNN 200 can be applied, in the case of speech where a person utters a spoken digit, the input sequence is the speech signal corresponding to the spoken digit or a representation thereof, which can be unlabeled, while the output can be a label classifying the spoken digit.” ).
Before the effective filing date of the invention to a person of ordinary skill in the art, it would have been obvious, having the teachings of Ortiz and Li to use the method of input sequence is a sequence of speech of Li to implement the prediction method of Ortiz. The suggestion and/or motivation to do make prediction of speech. Speech prediction requires the input speech data to make prediction.

Regarding claim 6, Ortiz teaches the method of claim 1. 
Ortiz does not specifically teach wherein the history recurrent neural network and the update recurrent neural network are gated recurrent unit (GRU) networks.
Li teaches wherein the history recurrent neural network and the update recurrent neural network are gated recurrent unit (GRU) networks ([Li, 0046] FIG. 6 shows an exemplary method 600 for customer profile learning based on a recurrent neural network that uses partially labelled sequence data, in accordance with an embodiment of the present principles. Method 600 corresponds to the recurrent neural network 200 of FIG. 2. The model stacks multiple recurrent neural networks (RNNs) to form the recurrent layer, which captures the long-range temporal dependency of the data. Each RNN is a feed-forward neural network with self-loops. Popular choices of a RNN include a long-short term memory (LSTM) and a gated recurrent unit (GRU). The outputs of the recurrent layer is aggregated in the aggregate layer to predict a label, which includes an auto-encoder/decoder structure.” Each RNN of the stacked RNN can be GRU.).

Claim 13-14 and 28-29 are rejected under 35 U.S.C. 103 as being unpatentable over Ortiz (Ortiz et al, 2018, “Learning State Representations for Query Optimization with Deep Reinforcement Learning”) in view of Kaplanyan (US 20180204314 A1).

Regarding claim 13, Ortiz teaches the method of claim 1. 
Ortiz does not specifically teach wherein a skip connection is utilized between previous and current recurrent layers.
Kaplanyan teaches wherein a skip connection is utilized between previous and current recurrent layers ([Kaplanyan, 0113; Figure 9] “FIG. 9 illustrates an exemplary internal structure 900 of a recurrent RCNN connection, according to one embodiment. As shown, a first plurality of convolutions 902A-C receives a first input 904, and a second plurality of convolutions 902D-F receives a second input 910. A feedback loop 906 provides a hidden, recurrent state 908 from the first plurality of convolutions 902A-C as input to a second plurality of convolutions 902E-F. In this way, information may be retained between inputs of the recurrent RCNN.” ).
Before the effective filing date of the invention to a person of ordinary skill in the art, it would have been obvious, having the teachings of Ortiz and Kaplanyan to use the method of using skip connection of Kaplanyan to implement the prediction method of Ortiz. The suggestion and/or motivation to do so is to improve the accuracy of the prediction method, as using skip connection between layers enable the network to skip some of the hidden layers which may add more errors to the result.

Claim 28 is a system claim having similar limitation to the method claim 13. Therefore, it is rejected under the same rationale as claim 13. 

Regarding claim 14, Ortiz teaches the method of claim 13.
Ortiz does not specifically teach wherein the skip connection concatenates output of the previous and current recurrent layers.
Kaplanyan wherein the skip connection concatenates output of the previous and current recurrent layers ([Kaplanyan, 0113; Figure 9] “FIG. 9 illustrates an exemplary internal structure 900 of a recurrent RCNN connection, according to one embodiment. As shown, a first plurality of convolutions 902A-C receives a first input 904, and a second plurality of convolutions 902D-F receives a second input 910. A feedback loop 906 provides a hidden, recurrent state 908 from the first plurality of convolutions 902A-C as input to a second plurality of convolutions 902E-F. In this way, information may be retained between inputs of the recurrent RCNN.”).

Claim 29 is a system claim having similar limitation to the method claim 14. Therefore, it is rejected under the same rationale as claim 14. 

Claim 15-16 and 30-31 are rejected under 35 U.S.C. 103 as being unpatentable over Ortiz (Ortiz et al, 2018, “Learning State Representations for Query Optimization with Deep Reinforcement Learning”) in view of Kearney (US 20200364624 A1).

Regarding claim 15, Ortiz teaches the method of claim 1. 
Ortiz does not specifically teach wherein a gated skip connection is utilized across layers.
Kearney teaches wherein a gated skip connection is utilized across layers ([Kearney, 0153; Fig. 9] “In the illustrated embodiment, the machine learning model 910 is a CNN including seven multi-scale stages 912 followed by a fully connected layer 914 that outputs a CAL estimate 916, such as a CAL estimate 916 for each tooth identified in the labels 904b. Each multi-scale stage 912 may contain three 3×3 convolutional layers, paired with batch normalization and leaky rectified linear units (LeakyReLU). The first and last convolutional layers of each stage 912 may be concatenated via dense connections which help reduce redundancy within the network by propagating shallow information to deeper parts of the network. Each multi-scale stage 912 may be downscaled by a factor of two at the end of each multi-scale stage by convolutional downsampling with stride 2. The third and fifth multi-scale stages 912 may be passed through attention gates 918a, 918b before being concatenated with the last multi-scale stage 912. The attention gate 918a applied to the third stage 912 may be gated by a gating signal derived from the fifth stage 912. The attention gate 918b applied to the fifth stage 912 may be gated by a gating signal derived from the seventh stage 912. Not all regions of the image are relevant for estimating CAL, so attention gates 918a, 918b may be used to selectively propagate semantically meaningful information to deeper parts of the network. Adam optimization may be used during training which automatically estimates the lower order moments and helps estimate the step size which desensitizes the training routine to the initial learning rate.” As shown in the Figure 9, the gated connection connects different layers.).
Before the effective filing date of the invention to a person of ordinary skill in the art, it would have been obvious, having the teachings of Ortiz, Kaplanyan, and Kearney to use the method of using gated skip connection of Kearney to implement the prediction method of Ortiz and Kaplanyan. The suggestion and/or motivation to do so is to improve the accuracy of the prediction method, as using gated skip connection enable the network to select which output from layers to pass.

Claim 30 is a system claim having similar limitation to the method claim 15. Therefore, it is rejected under the same rationale as claim 15. 

Regarding claim 16, Ortiz in view of Kearney teaches wherein the gated skip connection is a multiplicative gate added to control a flow of information across layers ([Kearney, 0153; Fig. 9] “In the illustrated embodiment, the machine learning model 910 is a CNN including seven multi-scale stages 912 followed by a fully connected layer 914 that outputs a CAL estimate 916, such as a CAL estimate 916 for each tooth identified in the labels 904b. Each multi-scale stage 912 may contain three 3×3 convolutional layers, paired with batch normalization and leaky rectified linear units (LeakyReLU). The first and last convolutional layers of each stage 912 may be concatenated via dense connections which help reduce redundancy within the network by propagating shallow information to deeper parts of the network. Each multi-scale stage 912 may be downscaled by a factor of two at the end of each multi-scale stage by convolutional downsampling with stride 2. The third and fifth multi-scale stages 912 may be passed through attention gates 918a, 918b before being concatenated with the last multi-scale stage 912. The attention gate 918a applied to the third stage 912 may be gated by a gating signal derived from the fifth stage 912. The attention gate 918b applied to the fifth stage 912 may be gated by a gating signal derived from the seventh stage 912. Not all regions of the image are relevant for estimating CAL, so attention gates 918a, 918b may be used to selectively propagate semantically meaningful information to deeper parts of the network. Adam optimization may be used during training which automatically estimates the lower order moments and helps estimate the step size which desensitizes the training routine to the initial learning rate.” Selective propagation process corresponds to the multiplication process, as passing only the selected element can be interpreted as multiplying zero to the non-selected multiplicand.).

Claim 31 is a system claim having similar limitation to the method claim 16. Therefore, it is rejected under the same rationale as claim 16. 

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Regarding attention mechanism and skip connections.
US 20190294970 A1
US 20170293836 A1
US 20180067558 A1
US 20180144245 A1
US 10210860 B1
US 20190294970 A1
US 20200226751 A1
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JUN KWON whose telephone number is (571)272-2072. The examiner can normally be reached 1.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Abdullah Kawsar can be reached on (571)270-3169. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/JUN KWON/
Patent Examiner, Art Unit 2127

/ABDULLAH AL KAWSAR/Supervisory Patent Examiner, Art Unit 2127