DETAILED ACTION
This action is in response to claims filed on 06/01/2021 for application 15895182. 
Claims 1 and 3-7 have been amended, and Claim 2 has been cancelled. Currently claims 1 and 3-7 remain pending. 
Priority
Acknowledgment is made of applicant’s claim for foreign priority under 35 U.S.C. 119
(a)-(d). The certified copy has been filed in parent Application No. JP2017-026630, filed on
02/16/2017.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The text of those sections of Title 35, U.S. Code not included in this action can be found in a prior Office action.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.

4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1 and 3-7 are rejected under 35 U.S.C. 103 as being unpatentable over Marx et al. US 2014/0116776 Al (“Hereinafter referred to as Marx”) in view of Mendes et al. (US 2021/0165963 Al)(“Hereinafter referred to as Mendes”) and in view of Xu et al. "Show, attend and tell: Neural image caption generation with visual attention." International conference on machine learning. PMLR, 2015(“Hereinafter referred to as Xu”). 
Regrinding claim 1, Marx teaches a text preparation apparatus comprising: 
a storage device; and a processor configured to operate in accordance with a program stored in the storage device, wherein the processor is configured to(Marx, para. 0293, “Embodiments of the invention, or aspects thereof, may be provided in a computer program comprising computer readable instructions for execution on a computer. The computer program is storable on any suitable computer storage medium so as to comprise a computer program product. Such a computer program may provide a graphical user interface which may allow the drilling operator to view and override adjusted drilling parameters as desired.”): 
retrieve a plurality of variables gathered from a plurality of sensors attached to a drill performing a drilling process in an oil well(Marx, para. 0096, “According to an embodiment as shown in FIG. l a, which depicts the system architecture of a drilling system in the standalone arrangement, there is a drilling system 10 comprising a well being drilled 320. Sensors 101, such as through an electronic drilling recorder, such as the Pason™ EDR or the NOV MID Totco™, or other logging tools, acquire data and the data is transmitted over a rig site network 103 to the DFA server software 106. According to an embodiment, historical database ), 
the plurality of variables comprising more than one of a gamma ray variable, a rate of penetration variable, a methane variable, an ethane variable, or a resistance variable (Marx, para. 0104, “As discussed above, according to an embodiment,
the database 202 may include pre-processed and mapped real-time drilling data 203 and logged data 204. The real-time drilling data 203 may include one or more data inputs relating to depth 2031, Rate of Penetration (ROP) 2032, Weight On Bit (WOB) 2033, rotary speed 2034, flow rate 2035, and torque 2036, as shown in Table 1, below, according to an embodiment. The logged data 204 may include data inputs from a bit record 2041, mud log 2042, lithology core database
2043, sonic log 2044, gamma ray log 2045, neutron density log 2046, resistivity log 2047, and micro log 2048.” & see also Marx, para. 0101, “FIG. 3 shows a flow chart of a prediction engine 20 according to an embodiment of the invention. The data inputs 201 to the prediction engine 20 may be the real-time data received from the sensors 101 (e.g. MWD or logged data) as shown in FIG. 1.” ); 
perform encoding processing to generate feature vectors fromMarx, para. 0164, According to an embodiment, the overall processing system for real-time lithology prediction 2057 may be as shown in FIG. 9. As shown, the inputs to the system may include downhole mechanic measurements 801 (e.g., ROP 2032, WOB 2033, torque 2036, rotary speed 2034), mud log 2042 (e.g., mud density), and bit record 2041 (e.g., bit size, PDC cutter size). The downhole mechanics measurements 801 may undergo pre-processing 810 before being fed into the feature extraction unit 820…The pre-processed ); 
and perform decoding processing generate a geology report indicating rock characteristics of rocks subject to the drilling process in the oil well that are consistent with the feature vectors encoded from the plurality of variables(Marx, para(s). 0164-0165, “ These four features may be used as inputs to a multilayer neural network 70 trained as discussed above. The output of the multilayer neural network is the predicted lithology type 840. Sample prediction results are now provided using the embodiment discussed above. A three layer 12-20-12 (tansig-logsig-tansig) feed forward neural network was used with the four features for training and testing in Well # 1. The predicted lithology 1002 using the same network is shown in FIG. 10. In the sample prediction results, the lithology prediction 1002 had a rate of successful prediction of 82.33% when compared to the true lithology 1001.” Note: It is being interpreted that neural network is being mapped to  the decoding processing, and the predicted lithology from the neural network is being mapped to the geology report indicating rock characteristics of rocks subject to the drilling process in the oil well ), 
wherein the feature vectors include a first feature vector representing features extracted from the entirety of thMarx, para. 0164, According to an embodiment, the overall processing system for real-time lithology prediction 2057 may be as shown in FIG. 9. As shown, the inputs to the system may include downhole mechanic measurements 801 (e.g., ROP 2032, WOB 2033, torque 2036, rotary speed );to be used in the geology report(Marx, para(s). 0164-0165, “ These four features may be used as inputs to a multilayer neural network 70 trained as discussed above. The output of the multilayer neural network is the predicted lithology type 840. Sample prediction results are now provided using the embodiment discussed above. A three layer 12-20-12 (tansig-logsig-tansig) feed forward neural network was used with the four features for training and testing in Well # 1. The predicted lithology 1002 using the same network is shown in FIG. 10. In the sample prediction results, the lithology prediction 1002 had a rate of successful prediction of 82.33% when compared to the true lithology 1001.” Note: It is being interpreted that the lithology prediction 1002 of fig. 10 represents the geology report); and generate the geology report(Marx, para(s). 0164-0165, “ These four features may be used as inputs to a multilayer neural network 70 trained as discussed above. The output of the multilayer neural network is the predicted lithology type 840. Sample prediction results are now provided using the embodiment discussed above. A three layer 12-20-12 (tansig-logsig-tansig) feed forward neural network was used with the four features for training and testing in Well # 1. The predicted lithology 1002 using the same network is shown in FIG. 10. In the sample prediction results, the lithology prediction 1002 had a rate of successful prediction of 82.33% when compared to the true lithology 1001.” Note: It is being interpreted that the lithology prediction 1002 of fig. 10 represents the geology report).
 
However, Mendes teaches: and wherein, in the decoding processing, the processor is configured to: perform first-layer recurrent neural network processing for rock properties and second-layer recurrent -2- 4830-4744-2399.2Atty. Dkt. No. 093659-0219neural network processing for words appropriate for each of thMendes, para. 0068, “FIG. 6C illustrates a long short-term memory network 640. This example can begin with an embedding layer 642 on input sentence including word vectors 602B-M. A long short-term memory layer 644 with 100 neurons can be appended, followed by a dropout layer 646, which can be a 0.5 dropout.” Note: It is being interpreted that element 644 of fig. 6C represents the first-layer recurrent first-layer recurrent neural network processing for rock properties to be used in the geology report and the element of 646 of fig. 6C represents the second-layer recurrent -2- 4830-4744-2399.2Atty. Dkt. No. 093659-0219 neural network processing for words appropriate for each of th);determine a phrase appropriate for each of the rock properties based on a phrase-pattern relation table and outputs of the second-layer recurrent neural network processing(Mendes, para. 0074-0075, “FIG. 10A illustrates an example classification 1000 of sentences. The classification 1000 can be generated through a neural network [such as the long short-term memory network 640 of FIG. 6C] as previously described. In this example, the sentences can be classified by symptom 1002, action 1004, an event 1006, or a result 1008. Note: It is being interpreted that classifying sentences in drilling reports to symptoms, actions, events, or results by the long short-term memory network 640 of FIG.6C represents mapping the limitation to determine a phrase appropriate for each of the rock properties based on a phrase-pattern relation table and outputs of the second-layer recurrent neural network processing); including the phrase appropriate for each of the rock properties determined based on the phrase-pattern relation table and the outputs of the second-layer recurrent neural network processing (Mendes, para. 0074-0075, “FIG. 10A illustrates an example classification 1000 of sentences. The classification 1000 can be generated through a neural network [such as the long short-term memory network 640 of FIG. 6C] as previously described. In this example, the sentences can be classified by symptom 1002, action 1004, an event 1006, or a result 1008. For example, sentences in drilling reports can be classified as pertaining to a symptom, an action, an event, or a symptom. Other classifications can also be performed based on the needs or context for the classification 1000. In this example, symptoms, actions, events, and results are provided as nonlimiting examples for the sake of explanation and clarity.” Note: It is being interpreted that classifying sentences in drilling reports to symptoms, actions, events, or results by the long short-term memory network 640 of FIG.6C represents mapping the limitation to including the phrase appropriate for each of the rock properties determined based on the phrase appropriate for each of the rock properties based on a phrase-pattern relation table and outputs of the second-layer recurrent neural network processing). 
 The motivation to do so would be to do automatic pattern matching of drill reports with drilling operations of an oil well (Mendes, para(s). 0027-28, “[T]he system can generate, based on the drilling reports, word vectors, where each word vector represents a respective word in the drilling reports. The system can also partition sentences in the drilling reports into respective words and, for each sentence, identify respective word vectors corresponding to the respective words associated with the sentence. Based on the respective word vectors, the system can classify the sentences into respective events, respective symptoms, respective actions, respective results, and so forth. The system can classify the sentences using a neural network into concepts, categories, etc.”). 
Marx also fails to teach: generate a first vector set from a state vector of a previous step in the first-layer recurrent neural network processing and the feature vector sets, each vector of the first vector set being generated based on similarity degrees between individual vectors in one of the feature vector sets and the state vector; generate a second vector based on similarity degrees between individual vectors in the first vector set and the state vector; input the second vector to a given step in the first-layer recurrent neural network processing.
generate a first vector set from a state vector of a previous step in the first-layer recurrent neural network processing and the feature vector sets, each vector of the first vector set being generated based on similarity degrees between individual vectors in one of the feature vector sets and the state vector (Xu, pg. 3, right-column, sec. 3.1.2. Decoder: Long Short-Term Memory Network; see also fig.1 detailing the word/image alignment model, “The weight                         
                            
                                
                                    α
                                
                                
                                    i
                                
                            
                        
                     of each annotation vector                         
                            
                                
                                    a
                                
                                
                                    i
                                
                            
                        
                     is computed by an attention model                         
                            
                                
                                    f
                                
                                
                                    a
                                    t
                                    t
                                
                            
                        
                     for which we use a multilayer perceptron conditioned on the previous hidden state                         
                            
                                
                                    h
                                
                                
                                    t
                                    -
                                    1
                                
                            
                        
                    …                        
                            
                                
                                    e
                                
                                
                                    t
                                    i
                                
                            
                            =
                            
                                
                                    f
                                
                                
                                    a
                                    t
                                    t
                                
                            
                            
                                
                                    
                                        
                                            a
                                        
                                        
                                            i
                                        
                                    
                                    ,
                                     
                                    
                                        
                                            h
                                        
                                        
                                            t
                                            -
                                            1
                                        
                                    
                                
                            
                        
                    ….” Note: It is being interpreted that                         
                            
                                
                                    f
                                
                                
                                    a
                                    t
                                    t
                                
                            
                        
                    represents generate a first vector set from a state vector of a previous step (i.e.,                        
                             
                            
                                
                                    h
                                
                                
                                    t
                                    -
                                    1
                                
                            
                        
                    ) in the first-layer recurrent neural network processing and the feature vector sets (i.e.                         
                            
                                
                                    a
                                
                                
                                    i
                                
                            
                        
                    ) and                         
                            
                                
                                    e
                                
                                
                                    t
                                    i
                                
                            
                        
                     represents similarity degrees between individual vectors in one of the feature vector sets and the state vector), generate a second vector based on similarity degrees between individual vectors in the first vector set and the state vector(Xu, pg. 3, right-column,  sec. 3.1.2. Decoder: Long Short-Term Memory Network, “To emphasize, we note that the hidden state varies as the output RNN advances in its output sequence: “where” the network looks next depends on the sequence of words that has already been generated.                        
                             
                            
                                
                                    e
                                
                                
                                    t
                                    i
                                
                            
                            =
                            
                                
                                    f
                                
                                
                                    a
                                    t
                                    t
                                
                            
                            
                                
                                    
                                        
                                            a
                                        
                                        
                                            i
                                        
                                    
                                    ,
                                     
                                    
                                        
                                            h
                                        
                                        
                                            t
                                            -
                                            1
                                        
                                    
                                
                            
                            ,
                             
                             
                            
                                
                                    α
                                
                                
                                    t
                                    i
                                
                            
                            =
                            
                                
                                    
                                        
                                            exp
                                        
                                        ⁡
                                        
                                            
                                                
                                                    
                                                        
                                                            e
                                                        
                                                        
                                                            t
                                                            i
                                                        
                                                    
                                                
                                            
                                        
                                    
                                
                                
                                    
                                        
                                            ∑
                                            
                                                k
                                                =
                                                1
                                            
                                            
                                                L
                                            
                                        
                                        
                                            e
                                            x
                                            p
                                            ⁡
                                            (
                                            
                                                
                                                    e
                                                
                                                
                                                    t
                                                    k
                                                
                                            
                                            )
                                        
                                    
                                
                            
                        
                     .  Once the weights (which sum to one) are computed, the context vector                         
                            
                                
                                    
                                        
                                            z
                                        
                                        ^
                                    
                                
                                
                                    t
                                
                            
                        
                     is computed by                         
                            
                                
                                    
                                        
                                            z
                                        
                                        ^
                                    
                                
                                
                                    t
                                
                            
                            =
                            ϕ
                            
                                
                                    
                                        
                                            
                                                
                                                    a
                                                
                                                
                                                    i
                                                
                                            
                                        
                                    
                                    ,
                                    
                                        
                                            
                                                
                                                    α
                                                
                                                
                                                    i
                                                
                                            
                                        
                                    
                                
                            
                            ,
                             
                        
                    where                         
                            ϕ
                        
                     is a function that returns a single vector….” Note: It is being interpreted that the context vector                         
                            
                                
                                    
                                        
                                            z
                                        
                                        ^
                                    
                                
                                
                                    t
                                
                            
                        
                     represents generate the second vector and                         
                            
                                
                                    α
                                
                                
                                    i
                                
                            
                        
                     represents similarity degrees between individual vectors in the first vector set and the state vector);input the second vector to a given step in the first-layer recurrent neural network processing(Xu, pg. 3, left-column,  sec. 3.1.2. Decoder: Long Short-Term Memory Network, “Our implementation of LSTMs, shown in Fig. 2...[where] it = σ(WiEyt−1 + Uiht−1 + Zi                        
                            
                                
                                    
                                        
                                            z
                                        
                                        ^
                                    
                                
                                
                                    t
                                
                            
                        
                    + bi),‌    ft = σ(Wf Eyt−1 + Uf ht−1 + Zf                         
                            
                                
                                    
                                        
                                            z
                                        
                                        ^
                                    
                                
                                
                                    t
                                
                            
                        
                     + bf ),    ct = ftct−1 + it tanh(WcEyt−1 + Ucht−1 + Zc                        
                            
                                
                                    
                                        
                                            z
                                        
                                        ^
                                    
                                
                                
                                    t
                                
                            
                        
                    + bc), ot = σ(WoEyt−1 + Uoht−1 + Zo                        
                            
                                
                                    
                                        
                                            z
                                        
                                        ^
                                    
                                
                                
                                    t
                                
                            
                        
                    + bo), ht = ot tanh(ct).” Note: It is being interpreted that the vector                         
                            
                                
                                    
                                        
                                            z
                                        
                                        ^
                                    
                                
                                
                                    t
                                
                            
                        
                     being a variable of it, ft ,ct, ot and indirectly of ht  of the LSTM represents the limitation of input the second vector to a given step in the first-layer recurrent neural network processing).
 Accordingly, one of ordinary skill in the art would modify Marx’s apparatus in view of Xu to teach: generate a first vector set from a state vector of a previous step in the first-layer recurrent neural network processing and the feature vector sets, each vector of the first vector set being generated based on similarity degrees between individual vectors in one of the feature vector sets and the state vector; generate a second vector based on similarity degrees between individual vectors in the first vector set and the state vector; input the second vector to a given step in the first-layer recurrent neural network processing. The motivation to do so would be to automatically generate text for an image using backpropagation learning (Xu, pg. 1, Abstract, “Inspired by recent work in machine translation and object detection, we introduce an attention based model that automatically learns to describe the content of images. We describe how we can train this model in a deterministic manner using standard backpropagation techniques….”). 
Regarding dependent claim 3, Marx in view of Mendes and in view of Xu teaches the text preparation apparatus according to claim 1, wherein the processor(Xu, pg. 6, left-column, “On our largest dataset…our soft attention model took less than 3 days to train on an NVIDIA Titan Black GPU.”) is configured to learn parameters in the encoding processing and the decoding processing from a plurality of training data pairs (Xu, pgs. 5-6, sec. 4.3 Training Procedure, “To create the annotations ai used by our decoder, we used the Oxford VGGnet pretrained on ImageNet without finetuning. In our experiments we use the 14                        
                            ×
                        
                    14                        
                            ×
                        
                    512 feature map of the fourth convolutional layer before max pooling. This means our decoder operates on the flattened                         
                            ×
                        
                    512 (i.e. L                        
                            ×
                        
                    D) encoding.”), and wherein each pair of the plurality of training data pairs comprises measured data including a plurality of second variables of a certain depth range of a second oil well and a second geology report of the certain depth range for the second oil well(Marx, para. 0099, “According to an embodiment as shown in FIG. 2a, the expert decision engine 328 may receive the adjusted real-time drilling parameters 329 and as well as analogous well data from a historical database 309… the drilling fusion engine 301 comprises monitoring a well for signs of impending influx in the formation being drilled in real time, and providing predictive capability for well control by combining real-time drilling data and trends in analogous well data into an expert knowledge system. The expert decision engine 328 may offer decision support…”).
It would have been obvious to one of ordinary skill in the art before the effective filing date
of the claimed invention to modify the teachings of Marx with the above teachings of Xu for the same rationale stated at Claim 1.
Regarding dependent claim 4,Marx in view of Mendes and in view of Xu teaches the text preparation apparatus according to claim 1, wherein the processor(Xu, pg. 6, left-column, “On our largest dataset…our soft attention model took less than 3 days to train on an NVIDIA Titan Black GPU.”) is configured to learn parameters in the encoding processing and the decoding processing from a plurality of training data pairs(Xu, pgs. 5-6, sec. 4.3 Training Procedure, “To create the annotations ai used by our decoder, we used the Oxford VGGnet pretrained on ImageNet without finetuning. In our experiments we use the 14                        
                            ×
                        
                    14                        
                            ×
                        
                    512 feature map of the fourth convolutional layer before max pooling. This means our decoder operates on the flattened 196                        
                            ×
                        
                    512 (i.e. L                        
                            ×
                        
                    D) encoding.”), wherein each pair of the plurality of training data pairs comprises measured data of a plurality of second variables corresponding to a plurality of second oil wells and a second geology report of a second oil well(Marx, para. 0099, “According to an embodiment as shown in FIG. 2a, the expert decision engine 328 may receive…analogous well data from a historical database 309…the drilling fusion engine 301 comprises monitoring a well for signs of impending influx in the formation being drilled in real time, and providing predictive capability for well control by combining real-time drilling data and trends in analogous well data into an expert knowledge system. The expert decision engine 328 may offer decision support…”), and -3- 4830-4744-2399.2Atty. Dkt. No. 093659-0219wherein the processor(Xu, pg. 6, left-column, “On our largest dataset…our soft attention model took less than 3 days to train on an NVIDIA Titan Black GPU.”) is configured to determine relations between the  rock properties and the plurality of second variables of the second oil well(Marx, para. 0099, “According to an embodiment as shown in FIG. 2a, the expert decision engine 328 may receive…analogous well data from a historical database 309….” & see also Marx, para. 0264, “According to an embodiment, the expert decision engine 328 may include six decision trees: mud volume, porosity, permeability, stuck pipe, bit wear and environmental. Further expert decision trees may be added according to the format described below in the discretion of the operator. The expert decision recommendations may appear as prompts to the user during operation of the client or standalone software.”) based on similarity degrees between individual vectors in the first vector set and the state vector in the learning from the plurality of training data pairs(Xu, pg. 3, right-column, sec. 3.1.2. Decoder: Long Short-Term Memory Network; see also fig.1 detailing the word/image alignment model, “The weight                         
                            
                                
                                    α
                                
                                
                                    i
                                
                            
                        
                     of each annotation vector                         
                            
                                
                                    a
                                
                                
                                    i
                                
                            
                        
                     is computed by an attention model                         
                            
                                
                                    f
                                
                                
                                    a
                                    t
                                    t
                                
                            
                        
                     for which we use a multilayer perceptron conditioned on the previous hidden state                         
                            
                                
                                    h
                                
                                
                                    t
                                    -
                                    1
                                
                            
                        
                    …                        
                            
                                
                                    e
                                
                                
                                    t
                                    i
                                
                            
                            =
                            
                                
                                    f
                                
                                
                                    a
                                    t
                                    t
                                
                            
                            
                                
                                    
                                        
                                            a
                                        
                                        
                                            i
                                        
                                    
                                    ,
                                     
                                    
                                        
                                            h
                                        
                                        
                                            t
                                            -
                                            1
                                        
                                    
                                
                            
                        
                    ….”Note: It is being interpreted that                         
                            
                                
                                    e
                                
                                
                                    t
                                    i
                                
                            
                        
                     represents similarity degrees,                        
                             
                            
                                
                                    a
                                
                                
                                    i
                                
                            
                        
                     represents individual vectors in the first vector set, and                         
                            
                                
                                    h
                                
                                
                                    t
                                    -
                                    1
                                
                            
                        
                     represents the state vector in the learning from the plurality of training data pairs ).
It would have been obvious to one of ordinary skill in the art before the effective filing date
of the claimed invention to modify the teachings of Marx with the above teachings of Xu for the same rationale stated at Claim 1.
Regarding dependent claim 5, Marx in view of Mendes and in view of Xu the text preparation apparatus according to claim 1, wherein the processor(Xu, pg. 6, left-column, “On our largest dataset…our soft attention model took less than 3 days to train on an NVIDIA Titan Black GPU.”) is configured to learn parameters in the encoding processing and the decoding processing from a plurality of training data pairs(Xu, pgs. 5-6, sec. 4.3 Training Procedure, “To create the annotations ai used by our decoder, we used the Oxford VGGnet pretrained on ImageNet without finetuning. In our experiments we use the 14                        
                            ×
                        
                    14                        
                            ×
                        
                    512 feature map of the fourth convolutional layer before max pooling. This means our decoder operates on the flattened 196                        
                            ×
                        
                    512 (i.e. L                        
                            ×
                        
                    D) encoding.”), wherein each pair of the plurality of training data pairs measured data on a plurality of second variables and a second geology report of a second oil well (Marx, para. 0099, “According to an embodiment as shown in FIG. 2a, the expert decision engine 328 may receive…analogous well data from a historical database 309…the drilling fusion engine 301 comprises monitoring a well for signs of impending influx in the formation being drilled in real time, and providing predictive capability for well control by combining real-time drilling data and trends in analogous well data into an expert knowledge system. The expert decision engine 328 may offer decision support…”), and wherein the processor(Xu, pg. 6, left-column, “On our largest dataset…our soft attention model took less than 3 days to train on an NVIDIA Titan Black GPU.”) is configured to determine a feature pattern relevant to a phrase consistent with the state vector in the measured data on a variable represented by a feature vector set (Xu, pg. 4, left-column, “In this work, we use a deep output layer  to compute the output word probability. Its input are cues from the image [(i.e.                          
                            
                                
                                    
                                        
                                            z
                                        
                                        ^
                                    
                                
                                
                                    t
                                
                            
                        
                    )], the previously generated word[(i.e.                         
                            
                                
                                    y
                                
                                
                                    t
                                    -
                                    1
                                
                            
                        
                    )], and the decoder state (ht). p(yt|a,                         
                            
                                
                                    y
                                
                                
                                    1
                                
                                
                                    t
                                    -
                                    1
                                
                            
                        
                    ) ∝ exp(Lo(Eyt−1 + Lhht + Lz                        
                            
                                
                                    
                                        
                                            z
                                        
                                        ^
                                    
                                
                                
                                    t
                                
                            
                        
                    )).”), based on similarity degrees between individual vectors in the feature vector set and the state vector in the learning from the plurality of training data pairs (Xu, pg. 3, right-column, sec. 3.1.2. Decoder: Long Short-Term Memory Network; see also fig.1 detailing the word/image alignment model, “The weight                         
                            
                                
                                    α
                                
                                
                                    i
                                
                            
                        
                     of each annotation vector                         
                            
                                
                                    a
                                
                                
                                    i
                                
                            
                        
                     is computed by an attention model                         
                            
                                
                                    f
                                
                                
                                    a
                                    t
                                    t
                                
                            
                        
                     for which we use a multilayer perceptron conditioned on the previous hidden state                         
                            
                                
                                    h
                                
                                
                                    t
                                    -
                                    1
                                
                            
                        
                    …                        
                            
                                
                                    e
                                
                                
                                    t
                                    i
                                
                            
                            =
                            
                                
                                    f
                                
                                
                                    a
                                    t
                                    t
                                
                            
                            
                                
                                    
                                        
                                            a
                                        
                                        
                                            i
                                        
                                    
                                    ,
                                     
                                    
                                        
                                            h
                                        
                                        
                                            t
                                            -
                                            1
                                        
                                    
                                
                            
                        
                    ….”Note: It is being interpreted that                         
                            
                                
                                    e
                                
                                
                                    t
                                    i
                                
                            
                        
                     represents similarity degrees,                        
                             
                            
                                
                                    a
                                
                                
                                    i
                                
                            
                        
                     represents individual vectors in the first vector set, and                         
                            
                                
                                    h
                                
                                
                                    t
                                    -
                                    1
                                
                            
                        
                     represents the state vector in the learning from the plurality of training data pairs).
It would have been obvious to one of ordinary skill in the art before the effective filing date
of the claimed invention to modify the teachings of Marx with the above teachings of Xu for the same rationale stated at Claim 1.
Referring to independent claim 6, it is rejected on the same basis as independent claim 1 since they are analogous claims. 
Regrinding claim 7, Marx teaches a text preparation apparatus comprising: an encoder configured to: retrieve a plurality of variables gathered from a plurality of sensors attached to a drill performing a drilling process in an oil well(Marx, para. 0096, “According to an embodiment as shown in FIG. l a, which depicts the system architecture of a drilling system in the standalone arrangement, there is a drilling system 10 comprising a well being drilled 320. Sensors 101, such as through an electronic drilling recorder, such as the Pason™ EDR or the NOV MID ,
the plurality of variables comprising more than one of a gamma ray variable, a rate of penetration variable, a methane variable, an ethane variable, or a resistance variable(Marx, para. 0104, “As discussed above, according to an embodiment,
the database 202 may include pre-processed and mapped real-time drilling data 203 and logged data 204. The real-time drilling data 203 may include one or more data inputs relating to depth 2031, Rate of Penetration (ROP) 2032, Weight On Bit (WOB) 2033, rotary speed 2034, flow rate 2035, and torque 2036, as shown in Table 1, below, according to an embodiment. The logged data 204 may include data inputs from a bit record 2041, mud log 2042, lithology core database
2043, sonic log 2044, gamma ray log 2045, neutron density log 2046, resistivity log 2047, and micro log 2048.” & see also Marx, para. 0101, “FIG. 3 shows a flow chart of a prediction engine 20 according to an embodiment of the invention. The data inputs 201 to the prediction engine 20 may be the real-time data received from the sensors 101 (e.g. MWD or logged data) as shown in FIG. 1.”); 
and -5- 4830-4744-2399.2Atty. Dkt. No. 093659-0219generate feature vectors from the plurality of variables gathered from the plurality of sensors attached to the drill performing the drilling process in the oil well Marx, para. 0164, According to an embodiment, the overall processing system for real-time lithology prediction 2057 may be as shown in FIG. 9. As shown, the inputs to the system may include downhole mechanic measurements 801 (e.g., ROP 2032, WOB 2033, torque 2036, rotary speed 2034), mud log 2042 (e.g., mud density), and bit record 2041 (e.g., bit size, PDC cutter size). The downhole mechanics measurements 801 may undergo pre-processing 810 before being fed into ; 
and a decoder configured generate a geology report indicating rock characteristics of rocks subject to the drilling process in the oil well that are consistent with the feature vectors encoded from the plurality of variables (Marx, para(s). 0164-0165, “These four features may be used as inputs to a multilayer neural network 70 trained as discussed above. The output of the multilayer neural network is the predicted lithology type 840. Sample prediction results are now provided using the embodiment discussed above. A three layer 12-20-12 (tansig-logsig-tansig) feed forward neural network was used with the four features for training and testing in Well # 1. The predicted lithology 1002 using the same network is shown in FIG. 10. In the sample prediction results, the lithology prediction 1002 had a rate of successful prediction of 82.33% when compared to the true lithology 1001.” Note: It is being interpreted that neural network  is being mapped to  the decoding processing, and the predicted lithology from the neural network is being mapped to the geology report indicating rock characteristics of rocks subject to the drilling process in the oil well), 
wherein the feature vectors include a first feature vector representing features extracted from the entirety of the plurality of variables and feature vector sets on individual variables of the plurality of variables, wherein each feature vector in a feature vector set represents a feature of a part of a corresponding variable of the plurality of variables(Marx, para. 0164, According to an embodiment, the overall processing system for real-time lithology prediction 2057 may be as shown in FIG. 9. As shown, the inputs to the system may include downhole mechanic measurements 801 (e.g., ROP 2032, WOB 2033, torque 2036, rotary speed );to be used in the geology report(Marx, para(s). 0164-0165, “ These four features may be used as inputs to a multilayer neural network 70 trained as discussed above. The output of the multilayer neural network is the predicted lithology type 840. Sample prediction results are now provided using the embodiment discussed above. A three layer 12-20-12 (tansig-logsig-tansig) feed forward neural network was used with the four features for training and testing in Well # 1. The predicted lithology 1002 using the same network is shown in FIG. 10. In the sample prediction results, the lithology prediction 1002 had a rate of successful prediction of 82.33% when compared to the true lithology 1001.” Note: It is being interpreted that the lithology prediction 1002 of fig. 10 represents the geology report)); and generate the geology report(Marx, para(s). 0164-0165, “ These four features may be used as inputs to a multilayer neural network 70 trained as discussed above. The output of the multilayer neural network is the predicted lithology type 840. Sample prediction results are now provided using the embodiment discussed above. A three layer 12-20-12 (tansig-logsig-tansig) feed forward neural network was used with the four features for training and testing in Well # 1. The predicted lithology 1002 using the same network is shown in FIG. 10. In the sample prediction results, the lithology prediction 1002 had a rate of successful prediction of 82.33% when compared to the true lithology 1001.”Note: It is being interpreted that the lithology prediction 1002 of fig. 10 represents the geology report).
 
However, Mendes teaches: wherein the decoder includes a  first-layer recurrent neural network for rock properties and second-layer recurrent -2- 4830-4744-2399.2Atty. Dkt. No. 093659-0219neural network processing for words appropriate for each of thMendes, para. 0068, “FIG. 6C illustrates a long short-term memory network 640. This example can begin with an embedding layer 642 on input sentence including word vectors 602B-M. A long short-term memory layer 644 with 100 neurons can be appended, followed by a dropout layer 646, which can be a 0.5 dropout.” Note: It is being interpreted that element 644 of fig. 6C represents the first-layer recurrent first-layer recurrent neural network processing for rock properties to be used in the geology report and the element of 646 of fig. 6C represents the second-layer recurrent -2- 4830-4744-2399.2Atty. Dkt. No. 093659-0219neural network processing for words appropriate for each of th); and wherein the decoder is configured to: determine a phrase appropriate for each of the rock properties based on a phrase-pattern relation table and outputs of the second-layer recurrent neural network (Mendes, para. 0074-0075, “FIG. 10A illustrates an example classification 1000 of sentences. The classification 1000 can be generated through a neural network [such as the long short-term memory network 640 of FIG. 6C] as previously described. In this example, the sentences can be classified by symptom 1002, action 1004, an event 1006, or a result 1008. For Note: It is being interpreted that classifying sentences in drilling reports to symptoms, actions, events, or results by the long short-term memory network 640 of FIG.6C represents mapping the limitation to determine a phrase appropriate for each of the rock properties based on a phrase-pattern relation table and outputs of the second-layer recurrent neural network processing); including the phrase appropriate for each of the rock properties determined based on the phrase-pattern relation table and the outputs of the second-layer recurrent neural network processing (Mendes, para. 0074-0075, “FIG. 10A illustrates an example classification 1000 of sentences. The classification 1000 can be generated through a neural network [such as the long short-term memory network 640 of FIG. 6C] as previously described. In this example, the sentences can be classified by symptom 1002, action 1004, an event 1006, or a result 1008. For example, sentences in drilling reports can be classified as pertaining to a symptom, an action, an event, or a symptom. Other classifications can also be performed based on the needs or context for the classification 1000. In this example, symptoms, actions, events, and results are provided as nonlimiting examples for the sake of explanation and clarity.” Note: It is being interpreted that classifying sentences in drilling reports to symptoms, actions, events, or results by the long short-term memory network 640 of FIG.6C represents mapping the limitation to including the phrase appropriate for each of the rock properties determined based on the phrase appropriate for each of the rock properties based on a phrase-pattern relation table and outputs of the second-layer recurrent neural network processing). 
 The motivation to do so would be to do automatic pattern matching of drill reports with drilling operations of an oil well (Mendes, para(s). 0027-28, “[T]he system can generate, based on the drilling reports, word vectors, where each word vector represents a respective word in the drilling reports. The system can also partition sentences in the drilling reports into respective words and, for each sentence, identify respective word vectors corresponding to the respective words associated with the sentence. Based on the respective word vectors, the system can classify the sentences into respective events, respective symptoms, respective actions, respective results, and so forth. The system can classify the sentences using a neural network into concepts, categories, etc.”). 
Marx also fails to teach: generate a first vector set from a state vector of a previous step in the first-layer recurrent neural network processing and the feature vector sets, each vector of the first vector set being generated based on similarity degrees between individual vectors in one of the feature vector sets and the state vector; generate a second vector based on similarity degrees between individual vectors in the first vector set and the state vector; input the second vector to a given step in the first-layer recurrent neural network processing.
generate a first vector set from a state vector of a previous step in the first-layer recurrent neural network processing and the feature vector sets, each vector of the first vector set being generated based on similarity degrees between individual vectors in one of the feature vector sets and the state vector (Xu, pg. 3, right-column, sec. 3.1.2. Decoder: Long Short-Term Memory Network; see also fig.1 detailing the word/image alignment model, “The weight                         
                            
                                
                                    α
                                
                                
                                    i
                                
                            
                        
                     of each annotation vector                         
                            
                                
                                    a
                                
                                
                                    i
                                
                            
                        
                     is computed by an attention model                         
                            
                                
                                    f
                                
                                
                                    a
                                    t
                                    t
                                
                            
                        
                     for which we use a multilayer perceptron conditioned on the previous hidden state                         
                            
                                
                                    h
                                
                                
                                    t
                                    -
                                    1
                                
                            
                        
                    …                        
                            
                                
                                    e
                                
                                
                                    t
                                    i
                                
                            
                            =
                            
                                
                                    f
                                
                                
                                    a
                                    t
                                    t
                                
                            
                            
                                
                                    
                                        
                                            a
                                        
                                        
                                            i
                                        
                                    
                                    ,
                                     
                                    
                                        
                                            h
                                        
                                        
                                            t
                                            -
                                            1
                                        
                                    
                                
                            
                        
                    ….” Note: It is being interpreted that                         
                            
                                
                                    f
                                
                                
                                    a
                                    t
                                    t
                                
                            
                        
                    represents generate a first vector set from a state vector of a previous step (i.e.,                        
                             
                            
                                
                                    h
                                
                                
                                    t
                                    -
                                    1
                                
                            
                        
                    ) in the first-layer recurrent neural network processing and the feature vector sets (i.e.                         
                            
                                
                                    a
                                
                                
                                    i
                                
                            
                        
                    ) and                         
                            
                                
                                    e
                                
                                
                                    t
                                    i
                                
                            
                        
                     represents similarity degrees between individual vectors in one of the feature vector sets and the state vector), generate a second vector based on similarity degrees between individual vectors in the first vector set and the state vector(Xu, pg. 3, right-column,  sec. 3.1.2. Decoder: Long Short-Term Memory Network, “To emphasize, we note that the hidden state varies as the output RNN advances in its output sequence: “where” the network looks next depends on the sequence of words that has already been generated.                        
                             
                            
                                
                                    e
                                
                                
                                    t
                                    i
                                
                            
                            =
                            
                                
                                    f
                                
                                
                                    a
                                    t
                                    t
                                
                            
                            
                                
                                    
                                        
                                            a
                                        
                                        
                                            i
                                        
                                    
                                    ,
                                     
                                    
                                        
                                            h
                                        
                                        
                                            t
                                            -
                                            1
                                        
                                    
                                
                            
                            ,
                             
                             
                            
                                
                                    α
                                
                                
                                    t
                                    i
                                
                            
                            =
                            
                                
                                    
                                        
                                            exp
                                        
                                        ⁡
                                        
                                            
                                                
                                                    
                                                        
                                                            e
                                                        
                                                        
                                                            t
                                                            i
                                                        
                                                    
                                                
                                            
                                        
                                    
                                
                                
                                    
                                        
                                            ∑
                                            
                                                k
                                                =
                                                1
                                            
                                            
                                                L
                                            
                                        
                                        
                                            e
                                            x
                                            p
                                            ⁡
                                            (
                                            
                                                
                                                    e
                                                
                                                
                                                    t
                                                    k
                                                
                                            
                                            )
                                        
                                    
                                
                            
                        
                     .  Once the weights (which sum to one) are computed, the context vector                         
                            
                                
                                    
                                        
                                            z
                                        
                                        ^
                                    
                                
                                
                                    t
                                
                            
                        
                     is computed by                         
                            
                                
                                    
                                        
                                            z
                                        
                                        ^
                                    
                                
                                
                                    t
                                
                            
                            =
                            ϕ
                            
                                
                                    
                                        
                                            
                                                
                                                    a
                                                
                                                
                                                    i
                                                
                                            
                                        
                                    
                                    ,
                                    
                                        
                                            
                                                
                                                    α
                                                
                                                
                                                    i
                                                
                                            
                                        
                                    
                                
                            
                            ,
                             
                        
                    where                         
                            ϕ
                        
                     is a function that returns a single vector….” Note: It is being interpreted that the context vector                         
                            
                                
                                    
                                        
                                            z
                                        
                                        ^
                                    
                                
                                
                                    t
                                
                            
                        
                     represents generate the second vector and                         
                            
                                
                                    α
                                
                                
                                    i
                                
                            
                        
                     represents similarity degrees between individual vectors in the first vector set and the state vector);input the second vector to a given step in the first-layer recurrent neural network processing(Xu, pg. 3, left-column,  sec. 3.1.2. Decoder: Long Short-Term Memory Network, “Our implementation of LSTMs, shown in Fig. 2...[where] it = σ(WiEyt−1 + Uiht−1 + Zi                        
                            
                                
                                    
                                        
                                            z
                                        
                                        ^
                                    
                                
                                
                                    t
                                
                            
                        
                    + bi),‌    ft = σ(Wf Eyt−1 + Uf ht−1 + Zf                         
                            
                                
                                    
                                        
                                            z
                                        
                                        ^
                                    
                                
                                
                                    t
                                
                            
                        
                     + bf ),    ct = ftct−1 + it tanh(WcEyt−1 + Ucht−1 + Zc                        
                            
                                
                                    
                                        
                                            z
                                        
                                        ^
                                    
                                
                                
                                    t
                                
                            
                        
                    + bc), ot = σ(WoEyt−1 + Uoht−1 + Zo                        
                            
                                
                                    
                                        
                                            z
                                        
                                        ^
                                    
                                
                                
                                    t
                                
                            
                        
                    + bo), ht = ot tanh(ct).” Note: It is being interpreted that the vector                         
                            
                                
                                    
                                        
                                            z
                                        
                                        ^
                                    
                                
                                
                                    t
                                
                            
                        
                     being a variable of it, ft ,ct, ot and indirectly of ht  of the LSTM represents the limitation of input the second vector to a given step in the first-layer recurrent neural network processing).
Accordingly, one of ordinary skill in the art would modify Marx’s apparatus in view of Xu to teach: generate a first vector set from a state vector of a previous step in the first-layer recurrent neural network processing and the feature vector sets, each vector of the first vector set being generated based on similarity degrees between individual vectors in one of the feature vector sets and the state vector; generate a second vector based on similarity degrees between individual vectors in the first vector set and the state vector; input the second vector to a given step in the first-layer recurrent neural network processing. The motivation to do so would be to automatically generate text for an image using backpropagation learning (Xu, pg. 1, Abstract, “Inspired by recent work in machine translation and object detection, we introduce an attention based model that automatically learns to describe the content of images. We describe how we can train this model in a deterministic manner using standard backpropagation techniques….”).  

Response to Arguments
Applicant’s arguments with respect to independent claim 1 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Adam Clark Standke whose telephone number is (571)270-1806.  The examiner can normally be reached on 9:30AM-6:30PM M-F. Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kakali Chaki can be reached on (571) 272-3719.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications 

Adam Clark Standke
Assistant Examiner
Art Unit 2122



/LUIS A SITIRICHE/Primary Examiner, Art Unit 2126