DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
1.	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Continued Examination
2.	A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on 15 November 2021 has been entered, where:
Claims 1, 7, 8, and 27have been amended.
Claims 6, 12-14, 16, 21, and 29 have been cancelled.
Claims 1-5, 7-11, 15, 17-20, 22-28, 30 and 31 are pending.
Claims 1-5, 7-11, 15, 17-20, 22-28, 30 and 31 are rejected.
Claim Rejections - 35 U.S.C. § 103
3.	The following is a quotation of 35 U.S.C. § 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
4.	The factual inquiries for determining obviousness under 35 U.S.C. § 103 are summarized as follows:
1.	Determining the scope and contents of the prior art.
2. 	Ascertaining the differences between the prior art and the claims at issue.
3. 	Resolving the level of ordinary skill in the pertinent art.
4. 	Considering objective evidence present in the application indicating obviousness or nonobviousness.
5.	This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject er of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
6.	Claims 1, 18-20, 23, 27, 28, and 31 are rejected under 35 U.S.C. § 103 as being unpatentable over US Published Application 20180373982 to Salakhutdinov et al. [hereinafter Salakhutdinov] in view of Seo et al., “Bi-Directional Attention Flow for Machine Comprehension,” 24 Feb 2017 [hereinafter Seo], and Choi et al., “Multi-Focus Attention Network for Efficient Deep Reinforcement Learning,” AAAI (2017) [hereinafter Choi] and further in view of Kataoka et al., “Anticipating Traffic Accidents with Adaptive Loss and Large-Scale Incident DB,” 08 April 2018 [hereinafter Kataoka]. 
Regarding claims 1 and 31, Salakhutdinov respectively teaches [a] computer-implemented neural network system for reinforcement learning, wherein the neural network system is used to control an agent interacting with an environment to perform a task in an attempt to achieve a specified result (Salakhutdinov ¶ 0006 teaches a [n]eural networks that utilize external memories can be distinguished along two main axes : memories with write operators and those without . Writeless external memory systems , often referred to as “ Memory Networks , ” typically fix which memories are stored (that is, a computer implemented neural network system)), and [o]ne or more non-transitory computer-readable storage media storing instructions that when executed by one or more computers cause the one or more computers (Salakhutdinov ¶¶ 0052-53 teaches a machine-readable medium for storing code) , . . . comprising:
an input network configured, at each of a plurality of time steps, to receive state data (Salakhutdinov ¶ 0020 teaches st can represent the current state embedding, M, can represent the current neural map, and (x , y) can represent the current position of the agent (that is, an input network configured to receive state data); Salakhutdinov ¶ 0036 teaches the neural map instead utilizes the relative position of the agent as input . This is expressed mathematically by assuming that the agent moves through the environment between time steps with a velocity ( u , v ) (that is, agent states are at each of a plurality of time steps)) comprising an image in pixel form that characterizes the environment . . . and extract, based on processing the state data using at least one convolutional layer (Salakhutdinov ¶ 0001,1 Appendix A at p. 7, first partial paragraph, teaches [t]he Neural Map architectures have an internal map size of 15 x 15 with a feature channel size of 32. To get rt, the global read operation passes the neural map first through a 3-layer convolutional network, with each convolution having filter size 3 x 3 and 8 channels, followed by a 256 unit linear layer and then a final 32 unit linear layer (that is an input network configured to . . . extract based on processing the state data using at least one convolutional layer)), respective convolutional features for each of a plurality of spatially distinct cells in the image (Salakhutdinov, Fig. 5., teaches a maze to be traversed by a DRL agent utilizing a neural map (Examiner’s annotations in text blocks):

    PNG
    media_image1.png
    707
    727
    media_image1.png
    Greyscale

(Salakhutdinov ¶ 0044 teaches a maze 400 to be traversed by a [Deep Reinforcement Learning (DRL)] agent 401 utilizing a neural map memory architecture . . . . Reference should also be made to FIG. 6, which depicts the section 405 of the maze 400 of FIG . 5 that is observed by the DRL agent 401 at the agent's 401 current location (that is, to receive state data); Salakhutdinov ¶ 0046 teaches the agent's 401 state observations are a 5x15x3 subsample of the complete maze, as depicted in FIG . 6 (Examiner Annotations in text box):

    PNG
    media_image2.png
    492
    503
    media_image2.png
    Greyscale

so that the agent is able to see fifteen pixels forward and three pixels on the side (i. e., including the center pixel and one pixel on each side of the agent (that is, respective convolutional features for each of a plurality of spatially distinct cells in the image)) . . . ;
a relational network configured to generate, for each cell in the image . . . , respective final features for the cell by updating the respective convolutional features for the cell using the respective convolutional features for the other cells in the image (Salakhutdinov ¶ 0048 teaches the DRL agent can be trained to traverse the three - dimensional environment by receiving color images (that is, cell) of the environment as the agent moves therethrough, using a deep network as a state embedding (that is, “state” is a respective convolutional feature). The state observations can be, for example, some number of previous color images (e.g., 100x75 RGB) (that is, with “some number of previous color images,” is using the respective convolutional features for the other cells in the image). The network can be, for example, pre-initialized with the weights from a network trained to traverse the environment (that is, to “train” is to update the respective features for the cell)) . . . , the relational network comprising at least one attention block (Salakhutdinov ¶ 0024 teaches the localized read operation can include utilizing a Spatial Transformer Network to attentively subsample (that is, at least one attention block) the neural map at particular locations scales (that is, the neural map is a relational network configured to generate, for each cell, respective final features for the cell, the relational network comprising at least one attention block)) comprising: 
(i) at least one query network configured to generate as output a query vector (Salakhutdinov ¶ 0025 teaches [t]he local read operation takes the current state embedding st, and the current global read vector rt, as input and produces a query vector qt (that is, at least one query network configured to generate as output a query vector)) . . . , (ii) at least one key network configured to generate a key vector for each of the plurality of cells in the image . . . , and (iii) at least one value network configured to generate a value vector for each of the plurality of cells in the image . . . (Salakhutdinov ¶ 0001, Appendix A, at p. 5, “3.5.2 Key-Value Context Read Operation,” first paragraph, teaches stronger bias on the context addressing operation by splitting each feature of the neural map into two parts                         
                            
                                
                                    M
                                
                                
                                    t
                                
                                
                                    
                                        
                                            x
                                            ,
                                            y
                                        
                                    
                                
                            
                            =
                            
                                
                                    
                                        
                                            k
                                        
                                        
                                            t
                                        
                                        
                                            
                                                
                                                    x
                                                    ,
                                                    y
                                                
                                            
                                        
                                    
                                    ,
                                    
                                        
                                            v
                                        
                                        
                                            t
                                        
                                        
                                            
                                                
                                                    x
                                                    ,
                                                    y
                                                
                                            
                                        
                                    
                                
                            
                        
                    , where                         
                            
                                
                                    k
                                
                                
                                    t
                                
                                
                                    
                                        
                                            x
                                            ,
                                            y
                                        
                                    
                                
                            
                        
                     is the . . . “key” feature (that is, key vector) and                         
                            
                                
                                    v
                                
                                
                                    t
                                
                                
                                    
                                        
                                            x
                                            ,
                                            y
                                        
                                    
                                
                            
                        
                     is the . . . value feature (that is, value vector) . . . . The key features are matched against the query vector (which is now a (C/2)-dimensional vector) to get the probability distribution (that is, (ii) at least one key network configured to generate a key vector for each of the plurality of cells in the image . . . , and (iii) at least one value network configured to generate a value vector for each of the plurality of cells in the image)), each attention block further comprising a respective transform network for each of the plurality of cells in the image (Salakhutdinov ¶ 0024 teaches the localized read operation can include utilizing a Spatial Transformer Network to attentively subsample the neural map at particular locations scales (that is, a respective transform network)) . . . , each transform network being arranged to: 
* * *
and an output network arranged to receive the respective final features (Salakhutdinov ¶ 0031 teaches process 100 , at step 104 , executes a map update operation that updates the neural map with the new data as per the local write operation at step 103 (that is, an output network arranged to receive the respective final features)) . . . and use the respective final features to select an action to be performed by the agent in response to receiving the state data at the time step (Salakhutdinov ¶ 0022 teaches [i]n the algorithms above,                         
                            
                                
                                    w
                                
                                
                                    t
                                
                                
                                    
                                        
                                            
                                                
                                                    x
                                                
                                                
                                                    t
                                                
                                            
                                            ,
                                            
                                                
                                                    y
                                                
                                                
                                                    t
                                                
                                            
                                        
                                    
                                
                            
                        
                     represents the feature at position (xt, yt) at time t, [X1 , . . . , Xk] represents a concatenation operation , and ot, is the output of the neural map at time t, which is then processed by another deep network f to get the policy outputs                         
                            
                                
                                    π
                                
                                
                                    t
                                
                            
                            
                                
                                    a
                                    |
                                    s
                                
                            
                        
                     (that is, use the respective final features to select an action to be performed by the agent in response to receiving the state data)).
	Though Salakhutdinov teaches the features of a memory architecture to store features located at a particular position in the environment in a memory location specific to that location, so that as the agent traverses the environment, the agent compares the features at the agent's particular position to a summary of the features stored throughout the memory architecture and writes the features that correspond to the summary to the coordinates in the memory architecture that correspond to the agent’s position, Salakhutdinov, however, does not explicitly teach that the Spatial Transformer Network includes the feature of “vector [generation] for each of the plurality of cells based on applying a . . . linear transformation to the convolutional features for the cell”. 
But Seo teaches the feature of a “linear transformation to the convolutional features for the cell (Seo, at p. 6, “4. Question Answering Experiments - Model Details,” first paragraph, teaches the linear transformation before the softmax [(that is, before “attention”) for the answers).
Salakhutdinov and Seo are from the same or similar field of endeavor. Salakhutdinov teaches an agent comparing the features at the agent's particular position to a summary of the features stored throughout the memory architecture and writes the features that correspond to the summary to the coordinates in the memory architecture that correspond to the agent's position as the agent traverses the environment. Seo teaches the feature of a linear transformation in machine comprehension responsive to a query and a context based on a bidirectional attention flow mechanism to obtain a query-aware context representation without early summarization. Thus, it would have been obvious to a person having ordinary skill in the art as of the effective filing date of the Applicant’s invention to modify Salakhutdinov pertaining to deep reinforcement learning with the linear transformation of Seo.
The motivation for doing so is to allow attention to be computed for each time step to flow to the subsequent modeling layer to reduce information loss otherwise lost by summarization (Seo, at page 1, “1. Introduction,” second paragraph).
Though the Salakhutdinov and Seo teach the features of Deep Reinforcement Learning with an attention feature including a linear transformation, however, the combination of Salakhutdinov and Seo does not explicitly teach -
* * *
. . . each transform network being arranged to:
determine a respective attention weight between the cell and each of the plurality of cells in the image . . . by (i) generating respective salience values for each of the plurality of cells based on using at least the query vector that is generated as output by the at least one query network and the key vector that is generated by the at least one key network, and (ii) combining the respective salience values using a non-linear function to form the respective attention weights; and
generate, using the respective attention weights and the value vectors that are generated by the at least one value network, respective modified features for the cell in the image . . ; and
* * *
But Choi teaches -
* * *
. . . each transform network being arranged to:
determine a respective attention weight between the cell and each of the plurality of cells in the image . . . by (i) generating respective salience values for each of the plurality of cells in the image (Choi, right column of p. 3, “Multi-focus Attention Network - Parallel Attention,” first paragraph, teaches [u]sing key features extracted from the feature extraction module (that is, key vector), parallel attention layers determine what partial states are important . . . where N is the number of attention layers,                         
                            
                                
                                    A
                                
                                
                                    i
                                
                                
                                    n
                                
                            
                        
                     is the i-th element of n-th soft attention weight vector (that is, the weight vector is generating respective salience values for each of the plurality of cells)) . . . based on using at least the query vector that is generated as output by the at least one query network and the key vector that is generated by the at least one key network (Choi, Figure 2, teaches (Examiner annotations in text boxes):

    PNG
    media_image3.png
    402
    938
    media_image3.png
    Greyscale

Choi, left column of p. 3, “Multi-focus Attention Network - Single Agent Setting, Input Segmentation”, second paragraph, teaches [w]e partitioned input image into uniform grid and used the cells (small image patches) in the grid as partial states; Choi, right column of p. 3, “Multi-focus Attention Network - Single Agent Setting, Parallel Attentions”, first paragraph, teaches [u]sing the key features extracted from the feature extraction module, parallel attention layers determine what partial states are important (that is, salient, which is using at least the query vector and at least the key vector) by using [equation (5)], where N is the number of attention layers,                         
                            
                                
                                    A
                                
                                
                                    i
                                
                                
                                    n
                                
                            
                        
                     is i-th element of n-th soft attention weight vector, i’ ϵ {0,1,…,K} - i and an is n-th selector vector which is trainable like other weights of [the] network (that is, determine a respective attention weight between the cell and each of the plurality of cells using at least the query vector that is generated by the at least one query network and the key vector that is generated by the at least one key network)
[Examiner note: though the claims do not recite the term “saliency,” the specification recites that “respective attention weights for each set of entity data by generating respective salience values,” (PGPUB2 ¶ 0011), where “[t]o generate salience values for the respective plurality of entities, the head section for a given entity may multiply the query vector for the given entity with the respective key vectors.” (PGPUB ¶ 0013)]), and (ii) combining the respective salience values using a non-linear function to form the respective attention weights (Choi, right column of p. 3, “Multi-focus Attention Network - Parallel Attentions,” first paragraph, teaches distance regularization, which encourages each attention layer to attend to different partial states (that is, combining) from toher attention layers (that is, combining the salience values uinsg a non-linear function to form the respective attention weights)
[Examiner notes that the Specification recites “each qi is compared to all entities’ keys k1:N via a dot product [attention]. The result are respective unnormalized saliencies, si, where the vector si denotes the set of saliences . . . . The saliencies are normalised into weights” (Specification ¶ 0062); that is, a “salience” is the output of the “attention mechanism,” which outputs are simply “attention weights”]); and
generate, using the respective attention weights and the value vectors that are generated by the at least one value network, respective modified features for the cell in the image (Choi, left column of p. 4, “Multi-focus Attention Network - Single Agent Setting, State-Action Value Estimation”, first paragraph, teaches [u]sing attention weights from parallel attention layers, weighted value feature is defined as [equation (8)] . . . . Then the concatenated feature is used to estimate stat-action value as follows: [equation (10)] (that is, generate, using the respective attention weights and the value vectors that are generated by the at least one value network, respective modified features for the cell)) . . . ; and
* * *
Salakhutdinov, Seo, and Choi are from the same or similar field of endeavor. Salakhutdinov teaches an agent comparing the features at the agent's particular position to a summary of the features stored throughout the memory architecture and writes the features that correspond to the summary to the coordinates in the memory architecture that correspond to the agent's position as the agent traverses the environment. Seo teaches machine comprehension responsive to a query and a context based on a bidirectional attention flow mechanism to obtain a query-aware context representation without early summarization. Choi teaches applying, in a reinforcement learning environment, a multi-focus attention network (AM Net) that enhances the agent’s ability to attend to important entities by using multiple parallel attentions. Thus, it would have been obvious to a person having ordinary skill in the art as of the effective filing date of the Applicant’s invention to modify the combination of Salakhutdinov and Seo pertaining to deep reinforcement learning including a linear transformation with the multi-focus attention network of Choi.
The motivation for doing so is to mimic the human ability to spatially abstract the low-level sensory input into multiple entities and attend to them simultaneously, which also achieves faster learning than existing models. (Choi, Abstract).
Though Salakhutdinov, Seo, and Choi teach the features of Deep Reinforcement Learning with an attention feature including a linear transformation with a multi-focus attention mechanism, however, the combination of Salakhutdinov, Seo, and Choi does not explicitly teach the instances of a “cell in the image that is captured by one or more sensors of the agent or one or more sensors that are located separately from the agent in the environment at the time step.” 
But Kataoka teaches “cell in the image that is captured by one or more sensors of the agent or one or more sensors that are located separately from the agent in the environment at the time step.” (Kataoka, left column of p. 1, “1 . Introduction,” first paragraph, teaches one vital technology field for achieving this target [of a car that carries humans to their destination safely], three-dimensional (3D) environment sensing, has seen significant improvements recently. For example, laser sensors such as light detection and ranging (LiDAR) and visual simultaneous localization and mapping (vSLAM) (that is, sensors, in which an image is captured by one or more sensors of the agent or one or more sensors that are located separately from the agent in the environment at the time step) are among the most active topics in the race for practical self-driving cars capable of transporting human passengers; Kataoka, left column of p. 3, “3. Our Approach,” first paragraph, teaches the system extracts global and local feature from each frame (that is, is frame is a “cell in the image”)).
Salakhutdinov, Seo, Choi, and Kataoka are from the same or similar field of endeavor. Salakhutdinov teaches an agent comparing the features at the agent's particular position to a summary of the features stored throughout the memory architecture and writes the features that correspond to the summary to the coordinates in the memory architecture that correspond to the agent's position as the agent traverses the environment. Seo teaches machine comprehension responsive to a query and a context based on a bidirectional attention flow mechanism to obtain a query-aware context representation without early summarization. Choi teaches applying, in a reinforcement learning environment, a multi-focus attention network (AM Net) that enhances the agent’s ability to attend to important entities by using multiple parallel attentions. Kataoka teaches model to gradually learn an earlier anticipation as training progresses where a loss function adaptively assigns penalty weights depending on how early the model can anticipate a traffic accident at each time epoch. Thus, it would have been obvious to a person having ordinary skill in the art as of the effective filing date of the Applicant’s invention to modify the combination of Salakhutdinov, Seo and Choi pertaining to deep reinforcement learning including a linear transformation of a multi-focus attention network with the sensor image capture and feature extraction with dynamic soft-attention of Kataoka.
The motivation for doing so is to achieve the objective, in self-driving cars, is to produce “a car that carries humans to their destination safely”, and one vital technology is 3D environment sensing. (Kataoka, left column of p. 1, “1. Introduction,” first paragraph).
Regarding claim 18, the combination of Salakhutdinov, Seo, Choi and Kataoka teaches all of the limitations of claim 1, as described above.
Salakhutdinov teaches the input network including at least one recurrent layer (Salakhutdinov ¶ 0001, Appendix A at p. 5, “3.5.3 GRU-based Local Write Operation, first paragraph teaches [using] a gated write operating based on the recurrent update equations of the Gated Recurrent Unit (GRU) (that is, (at least one recurrent layer)).
Regarding claim 19, the combination of Salakhutdinov, Seo, Choi and Kataoka teaches all of the limitations of claim 18, as described above.
	Salakhutdinov teaches the recurrent layer is a LSTM layer (Salakhutdinov ¶ 0047 teaches a DRL agent 401 utilizing the presently described neural map memory architecture generally outperforms baseline memory networks utilizing 128 LSTM units and MemNN (the recurrent layer is a LSTM layer)).
Regarding claim 20, the combination of Salakhutdinov, Seo, Choi and Kataoka teaches all of the limitations of claim 19, as described above.
Salakhutdinov further teaches wherein the LSTM layer is a convolutional LSTM layer (Salakhutdinov ¶ 0004 teaches Deep Reinforcement Learning (DRL) agents have so far used relatively simple memory architectures, with the main methods to overcome partial observability being either a temporal convolution over the past k frames or a long short-term memory (LSTM) layer).
Regarding claim 23, the combination of Salakhutdinov, Seo, Choi and Kataoka teaches all of the limitations of claim 1, as described above.
Salakhutdinov teaches -
wherein the output network is configured to generate a baseline value (Salakhutdinov ¶ 0047 teaches a DRL agent 401 utilizing the presently described neural map memory architecture generally outperforms baseline memory networks utilizing 128 LSTM units and MemNN, which is a memory network based architecture that performs attention over the past 32 states seen (the output network is configured to generate a baseline value)).
Regarding claim 27, Salakhutdinov teaches [a] method for controlling an agent interacting with an environment to perform a task in an attempt to achieve a specified result (Salakhutdinov ¶ 0019 teaches a method usable for DRL agent acting a multi-dimensional space), the method comprising:
receiving, at each of a plurality of time steps, state data comprising an image in pixel form that characterizes an environment (Salakhutdinov ¶ 0020 teaches st can represent the current state embedding, M, can represent the current neural map, and (x , y) can represent the current position of the agent (that is, an input network configured to receive state data); Salakhutdinov ¶ 0036 teaches the neural map instead utilizes the relative position of the agent as input . This is expressed mathematically by assuming that the agent moves through the environment between time steps with a velocity ( u , v ) (that is, agent states are at each of a plurality of time steps)) comprising an image in pixel form that characterizes the environment (Salakhutdinov, Fig. 5., teaches a maze to be traversed by a DRL agent utilizing a neural map (Examiner’s annotations in text blocks):

    PNG
    media_image1.png
    707
    727
    media_image1.png
    Greyscale

(Salakhutdinov ¶ 0044 teaches a maze 400 to be traversed by a [Deep Reinforcement Learning (DRL)] agent 401 utilizing a neural map memory architecture . . . . Reference should also be made to FIG. 6, which depicts the section 405 of the maze 400 of FIG. 5 that is observed by the DRL agent 401 at the agent's 401 current location (that is, to receive state data); Salakhutdinov ¶ 0046 teaches the agent's 401 state observations are a 5x15x3 subsample of the complete maze, as depicted in FIG . 6 (Examiner Annotations in text box):

    PNG
    media_image2.png
    492
    503
    media_image2.png
    Greyscale

so that the agent is able to see fifteen pixels forward and three pixels on the side (i. e., including the center pixel and one pixel on each side of the agent (that is, an image in pixel form that characterizes the environment)) . . . ;
extracting based on processing the state data using at least one convolutional layer (Salakhutdinov ¶ 0001,3 Appendix A at p. 7, first partial paragraph, teaches [t]he Neural Map architectures have an internal map size of 15 x 15 with a feature channel size of 32. To get rt, the global read operation passes the neural map first through a 3-layer convolutional network, with each convolution having filter size 3 x 3 and 8 channels, followed by a 256 unit linear layer and then a final 32 unit linear layer (that is an input network configured to . . . extract based on processing the state data using at least one convolutional layer)), respective convolutional features for each of a plurality of spatially distinct cells in the image . . . (Salakhutdinov, Fig. 5, teaches a maze to be traversed by a DRL agent utilizing a neural map (Examiner’s annotations in text blocks):

    PNG
    media_image1.png
    707
    727
    media_image1.png
    Greyscale

(Salakhutdinov ¶ 0044 teaches a maze 400 to be traversed by a [Deep Reinforcement Learning (DRL)] agent 401 utilizing a neural map memory architecture . . . . Reference should also be made to FIG . 6, which depicts the section 405 of the maze 400 of FIG. 5 that is observed by the DRL agent 401 at the agent's 401 current location (that is, to receive state data); Salakhutdinov ¶ 0046 teaches the agent's 401 state observations are a 5x15x3 subsample of the complete maze, as depicted in FIG . 6 (Examiner Annotations in text box):

    PNG
    media_image4.png
    356
    389
    media_image4.png
    Greyscale

so that the agent is able to see fifteen pixels forward and three pixels on the side (i. e., including the center pixel and one pixel on each side of the agent (that is, respective convolutional features for each of a plurality of spatially distinct cells in the image));
generating, for each cell in the image . . . , respective final features for the cell by using a relational network configured to update the respective convolutional features for the cell using the respective convolutional features for the other cells in the image (Salakhutdinov ¶ 0048 teaches the DRL agent can be trained to traverse the three - dimensional environment by receiving color images (that is, cell) of the environment as the agent moves therethrough, using a deep network as a state embedding (that is, “state” is a respective convolutional feature). The state observations can be, for example, some number of previous color images (e.g., 100x75 RGB) (that is, with “some number of previous color images,” is using the respective convolutional features for the other cells in the image). The network can be, for example, pre-initialized with the weights from a network trained to traverse the environment (that is, to “train” is to update the respective features for the cell)) . . . , the relational network comprising: 
at least one attention block (Salakhutdinov ¶ 0024 teaches the localized read operation can include utilizing a Spatial Transformer Network to attentively subsample (that is, at least one attention block) the neural map at particular locations scales (that is, the neural map is a relational network configured to generate, for each cell, respective final features for the cell, the relational network comprising at least one attention block)) comprising (i) at least one query network configured to generate as output a query vector (Salakhutdinov ¶ 0025 teaches [t]he local read operation takes the current state embedding st, and the current global read vector rt, as input and produces a query vector qt (that is, at least one query network configured to generate as output a query vector)) . . . , (ii) at least one key network configured to generate a key vector for each of the plurality of cells . . . , and (iii) at least one value network configured to generate a value vector for each of the plurality of cells . . . (Salakhutdinov ¶ 0001, Appendix A, at p. 5, “3.5.2 Key-Value Context Read Operation,” first paragraph, teaches stronger bias on the context addressing operation by splitting each feature of the neural map into two parts                         
                            
                                
                                    M
                                
                                
                                    t
                                
                                
                                    
                                        
                                            x
                                            ,
                                            y
                                        
                                    
                                
                            
                            =
                            
                                
                                    
                                        
                                            k
                                        
                                        
                                            t
                                        
                                        
                                            
                                                
                                                    x
                                                    ,
                                                    y
                                                
                                            
                                        
                                    
                                    ,
                                    
                                        
                                            v
                                        
                                        
                                            t
                                        
                                        
                                            
                                                
                                                    x
                                                    ,
                                                    y
                                                
                                            
                                        
                                    
                                
                            
                        
                    , where                         
                            
                                
                                    k
                                
                                
                                    t
                                
                                
                                    
                                        
                                            x
                                            ,
                                            y
                                        
                                    
                                
                            
                        
                     is the . . . “key” feature (that is, key vector) and                         
                            
                                
                                    v
                                
                                
                                    t
                                
                                
                                    
                                        
                                            x
                                            ,
                                            y
                                        
                                    
                                
                            
                        
                     is the . . . value feature (that is, value vector) . . . . The key features are matched against the query vector (which is now a (C/2)-dimensional vector) to get the probability distribution (that is, (ii) at least one key network configured to generate a key vector for each of the plurality of cells . . . , and (iii) at least one value network configured to generate a value vector for each of the plurality of cells)), each attention block further comprising a plurality of transform networks that correspond to the plurality of cells in the image . . . (Salakhutdinov ¶ 0024 teaches the localized read operation can include utilizing a Spatial Transformer Network to attentively subsample the neural map at particular locations scales (that is, a respective transform network)) and that are each configured to:
* * *
Though Salakhutdinov teaches the features of a memory architecture to store features located at a particular position in the environment in a memory location specific to that location, so that as the agent traverses the environment, the agent compares the features at the agent's particular position to a summary of the features stored throughout the memory architecture and writes the features that correspond to the summary to the coordinates in the memory architecture that correspond to the agent’s position, Salakhutdinov, however, does not explicitly teach that the Spatial Transformer Network includes the feature of “vector [generation] for each of the plurality of cells based on applying a . . . linear transformation to the convolutional features for the cell”. 
But Seo teaches the feature of a “linear transformation to the convolutional features for the cell (Seo, at p. 6, “4. Question Answering Experiments - Model Details,” first paragraph, teaches the linear transformation before the softmax [(that is, before “attention”) for the answers).
Salakhutdinov and Seo are from the same or similar field of endeavor. Salakhutdinov teaches an agent comparing the features at the agent's particular position to a summary of the features stored throughout the memory architecture and writes the features that correspond to the summary to the coordinates in the memory architecture that correspond to the agent's position as the agent traverses the environment. Seo teaches the feature of a linear transformation in machine comprehension responsive to a query and a context based on a bidirectional attention flow mechanism to obtain a query-aware context representation without early summarization. Thus, it would have been obvious to a person having ordinary skill in the art as of the effective filing date of the Applicant’s invention to modify Salakhutdinov pertaining to deep reinforcement learning with the linear transformation of Seo.
The motivation for doing so is to allow attention to be computed for each time step to flow to the subsequent modeling layer to reduce information loss otherwise lost by summarization (Seo, at page 1, “1. Introduction,” second paragraph).
Though Salakhutdinov and Seo teach the features of Deep Reinforcement Learning with an attention feature including a linear transformation, however, the combination of Salakhutdinov and Seo does not explicitly teach -
* * *
. . . each attention block further comprising a plurality of transform networks that correspond to the plurality of cells in the image . . . and that are each configured to:
determine a respective attention weight between the cell and each of the plurality of cells in the image . . . by (i) generating respective salience values for each of the plurality of cells based on using at least the query vector that is generated as output by the at least one query network and the key vector that is generated by the at least one key network, and (ii) combining the respective salience values using a non-linear function to form the respective attention weights; 
generate, using the respective attention weights and the value vectors that are generated by the at least one value network, respective modified features for the cell in the image . . . ; and
selecting an action to be performed by the agent in response to the received state data based on the respective final features for each of the cells at the time step.
But Choi teaches -
* * *
. . . each attention block further comprising a plurality of transform networks that corresponds to the plurality of cells that are each configured to:
determine a respective attention weight between the cell and each of the plurality of cells in the image . . . by (i) generating respective salience values for each of the plurality of cells (Choi, right column of p. 3, “Multi-focus Attention Network - Parallel Attention,” first paragraph, teaches [u]sing key features extracted from the feature extraction module (that is, key vector), parallel attention layers determine what partial states are important . . . where N is the number of attention layers,                         
                            
                                
                                    A
                                
                                
                                    i
                                
                                
                                    n
                                
                            
                        
                     is the i-th element of n-th soft attention weight vector (that is, weights of the “weight vector” is generating respective salience values for each of the plurality of cells)) based on using at least the query vector that is generated as output by the at least one query network and the key vector that is generated by the at least one key network (Choi, Figure 2, teaches (Examiner annotations in text boxes):

    PNG
    media_image3.png
    402
    938
    media_image3.png
    Greyscale

Choi, left column of p. 3, “Multi-focus Attention Network - Single Agent Setting, Input Segmentation”, second paragraph, teaches [w]e partitioned input image into uniform grid and used the cells (small image patches) in the grid as partial states; Choi, right column of p. 3, “Multi-focus Attention Network - Single Agent Setting, Parallel Attentions”, first paragraph, teaches [u]sing the key features extracted from the feature extraction module, parallel attention layers determine what partial states are important (that is, salient, which is using at least the query vector and at least the key vector) by using [equation (5)], where N is the number of attention layers,                         
                            
                                
                                    A
                                
                                
                                    i
                                
                                
                                    n
                                
                            
                        
                     is i-th element of n-th soft attention weight vector, i’ ϵ {0,1,…,K} - i and an is n-th selector vector which is trainable like other weights of [the] network (that is, determine a respective attention weight between the cell and each of the plurality of cells using at least the query vector that is generated by the at least one query network and the key vector that is generated by the at least one key network)
[Examiner note: though the claims do not recite the term “saliency,” the specification recites that “respective attention weights for each set of entity data by generating respective salience values,” (PGPUB ¶ 0011), where “[t]o generate salience values for the respective plurality of entities, the head section for a given entity may multiply the query vector for the given entity with the respective key vectors.” (PGPUB ¶ 0013)]), and (ii) combining the respective salience values using a non-linear function to form the respective attention weights (Choi, right column of p. 3, “Multi-focus Attention Network - Parallel Attentions,” first paragraph, teaches distance regularization, which encourages each attention layer to attend to different partial states (that is, combining) from other attention layers (that is, combining the salience values using a non-linear function to form the respective attention weights)
[Examiner notes that the Specification recites “each qi is compared to all entities’ keys k1:N via a dot product [attention]. The result are respective unnormalized saliencies, si, where the vector si denotes the set of saliences . . . . The saliencies are normalised into weights” (Specification ¶ 0062); that is, a “salience” is the output of the “attention mechanism,” which outputs are simply “attention weights”]); and
generate, using the respective attention weights and the value vectors that are generated by the at least one value network, respective modified features for the cell in the image (Choi, left column of p. 4, “Multi-focus Attention Network - Single Agent Setting, State-Action Value Estimation”, first paragraph, teaches [u]sing attention weights from parallel attention layers, weighted value feature is defined as [equation (8)] . . . . Then the concatenated feature is used to estimate stat-action value as follows: [equation (10)] (that is, generate, using the respective attention weights and the value vectors that are generated by the at least one value network, respective modified features for the cell)) . . . ; and 
selecting an action to be performed by the agent in response (Choi, right column of p. 5, “Combat Experiments for Multi-Agent Task Environment,” first paragraph, teaches Agents can take 10 actions: moving up, down, left, right, attacking enemy by specifying its index (1~5), or doing nothing (that is, selecting an action to be performed by the agent in response)) to the received state data based on the respective final features for each of the cells at the time step (Choi, right column of p. 2, “Related Works - Attention in Deep Reinforcement Learning,” first paragraph, teaches we use attention to extract important partial states in the input of one time-step and apply multiple attention layers to attend to multiple partial states for fast and efficient learning).
Salakhutdinov, Seo, and Choi are from the same or similar field of endeavor. Salakhutdinov teaches an agent comparing the features at the agent's particular position to a summary of the features stored throughout the memory architecture and writes the features that correspond to the summary to the coordinates in the memory architecture that correspond to the agent's position as the agent traverses the environment. Seo teaches machine comprehension responsive to a query and a context based on a bidirectional attention flow mechanism to obtain a query-aware context representation without early summarization. Choi teaches applying, in a reinforcement learning environment, a multi-focus attention network (AM Net) that enhances the agent’s ability to attend to important entities by using multiple parallel attentions. Thus, it would have been obvious to a person having ordinary skill in the art as of the effective filing date of the Applicant’s invention to modify the combination of Salakhutdinov and Seo pertaining to deep reinforcement learning including a linear transformation with the multi-focus attention network of Choi.
The motivation for doing so is to mimic the human ability to spatially abstract the low-level sensory input into multiple entities and attend to them simultaneously, which also achieves faster learning than existing models. (Choi, Abstract).
Though Salakhutdinov, Seo, and Choi teach the features of Deep Reinforcement Learning with an attention feature including a linear transformation with a multi-focus attention mechanism, however, the combination of Salakhutdinov, Seo, and Choi does not explicitly teach the feature of the instances of a “cell in the image that is captured by one or more sensors of the agent or one or more sensors that are located separately from the agent in the environment at the time step.” (Kataoka, left column of p. 1, “1 . Introduction,” first paragraph, teaches one vital technology field for achieving this target [of a car that carries humans to their destination safely], three-dimensional (3D) environment sensing, has seen significant improvements recently. For example, laser sensors such as light detection and ranging (LiDAR) and visual simultaneous localization and mapping (vSLAM) (that is, sensors, in which an image is captured by one or more sensors of the agent or one or more sensors that are located separately from the agent in the environment at the time step) are among the most active topics in the race for practical self-driving cars capable of transporting human passengers; Kataoka, left column of p. 3, “3. Our Approach,” first paragraph, teaches the system extracts global and local feature from each frame (that is, is frame is a “cell in the image”)).
Salakhutdinov, Seo, Choi, and Kataoka are from the same or similar field of endeavor. Salakhutdinov teaches an agent comparing the features at the agent's particular position to a summary of the features stored throughout the memory architecture and writes the features that correspond to the summary to the coordinates in the memory architecture that correspond to the agent's position as the agent traverses the environment. Seo teaches machine comprehension responsive to a query and a context based on a bidirectional attention flow mechanism to obtain a query-aware context representation without early summarization. Choi teaches applying, in a reinforcement learning environment, a multi-focus attention network (AM Net) that enhances the agent’s ability to attend to important entities by using multiple parallel attentions. Kataoka teaches a model to gradually learn an earlier anticipation as training progresses where a loss function adaptively assigns penalty weights depending on how early the model can anticipate a traffic accident at each time epoch. Thus, it would have been obvious to a person having ordinary skill in the art as of the effective filing date of the Applicant’s invention to modify the combination of Salakhutdinov, Seo and Choi pertaining to deep reinforcement learning including a linear transformation of a multi-focus attention network with the sensor image capture and feature extraction with dynamic soft-attention of Kataoka.
The motivation for doing so is to achieve the objective, in self-driving cars, is to produce “a car that carries humans to their destination safely”, and one vital technology is 3D environment sensing. (Kataoka, left column of p. 1, “1. Introduction,” first paragraph).
Regarding claim 28, the combination of Salakhutdinov, Seo, Choi and Kataoka teaches all of the limitations of claim 27, as described above. 
Salakhutdinov teaches -
wherein extracting the respective features comprises:
processing the state data using an input neural network (Salakhutdinov ¶ 0028 teaches for an agent's current position (xt, yt) at time t, the local write operation takes as input the current state embedding st, the global read output rt, the context read vector ct, and the current feature at position (xt, yt) in the neural map M(xt,yt) and produces, using a deep neural network f (that is, processing the state data using an input neural network), a new C-dimensional vector Wt+1 (xt,yt) (that is, to generate the respective features for each of the plurality of spatially distinct cells)).
7.	Claims 2, 3, and 8-11 are rejected under 35 U.S.C. § 103 as being unpatentable over US Published Application 20180373982 to Salakhutdinov et al. [hereinafter Salakhutdinov] in view of Seo et al., “Bi-Directional Attention Flow for Machine Comprehension,” 24 Feb 2017 [hereinafter Seo], and Choi et al., “Multi-Focus Attention Network for Efficient Deep Reinforcement Learning,” AAAI (2017) [hereinafter Choi] and further in view of and further in view of Kataoka et al., “Anticipating Traffic Accidents with Adaptive Loss and Large-Scale Incident DB,” 08 April 2018 [hereinafter Kataoka] and Battaglia et al., Interaction Networks for Learning about Objects, Relations and Physics (2016) [hereinafter Battaglia].
Regarding claim 2, the combination of Salakhutdinov, Seo Choi and Kataoka teaches all of the limitations of claim 1, as described above.
Though the combination of Salakhutdinov, Seo, Choi and Kataoka teach the features of a deep reinforcement learning with linear transformations in an attention functions, the combination of Salakhutdinov, Seo, Choi and Kataoka does not explicitly teach -
wherein each of the transform networks comprises one or more head sections, and an adaptive network to generate the modified features from the outputs of head sections.
But Battaglia teaches -
wherein each of the transform networks comprises one or more head sections, and an adaptive network (Battaglia, page 3, second full paragraph, teaches [a] standard deep neural network (transform network) building blocks (one or more head sections), multilayer perceptrons (MLP) (that is, an adaptive network), matrix operations, etc., . . . ) to generate the modified features from the outputs of head sections (Battaglia, Figure 1(b) & caption, teaches the model takes as input a graph that represents a system of objects . . . and relations . . . , instantiates the pairwise interaction terms bk, and computes their effects ek (that is, modified features) . . . to generate input (as cj), for an object model fO (to generate the modified features from the outputs of head sections) . . . ).
Salakhutdinov, Seo, Choi, Kataoka, and Battaglia are from the same or similar field of endeavor. Salakhutdinov teaches an agent comparing the features at the agent's particular position to a summary of the features stored throughout the memory architecture and writes the features that correspond to the summary to the coordinates in the memory architecture that correspond to the agent's position as the agent traverses the environment. Seo teaches machine comprehension responsive to a query and a context based on a bidirectional attention flow mechanism to obtain a query-aware context representation without early summarization. Choi teaches applying, in a reinforcement learning environment, a multi-focus attention network (AM Net) that enhances the agent’s ability to attend to important entities by using multiple parallel attentions. Kataoka teaches model to gradually learn an earlier anticipation as training progresses where a loss function adaptively assigns penalty weights depending on how early the model can anticipate a traffic accident at each time epoch. Battaglia teaches an interaction network that can reason about how objects in a complex system interact. Thus, it would have been obvious to a person having ordinary skill in the art as of the effective filing date of the Applicant’s invention to modify the combination of Salakhutdinov, Seo, Choi and Kataoka pertaining to deep reinforcement learning including a linear transformation with the multi-focus attention network with the transform networks of Battaglia.
The motivation for doing so is to capture complex interactions for predicting future states to improve the interaction of an agent with an environment in a deep reinforcement learning. (Battaglia, at p. 2, “1. Introduction,” second full paragraph).
Regarding claim 3, the combination of Salakhutdinov, Seo, Choi, Kataoka, and Battaglia teaches all of the limitations of claim 2, as described above.
Choi teaches -
wherein, denoting the number of head sections in each transform network as h, each attention block is operative to, for each of the h generate h value vectors for each cell using the convolutional features for the plurality of cells, and each head section is operative to form a sum of the value vectors for the plurality of cells weighted by respective attention weights (Choi, left column of p. 3, “Multi-focus Attention network, Single Agent Setting - Input Segmentation,” first paragraph, teaches partition[ing] input image into uniform grid and used the cells (small image patches) in the grid as partial states; also, Choi, right column of p. 3, “Multi-focus Attention Network - Feature Extraction,” first paragraph, teaches extraction using a convolutional neural network (that is, Choi teaches applying attention weights for each cell using the convolutional features); Choi, left column of p. 4, “Multi-focus Attention Network, Single Agent Setting - Parallel Attentions,” first paragraph, teaches [u]sing attention weights from parallel attention layers (that is, denoting the number of head sections in each transform network as h), weighted value feature is defined as [equation (8)] wherein hn (that is, for each of the h generate h value vectors) is the n-th sum of value features weighted by [attention weight vector] An (that is, each head section is operative to form a sum of the value vectors weighted by respective attention weights)).
Salakhutdinov, Seo, Choi, Kataoka and Battaglia are from the same or similar field of endeavor. Salakhutdinov teaches an agent comparing the features at the agent's particular position to a summary of the features stored throughout the memory architecture and writes the features that correspond to the summary to the coordinates in the memory architecture that correspond to the agent's position as the agent traverses the environment. Seo teaches machine comprehension responsive to a query and a context based on a bidirectional attention flow mechanism to obtain a query-aware context representation without early summarization. Choi teaches applying, in a reinforcement learning environment, a multi-focus attention network (AM Net) that enhances the agent’s ability to attend to important entities by using multiple parallel attentions. Kataoka teaches model to gradually learn an earlier anticipation as training progresses where a loss function adaptively assigns penalty weights depending on how early the model can anticipate a traffic accident at each time epoch. Battaglia teaches an interaction network that can reason about how objects in a complex system interact. Thus, it would have been obvious to a person having ordinary skill in the art as of the effective filing date of the Applicant’s invention to modify the combination of Salakhutdinov, Seo, Choi and Kataoka pertaining to deep reinforcement learning including a linear transformation with the multi-focus attention network with the transform networks of Battaglia.
The motivation for doing so is to mimic the human ability to spatially abstract the low-level sensory input into multiple entities and attend to them simultaneously, which also achieves faster learning than existing models. (Choi, Abstract).
Regarding claim 8, the combination of Salakhutdinov, Seo, Choi, Battaglia, and Mottaghi teaches all of the limitations of claim 6, as described above.
Battaglia teaches wherein, denoting the number of head sections in each transform network as h, each attention block comprises h query networks for generating a query vector for each cell from the plurality of cells (Battaglia, Figs. 1a, 1b & caption, teaches the model takes objects [(O1, O2, O3)] and relations [(r1, r2)] as input (denoting the number of head sections in each transform network as h), reasons about their interactions (each attention block comprises h query networks for generating a query vector for each entity); Examiner points out that query pertains to determining/querying the next state based on a given state, and that a query vector includes those elements, or tuple, relating to the query), and h key networks for generating a key vector (Battaglia, Figs. 1a, 1b & caption, teaches applies the effects and physical dynamics (h key networks for generating a key vector for each entity from corresponding entity data) . . . . ; Examiner points out that key pertains to answering/solving the next state based on the given state, and that a key vector includes those elements, or tuple, relating to the key), 
each head section being arranged to use the query vector (Battaglia, page 4, sixth full paragraph, teaches [t]he G, X, and E are input (each head section being arranged to use the query vector for the corresponding entity to generate the salience values for each of the plurality of entities) to a, which computes the DE x NO matrix product (dot product of the query vector and the respective key vector),                         
                            
                                
                                    E
                                
                                -
                            
                            =
                            E
                            
                                
                                    R
                                
                                
                                    r
                                
                                
                                    T
                                
                            
                        
                     in a deep reinforcement learning. 
Regarding claim 9, the combination of Salakhutdinov, Seo, Choi, Kataoka and Battaglia teaches all of the limitations of claim 3, as described above.
Battaglia teaches wherein each transform network is arranged to concatenate the weighted value vectors (Battaglia, page 4, fourth full paragraph, teaches marshalling function, m, computes the matrix products, ORr and ORs (weighted value vectors), and concatenates them with Ra: m(G) = [ORr;ORs;Ra] = B) (each transform network is arranged to concatenate the weighted value vectors)), and generate the modified features using the concatenated weighted value vectors (Battaglia, page 4, fourth to fifth full paragraph, teaches [t]he resulting B is a (2DS + DR) x NR matrix . . . . The [resulting] B is input to [relational model] φR (generate the modified features using the concatenated weighted value vectors) . . . . ).
Regarding claim 10, the combination of Salakhutdinov, Seo, Choi, Kataoka and Battaglia teaches all of the limitations of claim 9, as described above.
Battaglia teaches wherein each transform network is arranged to add the concatenated weighted value vectors to the convolutional features for the corresponding cell to form a summed vector (Battaglia, page 4, sixth-to-seventh full paragraph, teaches resulting C is a (DS +DX + DE) x NO matrix, whose NO columns represent the object states, external effects, and per-object aggregate interaction effects (each transform network is arranged to add the concatenated weight value vectors to the convolutional features for the corresponding cell to form a summed vector)), and transmit the summed vector to the adaptive network (Battaglia, page 4, sixth-to-seventh full paragraph, teaches [t]he C is input to co. (transmit the summed vector to the adaptive network) . . . . ).
Regarding claim 11, the combination of Salakhutdinov, Seo, Choi, Kataoka and Battaglia teaches all of the limitations of claim 2, as described above.
Battaglia teaches wherein the adaptive network comprises a multi-layer perceptron (Battaglia, page 3, second full paragraph, teaches multilayer perceptrons (MLP) (an adaptive network comprises a multi-layer perceptron), . . . ).
8.	Claims 4 and 5 are rejected under 35 U.S.C. § 103 as being unpatentable over US Published Application 20180373982 to Salakhutdinov et al. [hereinafter Salakhutdinov] in view of Seo et al., “Bi-Directional Attention Flow for Machine Comprehension,” 24 Feb 2017 [hereinafter Seo], and Choi et al., “Multi-Focus Attention Network for Efficient Deep Reinforcement Learning,” AAAI (2017) [hereinafter Choi] and further in view of Kataoka et al., “Anticipating Traffic Accidents with Adaptive Loss and Large-Scale Incident DB,” 08 April 2018 [hereinafter Kataoka], Battaglia et al., Interaction Networks for Learning about Objects, Relations and Physics (2016) [hereinafter Battaglia] and Duan et al., “One-Shot Imitation Learning,” (2017) [hereinafter Duan].
Regarding claim 4, the combination of Salakhutdinov, Seo, Choi, Kataoka and Battaglia teaches all of the limitations of claim 3, as described above.
However, the combination of Salakhutdinov, Seo, Choi, Kataoka and Battaglia does not explicitly teach -
wherein the attention block comprises h value networks, each value network being for generating value vectors from the convolutional features.
But Duan teaches -
wherein the attention block comprises h value networks, each value network being for generating value vectors from input features (Duan, right column of p. 5, “4.2.1. Neighborhood Attention,” second paragraph, teaches The input to neighborhood attention is a list of embeddings                         
                            
                                
                                    h
                                
                                
                                    1
                                
                                
                                    i
                                    n
                                
                            
                        
                    , . . . ,                         
                            
                                
                                    h
                                
                                
                                    B
                                
                                
                                    i
                                    n
                                
                            
                             
                        
                    of the same dimension, which can be the result of a projection operation over a list of block positions; Duan, Fig. 4 & caption, teaches:

    PNG
    media_image5.png
    177
    277
    media_image5.png
    Greyscale

in which Duan caption teaches for each block, performs one attention query corresponding to each block, and outputs a list of embeddings which have the same dimension as the input (that is, (the attention block comprises h value networks, each value network for generating value vectors from input features)).
Salakhutdinov, Seo, Choi, Kataoka, Battaglia, and Duan are from the same or similar field of endeavor. Salakhutdinov teaches an agent comparing the features at the agent's particular position to a summary of the features stored throughout the memory architecture and writes the features that correspond to the summary to the coordinates in the memory architecture that correspond to the agent's position as the agent traverses the environment. Seo teaches machine comprehension responsive to a query and a context based on a bidirectional attention flow mechanism to obtain a query-aware context representation without early summarization. Choi teaches applying, in a reinforcement learning environment, a multi-focus attention network (AM Net) that enhances the agent’s ability to attend to important entities by using multiple parallel attentions. Kataoka teaches model to gradually learn an earlier anticipation as training progresses where a loss function adaptively assigns penalty weights depending on how early the model can anticipate a traffic accident at each time epoch. Battaglia teaches an interaction network that can reason about how objects in a complex system interact. Duan teaches reinforcement learning implementing soft attention to generalize conditions and tasks unseen in training data. Thus, it would have been obvious to a person having ordinary skill in the art as of the effective filing date of the Applicant’s invention to modify the combination of Salakhutdinov, Seo, Choi, Kataoka and Battaglia pertaining to a deep reinforcement learning based on environment images having cells incorporating multi-focus attention networks with the soft attention embodiment of Duan.
The motivation for doing so is to broaden the policies to accomplish a variety of reinforcement learning tasks. (Duan, Abstract).
Regarding claim 5, the combination of Salakhutdinov, Seo, Choi, Kataoka, Battaglia, and Duan teaches all of the limitations of claim 4, as described above.
Battaglia teaches wherein each value network produces value vectors by applying a value linear transform to convolutional features (Battaglia, page 6, first full paragraph, teaches [t]he [relational] fR and [object] fO MLPs contained multiple hidden layers of linear transforms plus biases (each value network produces value vector by applying a value linear transform to entity data to convolutional features)).
9.	Claim 7 is rejected under 35 U.S.C. § 103 as being unpatentable over US Published Application 20180373982 to Salakhutdinov et al. [hereinafter Salakhutdinov] in view of Seo et al., “Bi-Directional Attention Flow for Machine Comprehension,” 24 Feb 2017 [hereinafter Seo], and Choi et al., “Multi-Focus Attention Network for Efficient Deep Reinforcement Learning,” AAAI (2017) [hereinafter Choi] and further in view of Kataoka et al., “Anticipating Traffic Accidents with Adaptive Loss and Large-Scale Incident DB,” 08 April 2018 [hereinafter Kataoka], and Mottaghi et al., “‘What happens if . . .’ Learning to Predict the Effect of Forces in Images,” (2016) [hereinafter Mottaghi].
Regarding claim 7, 15, and 22 the combination of Salakhutdinov, Seo, Choi, and Kataoka teaches all of the limitations of claim 1, as described above.
Though Salakhutdinov, Seo, Choi, and Kataoka teach the features of Deep Reinforcement Learning with an attention feature including a linear transformation with a multi-focus attention mechanism, the combination of Salakhutdinov, Seo, Choi, and Kataoka, however, does not explicitly teach -
wherein the non-linear function is a soft-max function.
But Mottaghi teaches wherein the non-linear function is a soft-max function (Mottaghi, page 7, Section 5.1, second full paragraph, teaches [output at each timestep is] ot = SoftMax(g(ht)) (the non-linear function is a soft-max function) . . . . ). 
Salakhutdinov, Seo, Choi, Kataoka, and Mottaghi are from the same or similar field of endeavor. Salakhutdinov teaches an agent comparing the features at the agent's particular position to a summary of the features stored throughout the memory architecture and writes the features that correspond to the summary to the coordinates in the memory architecture that correspond to the agent's position as the agent traverses the environment. Seo teaches machine comprehension responsive to a query and a context based on a bidirectional attention flow mechanism to obtain a query-aware context representation without early summarization. Choi teaches applying, in a reinforcement learning environment, a multi-focus attention network (AM Net) that enhances the agent’s ability to attend to important entities by using multiple parallel attentions. Kataoka teaches model to gradually learn an earlier anticipation as training progresses where a loss function adaptively assigns penalty weights depending on how early the model can anticipate a traffic accident at each time epoch. Mottaghi teaches a deep neural network model that learns long-term sequential dependencies of object movements while taking into account the geometry and appearance of the scene. Thus, it would have been obvious to a person having ordinary skill in the art as of the effective filing date of the applicant’s invention to modify the teachings of the combination of Salakhutdinov, Seo, Choi, and Kataoka pertaining to an deep reinforcement learning including a plurality of spatially distinct cells with multi-focus attention network with the long-term sequential dependencies of object movement of Mottaghi.
The motivation for doing so is predict long term movement of objects as their reaction to external forces from a single image. (Mottaghi, Abstract).
Regarding claim 15, the combination of Salakhutdinov, Seo, Choi and Kataoka teaches all of the limitations of claim 1, as described above.
However, the combination of Salakhutdinov, Seo, Choi and Kataoka does not explicitly teach - 
wherein, for each cell, the respective convolutional features further comprise data indicative of a position of the cell in the input image.
But Mottaghi teaches -
wherein(Mottaghi, Fig. 4 & caption, teaches input to the model is a force image and an RGB-M image . . . [depicting a region of interest relating to an object] (for each entity, the respective entity data further comprises data indicative of a position of the cell in the input image)).
Salakhutdinov, Seo, Choi, Kataoka, and Mottaghi are from the same or similar field of endeavor. Salakhutdinov teaches an agent comparing the features at the agent's particular position to a summary of the features stored throughout the memory architecture and writes the features that correspond to the summary to the coordinates in the memory architecture that correspond to the agent's position as the agent traverses the environment. Seo teaches machine comprehension responsive to a query and a context based on a bidirectional attention flow mechanism to obtain a query-aware context representation without early summarization. Choi teaches applying, in a reinforcement learning environment, a multi-focus attention network (AM Net) that enhances the agent’s ability to attend to important entities by using multiple parallel attentions. Kataoka teaches model to gradually learn an earlier anticipation as training progresses where a loss function adaptively assigns penalty weights depending on how early the model can anticipate a traffic accident at each time epoch. Mottaghi teaches a deep neural network model that learns long-term sequential dependencies of object movements while taking into account the geometry and appearance of the scene. Thus, it would have been obvious to one of ordinary skill in the art as of the effective filing date of the applicant’s invention to modify the combination of Salakhutdinov, Seo, Choi, and Kataoka pertaining to an interaction network including a plurality of spatially distinct cells with multi-focus attention network with the CNNs as force tower and image tower respectively relating to object movement indicative of object position of Mottaghi.
The motivation for doing so is predict long term movement of objects as their reaction to external forces from a single image. (Mottaghi, Abstract).
Regarding claim 22, the combination of Salakhutdinov, Seo, Choi, and Kataoka teaches all of the limitations of claim 1, as described above.
However, the combination of Salakhutdinov, Seo, Choi and Kataoka does not explicitly teach wherein the output network comprises a rectified linear unit.
	But Mottaghi teaches wherein the output network comprises a rectified linear unit (Mottaghi, Fig. 4 & caption, teaches an output network that includes a [rectified linear unit (ReLU)] (output network comprises a rectified linear unit).
Salakhutdinov, Seo, Choi, Kataoka, and Mottaghi are from the same or similar field of endeavor. Salakhutdinov teaches an agent comparing the features at the agent's particular position to a summary of the features stored throughout the memory architecture and writes the features that correspond to the summary to the coordinates in the memory architecture that correspond to the agent's position as the agent traverses the environment. Seo teaches machine comprehension responsive to a query and a context based on a bidirectional attention flow mechanism to obtain a query-aware context representation without early summarization. Choi teaches applying, in a reinforcement learning environment, a multi-focus attention network (AM Net) that enhances the agent’s ability to attend to important entities by using multiple parallel attentions. . Kataoka teaches model to gradually learn an earlier anticipation as training progresses where a loss function adaptively assigns penalty weights depending on how early the model can anticipate a traffic accident at each time epoch. Mottaghi teaches a deep neural network model that learns long-term sequential dependencies of object movements while taking into account the geometry and appearance of the scene. Thus, it would have been obvious to one of ordinary skill in the art as of the effective filing date of the applicant’s invention to modify the teachings of the combination of Salakhutdinov, Seo, Choi and Kataoka pertaining to an interaction network including a plurality of spatially distinct cells with multi-focus attention network with the CNNs as force tower and image tower respectively relating to object movement indicative of object position of Mottaghi.
The motivation for doing so is that RNNs composed of ReLUs and initialized with identity weight matrix are as powerful as standard LSTMs. (Mottaghi, page 7, second full paragraph).
10.	Claims 24-26 are rejected under 35 U.S.C. § 103 as being unpatentable over US Published Application 20180373982 to Salakhutdinov et al. [hereinafter Salakhutdinov] in view of Seo et al., “Bi-Directional Attention Flow for Machine Comprehension,” 24 Feb 2017 [hereinafter Seo], and Choi et al., “Multi-Focus Attention Network for Efficient Deep Reinforcement Learning,” AAAI (2017) [hereinafter Choi], Kataoka et al., “Anticipating Traffic Accidents with Adaptive Loss and Large-Scale Incident DB,” 08 April 2018 [hereinafter Kataoka], and US Published Application 20190266489 to Hu et al. [hereinafter Hu].
Regarding claim 24, the combination of Salakhutdinov, Seo, Choi, and Kataoka teaches all of the limitations of claim 1, as described above.
However, the combination of Salakhutdinov, Seo, Choi, and Kataoka does not explicitly teach -
wherein the output network is configured to generate a policy defining a distribution of respective probability values for each action of a space of possible actions, and select the action stochastically using the policy.
But Hu teaches -
wherein the output network is configured to generate a policy defining a distribution of respective probability values for each action of a space of possible actions (Hu ¶ 0054 teaches [a] policy (n) may be a strategy (distribution of respective probability values) employed to determine the next action for the agent based on the current state (for each action of a space of possible actions)), and select the action stochastically using the policy (Hu ¶ 0060 teaches [t]he agent may select one action from a set of available actions (select the action stochastically using the policy), which results in a new state and a new reward for a subsequent time step).
	Salakhutdinov, Seo, Choi, Kataoka, and Hu are from the same or similar field of endeavor. Salakhutdinov teaches an agent comparing the features at the agent's particular position to a summary of the features stored throughout the memory architecture and writes the features that correspond to the summary to the coordinates in the memory architecture that correspond to the agent's position as the agent traverses the environment. Seo teaches machine comprehension responsive to a query and a context based on a bidirectional attention flow mechanism to obtain a query-aware context representation without early summarization. Choi teaches applying, in a reinforcement learning environment, a multi-focus attention network (AM Net) that enhances the agent’s ability to attend to important entities by using multiple parallel attentions. Kataoka teaches model to gradually learn an earlier anticipation as training progresses where a loss function adaptively assigns penalty weights depending on how early the model can anticipate a traffic accident at each time epoch. Hu teaches interaction-aware decision making may include generating a multi-goal, multi-agent, multi-stage, interaction-aware decision making network policy based on the first agent neural network and the second agent neural network. Thus, it would have been obvious to one of ordinary skill in the art as of the effective filing date of applicant’s invention to modify the combination of Salakhutdinov, Seo, Choi, and Kataoka pertaining to an interaction network of a multi-cell image environment with the multi-goal, multi-agent, multi-stage, interaction-aware decision making network policy of Hu.
	The motivation for doing so is for a decentralized policy trained using a double critic, including a decentralized value function for learning local objectives and a centralized action-value function for learning cooperation, thereby enabling local objectives or goals to be considered while also considering cooperation between N number of agents by showing two equivalent views of the policy gradient and implementing the new actor-critic or agent-critic adaptation (Hu ¶ 0085).
Regarding claim 25, the combination of Salakhutdinov, Seo, Choi, Kataoka, and Hu teaches all of the limitations of claim 24, as described above.
Hu teaches wherein the output network is arranged to generate one or more action-related arguments, whereby the agent can perform the selected action based on the action-related arguments (Hu ¶ 0060 teaches [t]he agent may select one action from a set of available actions (whereby the agent can perform the selected action based on the action-related arguments), which results in a new state and a new reward for a subsequent time step. The goal of the agent is generally to collect the greatest amount of rewards possible (output network is arranged to generate one or more action-related arguments)).
	Salakhutdinov, Seo, Choi, Kataoka, and Hu are from the same or similar field of endeavor. Salakhutdinov teaches an agent comparing the features at the agent's particular position to a summary of the features stored throughout the memory architecture and writes the features that correspond to the summary to the coordinates in the memory architecture that correspond to the agent's position as the agent traverses the environment. Seo teaches machine comprehension responsive to a query and a context based on a bidirectional attention flow mechanism to obtain a query-aware context representation without early summarization. Choi teaches applying, in a reinforcement learning environment, a multi-focus attention network (AM Net) that enhances the agent’s ability to attend to important entities by using multiple parallel attentions. Kataoka teaches model to gradually learn an earlier anticipation as training progresses where a loss function adaptively assigns penalty weights depending on how early the model can anticipate a traffic accident at each time epoch. Hu teaches interaction-aware decision making may include generating a multi-goal, multi-agent, multi-stage, interaction-aware decision making network policy based on the first agent neural network and the second agent neural network. Thus, it would have been obvious to one of ordinary skill in the art as of the effective filing date of applicant’s invention to modify the combination of Salakhutdinov, Seo, Choi, and Kataoka pertaining to an interaction network of an image environment having a plurality of cells with the multi-goal, multi-agent, multi-stage, interaction-aware decision making network policy of Hu.
The motivation for doing so is for a decentralized policy trained using a double critic, including a decentralized value function for learning local objectives and a centralized action-value function for learning cooperation, thereby enabling local objectives or goals to be considered while also considering cooperation between N number of agents by showing two equivalent views of the policy gradient and implementing the new actor-critic or agent-critic adaptation (Hu ¶ 0085).
Regarding claim 26, the combination of Salakhutdinov, Seo, Choi, Kataoka, and Hu teaches all of the limitations of claim 25, as described above.
Salakhutdinov teaches -
wherein the action-related arguments comprise respective values for each of plurality of locations in an array having the same number of dimensions as the environment (Salakhutdinov ¶ 0043 teaches the present system and method has been described in the context of general applications to a DRL agent traversing a two - dimensional environment. In order to further illustrate the principles described herein, the present disclosure will now turn to particular implementations of the present system as utilized in the context of a two-dimensional and a three-dimensional maze-based environment wherein a memory is crucial for optimizing the behavior of a DRL agent (that is, the action-related arguments comprise respective values for each of plurality of locations in an array having the same number of dimensions as the environment)).
11.	Claims 17 and 30 are rejected under 35 U.S.C. § 103 as being unpatentable over US Published Application 20180373982 to Salakhutdinov et al. [hereinafter Salakhutdinov] in view of Seo et al., “Bi-Directional Attention Flow for Machine Comprehension,” 24 Feb 2017 [hereinafter Seo], and Choi et al., “Multi-Focus Attention Network for Efficient Deep Reinforcement Learning,” AAAI (2017) [hereinafter Choi], Kataoka et al., “Anticipating Traffic Accidents with Adaptive Loss and Large-Scale Incident DB,” 08 April 2018 [hereinafter Kataoka] and Per-Arne Andersen, “Deep Reinforcement Learning using Capsules in Advanced Game Environments,” University of Agder (Thesis, Jan 2018) [hereinafter Andersen].
Regarding claim 17, the combination of Salakhutdinov, Seo, Choi, and Kataoka teaches all of the limitations of claim 1, as described above.
However, the combination of Salakhutdinov, Seo, Choi and Kataoka does not explicitly teach -
wherein the output network comprises a max pooling layer for combining the respective final features for the plurality of cells.
But Andersen teaches -
wherein the output network comprises a max pooling layer for combining the respective final features for the plurality of cells (Andersen, page 14, 2.2.1 Pooling, second paragraph, teaches [t]here are several ways to perform pooling. Max and Average pooling are considered the most stable methods in whereas Max pooling is most used in state-of-the-art research).
Salakhutdinov, Seo, Choi, Kataoka, and Andersen are from the same or similar field of endeavor. Salakhutdinov teaches an agent comparing the features at the agent's particular position to a summary of the features stored throughout the memory architecture and writes the features that correspond to the summary to the coordinates in the memory architecture that correspond to the agent's position as the agent traverses the environment. Seo teaches machine comprehension responsive to a query and a context based on a bidirectional attention flow mechanism to obtain a query-aware context representation without early summarization. Choi teaches applying, in a reinforcement learning environment, a multi-focus attention network (AM Net) that enhances the agent’s ability to attend to important entities by using multiple parallel attentions. Kataoka teaches model to gradually learn an earlier anticipation as training progresses where a loss function adaptively assigns penalty weights depending on how early the model can anticipate a traffic accident at each time epoch. Andersen teaches a model-free RL technique for solving difficult game environments, extending Deep Q-learning to combines RL and ANN (Artificial Neural Networks). Thus, it would have been obvious to a person having ordinary skill in the art as of the effective filing date of the Applicant’s invention to modify Salakhutdinov, Seo, Choi and Kataoka pertaining to an interaction network with the image with cells of an environment and further with the max pooling of Andersen.
The motivation for doing so is because pooling reduces the number of parameters to optimize, thus decreasing the computational requirement of the system. (Andersen, page 15, 2.2.1 Pooling, second paragraph).
Regarding claim 30, the combination of Salakhutdinov, Seo, Choi, and Kataoka teaches all of the limitations of claim 27, as described above. 
However, the combination of Salakhutdinov, Seo, Choi and Kataoka does not explicitly teach -
wherein selecting the action to be performed comprises processing the respective final features using an output neural network comprising a max pooling layer for combining the respective final features for the plurality of cells.
But Andersen teaches -
wherein selecting the action to be performed comprises processing the respective final features using an output neural network comprising a max pooling layer for combining the respective final features for the plurality of cells. (Andersen, page 14, 2.2.1 Pooling, second paragraph, teaches [t]here are several ways to perform pooling. Max and Average pooling are considered the most stable methods in whereas Max pooling is most used in state-of-the-art research).
Salakhutdinov, Seo, Choi, and Andersen are from the same or similar field of endeavor. Salakhutdinov teaches an agent comparing the features at the agent's particular position to a summary of the features stored throughout the memory architecture and writes the features that correspond to the summary to the coordinates in the memory architecture that correspond to the agent's position as the agent traverses the environment. Seo teaches machine comprehension responsive to a query and a context based on a bidirectional attention flow mechanism to obtain a query-aware context representation without early summarization. Choi teaches applying, in a reinforcement learning environment, a multi-focus attention network (AM Net) that enhances the agent’s ability to attend to important entities by using multiple parallel attentions. Kataoka teaches model to gradually learn an earlier anticipation as training progresses where a loss function adaptively assigns penalty weights depending on how early the model can anticipate a traffic accident at each time epoch. Andersen teaches a model-free RL technique for solving difficult game environments, extending Deep Q-learning to combines RL and ANN (Artificial Neural Networks). Thus, it would have been obvious to a person having ordinary skill in the art as of the effective filing date of the Applicant’s invention to modify Salakhutdinov, Seo, Choi, and Kataoka pertaining to an interaction network with the image with cells of an environment and further with the max pooling of Andersen.
The motivation for doing so is because pooling reduces the number of parameters to optimize, thus decreasing the computational requirement of the system. (Andersen, page 15, 2.2.1 Pooling, second paragraph).
Response to Arguments
12.	Applicant’s arguments have been fully considered but are moot in view of the new grounds of rejection in view of Applicant’s amendments. 
13.	Applicant argues “Applicant respectfully submits that the cited portions of Salakhutdinov, Seo, and Choi do not disclose or suggest these features of amended claim 1.” In particular, Applicant argues that “the ‘deep convolutional network’ of Salakhutdinov operates on a ‘neural map.’ With respect to the ‘neural map,’ Salakhutdinov explicitly teaches that ‘[t]he present system can include a memory architecture, referred to throughout this disclosure as a 'neural map,' which may serve as the internal memory storage of a DRL agent." Id, emphasis added, at paragraph 0019.” (Response at pp. 12-13).
Examiner points out that Applicant appears to argue limitations not present in the claim (such as a Deep Convolutional Network). In this regard, in response to applicant’s argument that the references fail to show certain features of applicant’s invention, it is noted that the features upon which applicant relies (i.e., a deep convolutional network) are not recited in the rejected claim(s). Although the claims are interpreted in light of the specification, limitations from the specification are not read into the claims. See In re Van Geuns, 988 F.2d 1181, 26 USPQ2d 1057 (Fed. Cir. 1993).
Applicant’s claim 1 recites, in the preamble, “[a] computer-implemented neural network system for reinforcement learning . . . .” (Claim 1, lines 1-2). The system is depicted in Fig. 2 of the Specification:

    PNG
    media_image6.png
    543
    462
    media_image6.png
    Greyscale

(Specification ¶ 0032 & Fig. 2). Salakhutdinov teaches, for example, “embodiments of the present invention can be used to improve many different types of machine learning systems, including deep neural networks, in a variety of applications.” (Salakhutdinov ¶ 0049). The BRI of Applicant’s computer-implemented neural network system for reinforcement learning covers the teachings of Salakhutdinov.
Examiner agrees that the cited prior art references of Salakhutdinov, Seo, and Choi do not explicitly teach the amended language of “for each cell in the image that is captured by one or more sensors of the agent or one or more sensors that are located separately from the agent in the environment.” Examiner points to Kataoka as teaching this feature as set out in detail in the rejections hereinabove.
14.	 Applicant argues the prior art reference of “Choi cannot disclose or suggest “generating respective salience values for each of the plurality of cells based on using at least the query vector that is generated as output by the at least one query network.” (Response at p. 15). 
Examiner respectfully disagrees. The Specification recites:
For each entity, the attention block may generate h query vectors by inputting the respective entity data to h respective query networks. Likewise, for each entity the attention block may generate h key vectors by inputting the respective entity data to h respective key networks. To generate the salience values for the respective plurality of entities, the head section for a given entity may multiply the query vector for the given entity with the respective key vectors. The result may be normalized by a normalization factor which is a function of the number of components in the query vector and key vector (which is typically the same).
(Specification ¶ 0013 (Emphasis added)). Examiner points to Figure 2 of Choi in the rejections above, in which the “query vector” covers the “selector” of the feature extraction of the Choi model. Choi teaches that with feature extraction, “each agent required different selector vectors. For this purpose, we extract both key features and selector vectors from inputs of each agent.” (Choi, left column of p. 4, “Multi-focus Attention Network - Feature Extraction,” first paragraph). Moreover, whether there is a distinction between of a “key vector” or a “query vector” appears as a difference without distinction in view of Applicant’s specification.
Conclusion
15.	The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
(Vaswani et al., “Attention is All You Need,” NIPS (2017)) teaches a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely.
(Im et al, “Distance-based Self-Attention Network for Natural Language Inference,” 06 Dec 2017), teaches a Distance-based Self-Attention Network, which considers the word distance by using a simple distance mask in order to model the local dependency without losing the ability of modeling global dependency which attention has inherent.
16.	Any inquiry concerning this communication or earlier communications from the Examiner should be directed to KEVIN L. SMITH whose telephone number is (571) 272-5964. Normally, the Examiner is available on Monday-Thursday 0730-1730. 
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the Examiner by telephone are unsuccessful, the Examiner’s supervisor, KAKALI CHAKI can be reached on 571-272-3719. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/K.L.S./
Examiner, Art Unit 2122

/KAKALI CHAKI/Supervisory Patent Examiner, Art Unit 2122                                                                                                                                                                                                        


    
        
            
        
            
    

    
        1Salakhutdinov ¶ 0001 recites “The present application claims priority under 35 U.S.C. § 119 (e) to U.S. Provisional Patent Application No .62/524,183, titled NEURAL MAP, filed 23 June 2017, the disclosure of which is herein incorporated by reference in its entirety.” Appendix A of US Provisional Patent Application 62524183 is Parisotto et al., “Neural Map: Structured Memory for Deep Reinforcement Learning,” Carnegie Mellon University (2017) [hereinafter Appendix A].
        2 US Published Application 20190354885 to Li et al., entitled “Reinforcement learning using a relational network for generating data encoding relationships between entities in an environment,” filed 20 May 2016 [hereinafter PGPUB].
        3Salakhutdinov ¶ 0001 recites “The present application claims priority under 35 U.S.C. § 119 (e) to U.S. Provisional Patent Application No .62/524,183, titled NEURAL MAP, filed 23 June 2017, the disclosure of which is herein incorporated by reference in its entirety.” Appendix A of US Provisional Patent Application 62524183 is Parisotto et al., “Neural Map: Structured Memory for Deep Reinforcement Learning,” Carnegie Mellon University (2017) [hereinafter Appendix A].