DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This Office Action is in response to communications received 02/25/2020.
Claims 1 – 21 are allowed.  
Examiner’s Amendment
An examiner’s amendment to the record appears below.  Should the changes and/or additions be unacceptable to applicant, an amendment may be filed as provided by 37 CFR 1.312.  To ensure consideration of such an amendment, it MUST be submitted no later than the payment of the issue fee. 
Authorization for this examiner’s amendment was given in a communication with Paul Otterstedt on June 29, 2021.

Claims:
1.  (Currently Amended)  A multi-agent cooperation decision-making and training method, comprising the following steps:
S1: encoding, by an encoder, local observations obtained by agents by using a multi-layer perceptron or a convolutional neural network as feature vectors in a receptive field;
S2: calculating, by a graph convolution layer, relationship strength between the agents by using a relationship unit of a multi-headed attention mechanism, integrating, by a relationship convolution kernel of the relationship unit, the feature vectors in the receptive field into new feature vectors, and iterating the graph convolution layer for multiple times to obtain a relationship description of the multi-headed attention mechanism in a larger receptive field and at a higher order;
S3: splicing the feature vectors in the receptive field and the new feature vectors integrated by the graph convolution layer, sending the spliced vectors to a value network, wherein the value network selects and performs an action decision with [[the]]a highest future feedback expectation; and
S4: storing a local observation set and related sets of the agents in a buffer region, collecting samples in the buffer region for training, and optimizing and rewriting a loss function.

3.  (Currently Amended)  The decision-making and training method according to claim 1, wherein in each layer of graph convolution operation, each agent acquires the feature vector in the receptive field through a communication channel;
the feature vectors of all the agents are spliced into one feature matrix Ft in a size of N x L,
wherein N is a total number of the agents in the environment, and L is a length of the feature vector;
an adjacent matrix            
                
                    
                        C
                    
                    
                        t
                    
                    
                        i
                    
                
            
         in a size of (K+1) x N is constructed for each agent i, and K is a[[the]]number of the agents in the receptive field, and t is the time;
the first line of the adjacent matrix            
                
                    
                        C
                    
                    
                        t
                    
                    
                        i
                    
                
            
         is expressed by a one-hot of an index of the agent i, and the residual jth line is expressed by a one-hot of an index of an agent j in the receptive field; and a feature vector set             
                
                    
                        C
                    
                    
                        t
                    
                    
                        i
                    
                
                ×
                
                    
                        F
                    
                    
                        t
                    
                
            
         in a local region of the agent i is obtained through point multiplication operation.

4.  (Currently Amended)  The decision-making and training method according to claim 3, wherein the relationship strength is expressed by:
            
                
                    
                        α
                    
                    
                        i
                        j
                    
                
                =
                
                    
                        e
                        x
                        p
                        ⁡
                        (
                        τ
                        
                            
                                [
                                W
                            
                            
                                q
                            
                        
                        
                            
                                h
                            
                            
                                i
                            
                        
                        ∙
                        
                            
                                
                                    
                                        
                                            
                                                W
                                            
                                            
                                                k
                                            
                                        
                                        
                                            
                                                h
                                            
                                            
                                                j
                                            
                                        
                                    
                                
                            
                            
                                T
                            
                        
                        ]
                        )
                    
                    
                        
                            
                                ∑
                                
                                    e
                                    ∈
                                    
                                        
                                            ε
                                        
                                        
                                            i
                                        
                                    
                                
                            
                            
                                e
                                x
                                p
                                ⁡
                                (
                                τ
                                
                                    
                                        [
                                        W
                                    
                                    
                                        q
                                    
                                
                                
                                    
                                        h
                                    
                                    
                                        i
                                    
                                
                                ∙
                                
                                    
                                        
                                            
                                                
                                                    
                                                        W
                                                    
                                                    
                                                        k
                                                    
                                                
                                                
                                                    
                                                        h
                                                    
                                                    
                                                        e
                                                    
                                                
                                            
                                        
                                    
                                    
                                        T
                                    
                                
                                ]
                                )
                            
                        
                    
                
            
        
where             
                
                    
                        α
                    
                    
                        i
                        j
                    
                
            
         is a strength relationship between an[[the]] agent i and an[[the]] agent j;             
                
                    
                        ε
                    
                    
                        i
                    
                
            
         is the local region of the agent i, and includes k adjacent agents and a center agent;             
                τ
            
         is a scale factor; hi represents a feature vector of the agent i; similarly, j and e represent the agents; T represents matrix transposition;             
                
                    
                        W
                    
                    
                        q
                    
                
            
         and             
                
                    
                        W
                    
                    
                        k
                    
                
            
         are respectively a query vector parameter and a key vector parameter of each attention head to be learnt; q is query; and k is key.

5.  (Currently Amended)  The decision-making and training method according to claim 4, wherein the new feature vectors generated by the multi-headed attention mechanism are weighted and averaged according to the relationship strength, and a feature vector             
                
                    
                        h
                    
                    
                        
                            
                                i
                            
                            
                                '
                            
                        
                    
                
            
         of this layer of graph convolution is obtained through a nonlinear transformation function             
                σ
            
        :
            
                
                    
                        h
                    
                    
                        
                            
                                i
                            
                            
                                '
                            
                        
                    
                
                =
                σ
                
                    
                        
                            
                                1
                            
                            
                                M
                            
                        
                        
                            
                                ∑
                                
                                    m
                                    =
                                    1
                                
                                
                                    M
                                
                            
                            
                                
                                    
                                        ∑
                                        
                                            j
                                            ∈
                                            
                                                
                                                    ε
                                                
                                                
                                                    i
                                                
                                            
                                        
                                    
                                    
                                        
                                            
                                                α
                                            
                                            
                                                i
                                                j
                                            
                                            
                                                m
                                            
                                        
                                        
                                            
                                                W
                                            
                                            
                                                v
                                            
                                            
                                                m
                                            
                                        
                                        
                                            
                                                h
                                            
                                            
                                                j
                                            
                                        
                                    
                                
                            
                        
                    
                
            
        
[[W]]where             
                
                    
                        W
                    
                    
                        v
                    
                
            
         is a value vector parameter of each attention head to be learnt; v is value, and M is the number of attention heads.

6.  (Currently Amended)  The decision-making and training method according to claim 5, wherein the value network generates an expected value of future feedback for each feasible action, and executes an action with a[[the]] highest expected value by a probability of             
                1
                -
                ϵ
            
        , or a random action by a probability of             
                ϵ
            
        ; and             
                ϵ
            
         represents an execution probability, and is more than or equal to 0 and less than or equal to 1.

7.  (Currently Amended)  The decision-making and training method according to claim 6, wherein after the value network executes each action, a quintuple            
                (
                O
                ,
                A
                ,
                
                    
                        O
                    
                    
                        '
                    
                
                ,
                R
                ,
                C
                )
            
         is stored in the buffer region;            
                 
                O
                =
                {
                
                    
                        o
                    
                    
                        1
                    
                
                ,
                
                    
                        o
                    
                    
                        2
                    
                
                ,
                …
                ,
                
                    
                        o
                    
                    
                        N
                    
                
                }
            
         represents a local observation set of the agents at a[[the]] current time step;             
                A
                =
                {
                
                    
                        a
                    
                    
                        1
                    
                
                ,
                
                    
                        a
                    
                    
                        2
                    
                
                ,
                …
                ,
                
                    
                        a
                    
                    
                        N
                    
                
                }
            
         represents an action set selected by the agents;             
                
                    
                        O
                    
                    
                        '
                    
                
                =
                {
                
                    
                        o
                    
                    
                        1
                    
                    
                        '
                    
                
                ,
                
                    
                        o
                    
                    
                        2
                    
                    
                        '
                    
                
                ,
                …
                ,
                
                    
                        o
                    
                    
                        N
                    
                    
                        '
                    
                
                }
            
         represents a local observation set of the agents at a[[the]]  next time step;             
                R
                =
                
                    
                        
                            
                                r
                            
                            
                                1
                            
                        
                        ,
                        
                            
                                r
                            
                            
                                2
                            
                        
                        ,
                        …
                        ,
                        
                            
                                r
                            
                            
                                N
                            
                        
                    
                
            
         represents a real-time environment feedback set obtained by the agents; and C represents a local connection structure of the agents.

8.  (Currently Amended)  The decision-making and training method according to claim 7, wherein the training is performed by using time series differential learning of Q-learning; a small set including S samples is randomly sampled from the buffer region at each time, and the loss function is optimized by a back propagation method:
            
                L
                
                    
                        θ
                    
                
                =
                
                    
                        1
                    
                    
                        S
                    
                
                
                    
                        ∑
                        
                            S
                        
                    
                    
                        
                            
                                1
                            
                            
                                N
                            
                        
                        
                            
                                ∑
                                
                                    i
                                    =
                                    1
                                
                                
                                    N
                                
                            
                            
                                
                                    
                                        
                                            
                                                
                                                    
                                                        y
                                                    
                                                    
                                                        i
                                                    
                                                
                                                -
                                                Q
                                                
                                                    
                                                        
                                                            
                                                                O
                                                            
                                                            
                                                                i
                                                            
                                                        
                                                        ,
                                                        
                                                            
                                                                a
                                                            
                                                            
                                                                i
                                                            
                                                        
                                                        ;
                                                        θ
                                                    
                                                
                                            
                                        
                                    
                                    
                                        2
                                    
                                
                            
                        
                    
                
            
        
            
                
                    
                        y
                    
                    
                        i
                    
                
                =
                
                    
                        r
                    
                    
                        i
                    
                
                +
                γ
                
                    
                        max
                    
                    
                        
                            
                                a
                            
                            
                                i
                            
                            
                                '
                            
                        
                    
                
                Q
                (
                
                    
                        O
                    
                    
                        i
                    
                    
                        '
                    
                
                ,
                
                    
                        a
                    
                    
                        i
                    
                    
                        '
                    
                
                ;
                
                    
                        θ
                    
                    
                        '
                    
                
                )
            
        
where             
                
                    
                        O
                    
                    
                        i
                    
                
            
         represents a local observation set of the agent i in the receptive field;             
                
                    
                        O
                    
                    
                        i
                    
                    
                        '
                    
                
            
         represents a local observation set of the agent i in the receptive field at the next time step;             
                
                    
                        a
                    
                    
                        i
                    
                    
                        '
                    
                
            
         represents an action of the agent i at the next time step;             
                γ
            
         is a discount factor;             
                θ
            
         is a current network parameter; and             
                
                    
                        θ
                    
                    
                        '
                    
                
            
         is a target network parameter;
the target network parameter is updated by using the following rule:
            
                
                    
                        θ
                    
                    
                        '
                    
                
                =
                β
                θ
                +
                
                    
                        1
                        -
                        β
                    
                
                
                    
                        θ
                    
                    
                        '
                    
                
            
        
            
                Β
            
         is a soft update superparameter.

9.  (Currently Amended)  The decision-making and training method according to claim 8, wherein a regular term, which is a KL divergence represented by a relationship at a higher order in two continuous steps, is added into the loss function, and the loss function is rewritten as:
            
                L
                
                    
                        θ
                    
                
                =
                
                    
                        1
                    
                    
                        S
                    
                
                
                    
                        ∑
                        
                            S
                        
                    
                    
                        
                            
                                1
                            
                            
                                N
                            
                        
                        
                            
                                ∑
                                
                                    i
                                    =
                                    1
                                
                                
                                    N
                                
                            
                            
                                
                                    
                                        
                                            
                                                
                                                    
                                                        y
                                                    
                                                    
                                                        i
                                                    
                                                
                                                -
                                                Q
                                                
                                                    
                                                        
                                                            
                                                                O
                                                            
                                                            
                                                                i
                                                            
                                                        
                                                        ,
                                                        
                                                            
                                                                a
                                                            
                                                            
                                                                i
                                                            
                                                        
                                                        ;
                                                        θ
                                                    
                                                
                                            
                                        
                                    
                                    
                                        2
                                    
                                
                            
                        
                    
                
                +
                λ
                
                    
                        D
                    
                    
                        K
                        L
                    
                
                
                    
                        R
                        
                            
                                
                                    
                                        O
                                    
                                    
                                        i
                                    
                                
                                ;
                                θ
                            
                        
                        |
                        |
                        R
                        
                            
                                
                                    
                                        O
                                    
                                    
                                        i
                                    
                                    
                                        '
                                    
                                
                                ;
                                θ
                            
                        
                    
                
            
        
[[W]]where             
                
                    
                        D
                    
                    
                        K
                        L
                    
                
                (
                 
                 
                |
                |
                 
                 
                )
            
         is a KL divergence calculation function; and             
                R
                
                    
                        
                            
                                O
                            
                            
                                i
                            
                        
                        ;
                        θ
                    
                
            
         is an attention parameter distribution represented by a relationship of the agent i on a certain convolution layer.
Reasons for Allowance
The following is an examiner's statement of reasons for allowance: Independent claim 1, when considered as a whole, is allowable over the prior art of record. Specifically, the prior art of record fails to clearly teach or fairly suggest the combination of the following limitations, as recited in independent claim 1:
calculating, by a graph convolution layer, relationship strength between the agents by using a relationship unit of a multi-headed attention mechanism, integrating, by a relationship convolution kernel of the relationship unit, the feature vectors in the receptive field into new feature vectors, and iterating the graph convolution layer for multiple times to obtain a relationship description of the multi-headed attention mechanism in a larger receptive field and at a higher order;
S3: splicing the feature vectors in the receptive field and the new feature vectors integrated by the graph convolution layer, sending the spliced vectors to a value network, wherein the value network selects and performs an action decision with a highest future feedback expectation.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to CRAIG C DORAIS whose telephone number is (571)270-3371. The examiner can normally be reached M-S 6:00 - 10:00am.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Hyung Sough can be reached on 571-272-6799. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/CRAIG C DORAIS/Primary Examiner, Art Unit 2194