Examiner’s Amendment
Examiner called Applicant’s attorney, Mr. Risenmay Reed, registration # 75,395 on 02 May 2022 regarding to claim amendment regarding the deleted claimed elements of “in claims 1, 5, 8, 13, 14 that are described in Applicant’s specification. Today, 04 May 2022, Mr. Reed grants the Examiner’s amendment to said elements claims as follows: 
1. (Currently Amended) A computing device for controlling uplink transmission power of a plurality of terminal devices in a plurality of cells,
wherein each terminal device is configured to determine uplink transmission power based on at least a target received power per physical resource block, PRB, for full pathloss compensation and a pathloss compensation coefficient, the computing device comprising at least one processor; and at least one memory including computer program code, said at least one memory and computer program code being configured, with said at least one processor, to cause the computing device to perform:
maintaining, in a database, information on data traffic in the plurality of cells involving the plurality of terminal devices;
initializing a deep Q-learning network in which 
- a state is defined as a set of pairs of the target received power per PRB for full pathloss compensation and the pathloss compensation coefficient, wherein each pair corresponds to one of the plurality of cells, 
- an action in a given state is defined as a selection of valid values of the target received power per PRB for full pathloss compensation and the pathloss compensation coefficient for a pair currently having invalid values and 
- a reward of taking an action is calculated based on the information on the data traffic in the plurality of cells so as to optimize overall uplink performance over all of the plurality of cells;
training the deep Q-learning network with a plurality of random states and a plurality of random actions to approximate a Q value function, wherein each random state comprises initially a pre-defined number of pairs of the target received power per PRB for full pathloss compensation and the pathloss compensation coefficient having random valid values with a rest of the pairs in each random state having invalid values, said rest of the pairs comprising at least one pair;
determining, for each cell, an optimal target received power per PRB for full pathloss compensation and an optimal pathloss compensation coefficient based on the Q value function; and
causing transmitting optimized values of the target received power per PRB for full pathloss compensation and the pathloss compensation coefficient to the plurality of access nodes for further transmission to the plurality of terminal devices.  
2. (Previously Presented) The computing device of claim 1, wherein the information on data traffic may comprise, for each cell of the plurality of cells, information on one or more of the following: traffic density, user distribution, configuration of terminal devices within said cell, channel characteristics, intra-cell interference and inter-cell interference.  
3. (Previously Presented) The computing device of claim 1, wherein said at least one memory and computer program code are further configured, with said at least one processor, to cause the computing device to perform: receiving further information on data traffic in the plurality of cells involving the plurality of terminal devices; storing the further information on data traffic to the database to supplement or update the information on data traffic maintained in the database; and re-optimizing the deep Q-learning network to take into account the further information by repeating the training, the determining and the causing transmitting.  
4. (Previously Presented) The computing device according to claim 1, wherein the Q value function for a state and an action in the deep Q-learning network is defined as a sum of the reward for said state and action and a maximum cumulative reward of all states and actions following from said state when said action is performed.  
5. (Currently Amended) The computing device according to claim 1, wherein the training of the deep Q-learning network comprises:
a) generating a set of random states and a set of possible random actions in each random state in the set of random states;
b) calculating, using the set of random states and the sets of possible actions as input, a target Q value function as 1-step iterations of the Bellman equation  
    PNG
    media_image1.png
    60
    678
    media_image1.png
    Greyscale
 wherein Q(sn, akn) is the target Q value function sn+1 is a state following an initial state sn when an action akSn is taken, ak Sn+1 is an action performed in the state sn+1  r(sn, ak sn) is a reward from taking the action akSn in the state sn , Asn+1 is an action space comprising all possible actions ak Sn+1  and  max Q(sn+1, akn+') is calculated by using a deep neural 
{ ak Sn+1  ϵ Asn+1 } 
network to evaluate all possible Q values

    PNG
    media_image2.png
    37
    432
    media_image2.png
    Greyscale

c) feeding the set of random states and the sets of possible random actions to the deep neural network to produce an approximate Q value function Q (s, a), wherein s is an initial state and a is an action performed in the initial state;
d) evaluating a mean squared error between the approximate Q value function and the target Q value function;
e) updating weights of the deep neural network to minimize the mean squared error;
f) repeating steps c) and e) with the same set of random states and the same sets possible random actions until a pre-defined number of repetitions is reached; and
g) repeating steps a) to f) with a different set of random states and corresponding sets of possible random actions generated each time until the mean squared error between the approximate Q value function and the target Q value function is detected to converge.
6. (Original) The computing device of claim 5, wherein the calculating of the 1-step iterations of the Bellman equation comprises:
calculating the reward r(sn, ak sn), using a lookup table maintained in the database or using online calculation;
calculating    max    Q(sn+1 , ak sn+1)   by generating, for each combination of
{ ak sn+1  ϵ Asn+1 }
an initial state sn and an action ak sn performed in the initial state, a following state sn+1 and all allowed actions ak sn+1  ϵ Asn+1  in said following state sn+1 feeding said following state and all of said allowed actions to the neural network and taking a maximum of Q values produced as outputs of the neural network; and
calculating a sum of r(sn, ak sn)   and     max    Q(sn+1 , ak sn+1)
{ ak sn+1  ϵ Asn+1 }
7. (Previously Presented) The computing device according to claim 1, wherein the determining of the optimal target received power per PRB for full pathloss compensation and the optimal pathloss compensation coefficient comprises:
1) generating a zero state in which the target received power per PRB for full pathloss compensation and the pathloss compensation coefficient for all of the plurality of cells have invalid values, wherein the zero state is defined as a current state;
2) inputting the current state along with all possible actions in said current state into the deep Q-learning network to produce as an output a plurality of Q values for the current state;
3) finding, from the plurality of Q values for the current state, an optimal action which is an action which when taken in the current state leads to a maximum Q value of the plurality of Q values and associated optimal values for the target received power per PRB for full pathloss compensation and the pathloss compensation coefficient;
4) setting said optimal values for the next cell in sequence for which valid values have not yet been defined;
5) setting the state following the optimal action taken in the current state as the current state; and
6) repeating steps 2) to 5) until optimal values for all of the plurality of cells have been determined.  
8. (Currently Amended) The computing device according to claim 1, wherein valid values of the target received power per PRB for full pathloss compensation and the pathloss compensation coefficient are limited to a pre-defined range of values with a pre-defined spacing defined separately for the target received power per PRB for full pathloss compensation and the pathloss compensation coefficient, an invalid value of the target received power per PRB for full pathloss compensation is defined as -∞ dBm and an invalid value of the pathloss compensation coefficient is defined as null.  
9. (Previously Presented) The computing device according to claim 1, wherein the reward is defined as a difference of a sum-utility of all cells in the plurality of cells for which valid values have been set including the new cell and sum-utility of all cells in the plurality of cells for which valid values have been set excluding the new cell, the sum-utility being defined as a geometric mean or a sum of terminal device throughputs of terminal devices in one or more cells for which valid values have been set or as a sum of signal to interference and noise ratios of terminal devices in one or more cells for which valid values have been set.  
10. (Previously Presented) The computing device according to claim 9, wherein the sum-utility is calculated as:

    PNG
    media_image3.png
    103
    862
    media_image3.png
    Greyscale
wherein,  
    PNG
    media_image4.png
    115
    1044
    media_image4.png
    Greyscale
 Pj (x, y, z) is a traffic density of a given (x, y, z) coordinate point for a cell j, σ2 is the thermal noise variance, Gi (x, y, z) is the channel gain to a presumed terminal device at the point (x, y, z) in the cell i to the serving cell i, Pi (x, y, z) is the transmit power of a terminal device at the point (x, y, z) served by the cell i dependent on values of the target received power per PRB for full pathloss compensation P0,i and the pathloss compensation coefficient αi for the cell i and -∞ and null are, respectively, invalid values for the target received power per PRB for full pathloss compensation and the pathloss compensation coefficient.
 11. (Previously Presented) The computing device according to claim 1, wherein each terminal device of the plurality of terminal devices is configured to determine uplink transmission power as a minimum of a maximum transmission power configured for said terminal device in decibels and a sum of two or more terms in decibels, the two or more terms comprising at least the target received power per PRB for full pathloss compensation in decibels and the pathloss compensation coefficient multiplied by a downlink pathloss calculated by said terminal device in decibels.  
12. (Previously Presented) The computing device according to claim 1, wherein the computing device is a network element for a core network.  
13. (Currently Amended)  A method for controlling uplink transmission power of a plurality of terminal devices in a plurality of cells, wherein each terminal device is configured to determine uplink transmission power based on at least a target received power per physical resource block, PRB, for full pathloss compensation and a pathloss compensation coefficient, the method comprising:
maintaining, in a database, information on data traffic in the plurality of cells involving the plurality of terminal devices;
initializing a deep Q-learning network in which
- a state is defined as a set of pairs of the target received power per PRB for full pathloss compensation and the pathloss compensation coefficient, wherein each pair corresponds to one of the plurality of cells,
- an action in a given state is defined as a selection of valid values of the target received power per PRB for full pathloss compensation and the pathloss compensation coefficient for a pair currently having invalid values and 
- a reward of taking an action is calculated based on the information on the data traffic in the plurality of cells so as to optimize overall uplink performance over all of the plurality of cells;
training the deep Q-learning network with a plurality of random states and a plurality of random actions to approximate a Q value function, wherein each random state comprises initially a pre-defined number of pairs of the target received power per PRB for full pathloss compensation and the pathloss compensation coefficient having random valid values with a rest of the pairs in each random state having invalid values, said rest of the pairs comprising at least one pair;
determining, for each cell, an optimal target received power per PRB for full pathloss compensation and an optimal pathloss compensation coefficient based on the Q value function; and
causing transmitting optimized values of the target received power per PRB for full pathloss compensation and the pathloss compensation coefficient to the plurality of access nodes for further transmission to the plurality of terminal devices.  
14. (Currently Amended) A non-transitory computer readable medium stores instructions, which when executed by at least one processor, causes a computing device including the processor to perform at least the following:
initializing a deep Q-learning network in which 
- a state is defined as a set of pairs of a target received power per physical resource block, PRB, for full pathloss compensation and a pathloss compensation coefficient, wherein each pair corresponds to one of a plurality of cells,
- an action in a given state is defined as a selection of valid values of the target received power per PRB for full pathloss compensation and the pathloss compensation coefficient for a pair currently having invalid values and 
- a reward of taking an action is calculated based on information on data traffic in the plurality of cells by a plurality of terminal devices so as to optimize overall uplink performance over all of the plurality of cells, the information on data traffic being maintained in a database; 
training the deep Q-learning network with a plurality of random states and a plurality of random actions to approximate a Q value function, wherein each random state comprises initially a pre-defined number of pairs of the target received power per PRB for full pathloss compensation and the pathloss compensation coefficient having random valid values with a rest of the pairs in each random state having invalid values, said rest of the pairs comprising at least one pair; 
determining, for each cell, an optimal target received power per PRB for full pathloss compensation and an optimal pathloss compensation coefficient based on the Q value function; and 
causing transmitting optimized values of the target received power per PRB for full pathloss compensation and the pathloss compensation coefficient to the plurality of access nodes for further transmission to a plurality of terminal devices, wherein each terminal device is configured to determine uplink transmission power based on at least the target received power per PRB for full pathloss compensation and the pathloss compensation coefficient.  
---------------------------------





Reasons for Allowance
The following is an examiner’s statement of reasons for allowance:
The primary reasons for allowance of the claims are the inclusion of Applicant’s remarks on pages 8-11 received on 15 February 2022 and the prior art of records do not disclose said underlined amended claimed elements of 
“initializing a deep Q-learning network in which 
- a state is defined as a set of pairs of a target received power per physical resource block, PRB, for full pathloss compensation and a pathloss compensation coefficient, wherein each pair corresponds to one of a plurality of cells,
- an action in a given state is defined as a selection of valid values of the target received power per PRB for full pathloss compensation and the pathloss compensation coefficient for a pair currently having invalid values and 
- a reward of taking an action is calculated based on information on data traffic in the plurality of cells by a plurality of terminal devices so as to optimize overall uplink performance over all of the plurality of cells, the information on data traffic being maintained in a database; 
training the deep Q-learning network with a plurality of random states and a plurality of random actions to approximate a Q value function, wherein each random state comprises initially a pre-defined number of pairs of the target received power per PRB for full pathloss compensation and the pathloss compensation coefficient having random valid values with a rest of the pairs in each random state having invalid values, said rest of the pairs comprising at least one pair” in independent claims 1, 13, 14.
CALBRESE et al. WO 2018/068857 A1 does not disclose said bold elements in independent claims 1, 13, 14.
LEE et al. US 2016/0135128 A1 does not disclose said bold elements in independent claims 1, 13, 14. 
Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee.  Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance.”
Claims 1-14 are allowed.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to HAI V NGUYEN whose telephone number is (571)272-3901. The examiner can normally be reached M-F 8:00AM -5:00PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kevin Pan can be reached on 571-272-7855. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center/ for more information about Patent Center and https://www.uspto.gov/patents/docx/ for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/HAI V NGUYEN/Primary Examiner, Art Unit 2649