DETAILED ACTION
Allowable Subject Matter
1.	Claims 1, 3-4, 7-8, 11-14, 16-17, 20, 23, 26, 29, and 34-35 are allowed.
	Hottinen (WO 2012073059 A1) teaches a processor and a memory storing executable computer program code that causes the apparatus to at least perform operations including determining analyzing data related to at least one attribute of the apparatus and information received from at least one network device. The computer program code may cause the apparatus to enable utilization of at least one first resource of a network operator to facilitate communications in a first direction. The computer program code may further cause the apparatus to enable utilization of at least one second resource of a different network operator to facilitate communications in a different direction. 
 Wu, Qi-hui (CN 103327556 A) teaches dynamic network selection method for optimizing user QoE in a heterogeneous wireless network, combining the service type and user of transmitting current access network, dynamically updating access network period; the method comprises the following steps: establishing three service types of user QoE requirement function and variable initialization step of Q-learning, using Q-learning method for network selection policy and executing the step of switching, and Q-learning method in variable updating step. The invention starts from the view of the user, distinguish different traffic characteristics, optimized QoE; of the user the invention claims a dynamic network selection method capable of efficiently using heterogeneous wireless network resource.
The prior art of record fails to anticipate or render obvious the limitations of the claims.
The cited art on record, individually or combination, fail to explicitly disclose the combination of the following limitations: “acquiring a determination from a first reinforcement learning agent of whether to roam from the first wireless access point to a second wireless access point in a second wireless communications network, the second wireless communications network being operated by a second network operator, wherein the determination is based at least partially on a reward function that is also shared from the first reinforcement learning agent to a second reinforcement learning agent that is associated with a second wireless device; and roaming from the first wireless access point to the second wireless access point, based on the determination.”. Furthermore, it would not have been obvious to one of ordinary skill in the art to modify the prior in order to arrive at claimed invention.
The cited art on record, individually or combination, fail to explicitly disclose the combination of the following limitations: “allocating a parameter indicative of a reward according to a first reward function to a first reinforcement learning agent based on an action determined by the first reinforcement learning agent, the action comprising providing an instruction to a first wireless device served by a first wireless access point in a first wireless communications network operated by a first network operator, the instruction instructing the first wireless device to roam from the first wireless access point to a second wireless access point in a second wireless communications network operated by a second network operator; and allocating a parameter indicative of the reward to a second wireless device using the first reward function.”. Furthermore, it would not have been obvious to one of ordinary skill in the art to modify the prior in order to arrive at claimed invention.
	Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee.  Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance.”
Conclusion
2.	Any inquiry concerning this communication or earlier communications from the examiner should be directed to KHAI MINH NGUYEN whose telephone number is (571)272-7923.  The examiner can normally be reached on 6-3.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.  
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Charles Appiah can be reached on 571-272-7904.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/KHAI M NGUYEN/
Primary Examiner, Art Unit 2641