DETAILED ACTION
Claim Rejections - 35 USC § 102
1.	The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.

Claim(s) 1-4, 7-8, 11, 14-16, 20, 23, 26, 29, and 33-34 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Hottinen Ari (WO 2012073059 A1).
Considering claims 1, 11, 23, 26, Hottinen teaches a method performed by a first wireless device, the first wireless device being served by a first wireless access point in a first wireless communications network, the first wireless communications network being operated by a first network operator, a method performed by a node of wireless communication network, the method comprising:
a processor and a memory (processor and memory, Fig.1-6, abstract), said memory containing instructions executable by said processor whereby said first wireless device is operative to
 acquiring a determination from a first reinforcement learning agent (reinforcement learning algorithm) of whether to roam from the first wireless access point to a second wireless access point in a second wireless communications network, the second wireless communications network being operated by a second network operator (pg.21, lines 9-13 processor 82 of the UE 70 may receive the data corresponding to the channels from network devices (e.g., eNB 72, eNB 73) of the network operators, pg.25, lines 28-34 a reinforcement learning algorithm implemented 
roaming from the first wireless access point to the second wireless access point, based on the determination (pg.17, lines 28-34, pg.18, lines 31-33 UE 70 may notify the network devices of the operators to make the decision for the UE 70 regarding the best way to optimize connectivity. In this regard, an eNB (e.g., eNB 72, eNB 73) or an AP (e.g., AP 62, AP 64) of an operator may evaluate the information regarding its own resources (e.g., different types of traffic/services being provide, congestion, etc.) as well as analyzing the resources of other operators to determine the best usage of the uplink and downlink channels).  
Considering claim 2, Hottinen teaches wherein the first reinforcement learning agent shares a reward function with a second reinforcement learning agent, the second reinforcement learning agent being associated with a second wireless device (pg.26, lines 3-14).  
Considering claim 3, Hottinen teaches wherein the first wireless device and the second wireless device form part of a group of wireless devices and the shared reward function is shared between the wireless devices in the group of wireless devices (pg.26, lines 3-14).  
Considering claim 4, Hottinen teaches wherein the devices in the group of wireless devices have at least one common connection parameter (pg.26, lines 3-14).  
Considering claim 7, Hottinen teaches wherein the first reinforcement learning agent receives a parameter indicative of a positive reward when one or more of: the second wireless access point is in a home network associated with the first wireless device; and/or roaming to the second wireless access point from the first wireless access point improves connectivity of the first wireless device (pg.25, line 16 to pg.26, line 15).  

 	Considering claims 14, 29, Hottinen teaches a method performed by a node of a wireless communications network, the method comprising: 
a processor and a memory (processor and memory, Fig.1-6, abstract), said memory containing instructions executable by said processor whereby said first wireless device is operative to
allocating a parameter indicative of a reward to a first reinforcement learning agent based on an action determined by the first reinforcement learning agent, the action comprising providing an instruction to a first wireless device served by a first wireless access point in a first wireless communications network operated by a first network operator (pg.21, lines 9-13 processor 82 of the UE 70 may receive the data corresponding to the channels from network devices (e.g., eNB 72, eNB 73) of the network operators, pg.24, line 29 to pg.25,  line 34 the UE 70 may assign a reward (e.g., a positive value) to the action resulting in the decision. In this regard, when the UE 70 subsequently encounters a similar situation with respect to the corresponding network operator and/or channel, the UE 70 may analyze the data (e.g., positive value) associated with the reward and may determine that it is acceptable to make a similar decision… a reinforcement learning algorithm implemented (e.g., by the processor 82 of UE 70, 
Considering claim 15, Hottinen teaches Attorney Docket: 3602-2015US1allocating the parameter indicative of a reward according to a first reward function (pg.25, line 16 to pg.26, line 15).  
Considering claim 16, Hottinen teaches wherein the first wireless device is part of a first group of wireless devices and wherein the method further comprises: allocating a parameter indicative of a reward to another wireless device in the first group of wireless devices using the first reward function (pg.24, line 28 to pg.26, line 15).  
Considering claim 20, Hottinen teaches a parameter indicative of a reward to a third reinforcement learning agent based on an action determined by the third reinforcement learning agent for a third wireless device, wherein the third wireless device is part of a second group of wireless devices and wherein the step of allocating a parameter indicative of a reward to the from the first reward function (pg.25, line 16 to pg.26, line 15).  
Considering claim 33, Hottinen teaches wherein the first reinforcement learning agent shares a reward function with a second reinforcement learning agent, the second reinforcement learning agent being associated with a second wireless device (pg.25, line 16 to pg.26, line 15).  
Considering claim 34, Hottinen teaches wherein the first reinforcement learning agent receives a parameter indicative of a negative reward (negative value) when one or more of: the first wireless device roams to the second wireless access point in the second network; roaming to the second wireless access point decreases the connectivity of the first wireless device; roaming leads to a loss of connectivity of the first wireless device; and when an inter-network operator handover procedure is performed (pg.24, line 29 to pg.26, line 15).  
Claim Rejections - 35 USC § 103
2.	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 12-13, 17, and 35 is/are rejected under 35 U.S.C. 103 as being unpatentable over Hottinen Ari (WO 2012073059 A1) in view of Wu Qui-hui (CN 103327556 A).
Considering claims 12, 35, Hottinen teaches wherein the first reinforcement learning agent implements a learning method (pg.24, lines 8-21). 

Wu teaches the Q-learning method (Abstract, [0008]).
Therefore, it would have been obvious to one of ordinary skill in the art before the
effective filling date of the claimed invention to combine or modify of Wu to Hottinen to provide a dynamic network selection method for optimizing user QoE in heterogeneous wireless network, combining the service type and user of transmitting current access network, access network dynamically updating period.
Considering claim 13, Hottinen and Wu further teach wherein the first reinforcement learning agent is associated with a plurality of wireless devices and wherein the method further comprises the first reinforcement learning agent updating a central Q-table based on actions performed by the plurality of wireless devices (Hottinen: pg.24, lines 8-21, Wu: abstract, [0008]).
Considering claim 17, Hottinen do not clearly teach wherein the wireless devices in the first group of wireless devices have at least one common connection parameter.  
Wu teaches the wherein the wireless devices in the first group of wireless devices have at least one common connection parameter (Abstract, [0003], claim 3).
Therefore, it would have been obvious to one of ordinary skill in the art before the
effective filling date of the claimed invention to combine or modify of Wu to Hottinen to provide a dynamic network selection method for optimizing user QoE in heterogeneous wireless network, combining the service type and user of transmitting current access network, access network dynamically updating period.
Conclusion
3.	Any inquiry concerning this communication or earlier communications from the examiner should be directed to KHAI MINH NGUYEN whose telephone number is (571)272-7923. The examiner can normally be reached 6-3.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Charles Appiah can be reached on 571-272-7904. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.






/KHAI M NGUYEN/Primary Examiner, Art Unit 2641