DETAILED ACTION   
Claims 1-20 are pending.
Claims 1, 6, 8, 9, 14, 16 and 17 are amended.
No Claim(s) is/are canceled.
No Claim(s) is/are added.
Claims 1-20 are rejected. This rejection is made FINAL.

Response to Arguments
Applicant's arguments filed on 01/12/2021 have been fully considered but are moot in view of the new ground of rejection.
35 U.S.C. §112 Rejections
The rejections to claims 1-20 under 35 U.S.C. §112(b) are withdrawn in view of the applicant’s amendment.

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 12/04/2020 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered if signed and initialed by the Examiner.

Claim Objections
Claim 9 is objected to under 37 CFR 1.75(c) because of the informalities.  It is suggested to include “when executed by a processor”.

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale or otherwise available to the public before the effective filing date of the claimed invention.

Claims 1, 3, 4, 8, 9, 11, 12, 16, 17, 19 and 20 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by British Telecomm (EP 1843283 A1) hereinafter “British”.  The reference “British” was cited in IDS filed on 12/04/2020.
As per claim 1, British discloses a system configured to optimize Border Gateway Protocol (BGP) traffic in a telecommunications network ([0003], computer networks are controlled by different organizations is governed by the Border Gateway Protocol (BGP)) the system comprising: 
a network interface configured for communication with the telecommunication network  ([0013-0014], FIG. 1)
a processing device interconnected with the network interface ([0013-0014], FIG. 1)
and a memory device configured to store instructions that (Claim 15, a storage medium storing computer interpretable instructions to cause a programmable computer to become configured…), when executed, enable the processing device to 
perform an action in the telecommunications network when one or more inter-Autonomous Systems (AS) links are in a current state ([0064], the selection module 22 is caused select one of the actions as an appropriate action to undertake. The selection module 22 then proceeds to set the tariff data 15 on the basis of the selected action; see [0061], Having determined the current state of the communications system 3-1; ...The selection module 22 then (S9) proceeds to select a tariff 15 to be set for utilizing the communications network)
wherein the action in the telecommunication network is configure to have an effect on BGP traffic flow on the one or more inter-AS links ([0058], where traffic is measure volume of data carried by a link in the communications network 3-1)
determine a metric based on the effect of the action on the BGP traffic flow to determine an updated current state of the one or more inter-AS links ([0057], The monitoring module 24 then (s8) acquires data directly from the communications network 3-1; ...; 3-4 associated with the control computer 5-1; ...; 5-4 to determine a current state for the network 3-1; ...; 3-4)
and utilize the metric to perform a further action to achieve one or more rewards associated with the one or more inter-AS links ([0048], determines (s2) the immediate reward obtained which resulted from the previously taken action having been undertaken which resulted in the previously observed state of the observed system being in a particular state)

As per claim 3, British discloses the system of claim 1, wherein the current state and the updated current state are characterized by the metric which is a measurement based on any of ingress traffic, egress traffic, latency, dropped packets, and business metrics ([0059], the monitoring module 24 then coverts the calculated measure into an indication of a current state for the communications network 3)

([0076], where the actions undertaken by agents correspond to actions other than setting tariffs for a communication network)

As per claim 8, British discloses the system of claim 1, wherein the one or more rewards ([0011], using a calculated immediate reward) include one or more of balancing traffic across a plurality of inter-AS links, maximizing Quality of Experience, minimizing Service Layer Agreement penalties, minimizing a cost per bit, minimizing latency, and minimizing a penalty to change routing data ([0015], determined the route associated with the lowest cost is then selected)

As per claim 9, British discloses a non-transitory computer-readable medium comprising software logic adapted to optimize Border Gateway Protocol (BGP) traffic in a telecommunications network, the software logic, when executed, enabling a processing device to: 
perform an action in the telecommunications networks when one or more inter-Autonomous Systems (AS) links are in a current state ([0064], the selection module 22 is caused select one of the actions as an appropriate action to undertake. The selection module 22 then proceeds to set the tariff data 15 on the basis of the selected action; [0061], Having determined the current state of the communications system 3-1; ...the selection module 22 then (S9) proceeds to select a tariff 15 to be set for utilizing the communications network associated with the routing agent 10), wherein the action in the telecommunication network is configured to have an effect on BGP traffic flow on the one or more inter-AS links ([0058], where traffic is measure volume of data carried by a link in the communications network 3-1)
determining a metric based on the effect of the action on the BGP traffic flow to determine an updated current state of the one or more inter-AS links ([0057], The monitoring module 24 then (s8) acquires data directly from the communications network 3-1; ...; 3-4 associated with the control computer 5-1; ...; 5-4 to determine a current state for the network 3-1; ...; 3-4)
and utilizing the metric to perform a further action to achieve one or more rewards associated with the one or more inter-AS links ([0048], determines (s2) the immediate reward obtained which resulted from the previously taken action having been undertaken which resulted in the previously observed state of the observed system being in a particular state)

As per claim 11, British discloses the non-transitory computer-readable medium of claim 9, wherein the current state and the updated current state are characterized by the metric which is a measurement based on any of ingress traffic, egress traffic, latency, dropped packets, and business metrics ([0059], the monitoring module 24 then coverts the calculated measure into an indication of a current state for the communications network 3)

As per claim 12, British discloses the non-transitory computer-readable medium of claim 9, wherein the action is a direct action for outbound traffic on the one or more inter-AS links ([0076], where the actions undertaken by agents correspond to actions other than setting tariffs for a communication network)

([0011], using a calculated immediate reward) include one or more of balancing traffic across a plurality of inter-AS links, maximizing Quality of Experience, minimizing Service Layer Agreement penalties, minimizing a cost per bit, minimizing latency, and minimizing a penalty to change routing data ([0015], determined the route associated with the lowest cost is then selected)

As per claim 17, British discloses a method comprising: 
performing an action in the telecommunications networks when one or more inter-Autonomous Systems (AS) links are in a current state ([0064], the selection module 22 is caused select one of the actions as an appropriate action to undertake. The selection module 22 then proceeds to set the tariff data 15 on the basis of the selected action; [0061], Having determined the current state of the communications system 3-1; ...the selection module 22 then (S9) proceeds to select a tariff 15 to be set for utilising the communications network associated with the routing agent 10)
wherein the action in the telecommunication network is configured to have an effect on BGP traffic flow on the one or more inter-AS links ([0058], where traffic is measure volume of data carried by a link in the communications network 3-1)
determining a metric based on the effect of the action on the BGP traffic flow to determine an updated current state of the one or more inter-AS links ([0057], The monitoring module 24 then (s8) acquires data directly from the communications network 3-1; ...; 3-4 associated with the control computer 5-1; ...; 5-4 to determine a current state for the network 3-1; ...; 3-4)
([0048], determines (s2) the immediate reward obtained which resulted from the previously taken action having been undertaken which resulted in the previously observed state of the observed system being in a particular state)

As per claim 19, British discloses the method of claim 17, wherein the current state and the updated current state are characterized by the metric which is a measurement based on any of ingress traffic, egress traffic, latency, dropped packets, and business metrics ([0059], the monitoring module 24 then coverts the calculated measure into an indication of a current state for the communications network 3)

As per claim 20, British discloses the method of claim 17, wherein the action is one of a direct action for outbound traffic on the one or more inter-AS links, and an indirect action to influence inbound traffic on the one or more inter-AS links ([0076], where the actions undertaken by agents correspond to actions other than setting tariffs for a communication network)

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the 

Claims 2, 10 and 18 are rejected under 35 U.S.C. §103 as being unpatentable over British in view of Jacobson et al. (US Patent No. 7,197,573 B1) hereinafter “Jacobson ‘573”
As per claim 2, British discloses the system of claim 1, British does not explicitly disclose wherein the one or more rewards relate to optimization of one or more of inbound traffic and outbound traffic on the one or more inter-AS links.
Jacobson ‘573 discloses wherein the one or more rewards relate to optimization of one or more of inbound traffic and outbound traffic on the one or more inter-AS links (Jacobson ‘573, col. 1, lines 37-43, send traffic over a border router capable of forwarding traffic to its intended destination that is different from the border router having the least cost path that is capable of forwarding traffic to its intended destination)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to take the teaching of Jacobson ‘573” related to wherein the one or more rewards relate to optimization of one or more of inbound traffic and outbound traffic on the one or more inter-AS links and have modified the teaching of British in order to reduce the overloading (Jacobson ‘573, col. 1, lines 48-50)

As per claim 10, British discloses the non-transitory computer-readable medium of claim 9, British does not explicitly disclose wherein the one or more rewards relate to optimization of one or more of inbound traffic and outbound traffic on the one or more inter-AS links.
(Jacobson ‘573, col. 1, lines 37-43, send traffic over a border router capable of forwarding traffic to its intended destination that is different from the border router having the least cost path that is capable of forwarding traffic to its intended destination)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to take the teaching of Jacobson ‘573” related to wherein the one or more rewards relate to optimization of one or more of inbound traffic and outbound traffic on the one or more inter-AS links and have modified the teaching of British in order to reduce the overloading (Jacobson ‘573, col. 1, lines 48-50)

As per claim 18, British discloses the method of claim 17, British does not explicitly disclose wherein the one or more rewards relate to optimization of one or more of inbound traffic and outbound traffic on the one or more inter-AS links.
Jacobson ‘573 discloses wherein the one or more rewards relate to optimization of one or more of inbound traffic and outbound traffic on the one or more inter-AS links (Jacobson ‘573, col. 1, lines 37-43, send traffic over a border router capable of forwarding traffic to its intended destination that is different from the border router having the least cost path that is capable of forwarding traffic to its intended destination)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to take the teaching of Jacobson ‘573” related to wherein the one or more rewards relate to optimization of one or more of inbound traffic and outbound traffic on the 

Claims 5 and 13 are rejected under 35 U.S.C. §103 as being unpatentable over British in view of Jacobson et al. (US Patent 7,120,792 B1) hereinafter “Jacobson ‘792”
As per claim 5, British discloses the system of claim 1, British does not explicitly disclose wherein the action is an indirect action to influence inbound traffic on the one or more inter-AS links.
Jacobson ‘792 discloses wherein the action is an indirect action to influence inbound traffic on the one or more inter-AS links (Jacobson ‘792, col. 30, lines 10-12, When messages are received, crypto manager 1720 checks the original source from which messages inbound to BGP processing module 1710)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to take the teaching of Jacobson ‘792 related to wherein the action is an indirect action to influence inbound traffic on the one or more inter-AS links and have modified the teaching of British in order to improve the performance of the network (Jacobson ‘792, Background of the invention)

As per claim 13, British discloses the non-transitory computer-readable medium of claim 9, British does not explicitly disclose wherein the action is an indirect action to influence inbound traffic on the one or more inter-AS links.
Jacobson ‘792 discloses wherein the action is an indirect action to influence inbound traffic on the one or more inter-AS links (Jacobson ‘792, col. 30, lines 10-12, When messages are received, crypto manager 1720 checks the original source from which messages inbound to BGP processing module 1710)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to take the teaching of Jacobson ‘792 related to wherein the action is an indirect action to influence inbound traffic on the one or more inter-AS links and have modified the teaching of British in order to improve the performance of the network (Jacobson ‘792, Background of the invention)

Claims 6, 7, 14 and 15 are rejected under 35 U.S.C. §103 as being unpatentable over British in view of Cote et al. (US 2018/0248905 A1) hereinafter “Cote”
As per claim 6, British discloses the system of claim 1, British does not explicitly disclose wherein the instructions, when executed, further enable the processing device to receive training related to what actions are effective for the one or more rewards based on the current state.
Cote discloses wherein the instructions, when executed, further enable the processing device to receive training related to what actions are effective for the one or more rewards based on the current state (Cote, [0004], determining a model based on machine learning training with the PM data)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to take the teaching of Cote related to wherein the instructions, when executed, further enable the processing device to receive training related to what actions are effective for the one or more rewards based on the current state and have modified the teaching of British in order to improve the stability of the network (Cote, [0103])

 (Cote, [0047], The actual training depends on the machine-learning algorithm. It is usually computationally expensive and is usually performed offline)

As per claim 14, British discloses the non-transitory computer-readable medium of claim 9, British does not explicitly disclose wherein the instructions, when executed, further enables the processing device to receive training related to what actions are effective for the one or more rewards based on the current state.
Cote discloses wherein the instructions, when executed, further enables the processing device to receive training related to what actions are effective for the one or more rewards based on the current state (Cote, [0004], determining a model based on machine learning training with the PM data)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to take the teaching of Cote related to wherein the instructions, when executed, further enables the processing device to receive training related to what actions are effective for the one or more rewards based on the current state and have modified the teaching of British in order to improve the stability of the network (Cote, [0103])

As per claim 15, British in view of Cote disclose the non-transitory computer-readable medium of claim 14, wherein the training includes offline training using one of i) historical data based on actions taken in a production network, and ii) a simulation (Cote, [0047], The actual training depends on the machine-learning algorithm. It is usually computationally expensive and is usually performed offline)

Conclusion
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JEFF BANTHRONGSACK whose telephone number is (571) 270-7090.  The examiner can normally be reached on M-F 9:00am - 5:00pm
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, YEMANE MESFIN can be reached on 571-272-3927.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.


/JEFF BANTHRONGSACK/Examiner, Art Unit 2462                                                                                                                                                                                                        

/PETER CHEN/Primary Examiner, Art Unit 2462