Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Detailed Action
2.	Claims 1-16 are pending.
3.	Claims 1-24 were filed on 06/07/2021 . Applicant elects claims 1-16 without traverse. Claims 17-24 are withdrawn (see Restriction/Election response filed on 10/03/2022).	

Information Disclosure statement
4.	The information disclosure statement (IDS) submitted on 06/08/2022 and 07/23/2021  were filed.  The submission is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Claim Rejections - 35 USC § 102
5.	The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale or otherwise available to the public before the effective filing date of the claimed invention.

6.	Claims 1-3,4-8,11,13 and 15-16 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Yanjio et al. ( CN 110581808 A) hereinafter referred as Yanjio.

Regarding claim 1, Yanjio discloses a method comprising, at a device: 
receiving at a reinforcement learning agent environmental feedback from a data transmission network indicating a speed at which data is currently being transmitted through the data transmission network [page 5, para.5 ] a congestion control method based on deep reinforcement learning, which comprises the steps of firstly, initializing a network environment and obtaining network state data, wherein the network state data comprises network delay, transmission rate, sending rate and congestion window size; then initializing parameters of a congestion control model, the invention can train the congestion control model by utilizing the selected network state data (network time delay, transmission rate, sending rate and congestion window size). Also ( page 6, para. [5]-]6] Step S1: initializing a network environment, and generating network state data, wherein the network state data comprises a network delay, a transmission rate, a sending rate, and a congestion window size. step S1.2: and calculating the network delay, the transmission rate, the sending rate and the size of a congestion window according to the data sent by the two communication parties through the established connection. Specifically, before the program starts, it is necessary to initialize a network environment, establish a connection between both communication parties, calculate status data such as a network delay (RTT), a transmission rate (delivery rate), a sending rate (sending rate), and a congestion window size (cwnd) of a network by data transmission of both communication parties, and store the data in an experience pool); and
 adjusting, by the reinforcement learning agent, a transmission rate of one or more of a plurality of data flows within a data transmission network, based on the environmental feedback (page 7, para [1-2] data may be randomly selected as the target network status data from the network status data generated in step S1. And then training the neural network by using the target network state data, wherein the reward function and the loss function are used for adjusting occurrence of network congestion and achieving the purpose of optimizing network performance. Also, para.[ 6]  when the parameters are updated, the Q network and target Q parameters need to be adjusted according to the feedback of the reward function received by the Agent, and are updated respectively. Further, (page 10, para. [ 4]   the method of the invention can fully utilize the performance index of the network, adopts a proper value through deep reinforcement learning to adjust the size and the direction of the network congestion window, thereby improving the network throughput, reducing the packet loss rate and the time delay and further solving the network congestion).

Regarding claim 2, claim 1 is incorporated and Yanjio further discloses, wherein the reinforcement learning agent includes a trained neural network that takes the environmental feedback as input and outputs adjustments to be made to one or more of the plurality of data flows, based on the environmental feedback (page 7, para [1-2] data may be randomly selected as the target network status data from the network status data generated in step S1. And then training the neural network by using the target network state data, wherein the reward function and the loss function are used for adjusting occurrence of network congestion and achieving the purpose of optimizing network performance. Also, para.[ 6]  when the parameters are updated, the Q network and target Q parameters need to be adjusted according to the feedback of the reward function received by the Agent, and are updated respectively. Also, (page 6, para.4-7 and page 7 [para 1] Step S1: initializing a network environment, and generating network state data, wherein the network state data comprises network delay, transmission rate, sending rate and congestion window size; Step S2: initializing parameters of a congestion control model, wherein the parameters of the congestion control model comprise a reward function, an experience pool size, a neural network structure and a learning rate; Step S3: selecting target network state data from the generated network state data, updating parameters of a neural network according to the target network state data, a reward function and a loss function, and generating different congestion control models).

Regarding claim 3, claim 1 is incorporated and  Yanjio further discloses wherein environmental feedback is retrieved in response to establishing, by the reinforcement learning agent, an initial transmission rate of each of the plurality of data flows within the data transmission network (page 6, para [4] before the program starts, it is necessary to initialize a network environment, establish a connection between both communication parties, calculate status data such as a network delay (RTT), a transmission rate (delivery rate), a sending rate (sending rate), and a congestion window size (cwnd) of a network by data transmission of both communication parties, and store the data in an experience pool. After a certain amount of data is stored in the experience pool, a certain amount of state data can be randomly taken from the experience pool to prepare for the operation of each step (i.e., subsequent training). Also, (page, 8 lines para [8] when training the neural network, a part of samples in the "experience pool" may be randomly extracted to train the model (the experience pool is a container for storing data, and stores historical data)).

Regarding claim 5, claim 1 is incorporated and  Yanjio further discloses, wherein each of the plurality of data flows include a transmission of data from a source to a destination (page 4 para [2]-[3] Step S1.1: establishing connection between two communication parties; Step S1.2: and calculating the network delay, the transmission rate, the sending rate and the size of a congestion window according to the data sent by the two communication parties through the established connection. Also, (page 8 para [11] and page 9, para  [1], when model training is carried out, an Agent is adopted to observe a Sender and a Receiver in Environment, and observed data states are sent to a DQN neural network. The DQN continuously learns through reward of data sending and Environment feedback, and an action is adopted as a mode for adjusting the congestion window).

Regarding claim 6, claim 1 is incorporated and  Yanjio further discloses, wherein the transmission rate for each of the plurality of data flows is established by the reinforcement learning agent located on each of one or more sources of communications data (page 6, para. [4] a congestion control method based on deep reinforcement learning, which comprises the following steps: Step S1: initializing a network environment, and generating network state data, wherein the network state data comprises network delay, transmission rate, sending rate and congestion window size; Step S2: initializing parameters of a congestion control model, wherein the parameters of the congestion control model comprise a reward function, an experience pool size, a neural network structure and a learning rate; Step S3: selecting target network state data from the generated network state data, updating parameters of a neural network according to the target network state data, a reward function and a loss function, and generating different congestion control models).

Regarding claim 7, claim 1 is incorporated and  Yanjio further discloses, wherein the environmental feedback includes measurements extracted by the reinforcement learning agent from data packets sent within the data transmission network (page 7, para [5] it is necessary to initialize a network environment, establish a connection between both communication parties, calculate status data such as a network delay (RTT), a transmission rate (delivery rate), a sending rate (sending rate), and a congestion window size (cwnd) of a network by data transmission of both communication parties, and store the data in an experience pool. After a certain amount of data is stored in the experience pool, a certain amount of state data can be randomly taken from the experience pool to prepare for the operation of each step (i.e., subsequent training). Also, (page 8, para [5] acquiring the current network state, and obtaining an action according to the network state, such as cwnd (. 2). This action is then executed to expand the current congestion window by a factor of two. The sender makes a decision as to whether or not an ack (acknowledgement message) is obtained from the receiver, and if not, waits until it is obtained. After the ack is obtained, the state and the reward are updated, the update of the state is to observe the network delay (RTT), the transmission rate (delivery rate), the sending rate (sending rate) and the congestion window size (cwnd) network state of the network link in step S1, the update of the reward is to calculate according to a reward function).

Regarding claim 8, claim 7 is incorporated and  Yanjio further discloses, wherein the measurements include a state value indicating a speed at which data is currently being transmitted within the transmission network (page 6, para. [5]-]6] it is necessary to initialize a network environment, establish a connection between both communication parties, calculate status data such as a network delay (RTT), a transmission rate (delivery rate), a sending rate (sending rate), and a congestion window size (cwnd) of a network by data transmission of both communication parties).

Regarding claim 11, claim 1 is incorporated and  Yanjio further discloses,, wherein a granularity of the adjustments made by the reinforcement learning agent is adjusted during a training of a neural network included within the reinforcement learning agent (page 7, para [4]-[6] step S3.2: and updating the neural network parameters in a mode of minimizing a loss function according to the obtained value of the Reward function, and generating different congestion control models. specifically, in reinforcement learning, the congestion control is to change the size of a window by an action, and the basis of what action is taken is status data. The two modes are two mechanisms of action selection by DQN (deep reinforcement learning). Reinforcement learning defines an environment for an Agent to implement certain actions to maximize rewards. When the parameters are updated, the Q network and target Q parameters need to be adjusted according to the feedback of the reward function received by the Agent, and are updated respectively. In the optimization process, after a certain number of time steps, the target Q parameter of the target network is updated to be the parameter of the eval net of the current training network).

Regarding claim 13 , claim 1 is incorporated and  Yanjio further discloses, wherein the environmental feedback includes signals from the environment, or estimations thereof, or predictions of the environment (page 8, para. [8] …the number of channels of input signals of the first layer of the convolutional layer is 16, the number of output channels is 32, the activation function is ReLU and is used as the input of the second layer of the convolutional layer, the number of channels of input signals of the second layer of the convolutional layer is 32, the number of output channels is 64, the activation function is also ReLU, after the two layers of the convolutional layer are processed, extracted characteristic data are flattened (Flatter), then the flattened data are input into the first layer of the fully-connected layer, the output is processed through the ReLU and then enters the second layer of the fully-connected layer, and the output of the second layer of the fully-connected layer is the Q value corresponding to different actions required by the invention).

Regarding independent claim 15, claim corresponds to independent claim 1 and is therefore rejected for similar reasoning.  Yanjio further discloses A non-transitory computer-readable media storing computer instructions which when executed by one or more processors of a device (page 6 para. [6] and page 8, para. [8])

Regarding claim 16, claim 15 is incorporated.  Claim 16 corresponds to claim 2 is therefore rejected for similar reasoning.


Claim Rejections - 35 USC § 103

7.	In the event the determination of the status of the application as subject to AlA 35 U.S.C. 102 and 103 (or as subject to pre-AlA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
8.	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.


9.	Claim 4 is rejected under 35 U.S.C. 103 as being unpatentable over Yanjio et al. ( CN 110581808 A) hereinafter referred as Yanjio in view of Ganguli et al. (US 20190044849 A1) hereinafter referred as Ganguli.

Regarding claim 4, claim 1 is incorporated.  Yanjio may not explicitly disclose
 the data transmission network includes one or more sources of transmitted data, the one or more sources of transmitted data include one or more network interface cards (NICs) located on one or more computing devices, and each of the one or more NICs implement one or more of the plurality of data flows within the data transmission network.
However, Ganguli discloses the data transmission network includes one or more sources of transmitted data, the one or more sources of transmitted data include one or more network interface cards (NICs) located on one or more computing devices, and each of the one or more NICs implement one or more of the plurality of data flows within the data transmission network (para. [0108] the telemetry collector 1936 is to monitor network flows and connections between ports 1652 of the NIC 1648 and the plies 1662. The telemetry collector 1936 is also to receive notifications on telemetry data such as latency, congestion, and link failures and provide the information to management software. The policy manager 1938 is to determine which of the policy data 1902 is to be applied in programming the SDN table data 1904. For example, the policy manager 1938 may identify a load balancing policy based on QoS requirements and congestion data in response to the SDN management agent 1930 detecting that a link failure or congestion has occurred between a specified NIC port and a switch. The policy manager 1938 is also to determine (e.g., based on DHT data 1908) a path hierarchy for a load balancing path after the link failure).
Therefore, it would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to modify the system of Yanjio and include the data transmission network includes one or more sources of transmitted data, the one or more sources of transmitted data include one or more network interface cards (NICs) located on one or more computing devices, and each of the one or more NICs implement one or more of the plurality of data flows within the data transmission network using the teaching of Ganguli. One would have been motivated to do so in order to accurately balance resource utilization in the data center.

10.	Claim 9 is rejected under 35 U.S.C. 103 as being unpatentable over Yanjio et al. ( CN 110581808 A) hereinafter referred as Yanjio in view of Zhou et al (US 20200120036 A1)  hereinafter referred as Zhou.

Regarding claim 9 , claim 1 is incorporated.  Yanjio may not explicitly disclose wherein the measurements include statistics derived from signals implemented within the data transmission network, the statistics including one or more of latency measurements, congestion notification packets, and a transmission rate.  However Zhou discloses wherein the measurements include statistics derived from signals implemented within the data transmission network, the statistics including one or more of latency measurements, congestion notification packets, and a transmission rate ([Abstract] obtaining a transmitted data volume of a flow, and identifying a predictable flow and a non-predictable flow in the flow; collecting statistics about total data transmission volumes of the predictable flow and the non-predictable flow; obtaining a congestion transmission model of the predictable flow, and solving the congestion transmission model to obtain a scheduling policy for the predictable flow; and allocating bandwidths to the predictable flow and the non-predictable flow to obtain a bandwidth allocation result, and sending the bandwidth allocation result and the scheduling policy to the host, so that the host executes the scheduling policy in a scheduling period. This can prevent congestion in advance and reduce a delay of a delay-sensitive flow, and is applicable to a large data center)
Therefore, it would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to modify the system of Yanjio and wherein the measurements include statistics derived from signals implemented within the data transmission network, the statistics including one or more of latency measurements, congestion notification packets, and a transmission rate using the teaching of Zhou. One would have been motivated to do so in order to  prevent congestion in advance and reducing delay of a delay-sensitive stream.



11.	Claim 10 is rejected under 35 U.S.C. 103 as being unpatentable over Yanjio et al. ( CN 110581808 A) hereinafter referred as Yanjio in view of Park et al. (US 20200187016 A1) hereinafter referred as Park.

Regarding claim 10, claim 1 is incorporated.  Yanjio may not explicitly disclose wherein the data transmission network includes a distributed computing environment for performing ray tracing computations.   However,  Park discloses wherein the data transmission network includes a distributed computing environment for performing ray tracing computations (para. [0039]  a ray tracing (RT) method is used as an analysis method for the network design of a new communication system. The RT method has an advantage in that results in which an electromagnetic wave characteristic is taken into consideration in addition to a local characteristic can be derived because analysis is performed by taking into consideration the path of an electromagnetic wave changed due to a structure, such as a building or a tree, within an area and the characteristics of the area. That is, in the RT method, a plurality of rays transmitted in a sphere form of 360° from a given transmission location within a map 100 of a given area illustrated in FIG. 1 is taken into consideration. The transmission, reflection and diffraction of a ray are computed for each path. [0040] In the RT method, a long time is taken for the computation process in order to derive such an advantage. That is, a long analysis time is taken to obtain accurate results because the transmission, reflection and diffraction of a ray transmitted to a given path need to be separately computed by taking into consideration a structure on the path).
Therefore, it would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to modify the system of Yanjio and include the data transmission network includes a distributed computing environment for performing ray tracing computations using the teaching of Park. One would have been motivated to do so in order to reduce the  time required for the total analysis process by performing decentralized processing reflecting the property of another given area which suggests the reliability and accuracy of the result.


12.	Claim 12 is rejected under 35 U.S.C. 103 as being unpatentable over Yanjio et al. ( CN 110581808 A) hereinafter referred as Yanjio in view of Di Pietro et al. (US 20150195146 A1) hereinafter referred as Di Pietro.

Regarding claim 12, claim 1 is incorporated and Yanjio discloses receiving, by the reinforcement learning agent, environmental feedback, and performing adjustments, based on the environmental feedback as recited in claim 1 above. Yanjio may not explicitly disclose receiving, by the reinforcement learning agent, additional environmental feedback, and performing additional adjustments, based on the additional environmental feedback.  However, Di Pietro discloses receiving, by the reinforcement learning agent, additional environmental feedback, and performing additional adjustments, based on the additional environmental feedback (para [0072] a mechanism that dynamically adjusts the sending of information required by an LM hosted in the network, according to the network congestion and learning machine state (e.g., using a closed-loop feedback mechanism). T0072he techniques herein prevent the network from being flooded by messages carrying features, which would prevent the learning machine from being operational and could by itself be a sophisticated form of attack against to the learning network. First, detection of congestion generated by feature messages is performed both locally and on an end-to-end basis, as illustrated generally in FIGS. 4E-4F. Then, aggregating nodes are chosen based on the feature relevance to the LM, on the overall feature traffic rate and on the nodes' capabilities, as illustrated generally in FIG. 4G. Once aggregation has been set up, both the traffic rate and the LM performance are monitored by the LM, allowing the aggregation parameters to be adjusted dynamically (e.g., via a closed loop control mechanism, etc.)
Therefore, it would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to modify the system of Yanjio and include receiving, by the reinforcement learning agent, additional environmental feedback, and performing additional adjustments, based on the additional environmental feedback using the teaching of Di Pietro. One would have been motivated to do so in order to effectively reduce the number of collisions, thereby preventing the network from being flooded by messages.

13.	Claim 14 is rejected under 35 U.S.C. 103 as being unpatentable over Yanjio et al. ( CN 110581808 A) hereinafter referred as Yanjio in view of Changuel et al (US 20150334028 A1)  hereinafter referred as Changuel.

Regarding claim 14 , claim 1 is incorporated and  Yanjio  discloses  wherein the reinforcement learning agent learns a congestion control, and the congestion control is modified in reaction to observed data (page 7, para. [11] and page 8 para. [1] parameters of the model need to be initialized, then state data are randomly selected, when model training is carried out, an Agent is adopted to observe a Sender and a Receiver in Environment, and observed data states are sent to a DQN neural network. The DQN continuously learns through reward of data sending and Environment feedback, and an action is adopted as a mode for adjusting the congestion window.
Yanjio may not explicitly disclose congestion policy. However, Changuel discloses a congestion control policy  (para. [0016] a learning step to associate appropriate congestion control policies, respectively, with estimated video categories from a plurality of training streaming video traffics; [0017] an estimating step of the video category of an ongoing streaming video traffic; [0018] an application step of the congestion control policy associated with the estimated video category of the ongoing streaming video traffic to the said ongoing streaming video traffic. [0038] during a learning phase, a set of parameters are observed in corpus of representative video contents and learned using reinforcement learning, so that different categories of video and corresponding optimal policies are obtained on the basis of the values at convergence of the considered parameters).
Therefore, it would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to modify the system of Yanjio and include a congestion control policy using the teaching of Changuel. One would have been motivated to do so in order to  improve performance and reduce the computational complexity of congestion control mechanisms in TCP for best-effort services on wireless networks, thus allowing to simultaneously handling multiple video flows for congestion avoidance.

Conclusion
14.	Any inquiry concerning this communication or earlier communications from the examiner should be directed to Kidest Mendaye whose telephone number is (571)272-2603. The examiner can normally be reached on Monday through Friday 7:00 am-5:00pm EST. Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
	If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Peter Pappas can be reached on (571) 272-7646. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. 
	Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/KIDEST MENDAYE/
Examiner, Art Unit 2448                                                                                        

/JONATHAN A BUI/Primary Examiner, Art Unit 2448