DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
Applicant’s arguments, see page 6, filed 06/24/2021, with respect to 35 U.S.C. 112(f) interpretations have been fully considered and are persuasive.  The 112(f) interpretation of claim 16 has been withdrawn. 
Applicant’s arguments, see page 6, filed 06/24/2021, with respect to 35 U.S.C. 112(b) rejections have been fully considered and are persuasive.  The 35 U.S.C. 112(b) rejections of claims 7, 18 and 20 have been withdrawn. 
Applicant's arguments filed 06/24/2021 have been fully considered but they are not persuasive. The applicant has argued in substance:
Ma (Pub. No. US 20200162535 A1, hereinafter Ma) does not disclose “receiving a reward from the receiving computing device, wherein the reward is based on one or more reception parameters associated with the transmitted real-time communication received at the receiving computing device”.  The examiner respectfully disagrees.  Specifically, the applicant argues that “[the applicant’s] structure is different from that of MA in that the reward is provided by a QoE metric module 224 of the receiving device”).  However, while the examiner agrees with the applicant’s assertion that Ma’s disclosure shows the reward being provided by the environment, further observation of FIG. 3 (which supports previously cited para. [0033] and [0041]) shows that the reward Rt is then sent by the agents to a central agent.  Therefore, the t is “received from the receiving computing device” as claimed, since the agents represent end users (para. [0030]) which receive video streams (para. 0028]).
Ma does not disclose the QoE machine learning model to calculate QoE, since Ma teaches an equation to calculate QoE.  The examiner respectfully disagrees.  Looking to the applicant’s specification, the QoE machine learning model only necessitates “analyz[ing] various reception parameters such as the payload of the received audio and video data streams, wherein the payload is the part of the received data that is the actual intended message” (the other surrounding disclosure merely exemplifies aspects of the QoE machine learning model).  Accordingly, the equation in Ma accounts for a freezing time that results from streaming the video (i.e. payload) at a certain bitrate, and as such, the examiner equates the QoE equation in Ma (see para. [0033], [0034]) as the QoE machine learning model. 


Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claims 1-20 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Ma, et al (Pub. No. US 20200162535 A1 hereinafter Ma).

Claim 1 is an independent claim and Ma discloses a method of optimizing expected user-perceived quality of experience (QoE) in real-time communications between a sending computing device and a receiving computing device, comprising:
determining a current state of the sending computing device (receive network/playback status including retransmission packet count, para. [0010], [0015]); 
	determining a current action of the sending computing device, wherein the current action comprises a plurality of transmission parameters (current sending bitrate, throughput and frame size (among others) used in calculations, para. [0014], [0034]); 
	transmitting, in accordance with the current action current sending bitrate and throughput for packet transmissions in video streaming in an environment, para. [0030], [0043]), a real-time communication from the sending computing device to the receiving computing device, wherein the real-time communication includes one or more of a real-time audio communication and a real-time video communication (streaming real-time video, para. [0010], [0030]); 
	receiving a reward from the receiving computing device (environment also provides a reward Rt to the agent, para. [0033], para. [0041]; reward Rt is received in tuples from other agents (i.e. receiving computing device), see FIG. 3), wherein the reward is based on one or more reception parameters associated with the transmitted real-time communication received at the receiving computing device (QoE or reward calculation based on end-to-end interaction latency, para. [0033], [0034]); 
determining, based on the current state, the current action, and the reward, an expected value of a sum of future rewards (state inputs and sending bitrates used in reinforcement learning method, para. [0038]; expected cumulative reward that it receives from the environment...the reward reflects the performance of each streaming ground according to the QoE matrices ARS intends to optimize as discussed above...estimate the gradient of the expected total reward by observing the trajectories of executions obtained by following the policy, para. [0041]); and 
	changing at least one of the plurality of transmission parameters to maximize the expected value of the sum of future rewards (make decisions and adjust the sending bitrate based on trained and optimized ABR algorithm, para. [0042]; trained policy is based on maximizing estimated cumulative reward, para. [0041]).

As per claim 2, claim 1 is incorporated and Ma further discloses wherein a state-action value function of a reinforcement learning model determines the expected value of the sum of future rewards (reinforcement learning tools to train and optimize the ABR algorithm, para. [0013]; state inputs and sending bitrates used in reinforcement learning method, para. [0038]; expected cumulative reward that it receives from the environment...the reward reflects the performance of each streaming ground according to the QoE matrices ARS intends to optimize as discussed above...estimate the gradient of the expected total reward by observing the trajectories of executions obtained by following the policy, para. [0041]).  

As per claim 3, claim 2 is incorporated and Ma further discloses further comprising providing an output of the state-action value function to a control policy learning model of the reinforcement learning model (state inputs and sending bitrates used in reinforcement learning method, para. [0038]) and changing, by the control policy learning model, the at least one of the plurality of transmission parameters based on the output of the state-action value function (make decisions and adjust the sending bitrate based on trained and optimized ABR algorithm, para. [0042]; trained policy is based on maximizing estimated cumulative reward, para. [0041]).  
As per claim 4, claim 1 is incorporated and Ma further discloses wherein the reward comprises a user-perceived quality of experience (QoE) metric (reward reflects the performance of each streaming ground according to the QoE matrices ARS intends to optimize, para. [0041]) based on the one or more reception parameters associated with the transmitted real-time communication received at the receiving computing device (QoE matrices to calculate reward, where the calculation is based on end-to-end-interaction latency, para. [0033], [0034]).  

As per claim 5, claim 4 is incorporated and Ma further discloses determining the user-perceived QoE with a QoE machine learning model (equation for calculating QoE, para. [0033], [0034]), wherein the QoE machine learning model assesses the payload of the transmitted real- time communication received at the receiving computing device (by using DRL-based ARS to handle abr control in real-time video streaming systems, it optimizes its policy for different network characteristics and QoE matrices directly from user QoE, para. [0043]).  

As per claim 6, claim 4 is incorporated and Ma further discloses determining the user-perceived QoE with a QoE machine learning model, wherein the QoE machine learning model assesses: a network statistic at the receiving computing device (end-to-end interaction latency in QoE matrix, para. [0034]); a receiving computing device statistic (freezing time from streaming the video, para. [0034]); or a user feedback of the transmitted real-time communication received at the receiving computing device (quality perceived by the user, para. [0034]).  

claim 7, claim 4 is incorporated and Ma further discloses wherein the at least one of the plurality of transmission parameters includes a send rate parameter (adjusted sending rate, para. [0042]), a resolution parameter, a frame rate parameter, a quantization parameter (QP), or a forward error correction (FEC) parameter.

As per claim 8, claim 1 is incorporated and Ma further discloses wherein the sending computing device additionally operates as a receiving computing device and wherein the receiving computing device additionally operates as a sending computing device for two-way real-time communication (real-time video systems such as video conferencing, para. [0003]).

Claim 9 is an independent claim and Ma discloses a method of training a reinforcement learning model for optimizing expected user-perceived quality of experience (QoE) in real-time communications, the method including 
	determining a current state of a sender and providing the current state to an agent in communication with the sender (network QoS parameters provided to an agent from the environment comprising round-trip time, received bitrate, packet loss rate, retransmission packet count, and so on, para. [0015]); 
	determining a current action of the sender, wherein the current action is known by the agent and wherein the current action comprises a plurality of transmission parameters (current sending bitrate, throughput and frame size (among others) used in calculations, para. [0014], [0034]); 
	transmitting, in accordance with the current action (current sending bitrate and throughput for packet transmissions in video streaming in an environment, para. [0030], [0043]), a real-time communication from the sender to a receiver (video streaming in an environment, para. [0030]), (streaming real-time video, para. [0010], [0030]); 
	receiving, from the receiver, at the agent, a reward determined at the receiver (environment also provides a reward to the agent, para. [0033], para. [0041]), wherein the reward is based on one or more reception parameters associated with the real-time communication received at the receiver (QoE or reward calculation based on end-to-end interaction latency, para. [0033], [0034]); 
	determining at the agent, based on the current state, the current action and the reward, an expected value of a sum of future rewards (state inputs and sending bitrates used in reinforcement learning method, para. [0038]expected cumulative reward that it receives from the environment...the reward reflects the performance of each streaming ground according to the QoE matrices ARS intends to optimize as discussed above...estimate the gradient of the expected total reward by observing the trajectories of executions obtained by following the policy, para. [0041]); and 
	changing at least one of the plurality of transmission parameters to maximize the expected value of the sum of future rewards (make decisions and adjust the sending bitrate based on trained and optimized ABR algorithm, para. [0042]; trained policy is based on maximizing estimated cumulative reward, para. [0041]).

As per claim 10, claim 9 is incorporated and Ma further discloses wherein the sender, receiver and network are simulated (learning process in simulated environment, abstract, para. [0020], [0037], [0041]).

As per claim 11, claim 10 is incorporated and Ma further discloses wherein the sender, receiver and network are simulated with discrete events (simulate real video streaming processes, para. [0037]).  

claim 12, claim wherein the network is emulated (simulated network environment, para. [0020], [0037]) between the sender, comprising a sending computing device (streaming server, para. [0028]), and the receiver, comprising a receiving computing device (user end in the environment, para. [0014]).  

As per claim 13, claim 12 is incorporated and Ma further discloses wherein each of the sending computing device and the receiving computing device execute a communications application (client applications, para. [0037]; program code, para. [0051]) and wherein one or more conditions of the network are controlled according to one or more predetermined parameters (simulate network conditions for learning algorithm, para. [0037]).  

As per claim 14, claim 9 is incorporated and Ma further discloses wherein the sender comprises a sending computing device (streaming server, para. [0028]), the receiver comprises a receiving computing device (user end in the environment, para. [0014]) and the network comprises a live, real network (underlying network, para. [0010]; see also FIG. 1-3).  

As per claim 15, claim 14 is incorporated and Ma further discloses wherein the sender, receiver and network are in a live environment and the method further comprises continuously training the agent based on live real-time communication transmissions (real-time streaming and adapting bitrate based on status feedback, para. [0010]; training agent over time based on observations of real time streaming simulations, para. [0037], [0038]).

Claim 16 is an independent claim corresponding to independent claim 1 and is therefore rejected for similar reasoning.  Additionally, Ma further discloses a memory storing executable instructions and a processing device executing the executable instructions (see para. [0046], [0047]).

As per claim 17, claim 16 is incorporated.  Claim 17 corresponds to claim 8 and is therefore rejected for similar reasoning.

As per claim 18, claim 16 is incorporated.  Claim 18 corresponds to claim 7 and is therefore rejected for similar reasoning.

As per claim 19, claim 16 is incorporated and Ma further discloses further comprising performing the determination of the expected value of the sum of future rewards with a reinforcement learning model (reinforcement learning tools to train and optimize the ABR algorithm, para. [0013]).

As per claim 20, claim 19 is incorporated and Ma further discloses wherein the reinforcement learning model comprises at least one of an actor-critic model, a q-learning model (q-learning, para. [0044]), a policy gradient model (gradient descent method of learning, para. [0035]), a temporal difference model and a monte-carlo tree search model.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
CN 109255443 – generally discloses a training a depth reinforcement learning model based on a first state, a first action, a reward score, and a second state.
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  

Any inquiry concerning this communication or earlier communications from the examiner should be directed to JONATHAN A BUI whose telephone number is (571)270-7168.  The examiner can normally be reached on Mon-Fri: 9AM - 530PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Peter Pappas can be reached on (571)272-7646.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/JONATHAN A BUI/Primary Examiner, Art Unit 2448