DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This Office Action is a response to communications dated 05/18/2020.  Claims 1-20 are pending in the application.

Information Disclosure Statement
The information disclosure statement filed 05/19/2020 complies with the provisions of 37 CFR 1.97, 1.98 and MPEP § 609.  It has been considered and placed in the application file.

Claim Objections
Claim 1, 12 and 19 are objected to because of the following informalities:  
As per claim 1, lines 13-14, “the action a stretch, compress, or hold action” should be changed to either --the action includes a stretch, compress, or hold action—or -- the action is a stretch, compress, or hold action--.
As per claim 12, line 9, “the action a stretch, compress, or hold action” should be changed to either --the action includes a stretch, compress, or hold action-- or --the action is a stretch, compress, or hold action--.
As per claim 19, lines 9-20, “the action a stretch, compress, or hold action” should be changed to either --the action includes a stretch, compress, or hold action—or -- the action is a stretch, compress, or hold action--.


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 1-20 are rejected under 35 U.S.C. 103 as being unpatentable over Fang et al. (Reinforcement learning for bandwidth estimation and congestion control in real-time communications, arXiv, 6 pages, December 4, 2019) (hereinafter “Fang”) in view of Ma et al. (US 2020/0162535) (hereinafter “Ma”).
Regarding claim 1, in accordance with Fang reference entirety, Fang discloses a  device for controlling jitter-buffer delay in a media streaming session, the device comprising: real-time communications (RTC) system is disclosed in the Abstract and thereinafter) comprising: 
identifying a jitter buffer state (bandwidth estimation and congestion control) of a jitter buffer; the jitter buffer storing media data of an ongoing media streaming session taking place over a network, the media data (RTP packets) stored in the jitter buffer prior to processing of the media data (RTP packets), the jitter buffer state  (bandwidth estimation and congestion control) comprising an indicator of a delay (e.g., jitter-buffer control, packet loss resiliency, video encoding, etc) in processing of media data in the jitter buffer (page 2, section 2 R2Net: An Initial Approach: “In RL the formulation of the problem in terms of states, actions, and reward is crucial. In RTC, the ultimate reward is to deliver excellent QoE to end users, although the actions that can be taken to achieve this can vary widely. In our present work, we focus on a subset of the problem (bandwidth estimation and congestion control), but we note that there are numerous other sub-problems in RTC can naturally be posed as RL problems (e.g., jitter-buffer control, packet loss resiliency, video encoding, etc).” And section 2.2 RL formulation: "To estimate bandwidth, the receiver can use the incoming RTP packets and the measured roundtrip time (RTT). In our formulation, we use the aggregated RTP and RTT information in a fixed time-window of 50 ms as the environment state, and estimated bandwidth as the agent’s action. The environment updates the next state and reward based on the input action. State and Action: The state is a 4-dimension vector consisting of receive rate (kb/s), average packet interval (ms), packet loss rate (%), and average RTT (ms)"); 
identifying a network delay (RTT) of the network (page 2, section 2.2 RL formulation: "To estimate bandwidth, the receiver can use the incoming RTP packets and the measured roundtrip time (RTT). In our formulation, we use the aggregated RTP and RTT information in a fixed time-window of 50 ms as the environment state, and estimated bandwidth as the agent’s action. The environment updates the next state and reward based on the input action. State and Action: The state is a 4-dimension vector consisting of receive rate (kb/s), average packet interval (ms), packet loss rate (%), and average RTT (ms)".  And section 2.2 RL formulation: "Reward Design: We define reward per 50ms time step as 0:6 ln(4R + 1) - D - 10L, where R is receive rate in that time step, in Mb/s, D is the average RTT in that time step, in seconds, and L is packet loss rate. This means that receiving more packets is rewarded (since this should lead to higher QoE), but delay and packet loss are penalized (since these degrade QoE)");  
determining an action for a media frame of media data (RTP packets) in the jitter buffer based upon the jitter buffer state (bandwidth estimation and congestion control) and the network delay (RTT), the action a stretch, compress, or hold action page 2, section 2 R3Net: An Initial Approach: "an RL agent might control all actions in an RTC system, continuously improving QoE in an online manner." And section 2.2 RL formulation: "In our formulation, we use the aggregated RTP and RTT information in a fixed time-window of 50 ms as the environment state, and estimated bandwidth as the agent’s action. The environment updates the next state and reward based on the input action"); and d
It appears that Jang fails to explicitly disclose the details of the device comprising a computer processor; a memory, storing instructions, which when executed by the computer processor causes the computer processor; and the functional limitation of “determining a playback duration of the media frame based upon the action.”  However, such limitations lack thereof from Fang’s teaching are well-known in the art and taught by Ma.
In an analogous art in the same field of endeavor, Ma teaches methods and apparatus for learning based adaptive real-time streaming comprising a device (FIG. 5 and its corresponding description in paragraph [0045]) having a computer processor (520); a memory (510), storing instructions (515a), which when executed by the computer processor (520) causes the computer processor (520) to perform at least the functional limitation of “determining a playback duration of the media frame based upon the action” (Ma; para [0015]: "two categories of states S, including the network QoS and the playback status are provided to the agent from the environment. For example, the network QoS parameters comprise the round-trip time (RTT), the received bitrate, the packet loss rate, the retransmission packet count and so on. The play back status includes the received frame rate, the maximum received frame interval and the minimum received frame interval. And para[0031]: "Two categories of states S, including the network QoS (such as the round-trip time (RTT), the received bitrate, the packet loss rate, the retransmission packet count) and the playback status (such as received frame rate, maximum received frame interval, and minimum received frame interval) are provided to the agent 301/302/303/304 by the environment 321." And paragraph [0037]: "The training set used for simulation is obtained by simulating real video streaming processes to get state observations (i.e., the network QoS and the playback status) over various patterns of network environment ... playback status can be obtained.").
Thus, it would have been obvious to a person having ordinary skill in the art to which the claimed invention pertains before the effective filing date of the claimed invention to incorporate/combine/implement Ma’s device having the above discussed details and function into Fang’s teaching to arrive the claimed invention.  A motivation for doing so would be to overcome the prior art’s drawbacks as well as to improve bitrate adaptation, user QoE, and network utilization (Ma; para [0009]).
Regarding claim 2, in addition to features recited in base claim 1 (see rationales discussed above), Fang in view of Ma also discloses wherein the operations of determining the action for the frame of media data in the jitter buffer based upon the jitter buffer state and network delay comprises determining the action for the frame of media data based upon past jitter buffer states, past network states, past actions (Fang; pages 2-3, section 2.2 RL formulation: "State and Action: The state is a 4-dimension vector consisting of receive rate (kb/s), average packet interval (ms), packet loss rate (%), and average RTT (ms). We further scale the state to produce inputs with the same order of magnitude to the neural network. We use sigmoid activation as the last layer of the network, yielding outputs in the (0; 1) range. We then map the output to (0; 8) Mb/s as the bandwidth estimate, corresponding to an appropriate range for our RTC application." And section 2.1 Simulator: the collection of training data and real-time continuous updating of the model needs to be considered.).
Thus, it would have been obvious to a person having ordinary skill in the art to which the claimed invention pertains before the effective filing date of the claimed invention to incorporate/combine/implement Ma’s device having the above discussed details and function into Fang’s teaching to arrive the claimed invention for the same rationales above discussed.
Regarding claim 3, in addition to features recited in base claim 1 (see rationales discussed above), Fang in view of Ma also discloses wherein the jitter buffer state comprises a current jitter buffer delay, current received frames in the jitter buffer, total delay of the media frame, whether the media frame is concealed or not, whether the media frame is newly received, and a previously taken action (Fang; page 2, section 2 R2Net: An Initial Approach: "In our present work, we focus on a subset of the problem (bandwidth estimation and congestion control), but we note that there are numerous other sub-problems in RTC can naturally be posed as RL problems (e.g., jitter-buffer control, packet loss resiliency, video encoding, etc). Eventually, an RL agent might control all actions in an RTC system, continuously improving QoE in an online manner." And section 2.1 Simulator: the collection of training data and real-time continuous updating of the model needs to be considered.).

Regarding claim 4, in addition to features recited in base claim 1 (see rationales discussed above), Fang in view of Ma also discloses wherein the operations of determining the action for the frame of media data in the jitter buffer based upon the jitter buffer state and network delay comprises determining the action by using a model trained using past network traces of previous media streaming sessions (Fang; page 2, section 2.1 Simulator: "In RL training, it is common to use a simulation environment to speed up training, allowing agents to learn from vast numbers of observations before they are deployed into their target environment. For RTC, this means that a realistic simulation of Internet and application performance is required. Training may also be done online, with observations being collected from a full-scale RTC system, even learning from production calls. In the latter case, the collection of training data and real-time continuous updating of the model needs to be considered. We can train an RL model either in the real RTC process or in a simulator. As an initial approach, we use a simulator that can mimic the RTC process, but runs 1000x faster than real-time, to speed up training. Our simulator consists of the caller and callee RTC endpoints connected by a simulated network link; we can also simulate cross traffic (e.g., from TCP senders). The network simulator uses trace-replay-based simulation to control the parameters of the bottleneck link (including capacity, delay, and packet loss) in a discrete event simulation.").
Thus, it would have been obvious to a person having ordinary skill in the art to which the claimed invention pertains before the effective filing date of the claimed invention to incorporate/combine/implement Ma’s device having the above discussed details and function into Fang’s teaching to arrive the claimed invention for the same rationales above discussed.
Regarding claim 5, in addition to features recited in base claim 4 (see rationales discussed above), Fang in view of Ma also discloses training the model by: simulating a jitter buffer using the past network traces; producing a training action using the model for the simulated jitter buffer; producing an estimated value of the training action using a second model; and modifying the model based upon the estimated value and a reward signal (Fang; page 2, section 2.1 Simulator: "In RL training, it is common to use a simulation environment to speed up training, allowing agents to learn from vast numbers of observations before they are deployed into their target environment. For RTC, this means that a realistic simulation of Internet and application performance is required. Training may also be done online, with observations being collected from a full-scale RTC system, even learning from production calls. In the latter case, the collection of training data and real-time continuous updating of the model needs to be considered. We can train an RL model either in the real RTC process or in a simulator. As an initial approach, we use a simulator that can mimic the RTC process, but runs 1000x faster than real-time, to speed up training. Our simulator consists of the caller and callee RTC endpoints connected by a simulated network link; we can also simulate cross traffic (e.g., from TCP senders). The network simulator uses trace-replay-based simulation to control the parameters of the bottleneck link (including capacity, delay, and packet loss) in a discrete event simulation."  And page 3, section 2.3 Model and Training: "The input of the neural network is a time series, representing the state of the path between sender and receiver over time. The history information has impact on the estimated bandwidth (e.g., increasing RTT may mean previous bandwidth estimates were too large). Thus, we use a recurrent neural network with Gated Recurrent Units (GRUs) [3] to estimate bandwidth, as shown in Figure 1. For the leaky ReLU layer, we use the negative slope of 0.01. For the rest of the paper, we refer to this neural network as R3Net (RL-based Recurrent Network for RTC). We train R3Net using an actor-critic framework, where the actor and critic share the first few layers. The model is updated using Proximal Policy Optimization (PPO) [15] and the Adam optimizer with a learning rate of 3 x 10^-5, implemented using PyTorch, based on DeepRL [17]. We used around 10,000 network traces for simulation in training, and tested on 1150 different network traces."  In addition, Figure 1 also depicts R2Net structure (RL-based Recurrent Network for RTC) having all features explained in the Legend).
Thus, it would have been obvious to a person having ordinary skill in the art to which the claimed invention pertains before the effective filing date of the claimed invention to incorporate/combine/implement Ma’s device having the above discussed details and function into Fang’s teaching to arrive the claimed invention for the same rationales above discussed.

claim 6, in addition to features recited in base claim 5 (see rationales discussed above), Fang in view of Ma also discloses wherein the model and second model share a layer (Fang; page 3; Figure 1 depicts R2Net structure (RL-based Recurrent Network for RTC) having all features per layer as well as share a layer explained in the Legend).
Thus, it would have been obvious to a person having ordinary skill in the art to which the claimed invention pertains before the effective filing date of the claimed invention to incorporate/combine/implement Ma’s device having the above discussed details and function into Fang’s teaching to arrive the claimed invention for the same rationales above discussed.
Regarding claim 7, in addition to features recited in base claim 4 (see rationales discussed above), Fang in view of Ma also discloses wherein the model comprises at least two layers wherein at least one layer includes a leaky rectified linear unit activation function (Fang; page 3; Figure 1 depicts R2Net structure (RL-based Recurrent Network for RTC) having all features per layer as well as share a layer explained in the Legend including Leaky ReLU.  In addition, page 3, section 2.3 Model and Training: "The input of the neural network is a time series, representing the state of the path between sender and receiver over time. The history information has impact on the estimated bandwidth (e.g., increasing RTT may mean previous bandwidth estimates were too large). Thus, we use a recurrent neural network with Gated Recurrent Units (GRUs) [3] to estimate bandwidth, as shown in Figure 1. For the leaky ReLU layer, we use the negative slope of 0.01. For the rest of the paper, we refer to this neural network as R3Net (RL-based Recurrent Network for RTC). We train R3Net using an actor-critic framework, where the actor and critic share the first few layers. The model is updated using Proximal Policy Optimization (PPO) [15] and the Adam optimizer with a learning rate of 3 x 10^-5, implemented using PyTorch, based on DeepRL [17]. We used around 10,000 network traces for simulation in training, and tested on 1150 different network traces.").
Thus, it would have been obvious to a person having ordinary skill in the art to which the claimed invention pertains before the effective filing date of the claimed invention to incorporate/combine/implement Ma’s device having the above discussed details and function into Fang’s teaching to arrive the claimed invention for the same rationales above discussed.
Regarding claim 8, in addition to features recited in base claim 4 (see rationales discussed above), Fang in view of Ma also discloses wherein the model comprises: a first layer that comprises a leaky rectified linear unit (ReLu) activation function; a second layer that comprises a leaky ReLu activation function; a third layer that comprises a gated recurrent unit (GRU); a fourth layer that comprises a leaky ReLu activation function; and a fifth layer implementing a soft-max function (Fang; page 3; Figure 1 depicts R2Net structure (RL-based Recurrent Network for RTC) having all features per layer as well as share a layer explained in the Legend including Leaky ReLU. And page 3, section 2.3 Model and Training: "The input of the neural network is a time series, representing the state of the path between sender and receiver over time. The history information has impact on the estimated bandwidth (e.g., increasing RTT may mean previous bandwidth estimates were too large). Thus, we use a recurrent neural network with Gated Recurrent Units (GRUs) [3] to estimate bandwidth, as shown in Figure 1. For the leaky ReLU layer, we use the negative slope of 0.01. For the rest of the paper, we refer to this neural network as R3Net (RL-based Recurrent Network for RTC). We train R3Net using an actor-critic framework, where the actor and critic share the first few layers. The model is updated using Proximal Policy Optimization (PPO) [15] and the Adam optimizer with a learning rate of 3 x 10^-5, implemented using PyTorch, based on DeepRL [17]. We used around 10,000 network traces for simulation in training, and tested on 1150 different network traces.").
Thus, it would have been obvious to a person having ordinary skill in the art to which the claimed invention pertains before the effective filing date of the claimed invention to incorporate/combine/implement Ma’s device having the above discussed details and function into Fang’s teaching to arrive the claimed invention for the same rationales above discussed.
Regarding claim 9, in addition to features recited in base claim 1 (see rationales discussed above), Fang in view of Ma also discloses wherein the operations further comprise playing back the media frame at the playback duration (Ma; para [0015]: "two categories of states S, including the network QoS and the playback status are provided to the agent from the environment. For example, the network QoS parameters comprise the round-trip time (RTT), the received bitrate, the packet loss rate, the retransmission packet count and so on. The play back status includes the received frame rate, the maximum received frame interval and the minimum received frame interval. And para[0031]: "Two categories of states S, including the network QoS (such as the round-trip time (RTT), the received bitrate, the packet loss rate, the retransmission packet count) and the playback status (such as received frame rate, maximum received frame interval, and minimum received frame interval) are provided to the agent 301/302/303/304 by the environment 321." And paragraph [0037]: "The training set used for simulation is obtained by simulating real video streaming processes to get state observations (i.e., the network QoS and the playback status) over various patterns of network environment ... playback status can be obtained.").
Thus, it would have been obvious to a person having ordinary skill in the art to which the claimed invention pertains before the effective filing date of the claimed invention to incorporate/combine/implement Ma’s device having the above discussed details and function into Fang’s teaching to arrive the claimed invention for the same rationales above discussed.
Regarding claim 10, in addition to features recited in base claim 1 (see rationales discussed above), Fang in view of Ma also discloses wherein the operations of determining the action for the frame of media data in the jitter buffer based upon the jitter buffer state and network delay comprises using a machine-learned model, and wherein the operations further comprise: producing an estimated value of the action using a second model; and modifying the model based upon the estimated value and a reward signal (Fang; page 3, section 2.3 Model and Training: "The input of the neural network is a time series, representing the state of the path between sender and receiver over time. The history information has impact on the estimated bandwidth (e.g., increasing RTT may mean previous bandwidth estimates were too large). Thus, we use a recurrent neural network with Gated Recurrent Units (GRUs) [3] to estimate bandwidth, as shown in Figure 1. For the leaky ReLU layer, we use the negative slope of 0.01. For the rest of the paper, we refer to this neural network as R3Net (RL-based Recurrent Network for RTC). We train R3Net using an actor-critic framework, where the actor and critic share the first few layers. The model is updated using Proximal Policy Optimization (PPO) [15] and the Adam optimizer with a learning rate of 3 x 10^-5, implemented using PyTorch, based on DeepRL [17]. We used around 10,000 network traces for simulation in training, and tested on 1150 different network traces.").
Thus, it would have been obvious to a person having ordinary skill in the art to which the claimed invention pertains before the effective filing date of the claimed invention to incorporate/combine/implement Ma’s device having the above discussed details and function into Fang’s teaching to arrive the claimed invention for the same rationales above discussed.
Regarding claim 11, in addition to features recited in base claim 1 (see rationales discussed above), Fang in view of Ma also discloses wherein the media data comprises audio data, video data, or both audio and video data (RTP packets) (Fang; page 2, section 2 R2Net: An Initial Approach: “We take a receiver side approach to bandwidth estimation in RTC calls, using incoming RTP packets to estimate available bandwidth on the path between sender and receiver.” It is inherent that the real-time protocol (RTP) packets are included both audio and video data).
As per group claims 12-18, the claims appear to call for a method having limitations variously and essentially mirrored functional limitations of apparatus claims 1-5, 5 and 11, respectively.  Thus, they are deemed obvious over Fang in view of Ma for the same rationales applied to claims 1-5 and 11 as discussed above.


Conclusion
The prior/related art made of record and not relied upon is considered pertinent to applicant's disclosure.
Fang et al. (US 2021/0012227).
Taylor et al. (US 2021/0158141).
Singh et al. (US 2020/0074296).
Haggerty (US 10,965,806).
Pugaczewski (US 2020/0044955).
Jay et al., A Deep Reinforcement Learning Perspective on Internet Congestion Control, downloadable at http://proceedings.mlr.press/v97/jay19a/jay19a.pdf,  10 pages, 2019.
Mnih et al, Asynchronous Methods for Deep Reinforcement Learning, arXiv, 19 pages, June 16, 2016.
Creusen et al., Control of jitter buffer size using machine learning, Technical Disclosure Commons, 7 pages, December 6, 2017.
Tian et al., Deeplive: QoE Optimization for Live Video Streaming through Deep Reinforcement Learning, EasyChair Preprint, No. 2149, 6 pages, December 12, 2019.

Huang et al., QARC: Video Quality Aware Rate Control for Real-Time Video Streaming via Deep Reinforcement Learning, arXiv, 10 pages, October 27, 2018.
Bhattacharyya et al., QFlow: A Reinforcement Learning Approach to High QoE Video Streaming over Wireless Networks, AC, 10 pages, July 5, 2019.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to FRANK DUONG whose telephone number is (571)272-3164. The examiner can normally be reached 7:00AM-3:30PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, MICHAEL THIER can be reached on 571-272-2832. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 
Applicant is encouraged to submit a written authorization for Internet communications (PTO/SB/439, http://www.uspto.gov/sites/default/files/documents/sb0439.pdf) in the instant patent application to authorize the examiner to communicate with the applicant via email. The authorization will allow the examiner to better practice compact prosecution. The written authorization can be submitted via one of the following methods only: (1) Central Fax which can be found in the Conclusion section of this Office action; (2) regular postal mail; (3) EFS WEB; or (4) the service window on the Alexandria campus. EFS web is the recommended way to submit the form since this allows the form to be entered into the file wrapper within the same day (system dependent). Written authorization submitted via other methods, such as direct fax to the examiner or email, will not be accepted. See MPEP § 502.03.





/FRANK DUONG/Primary Examiner, Art Unit 2474                                                                                                                                                                                                       February 16, 2022