Reasons for Allowance
Claims 1-20 are considered allowable since when reading the claims in light of the specification (MPEP § 2111.01) or In re Sneed, 710 F.2 1544, 1548, 218 USPQ 385,388 Fed. Cir. 1983), none of the references of record alone or in combination disclose or suggest the combination of limitations specified in independent claim 1, wherein the Applicant discloses electronically receive, from a video file repository, a video file demonstrating regulatory compliance requirements for one or more users associated with a resource distribution entity, wherein the video file comprises one or more image frames; transmit control signals configured to cause one or more computing devices of one or more users to display the video file in one or more interactive application environments stored thereon; initiate, via the one or more interactive application environments, a reinforcement learning algorithm on the video file, wherein implementing further comprises electronically receiving, via the one or more interactive application environments, one or more user inputs from the one or more users providing feedback for at least one or more portions of the video file; initiate an optimization policy generation engine on the one or more user inputs to generate an optimization policy, wherein the optimization policy generation engine is configured to encode the one or more user inputs into shaping rewards, wherein encoding further comprises assigning a cost to a first portion of one or more image frames associated with one or more negative feedbacks and assigning a reward to a second portion of the one or more image frames associated with one or more positive feedbacks, wherein the first portion and the second portion are associated with at least the one or more portions of the video file; initiate an implementation of the optimization policy on the video file, wherein initiating further comprises generating a modified video file based on at least the optimization policy to maximize an aggregated reward calculated based on the one or more positive feedbacks; initiate a validation engine on the modified video file, wherein the validation engine is configured to validate one or more changes implemented on the video file with the one or more user inputs and the optimization policy; and initiate a deployment of the modified video file to the one or more users.
The prior art of Petander (U.S. Pub. No. 20190222895 A1) discloses the system 100, as shown in FIG. 1, uses positive and negative rewards to teach itself when to show videos and which videos to show with the assumption that the user is willing to watch the right videos, if they are shown in suitable situations. Positive and negative in this disclosure is not meant to be limited to positive and negative numbers, respectively. Instead, positive and negative is to be understood in the sense that the rewards (i.e. feedback) influences the decision in a positive or negative way. For example, a positive reward associated with a particular decision outcome means that this decision outcome is marked as preferred. Vice versa, a negative reward marks the decision outcome as less desirable. From an algorithmic point of view, positive rewards can be encoded by negative numbers, such as cost, or as positive numbers, such as value. Equally, negative rewards can be encoded by positive numbers, such as cost, or as negative numbers, such as value. When the user is incentivized and the user's goals are aligned with the operator's goals, this can be expected to be the case. When it has an opportunity to show a video, it can choose between two actions, wait or show video. It is noted that the terms ‘wait’ and ‘delay’ are used interchangeably herein as well as the terms ‘show’ and ‘play’ are used interchangeably. These actions result in the system receiving rewards depending on the outcomes of the action: a positive reward for user watching the video and a negative reward for the user skipping the video. The system learns from the rewards to choose the action that maximizes the current and future rewards. It is noted here that rewards refer to technical parameters, such as weight values or factors used for calculation as opposed to commercial rewards of a reward point scheme for example.
The prior art of Zhang (U.S. Pub. No. 20200124429 A1) discloses adjusting the policy based on a reward received by applying one or more of the plurality of improvement actions, wherein the reward is calculated based on cost changes after applying the one or more actions. In some embodiments, the reward for applying the one or more improvement actions is a predetermined positive number if the one or more improvement actions reduce the cost of the routing solution or a predetermined negative number if the one or more improvement actions do not reduce the cost of the routing solution. In some embodiments, applying the one or more of the plurality of improvement actions to the state before applying the perturbation action corresponds to a first iteration; and the reward for applying the one or more improvement actions in an i-th iteration comprises a difference between a prior total cost reduction of a prior iteration and a total cost reduction of the i-th iteration.
The prior art of Gupta (U.S. Pub. No. 20210142387 A1) discloses adaptively selecting personalized contents for users that help achieve certain goals. For instance, a machine learning model may use artificial intelligence techniques, such as reinforcement learning, to select a set of personalized contents based on feedback (e.g., rewards) received in response to user actions. The rewards assigned to the user actions, which are provided as feedback to the machine learning model, may be adapted to achieve specific business goals.

Inter alia, independent claims 8 and 15, wherein the Applicant discloses similar limitations, are allowed for reasons similar to those cited above.

Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee. Such submissions should be clearly labeled "Comments on Statement of Reasons for Allowance."

Correspondence Information
Any inquiry concerning this communication or earlier communications from the examiner should be directed to KYU CHAE whose telephone number is (571)270-5696.  The examiner can normally be reached on 8:00am -4:30pm.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, NASSER MOAZZAMI can be reached on 571-272-4195.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/KYU CHAE/
Primary Examiner, Art Unit 2426