DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Arguments
Applicant’s arguments with respect to claim(s) 1-4, 7-14, and 17-21 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim 2, 11, 12, and 21 is/are rejected under 35 U.S.C. 103 as being unpatentable over Weiss (US20140095258) in view of Francis (US20190025917). 

Regarding claim 2, Weiss teaches a system for determining incentive distribution (Weiss: Paragraph [0184] “The system described above may, in some embodiments, provide incentives to consumers in exchange for participating in market research.” Determining incentive distribution is taught as provide incentives to consumers.), the system comprising: one or more processors (Weiss: Paragraph [0048] “deployed in part or in whole through a machine that executes computer software, program codes, and/or instructions on a processor.” One or more processors is taught as machine that executes computer software, program codes, and/or instructions on a processor.); 
and a memory storing instructions that, when executed by the one or more processors, cause the system to perform (Weiss: Paragraph [0048] “The processor may include memory that stores methods, codes, instructions and programs as described herein and elsewhere.” A memory storing instructions that, when executed by the one or more processors, cause the system to perform is taught as processor may include memory that stores methods, codes, instructions and programs as described herein and elsewhere.): obtaining feature information of entities, the feature information characterizing features of individual entities (Weiss: Paragraph [0024] “the system 108 may acquire information from at least one data source external to the system 108. The information acquired from the at least one data source may be any suitable information, as embodiments are not limited in this respect. In some cases, the information may include information regarding the consumer 102, regarding an inferred characteristic, and/or regarding a commercial entity or a product or service offered by a commercial entity. For example, in response to inferring a characteristic of the consumer 102, the system 108 may obtain social networking data provided by a consumer to a social networking service or that relates to the consumer 102.” Obtaining feature information for entities, the feature information characterizing features of individual entities is taught as the information may include information regarding the consumer, regarding an inferred characteristic, and/or regarding a commercial entity or a product or service offered by a commercial entity. For example, in response to inferring a characteristic of the consumer, the system may obtain social networking data provided by a consumer to a social networking service or that relates to the consumer.);… determining return metric from providing the individual incentives… to the individual entities based on the predicted returns and costs of the individual incentives (Weiss: Paragraph [0029] “29] In addition to or as an alternative to characteristics of the consumer, the system 108 may determine the reward to offer the consumer 102 based on one or more metrics regarding the consumer's performance of one or more tasks requested by the system 108. Such metrics may relate to a quality of the consumer's performance of the task for which the reward is offered or tasks previously performed. Examples of such metrics include a timeliness of the performance or a usefulness of information provided to the system 108 by the consumer 102 as a part of performing the task.” Determining return metric from providing the individual incentives to the individual entities based on the predicted returns is taught as the system may determine the reward to offer the consumer based on one or more metrics regarding the consumer's performance of one or more tasks requested by the system.); and identifying a set of incentives to be provided to one or more of the entities based on the return metric (Weiss: Paragraph [0029] “In addition to or as an alternative to characteristics of the consumer, the system 108 may determine the reward to offer the consumer 102 based on one or more metrics regarding the consumer's performance of one or more tasks requested by the system 108. Such metrics may relate to a quality of the consumer's performance of the task for which the reward is offered or tasks previously performed.” Identifying a set of incentives to be provided to one or more of the entities based on the return metric is taught as may determine the reward to offer the consumer based on one or more metrics regarding the consumer's performance of one or more tasks.).
Weiss does not explicitly disclose … training, using reinforcement learning, a deep-Q network based on historical data for predicting a reward from providing a given incentive to a given entity, wherein: the historical data comprise transition information of the entities, and a last layer of the deep-Q network represents individual incentives and a null incentive, wherein the null incentive represents no incentive to be provided; feeding the feature information of each of the entities into the deep-Q network to obtain predicted returns from providing the individual incentives and the null incentive to the entity;… and the null incentive
Francis further teaches  … training, using reinforcement learning (Francis: Paragraph [0133] “ The theory of ‘reinforcement learning’ formulates the environment as a Markov Decision Process. Given an environment and the current state of the actor (animals or automata) in the environment, RL suggests that the actor chooses an action not only to maximize its immediate expected reward but also its future expected rewards.” Training, using reinforcement learning is taught as using ‘reinforcement learning’ to predict the actors next action.), a deep-Q network (Francis: Paragraph [0135] “The state-action value, Qπ(s,a), is the expected return starting from state ‘s’ given that the RL agent executes the action ‘a’ in state ‘s’ under a policy. Specifically, we use an ε-greedy policy as the actor and the Q learning paradigm, augmented with Eligibility Trace Q(λ), as the actor's update rule.” A deep-Q network is taught as the RL(reinforcement learning) system that uses the Q learning paradigm.) based on historical data for predicting a reward from providing a given incentive to a given entity (Francis: Paragraph [0181] “In an embodiment of the architecture, r is the class label predicted by a reward classifier (critic) whose input is the M1 neural activity. Specifically, when population firing is classified as rewarding, r is set to 1, whereas when the neural activity is classified as non-rewarding, r is set to −1. As such, a classifier outputs a binary evaluative measure by decoding the neural signal, which critiques the executed action.” Based on historical data for predicting a reward from providing a given incentive to a given entity is taught as a class label predicted by a reward classifier whose input is the M1 neural activity to evaluate a measure of the neural signal to critique the executed action. Further refer to paragraph [0211].), wherein: the historical data comprise transition information of the entities (Francis: Paragraph [0133] “ Given an environment and the current state of the actor (animals or automata) in the environment, RL suggests that the actor chooses an action not only to maximize its immediate expected reward but also its future expected rewards. The term environment in our case includes the neural activation patterns from M1. ” The historical data comprise transition information of the entities is taught as given the current state of the actor predicting the future actions and rewards.), and a last layer of the deep-Q network represents individual incentives and a null incentive, wherein the null incentive represents no incentive to be provided (Francis: Paragraph [0111] “The goal of the actor (SAC-BMI agent) 102 is to maximize its immediate and future rewards provided by the critic 104. The multisensory feedback to the brain 106 with respect to the action performed results in a critic signal, which is labeled as rewarding or non-rewarding by a classifier.” A last layer of the deep-Q network represents individual incentives and a null incentive, wherein the null incentive represents no incentive to be provided is taught as the reinforcement learning network that uses Q values to predict whether the action depicted by the neural signal is to be rewarded (i.e.incentive)or non-rewarded(i.e. null incentive).); feeding the feature information of each of the entities into the deep-Q network to obtain predicted returns from providing the individual incentives and the null incentive to the entity (Francis: Paragraph [0111] “The goal of the actor (SAC-BMI agent) 102 is to maximize its immediate and future rewards provided by the critic 104. The multisensory feedback to the brain 106 with respect to the action performed results in a critic signal, which is labeled as rewarding or non-rewarding by a classifier.” Feeding the feature information of each of the entities into the deep-Q network to obtain predicted returns from providing the individual incentives and the null incentive to the entity is taught as the reinforcement learning network that uses Q values to predict whether the action depicted by the neural signal (i.e. feeding the feature information of each of the entities into the deep-Q network) is to be rewarded (i.e.incentive) or non-rewarded(i.e. null incentive). Specifically, to obtain the predicted returns of the reward or null reward is taught as the reward expectation which is calculated from from the learning architecture(Refer to paragraph [0009]).);… and the null incentive (Francis: Paragraph [0111] “The goal of the actor (SAC-BMI agent) 102 is to maximize its immediate and future rewards provided by the critic 104. The multisensory feedback to the brain 106 with respect to the action performed results in a critic signal, which is labeled as rewarding or non-rewarding by a classifier.” Null incentive is taught as non-rewarded(i.e. null incentive).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the consumer analytics system of Weiss with the deep Q-learning Network of Francis in order to utilize reinforcement learning based on Q-learning, thereby calculating reward expectation signal extracted from the same neural ensemble could be utilized as an evaluative signal (critic) of the performed action to allow autonomous improvement (Francis: Paragraph [0009] “reward expectation signal extracted from the same neural ensemble could be utilized as an evaluative signal (critic) of the performed action”).
  
Claim 12 is similarly rejected refer to claim 2 for further analysis.

Regarding claim 11, Weiss in view of Francis teaches the system of claim 2, Weiss further teaches wherein the entities include at least one passenger of a vehicle (Weiss: Paragraph [0087] “As an example, if the time and distance between points and altitude indicate the consumer is likely traveling in a car, the points obtained during this time could be cross-referenced with the known location of roads.” The entities include at least one passenger of a vehicle is taught as the consumer is likely traveling in a car.) or one driver of the vehicle.
	 Claim 21 is similarly rejected refer to claim 11 for further analysis.

Claim 1 is/are rejected under 35 U.S.C. 103 as being unpatentable over Weiss (US20140095258) in view of Francis (US20190025917) and Carpenter (US20130103476).

Regarding claim 1, Weiss further teaches A non-transitory computer-readable storage medium for determining incentive distribution (Weiss: Paragraph [0184] “The system described above may, in some embodiments, provide incentives to consumers in exchange for participating in market research.” Determining incentive distribution is taught as provide incentives to consumers. [0048] “Tangible storage media are non-transitory and have at least one physical, structural component.”), configured with instructions executable by one or more (Weiss: Paragraph [0048] “deployed in part or in whole through a machine that executes computer software, program codes, and/or instructions on a processor.” One or more processors is taught as machine that executes computer software, program codes, and/or instructions on a processor.): obtaining feature information of entities, the feature information characterizing features of individual entities (Weiss: Paragraph [0024] “the system 108 may acquire information from at least one data source external to the system 108. The information acquired from the at least one data source may be any suitable information, as embodiments are not limited in this respect. In some cases, the information may include information regarding the consumer 102, regarding an inferred characteristic, and/or regarding a commercial entity or a product or service offered by a commercial entity. For example, in response to inferring a characteristic of the consumer 102, the system 108 may obtain social networking data provided by a consumer to a social networking service or that relates to the consumer 102.” Obtaining feature information for entities, the feature information characterizing features of individual entities is taught as the information may include information regarding the consumer, regarding an inferred characteristic, and/or regarding a commercial entity or a product or service offered by a commercial entity. For example, in response to inferring a characteristic of the consumer, the system may obtain social networking data provided by a consumer to a social networking service or that relates to the consumer.);…determining return metric from providing the individual incentives … to the individual entities based on the predicted returns (Weiss: Paragraph [0029] “29] In addition to or as an alternative to characteristics of the consumer, the system 108 may determine the reward to offer the consumer 102 based on one or more metrics regarding the consumer's performance of one or more tasks requested by the system 108. Such metrics may relate to a quality of the consumer's performance of the task for which the reward is offered or tasks previously performed. Examples of such metrics include a timeliness of the performance or a usefulness of information provided to the system 108 by the consumer 102 as a part of performing the task.” Determining return metric from providing the individual incentives to the individual entities based on the predicted returns is taught as the system may determine the reward to offer the consumer based on one or more metrics regarding the consumer's performance of one or more tasks requested by the system.) and costs of the individual incentives (Weiss: Paragraph [0033] “Though, investment may alternatively or additionally be computed in other ways, such as based on the cost to the business of supplying rewards redeemed by consumers.” The costs of the individual incentives is taught as the cost to the business of supplying rewards redeemed by consumers.); and identifying a set of incentives to be provided to one or more of the entities based on the return metric (Weiss: Paragraph [0029] “In addition to or as an alternative to characteristics of the consumer, the system 108 may determine the reward to offer the consumer 102 based on one or more metrics regarding the consumer's performance of one or more tasks requested by the system 108. Such metrics may relate to a quality of the consumer's performance of the task for which the reward is offered or tasks previously performed.” Identifying a set of incentives to be provided to one or more of the entities based on the return metric is taught as may determine the reward to offer the consumer based on one or more metrics regarding the consumer's performance of one or more tasks.)  …, wherein identifying the set of incentives includes: identifying incentives with highest return metric for the individual entities (Weiss: Paragraph [0176] “The system may then use the total gross profit to compute a specific Return On Investment (ROI) for the reward, such as by comparing the cost of the reward with the total gross profit. The system may then provide the ROI to a business at which the reward was redeemable to provide the business with a measure of the effective profit the rewards program.” [0177] “calculates an ROI of the type described above, the measure of ROI may be computed individually for each type of reward given out. The ROI on each reward may then be compared with the ROI of one or more other rewards, such as multiple rewards offered by the same business, to determine which reward provides the best ROI.” Identifying incentives with highest return metric for the individual entities is taught as The ROI on each reward may then be compared with the ROI of one or more other rewards, such as multiple rewards offered by the same business, to determine which reward provides the best ROI (i.e. highest return metric)); and selecting the incentives with the highest return metric in an order of highest to lowest return metric (Weiss: Paragraph [0177-0178] “In embodiments in which the system calculates an ROI of the type described above, the measure of ROI may be computed individually for each type of reward given out. The ROI on each reward may then be compared with the ROI of one or more other rewards, such as multiple rewards offered by the same business, to determine which reward provides the best ROI. [0178] Determining which reward provides the best ROI may provide an indication of which reward the system should offer more often in the future or otherwise expand on.” The examiner notes selecting the incentives with the highest return metric in an order of highest to lowest return metric is taught as the rewards with the best ROI (i.e. highest return metric) are determined in the system and then favored for use more often in the future than the rewards with lower ROI. Refer to paragraph [0179] for further analysis.) ….

Francis further teaches  … training, using reinforcement learning (Francis: Paragraph [0133] “ The theory of ‘reinforcement learning’ formulates the environment as a Markov Decision Process. Given an environment and the current state of the actor (animals or automata) in the environment, RL suggests that the actor chooses an action not only to maximize its immediate expected reward but also its future expected rewards.” Training, using reinforcement learning is taught as using ‘reinforcement learning’ to predict the actors next action.), a deep-Q network (Francis: Paragraph [0135] “The state-action value, Qπ(s,a), is the expected return starting from state ‘s’ given that the RL agent executes the action ‘a’ in state ‘s’ under a policy. Specifically, we use an ε-greedy policy as the actor and the Q learning paradigm, augmented with Eligibility Trace Q(λ), as the actor's update rule.” A deep-Q network is taught as the RL(reinforcement learning) system that uses the Q learning paradigm.) based on historical data for predicting a reward from providing a given incentive to a given entity (Francis: Paragraph [0181] “In an embodiment of the architecture, r is the class label predicted by a reward classifier (critic) whose input is the M1 neural activity. Specifically, when population firing is classified as rewarding, r is set to 1, whereas when the neural activity is classified as non-rewarding, r is set to −1. As such, a classifier outputs a binary evaluative measure by decoding the neural signal, which critiques the executed action.” Based on historical data for predicting a reward from providing a given incentive to a given entity is taught as a class label predicted by a reward classifier whose input is the M1 neural activity to evaluate a measure of the neural signal to critique the executed action. Further refer to paragraph [0211].), wherein: the historical data comprise transition information of the entities (Francis: Paragraph [0133] “ Given an environment and the current state of the actor (animals or automata) in the environment, RL suggests that the actor chooses an action not only to maximize its immediate expected reward but also its future expected rewards. The term environment in our case includes the neural activation patterns from M1. ” The historical data comprise transition information of the entities is taught as given the current state of the actor predicting the future actions and rewards.), and a last layer of the deep-Q network represents individual incentives and a null incentive, wherein the null incentive represents no incentive to be provided (Francis: Paragraph [0111] “The goal of the actor (SAC-BMI agent) 102 is to maximize its immediate and future rewards provided by the critic 104. The multisensory feedback to the brain 106 with respect to the action performed results in a critic signal, which is labeled as rewarding or non-rewarding by a classifier.” A last layer of the deep-Q network represents individual incentives and a null incentive, wherein the null incentive represents no incentive to be provided is taught as the reinforcement learning network that uses Q values to predict whether the action depicted by the neural signal is to be rewarded (i.e.incentive)or non-rewarded(i.e. null incentive).); feeding the feature information of each of the entities into the deep-Q network to obtain predicted returns from providing the individual (Francis: Paragraph [0111] “The goal of the actor (SAC-BMI agent) 102 is to maximize its immediate and future rewards provided by the critic 104. The multisensory feedback to the brain 106 with respect to the action performed results in a critic signal, which is labeled as rewarding or non-rewarding by a classifier.” Feeding the feature information of each of the entities into the deep-Q network to obtain predicted returns from providing the individual incentives and the null incentive to the entity is taught as the reinforcement learning network that uses Q values to predict whether the action depicted by the neural signal (i.e. feeding the feature information of each of the entities into the deep-Q network) is to be rewarded (i.e.incentive) or non-rewarded(i.e. null incentive). Specifically, to obtain the predicted returns of the reward or null reward is taught as the reward expectation which is calculated from from the learning architecture(Refer to paragraph [0009]).);… and the null incentive (Francis: Paragraph [0111] “The goal of the actor (SAC-BMI agent) 102 is to maximize its immediate and future rewards provided by the critic 104. The multisensory feedback to the brain 106 with respect to the action performed results in a critic signal, which is labeled as rewarding or non-rewarding by a classifier.” Null incentive is taught as non-rewarded(i.e. null incentive).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the consumer analytics system of Weiss with the deep Q-learning Network of Francis in order to utilize reinforcement learning based on Q-learning, thereby calculating reward expectation signal extracted from the same neural ensemble could be utilized as an evaluative signal (critic) of the performed action to allow autonomous improvement (Francis: Paragraph [0009] “reward expectation signal extracted from the same neural ensemble could be utilized as an evaluative signal (critic) of the performed action”).
Weiss in view of Francis does not explicitly disclose and a budget for a period of time … until a sum of the costs of the selected incentives reaches the budget.
Carpenter further teaches …and a budget for a period of time (Carpenter: Paragraph [0024] “There is a need for campaign offers and rewards systems that enable advertisers to create multiple purchase offers over time and run simultaneous campaigns.” [0044] “An analytics engine, predicts a probability that a user will redeem a given offer and estimates a reward the user will be owed when it redeems the offer and calculates an expected amount of the campaign's budget that will be consumed if the user is presented with that offer.” A budget for a period of time is taught as calculates an expected amount of the campaign's budget. The campaigns occur over time.)… until a sum of the costs of the selected incentives reaches the budget (Carpenter: Paragraph [0038] “that control redemption of rewards in view of a reward budget, and turns off the campaign when a campaign budget has been met.” A sum of the costs of the selected incentives reaches the budget is taught as control redemption of rewards in view of a reward budget and turns off the campaign when a campaign budget has been met.).  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Weiss and Francis with the incentive campaigns of Carpenter in order to allow improved systems for offering incentive programs and for tracking participation in an incentive program, thereby allowing companies to use rewards and incentives to increase awareness of product offerings, to launch new products, to attract the attention of a newly identified audience (Carpenter: Paragraph [0007] “Companies use rewards and incentives to increase awareness of product offerings, to launch new products, to attract the attention of a newly identified audience”).

Claim 3, 4, 13 and 14 is/are rejected under 35 U.S.C. 103 as being unpatentable over Weiss (US20140095258) in view of Francis (US20190025917) and Carpenter (US20130103476).

Regarding claim 3, Weiss in view of Francis teaches the system of claim 2, Weiss further teaches wherein the set of incentives (Weiss: Paragraph [0029] “In addition to or as an alternative to characteristics of the consumer, the system 108 may determine the reward to offer the consumer 102 based on one or more metrics regarding the consumer's performance of one or more tasks requested by the system 108. Such metrics may relate to a quality of the consumer's performance of the task for which the reward is offered or tasks previously performed.” Identifying a set of incentives to be provided to one or more of the entities based on the return metric is taught as may determine the reward to offer the consumer based on one or more metrics regarding the consumer's performance of one or more tasks.) is…
	Weiss in view of Francis does not explicitly disclose identified further based on a budget for a period of time.
Carpenter further teaches identified further based on a budget for a period of time (Carpenter: Paragraph [0024] “There is a need for campaign offers and rewards systems that enable advertisers to create multiple purchase offers over time and run simultaneous campaigns.” [0044] “An analytics engine, predicts a probability that a user will redeem a given offer and estimates a reward the user will be owed when it redeems the offer and calculates an expected amount of the campaign's budget that will be consumed if the user is presented with that offer.” A budget for a period of time is taught as calculates an expected amount of the campaign's budget. The campaigns occur over time.).  
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Weiss and Francis with the incentive campaigns of Carpenter in order to allow improved systems for offering incentive programs and for tracking participation in an incentive program, thereby allowing companies to use rewards and incentives to increase awareness of product offerings, to launch new products, to attract the attention of a newly identified audience (Carpenter: Paragraph [0007] “Companies use rewards and incentives to increase awareness of product offerings, to launch new products, to attract the attention of a newly identified audience”).
	Claim 13 is similarly rejected refer to claim 3 for further analysis.

Regarding claim 4, Weiss in view of Francis and Carpenter teaches the system of claim 3, wherein identifying the set of incentives (Weiss: Paragraph [0029] “In addition to or as an alternative to characteristics of the consumer, the system 108 may determine the reward to offer the consumer 102 based on one or more metrics regarding the consumer's performance of one or more tasks requested by the system 108. Such metrics may relate to a quality of the consumer's performance of the task for which the reward is offered or tasks previously performed.” Identifying a set of incentives to be provided to one or more of the entities based on the return metric is taught as may determine the reward to offer the consumer based on one or more metrics regarding the consumer's performance of one or more tasks.) includes: identifying incentives with highest return metric for the individual entities (Weiss: Paragraph [0176] “The system may then use the total gross profit to compute a specific Return On Investment (ROI) for the reward, such as by comparing the cost of the reward with the total gross profit. The system may then provide the ROI to a business at which the reward was redeemable to provide the business with a measure of the effective profit the rewards program.” [0177] “calculates an ROI of the type described above, the measure of ROI may be computed individually for each type of reward given out. The ROI on each reward may then be compared with the ROI of one or more other rewards, such as multiple rewards offered by the same business, to determine which reward provides the best ROI.” Identifying incentives with highest return metric for the individual entities is taught as The ROI on each reward may then be compared with the ROI of one or more other rewards, such as multiple rewards offered by the same business, to determine which reward provides the best ROI (i.e. highest return metric)); and selecting the incentives with the highest return metric in an order of highest to lowest return metric (Weiss: Paragraph [0177-0178] “In embodiments in which the system calculates an ROI of the type described above, the measure of ROI may be computed individually for each type of reward given out. The ROI on each reward may then be compared with the ROI of one or more other rewards, such as multiple rewards offered by the same business, to determine which reward provides the best ROI. [0178] Determining which reward provides the best ROI may provide an indication of which reward the system should offer more often in the future or otherwise expand on.” The examiner notes selecting the incentives with the highest return metric in an order of highest to lowest return metric is taught as the rewards with the best ROI (i.e. highest return metric) are determined in the system and then favored for use more often in the future than the rewards with lower ROI. Refer to paragraph [0179] for further analysis.) …
Weiss does not explicitly disclose until a sum of the costs of the selected incentives reaches the budget.  
Carpenter further teaches until a sum of the costs of the selected incentives reaches the budget (Carpenter: Paragraph [0038] “that control redemption of rewards in view of a reward budget, and turns off the campaign when a campaign budget has been met.” A sum of the costs of the selected incentives reaches the budget is taught as control redemption of rewards in view of a reward budget and turns off the campaign when a campaign budget has been met.).  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Weiss and Francis with Carpenter in order to allow improved systems for offering incentive programs and for tracking participation in an incentive program, thereby allowing companies to use rewards and incentives to increase awareness of product offerings, to launch new products, to attract the attention of a newly identified audience (Carpenter: Paragraph [0007] “Companies use rewards and incentives to increase awareness of product offerings, to launch new products, to attract the attention of a newly identified audience”).
	Claim 14 is similarly rejected refer to claim 4 for further analysis.

Claim 7-10 and 17-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Weiss (US20140095258) in view of Francis (US20190025917) and Osband (US20170032245).

Regarding claim 7, Weiss in view of Francis teaches the system of claim 2, Weiss in view of Francis do not explicitly disclose wherein the training the deep-Q network further comprises: storage of at least a portion of the historical data in a replay memory; sampling of a first dataset of the information stored in the replay memory; and training of the deep-Q network using the sampled first dataset.  
	*The examiner notes that Francis further teaches training a deep-Q network, however for consistency, the limitations are maintained as rejected by Osband.*
Osband further teaches wherein the training the deep-Q network (Osband: Paragraph [0040] “a Deep Q-learning Network (DQN)” A deep-Q network is taught as a Deep Q-learning Network (DQN).) further comprises: storage of at least a portion of the historical data in a replay memory (Osband: Paragraph [0067] “ A set of data including historical and artificial data is obtained (705). In accordance with some embodiments, the observed data is read from a memory. In accordance with some other embodiments, the observed data is received from another system.” Storage of at least a portion of the historical data in a replay memory is taught as a set of data including historical read from a memory.); sampling of a first dataset of the information stored in the replay memory (Osband: Paragraph [0064] “The observed and artificial data are sampled to obtain a training set of data (615). In accordance with some of these embodiments, the training data includes M samples of data.” [0067] “ A set of data including historical and artificial data is obtained (705). In accordance with some embodiments, the observed data is read from a memory. In accordance with some other embodiments, the observed data is received from another system.” Training of the deep-Q network using the first sampled dataset is taught as obtain a training set of data where the training dataset includes samples of data which are read from a memory.); and training of the deep-Q network using the sampled first dataset (Osband: Paragraph [0064] “The observed and artificial data are sampled to obtain a training set of data (615). In accordance with some of these embodiments, the training data includes M samples of data.” Training of the deep-Q network using the sampled first dataset is taught as obtain a training set of data where the training dataset includes samples of data.).  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Weiss and Francis with the deep Q-learning Network of Osband in order to utilize Q-learning, thereby improving the stability for a deep learning network using reinforcement learning (Osband: Paragraph [0036] “Several important modifications to the updating process in Q-learning improve stability for a deep learning network using reinforcement learning provided in accordance with some embodiments of the invention.”).
Claim 17 is similarly rejected refer to claim 7 for further analysis.

Regarding claim 8, Weiss in view of Francis and Osband teaches the system of claim 7, Weiss further teaches … characterizing activities of the entities (Weiss: Paragraph [0024] “the system 108 may acquire information from at least one data source external to the system 108. The information acquired from the at least one data source may be any suitable information, as embodiments are not limited in this respect. In some cases, the information may include information regarding the consumer 102, regarding an inferred characteristic, and/or regarding a commercial entity or a product or service offered by a commercial entity. For example, in response to inferring a characteristic of the consumer 102, the system 108 may obtain social networking data provided by a consumer to a social networking service or that relates to the consumer 102.” Obtaining feature information for entities, the feature information characterizing features of individual entities is taught as the information may include information regarding the consumer, regarding an inferred characteristic, and/or regarding a commercial entity or a product or service offered by a commercial entity. ) after the set of incentives have been provided to the one or more of the entities (Weiss: Paragraph [0031] “may provide the information about the reward(s) before the consumer 102 performs the task, when the consumer 102 is to be incentivized to perform the task, or after the consumer 102 performs the task.” After the set of incentives have been provided to the one or more of the entities is taught as provide the information about the reward(s) after the consumer performs the task.).  
	*The examiner notes that Francis further teaches training a deep-Q network, however for consistency, the limitations are maintained as rejected by Osband.*
Osband further teaches wherein the deep-Q network (Osband: Paragraph [0040] “a Deep Q-learning Network (DQN)” A deep-Q network is taught as a Deep Q-learning Network (DQN).) is updated using the transition information of the entities, the transition information (Osband: Paragraph [0071] “The reward, rlt realized and resulting transition state, slt+1 are observed (735). The state, s, the action, a, and the resulting transition state, slt+1 are stored as resulting data in memory.” [0008] “selected by the process and results for the action including a reward and a transition state that result from the selected action are determined.” Updated using transition information for the entities, the transition information is taught as the transition states resulting from an action of a user.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Weiss and Francis with the deep Q-learning Network of Osband in order to utilize Q-learning, thereby improving the stability for a deep learning network using reinforcement learning (Osband: Paragraph [0036] “Several important modifications to the updating process in Q-learning improve stability for a deep learning network using reinforcement learning provided in accordance with some embodiments of the invention.”).
Claim 18 is similarly rejected refer to claim 8 for further analysis.

Regarding claim 9, Weiss in view of Francis and Osband teaches the system of claim 8, Osband further teaches wherein the deep-Q network (Osband: Paragraph [0040] “a Deep Q-learning Network (DQN)” A deep-Q network is taught as a Deep Q-learning Network (DQN).)  is updated using the transition information of the entities (Osband: Paragraph [0071] “The reward, rlt realized and resulting transition state, slt+1 are observed (735). The state, s, the action, a, and the resulting transition state, slt+1 are stored as resulting data in memory.” [0008] “selected by the process and results for the action including a reward and a transition state that result from the selected action are determined.” Updated using transition information for the entities is taught as the transition states resulting from an action for a user.) based on: storage of at least a portion of the transition information in the replay memory (Osband: Paragraph [0064] “The state, s, the action, a, and the resulting transition state, slt+1 are stored as resulting data in memory.” Storage of at least a portion of the transition information in the replay memory is taught as the state, s, the action, a, and the resulting transition state, slt+1 are stored as resulting data in memory), the storage of the (Osband: Paragraph [0064] “The state, s, the action, a, and the resulting transition state, slt+1 are stored as resulting data in memory. The selecting (630) and observing of the results are repeated until the time period ends (640). The observed set of data is then updated with the results (650).” The storage of the at least the portion of the transition information in the replay memory causing at least some of the historical information stored in the replay memory to be removed from the replay memory is taught as selecting (630) and observing of the results are repeated until the time period ends and the observed set of data is then updated with the results); sampling of a second dataset of the information stored in the replay memory (Osband: Paragraph [0066] “As such, it is desirable to use incremental methods to incorporate new data sample into the fitting process as the data is generated.” Sampling of a second dataset of the information stored in the replay memory is taught as incorporate new data sample into the fitting process as the data is generated.); and updating of the deep-Q network using the sampled second dataset (Osband: Paragraph [0066] “As such, it is desirable to use incremental methods to incorporate new data sample into the fitting process as the data is generated.” Updating of the deep-Q network using the sampled second dataset is taught as incorporate new data sample into the fitting process as the data is generated by the deep Q-learning model.).  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Weiss and Francis with the deep Q-learning Network of Osband in order to utilize Q-learning, thereby improving the stability for a deep learning network using reinforcement learning (Osband: Paragraph [0036] “Several important modifications to the updating process in Q-learning improve stability for a deep learning network using reinforcement learning provided in accordance with some embodiments of the invention.”).
	Claim 19 is similarly rejected refer to claim 9 for further analysis.

Regarding claim 10, Weiss in view of Francis and Osband teaches the system of claim 9, wherein updating of the deep-Q (Osband: Paragraph [0040] “a Deep Q-learning Network (DQN)” A deep-Q network is taught as a Deep Q-learning Network (DQN).) network includes a change in a last layer of the deep-Q network (Osband: Paragraph [0004] “One manner of training a deep learning network is referred to as reinforcement learning in which the system takes a sequence of actions in order to maximize cumulative rewards.” [0081] “The convolutional part of the network (a deep learning network that uses reinforcement learning as provided in accordance with an embodiment of the invention) used is identical to the one used in other systems.…The fully connected layers all use Rectified Linear Units (ReLU) as a non-linearity. Gradients 1/K that flow from each head are normalized.” Network includes a change in a last layer of the deep-Q network is taught as the sequence of actions at each time step from the training data which are used to update  the layers of the convolutional neural network.).  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Weiss and Francis with the deep Q-learning Network of Osband in order to utilize Q-learning, thereby improving the stability for a deep learning network using reinforcement learning (Osband: Paragraph [0036] “Several important modifications to the updating process in Q-learning improve stability for a deep learning network using reinforcement learning provided in accordance with some embodiments of the invention.”).
Claim 20 is similarly rejected refer to claim 10 for further analysis.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to AHSIF A. SHEIKH whose telephone number is (571)272-2607. The examiner can normally be reached Mon-Fri 7:30-5:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.

Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/A.A.S./Examiner, Art Unit 2127                                                                                                                                                                                                        

/ABDULLAH AL KAWSAR/Supervisory Patent Examiner, Art Unit 2127