DETAILED ACTION
This action is in response to the initial filing of Application no. 16/720374 on 12/12/2019.
Claims 1- 20 are still pending in this application, with claims 1, 12 and 20 being independent.
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Allowable Subject Matter
Claims  7, 10 and 18 objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims since it has been determined, based on search and consideration, that the prior art fails to teach or suggest in reasonable combination the limitations recited in these claims.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 2, 5, 8, 11, 12, 13, 16, 19 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Reiss et al. (US 2021/0090017) (“Reiss”) in view of Forster et al. (“A Cognitive Computing Approach for Classification of Complaints in the Insurance Industry”) (“Forster”) and further in view of  Hu et al. (US 2018/0012153) (“Hu”).
For claims 1 and 12, Reiss discloses a non-transitory computer-readable storage medium for automatically adjusting strategies (Abstract), configured with instructions when executable by one or more processors to cause the one or more processors to perform operations (Fig.8, 804; [0115] [0117]) comprising: receiving a plurality of complaints (negative feedback about an item from a buyer and/or courier), wherein each of the complaints corresponds to an order ([0025] [0046] [0048] [0050 – 0052] [0054] [0064] [0074]); determining and selecting a category for a plurality of categories of complaints based on a number of complaints in the selected category, wherein each of the complaints correspond to an order (“In some cases, the computational model 532 may apply parsing and text recognition for determining, from particular orders, the item to which the negative feedback applies … consequently, when a threshold level of negative feedback has been received for a category of items from a merchant, the predicted spoilage times for the item or category of items may be adjusted to be a shorter length of time”, [0076]); from a group of strategies each associated with one or more conditions and one or more actions (e.g. variable delivery zones wherein items available for delivery are based on spoilage and courier time conditions to ensure quality of delivered items and/or affinity based delivery wherein couriers are selected for merchants based on mutual affinity conditions to improve efficiency, [0014] [0015] [0018 – 0020] [0025] [0026] [0075] [0076] [0085] [0086] [0088]), selecting a candidate strategy causing the complaints of the selected category (“Accordingly, when feedback from a plurality of buyers and/or couriers indicates that a particular item has been delivered in a spoiled or otherwise unsatisfactory condition, and the unsatisfactory condition may be attributed at least in part to the amount of time expended for delivery of the item, the spoilage time for the item may be shortened or otherwise revised”, wherein the candidate strategy is variable delivery zones wherein items available for delivery are based on spoilage time and courier travel time to ensure quality of delivered items, [0019] [0025] [0026] [0075] [0076] [0086]), wherein the one or more actions are executed in response to the one or more conditions being satisfied (an item is allowed to be bought by a buyer based on the spoilage time being less that a courier travel time to the  buyer’s delivery location and/or courier is selected based on positive affinity between a merchant and the courier, [0019] [0025] [0026] [0075] [0086]); and optimizing the candidate strategy by changing the one or more conditions of the candidate strategy based on a plurality of historical orders (a predicted spoilage time for a category of items for a particular merchant is adjusted to be a shorter length based on the threshold level of negative feedback from a plurality of historical orders so that a buyer at a destination is prevented from ordering the category of items based on the spoilage time being less than the available courier travel times to the destination, [0026] [0027] [0029 – 0032] [0040] [0043] [0044] [0049 – 0054] [0062] [0063] [0064] [0075] [0076] [0078] [0080]). Yet, Reiss fails to teach the following: the receiving a plurality of complaints and determining and selecting a category for a plurality of categories of  the complaints further comprises determining one or more characteristics of a plurality of complaints, wherein each of the complaints corresponds to an order and classifying the plurality of complaints into a plurality of categories based on the one or more characteristics by using a trained classifier; and optimizing the candidate strategy using a reinforcement learning model.
However, Forster discloses a method for classifying complaints (Abstract) wherein features (characteristics) of a plurality of complaints are determined (5.1 Dataset and 5.2 Learning and Testing Procedure), and the plurality of complaints are classified into a plurality of categories based on the one more characteristics by using a trained classifier (MaxEnt, 3.2 Machine Learning) (2. Problem Definition and 5.2. Learning and Testing Procedure)
Additionally, Hu discloses a system and method for order allocation (Abstract), wherein reinforcement (reinforced) learning is used to continuously optimize algorithms used to process information in an order allocation system (e.g. food ordering system) ([0081 – 0087]).
Therefore, it would have been obvious to one of ordinary skill in the art at the time of applicant’s filing to improve the invention disclosed by Reiss in the same way that Forster’s and Hu’s inventions to achieve the following predictable results for the purpose of enabling people to participate as couriers in a crowdsourced service economy and make decisions that can assist in courier retention, while also ensuring customer satisfaction with the service (Reiss, [0001] [0014]): the receiving a plurality of complaints and determining and selecting a category for a plurality of categories of  the complaints further comprises further comprises determining one or more characteristics (features) of the plurality of complaints are determined, wherein each of the complaints corresponds to the order and classifying the complaints into a plurality of categories based on the one or more characteristics by using a trained classifier; and the algorithm(s) used to process information and perform actions associated with the candidate strategy (Reiss, [0078 – 0080]) is further optimized using reinforcement learning.

 For claims 2 and 13, Reiss and Forster further disclose, wherein the determining one or more characteristics of a plurality of complaints comprises, for each of the complaints, using Natural Language Processing (NLP) to: extract one or more first features from a content of the each complaint (Reiss, parsing and text recognition are applied to comments associated with orders, [0064] [0076]) (Forster, feature includes bag-of-words, 5.1 Dataset and 5.2 Learning and Testing Procedure); extract one or more second features from the order corresponding to the each complaint (Reiss, parsing and text recognition are applied to comments associated with orders, [0064] [0076]) (Forster,  bi-grams are extracted, 5.1 Dataset and 5.2 Learning and Testing Procedure); and extract one or more third features from a user profile associated with the order corresponding to the each complaint (Reiss, feedback is a part of the buyer’s historic information, wherein parsing and text recognition are applied to comments associated with feedback, [0065] [0076]) (Forster, lemmas with the highest tf-idf and log-likelihood values were calculated  and sorted by their POS, 5.1 Dataset and 5.2 Learning and Testing Procedure), wherein the one or more characteristics comprise the first, second, and third features.

	For claims 5 and 16, Reiss further discloses, wherein the selecting a category from the categories based on a number of complaints in the selected category comprises: selecting the category if the number of complaints in the category is greater than a threshold (Reiss, [0076]).
 	For claim 8, Reiss further discloses, wherein: the one or more conditions are based on one or more of the following parameters: time of the order, pickup location and destination (Reiss, an increase in spoilage time changes or affects the destinations/delivery zones, [0025] [0075] [0076]).
	For claims 11 and 19, Reiss further discloses, wherein the optimizing a candidate strategy at least lowers a false positive rate (Reiss, a buyer is prevented from ordering an item, [0025] [0075] [0076]).

For claim 20, Reiss discloses a method for automatically adjusting strategies (Abstract), configured with instructions when executable by one or more processors to cause the one or more processors to perform operations (Fig.8, 804; [0115] [0117]) comprising: receiving a plurality of  feedback (feedback about an item from a buyer and/or courier), wherein each of the feedbacks corresponds to an order ([0025] [0046] [0048] [0050 – 0052] [0054] [0064] [0074]); determining and selecting a category from a plurality of categories of complaints based on a number of feedbacks in the selected category, wherein each of the complaints correspond to an order (“In some cases, the computational model 532 may apply parsing and text recognition for determining, from particular orders, the item to which the negative feedback applies … consequently, when a threshold level of negative feedback has been received for a category of items from a merchant, the predicted spoilage times for the item or category of items may be adjusted to be a shorter length of time”, [0076]); from a group of strategies each associated with one or more conditions and one or more actions (e.g. variable delivery zones wherein items available for delivery are based on spoilage and courier time conditions to ensure quality of delivered items and/or affinity based delivery wherein couriers are selected for merchants based on mutual affinity conditions to improve efficiency, [0014] [0015] [0018 – 0020] [0025] [0026] [0075] [0076] [0085] [0086] [0088]), identifying a candidate strategy resulting in feedbacks of the selected category (“Accordingly, when feedback from a plurality of buyers and/or couriers indicates that a particular item has been delivered in a spoiled or otherwise unsatisfactory condition, and the unsatisfactory condition may be attributed at least in part to the amount of time expended for delivery of the item, the spoilage time for the item may be shortened or otherwise revised”, wherein the candidate strategy is variable delivery zones wherein items available for delivery are based on spoilage time and courier travel time to ensure quality of delivered items, [0019] [0025] [0026] [0075] [0076] [0086]), wherein the one or more actions are executed in response to the one or more conditions being satisfied (an item is allowed to be bought by a buyer based on the spoilage time being less that a courier travel time to the  buyer’s delivery location and/or courier is selected based on positive affinity between a merchant and the courier, [0019] [0025] [0026] [0075] [0086]); and in response to the feedbacks of the selected category comprising complaints, optimizing the candidate strategy by changing the one or more conditions of the candidate strategy based on a plurality of historical orders (a predicted spoilage time for a category of items for a particular merchant is adjusted to be a shorter length based on the threshold level of negative feedback from a plurality of historical orders so that a buyer at a destination is prevented from ordering the category of items based on the spoilage time being less than the available courier travel times to the destination, [0026] [0027] [0029 – 0032] [0040] [0043] [0044] [0049 – 0054] [0062] [0063] [0064] [0075] [0076] [0078] [0080]). Yet, Reiss fails to teach the following: the receiving a plurality of complaints and determining and selecting a category for a plurality of categories of  the complaints further comprises determining one or more characteristics of a plurality of complaints, wherein each of the complaints corresponds to an order and classifying the plurality of complaints into a plurality of categories based on the one or more characteristics by using a trained classifier; and optimizing the candidate strategy using a reinforcement learning model.
However, Forster discloses a method for classifying complaints (Abstract) wherein features (characteristics) of a plurality of complaints are determined (5.1 Dataset and 5.2 Learning and Testing Procedure), and the plurality of complaints are classified into a plurality of categories based on the one more characteristics by using a trained classifier (MaxEnt, 3.2 Machine Learning) (2. Problem Definition and 5.2. Learning and Testing Procedure)
Additionally, Hu discloses a system and method for order allocation (Abstract), wherein reinforcement (reinforced) learning is used to continuously optimize algorithms used to process information in an order allocation system (e.g. food ordering system) ([0081 – 0087]).
Therefore, it would have been obvious to one of ordinary skill in the art at the time of applicant’s filing to improve the invention disclosed by Reiss in the same way that Forster’s and Hu’s inventions to achieve the following predictable results for the purpose of enabling people to participate as couriers in a crowdsourced service economy and make decisions that can assist in courier retention, while also ensuring customer satisfaction with the service (Reiss, [0001] [0014]): the receiving a plurality of complaints and determining and selecting a category for a plurality of categories of  the complaints further comprises further comprises determining one or more characteristics (features) of the plurality of complaints are determined, wherein each of the complaints corresponds to the order and classifying the complaints into a plurality of categories based on the one or more characteristics by using a trained classifier; and the algorithm(s) used to process information and perform actions associated with the candidate strategy (Reiss, [0078 – 0080]) is further optimized using reinforcement learning.

Claims 3 and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Reiss et al. (US 2021/0090017) (“Reiss”) in view of Forster et al. (“A Cognitive Computing Approach for Classification of Complaints in the Insurance Industry”) (“Forster”), and further in view of  Hu et al. (US 2018/0012153) (“Hu”) and further in view of Wang et al. (“Inductive and Example-Based Learning for Text Classification”) (“Wang”).
For claims 3 and 14, the combination of Reiss, Forster and Hu fails to teach that the classifier is trained as using semi-supervised machine learning based on a first plurality of historical complaints with corresponding categorical labels and a second plurality of historical complaints that are not labeled.
However, Wang  discloses a method for text classification (Abstract), wherein a classifier is trained using a semi-supervised algorithm (graph based) based on labeled and unlabeled data (2.3. Example- Based Learning, 1611 and 1612) or a supervised learning algorithm (2.2. MaxEnt Model Parameterization with TF*IDF Weighted Vector Space Model, pg. 1611) 
Therefore, it would have been obvious at the time of applicant’s filing to substitute the classifier trained by supervised learning disclosed by the combination of Reiss, Forster and Hu with Wang’s classifier which is trained using semi-supervised machine learning to achieve the predictable results of further classifying text including the complaints for the purpose of efficiently enabling people to participate as couriers in a crowdsourced service economy and make decisions that can assist in courier retention, while also ensuring customer satisfaction with the service (Reiss, [0001] [0014]), wherein a component of providing this service involves using classifiers which can be trained using a small amount of labeled data.

Claims 4 and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Reiss et al. (US 2021/0090017) (“Reiss”) in view of Forster et al. (“A Cognitive Computing Approach for Classification of Complaints in the Insurance Industry”) (“Forster”), and further in view of  Hu et al. (US 2018/0012153) (“Hu”) and further in view of Arivoli et al.  (“Document Classification Using Machine Learning Algorithms - A Review”) (“Arivoli”).
For claims 4 and 15, the combination of Reiss, Forster and Hu fails to teach, wherein the classifier comprises an unsupervised machine learning model trained to group the complaints based on vector representations of the one or more characteristics of the plurality of complaints.
However, Arivoli discloses a method for document classification (Abstract), wherein a classifier comprises an unsupervised machine learning model trained to group documents based on vector representations of the one or more characteristics of the plurality of documents (2. Document Representation, 3.13. RBM and 3.17 DBN, pg.48, 49 and 51) or a supervised learning algorithm (3.1 Maximum Entropy Classifier, pg. 49) 
Therefore, it would have been obvious at the time of applicant’s filing to substitute the classifier trained by supervised learning disclosed by the combination of Reiss, Forster and Hu with Arivoli’s classifier which comprises an unsupervised machine learning model to achieve the predictable results of further classifying text including the complaints for the purpose of efficiently enabling people to participate as couriers in a crowdsourced service economy and make decisions that can assist in courier retention, while also ensuring customer satisfaction with the service (Reiss, [0001] [0014]), wherein a component of providing this service involves using classifiers which can be trained without using labeled data.

Claim 6 is rejected under 35 U.S.C. 103 as being unpatentable over Reiss et al. (US 2021/0090017) (“Reiss”) in view of Forster et al. (“A Cognitive Computing Approach for Classification of Complaints in the Insurance Industry”) (“Forster”), and further in view of  Hu et al. (US 2018/0012153) (“Hu”) and further in view of Sutcliffe (US 2006/0069620).
For claim 6, the combination of Reiss, Forster and Hu fails to teach, wherein the selecting a category from the plurality of categories based on a number of complaints in the selected category comprises: selecting the category if an increase of the number of complaints in the selected category during a period of time is greater than a threshold.
However, Sutcliffe discloses a food brokering method (Abstract), wherein a remedial action is performed when the number of complaints received for a restaurant is above a predetermined threshold for a predetermined period of time ([0107]).
Therefore, it would have been obvious to one of ordinary skill in the art at the time of applicant’s filing to improve the invention disclosed by the combination of Reiss, Forster and Hu in the same way that Sutcliffe’s invention has been improved to achieve the predictable results of further providing a remedial action, e.g. selecting the category if an increase of the number of complaints in the selected category during a period of time is greater than a threshold for the purpose of enabling people to participate as couriers in a crowdsourced service economy and make decisions that can insist in courier retention, while also ensuring customer satisfaction with the service (Reiss, [0001] [0014])

Claims 9 and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Reiss et al. (US 2021/0090017) (“Reiss”) in view of Forster et al. (“A Cognitive Computing Approach for Classification of Complaints in the Insurance Industry”) (“Forster”), and further in view of  Hu et al. (US 2018/0012153) (“Hu”), and further in view of Ren et al. (“Reinforcement Learning for On-Demand Logistics”) (“Ren”) and further in view of Tesauro et al. (US 9,047,423) (“Tesauro”).
For claims 9 and 17, the combination of Reiss, Forster and Hu fails to teach wherein the optimizing a candidate strategy using a reinforcement learning model comprises: building one or more search graphs using Monte Carlo Graph Search (MCGS) algorithm based on a plurality of historical orders.
However, Ren discloses a method for performing reinforcement learning for on-demand logistics (Abstract), wherein an optimal policy is obtained using Monte-Carlo (The goal of reinforcement learning is to find the optimal policy. This is not trivial since unlike Markov Decision Processes, the rewards and transition probabilities between states are unknown, as seen in Figure 3. For this reason, there are many techniques to either obtain this information (model-based) or obtain the optimal policy directly (model-free), such as Monte Carlo, Bootstrap, and SARSA). Furthermore, Ren discloses that features of a delivery system embody a current status of an environment (Reinforcement learning is one of the most popular and powerful artificial intelligence algorithms today. Instances of reinforcement learning have reached mainstream news, such as AlphaGo, the reinforcement learned computer program that defeated the world’s top Go player. In short, the goal of reinforcement learning is to learn the best action given a state of the environment, in order to maximize the overall rewards. Here are the fundamental concepts in reinforcement learning, summarized in Figure 2… Now we will discuss how we applied reinforcement learning to the DoorDash assignment problem. To formulate the assignment problem in a way that’s suitable for reinforcement learning, we made the following definitions.
State: The outstanding deliveries and working Dashers, since they represent the current status of the world from an assignment perspective. Note that this means that the state space is practically infinite, since deliveries and Dashers individually can have many different characteristics (pick up location/time, drop off location/time, etc.) and there can be many different combinations of deliveries and Dashers).
Additionally, Tesauro discloses a Monte Carlo planning method (Abstract), wherein search graphs comprising states and actions are built using a Monte Carlo Graph Search algorithm  (Fig.2; column 4 lines 8 – 25, column 7 lines 15 – 54)
Therefore, it would have been obvious to one of ordinary skill in the art at the time of applicant’s filing to improve the invention disclosed by the combination of Reiss, Forster and Hu in the same way that Ren’s and Tesauro’s invention have been improved to achieve the following, predictable results for the purpose of efficiently enabling people to participate as couriers in a crowdsourced service economy and make decisions that can assist in courier retention, while also ensuring customer satisfaction with the service (Reiss, [0001] [0014]), wherein the optimizing a candidate strategy using a reinforcement learning model comprises: building one or more search graphs using Monte Carlo Graph Search (MCGS) algorithm based on a plurality of delivery system features including historical orders.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SONIA L GAY whose telephone number is (571)270-1951. The examiner can normally be reached Monday-Friday 9-5 ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel Washburn can be reached on 571-272-5551. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/SONIA L GAY/Primary Examiner, Art Unit 2657