DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Status of the Claims
The amendment received on 08 March 2022 has been acknowledged and entered.
Claims 1, 10, and 19 have been amended.
No new claims have been added.
Claims 1-20 are currently pending.

Drawings
The drawings were received on 08 March 2022.  These drawings are acceptable.

Response to Amendments and Arguments
Applicant’s arguments, see REMARKS, page 12, filed 08 March 2022, with respect to claims 1-20 have been fully considered and are persuasive.  The rejection of claims 1-20 under 35 U.S.C. 101 has been withdrawn. 

Terminal Disclaimer
The terminal disclaimer filed on 08 March 2022 disclaiming the terminal portion of any patent granted on this application which would extend beyond the expiration date of US Patent Application No. 16/237,103 has been reviewed and is accepted.  The terminal disclaimer has been recorded.

EXAMINER’S AMENDMENT
An examiner’s amendment to the record appears below. Should the changes and/or additions be unacceptable to applicant, an amendment may be filed as provided by 37 CFR 1.312. To ensure consideration of such an amendment, it MUST be submitted no later than the payment of the issue fee.
					
In the Claims
 	1. (Currently Amended) A method for ride order dispatching and vehicle repositioning, comprising: 
 	obtaining information comprising a location of a vehicle, current orders, and a current time; 
 	inputting the obtained information to a trained model; and 
 	determining action information for the vehicle based on an output of the trained model, the action information comprising: re-positioning the vehicle or accepting a ride order, wherein:  
 	the model comprises a multi-driver deep-Q network (MD-DQN) and is configured with model instructions for performing: 
receiving information of drivers and information of orders as inputs; 
obtaining a global state based on the information of drivers, the information of orders, and a global time, each state transition of the global state being from a driver of the drivers becoming available to another driver of the drivers becoming available; 
querying a plurality of driver-order pairs and driver-reposition pairs based at least on the obtained global state to determine a Q-value of the MD-DQN for the driver; 
determining the action information as the output based at least on the determined Q-value to optimize a total return for the drivers; and 
sending the determined action information to the vehicle for execution, 
wherein the method further comprises: 
training the MD-DON based on training data comprising historical driver information, historical order information, and expected action values, wherein the training comprises: 
predicting a Q-value for each entry in the historical data; 2Application No.: 16/237,175 Attorney Docket No.: 55KS-288773 Client Ref. No.: D19F00116USL-US-US1 
determining an error based on the 0-value and an expected action value using a loss function [[and]]; and 
propagating the error back through the MD-DON and updating weights of a plurality of layers of the MD-DON according to the propagated error.

10. (Currently Amended) A system for ride order dispatching and vehicle repositioning, comprising a processor and a non-transitory computer-readable storage medium storing instructions that, when executed by the processor, cause the system to perform operations comprising: 
obtaining information comprising a location of a vehicle, current orders, and a current time; 
inputting the obtained information to a trained model; and 
determining action information for the vehicle based on an output of the trained model, the action information comprising: 
re-positioning the vehicle or accepting a ride order, wherein: 
the model comprises a multi-driver deep-Q network (MD-DQN) and is configured with model instructions for performing: receiving information of drivers and information of orders as inputs; 
obtaining a global state based on the information of drivers, the information of orders, and a global time, each state transition of the global state being from a driver of the drivers becoming available to another driver of the drivers becoming available; 
querying a plurality of driver-order pairs and driver-reposition pairs based at least on the obtained global state to determine a Q-value of the MD-DQN for the driver; 
determining the action information as the output based at least on the determined Q-value to optimize a total return for the drivers; and 
sending the determined action information to the vehicle for execution, 
wherein the method further comprises: 
training the MD-DON based on training data comprising historical driver information, historical order information, and expected action values, wherein the training comprises: 
predicting a Q-value for each entry in the historical data; 
determining an error based on the 0-value and an expected action value using a loss function [[and]]; and propagating the error back through the MD-DON and updating weights of a plurality of layers of the MD-DON according to the propagated error.

 	19. (Currently Amended) A non-transitory computer-readable storage medium storing instructions that, when executed by a processor, cause the processor to perform operations comprising: 
 	obtaining information comprising a location of a vehicle, current orders, and a current time; 
 	inputting the obtained information to a trained model; and 
 	determining action information for the vehicle based on an output of the trained model, the action information comprising: re-positioning the vehicle or accepting a ride order, wherein: 
 	the model comprises a multi-driver deep-Q network (MD-DQN) and is configured with model instructions for performing: 
 	receiving information of drivers and information of orders as inputs; 
 	obtaining a global state based on the information of drivers, the information of orders, and a global time, each state transition of the global state being from a driver of the drivers becoming available to another driver of the drivers becoming available; 
 	querying a plurality of driver-order pairs and driver-reposition pairs based at least on the obtained global state to determine a Q-value of the MD-DQN for the driver; 
 	determining the action information as the output based at least on the determined Q-value to optimize a total return for the drivers; and 
 	sending the determined action information to the vehicle for execution, wherein the method further comprises: 
 	training the MD-DON based on training data comprising historical driver information, historical order information, and expected action values, wherein the training comprises: 
 	predicting a Q-value for each entry in the historical data; 
 	determining an error based on the 0-value and an expected action value using a loss function [[and]]; and propagating the error back through the MD-DON and updating weights of a plurality of layers of the MD-DON according to the propagated error.

Allowable Subject Matter
Claims 1-20 are allowed.
The following is an examiner’s statement of reasons for allowance:
Independent claims 1, 10, and 19 applies, relies on or uses the judicial exception in a manner that imposes a meaningful limit on the judicial exception, such that the claims are more than a drafting effort designed to monopolize the judicial exception.

Further, as per independent claims 1, 10, and 19, the best prior art:
 	1) Tulabandhula et al. (US PG Pub. 2018/0046961 A1) discloses a method and system for dispatching of vehicles in a public transportation network, which includes a processing device which may apply a Markov Decision Process (MDP) model to determine a score for each of multiple decision rules, in which each score represents a number of passengers waiting at the stop at the end of a time interval, and use the scores to identify a number of waiting passengers at which a reserve vehicle should be dispatched. 
	However, Tulabandhula et al. does not disclose or fairly teach:
the model comprises a multi-driver deep-Q network (MD-DQN) and is configured with model instructions for performing: 
 	receiving information of drivers and information of orders as inputs; 
obtaining a global state based on the information of drivers, the information of orders, and a global time, each state transition of the global state being from a driver of the drivers becomes available to another driver of the drivers becomes available; 
querying a plurality of driver-order pairs and driver-reposition pairs based at least on the obtained global state to determine a Q-value of the MD-DQN for the driver; and 
determining the action information as the output based at least on the determined Q-value to optimize a total return for the drivers; and
 	wherein the method further comprises: 
 	training the MD-DQN based on training data comprising historical driver information, historical order information, and expected action values, wherein the training comprises: 
predicting a Q-value for each entry in the historical data; determining an error based on the Q-value and an expected action value using a loss function; and 
propagating the error back through the MD-DQN and updating weights of a plurality of layers of the MD-DQN according to the propagated error.

As per independent claims 1, 10, and 19, the best NPL prior art:
 	1) Link et al., “Efficient Large-Scale Fleet Management via Multi-Agent Deep Reinforcement Learning”; 23 August 2018; Proceedings of the 24th ACMSIGKDD International Conference on Knowledge Discovery & Data Mining (KDD'18), pages 1774-1783, discloses balancing supply of drivers and demand of the rider request by directing drivers to locations of high demand.
 
2) Xu et al., “Large-Scale Order Dispatch in On-Demand Ride-Hailing Platforms: A Learning and Planning Approach”; 23 August 2018; Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD'18), pages 905-913, discloses using a dispatch algorithm to optimize order dispatch over a long period of time by fulfilling current passenger demand and optimizing the anticipated future gain by relying on modeling of spatiotemporal passenger demand and taxi mobility patterns.
However, neither Link et al. nor Xu et al. disclose or fairly teach:
 	the model comprises a multi-driver deep-Q network (MD-DQN) and is configured with model instructions for performing: 
 	receiving information of drivers and information of orders as inputs; 
obtaining a global state based on the information of drivers, the information of orders, and a global time, each state transition of the global state being from a driver of the drivers becomes available to another driver of the drivers becomes available; 
querying a plurality of driver-order pairs and driver-reposition pairs based at least on the obtained global state to determine a Q-value of the MD-DQN for the driver;  
determining the action information as the output based at least on the determined Q-value to optimize a total return for the drivers; and
	wherein the method further comprises: 
 	training the MD-DQN based on training data comprising historical driver information, historical order information, and expected action values, wherein the training comprises: 
predicting a Q-value for each entry in the historical data; determining an error based on the Q-value and an expected action value using a loss function; and 
propagating the error back through the MD-DQN and updating weights of a plurality of layers of the MD-DQN according to the propagated error.

As per independent claims 1, 10, and 19, the best Foreign prior art,
 	1)  Zu et al. (CN 107832882 A) discloses a method for recommending taxi passenger-searching strategy based on a Markov decision process by providing a taxi searching strategy obtaining an optimal solution for finding passengers for a taxi platform based on a Markov decision process and utilize Q-learning to solve the optimization problem.
 	However, however, Zu et al. does not disclose or fairly teach:
 	the model comprises a multi-driver deep-Q network (MD-DQN) and is configured with model instructions for performing: 
 	receiving information of drivers and information of orders as inputs; 
obtaining a global state based on the information of drivers, the information of orders, and a global time, each state transition of the global state being from a driver of the drivers becomes available to another driver of the drivers becomes available; 
querying a plurality of driver-order pairs and driver-reposition pairs based at least on the obtained global state to determine a Q-value of the MD-DQN for the driver; and 
determining the action information as the output based at least on the determined Q-value to optimize a total return for the drivers; and
 	wherein the method further comprises: 
 	training the MD-DQN based on training data comprising historical driver information, historical order information, and expected action values, wherein the training comprises: 
predicting a Q-value for each entry in the historical data; determining an error based on the Q-value and an expected action value using a loss function; and 
propagating the error back through the MD-DQN and updating weights of a plurality of layers of the MD-DQN according to the propagated error.

Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee.  Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance.”

The prior art made of record and not relied upon is considered pertinent to applicant's
disclosure.
	1)  Ferguson et al. (US PG Pub. 2019/0025820 A1) discloses autonomous vehicle repositioning by providing a platform for distributing and navigating an autonomous or semi-autonomous fleet throughout a plurality of pathways, and wherein the platform may employ demand distribution prediction algorithms, and interim repositioning algorithms to distribute the autonomous or semi-autonomous fleet for performing orders and tasks.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to FREDA A. NELSON whose telephone number is (571)272-7076. The examiner can normally be reached Monday-Friday, 10:00am - 6:30pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Shannon Campbell can be reached on 571-272-5587. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





Please address mail to be delivered by the United States Postal Service (USPS) as follows: 
Commissioner of Patents and Trademarks
Washington, D.C. 20231

Or faxed to: (571) 273-7076 [Informal/Draft Communications, labeled
"PROPOSED" or "DRAFT"]
 	Hand delivered responses should be brought to the Customer Service Window, Randolph Building, 401 Dulany Street, Alexandria, VA 22314
/F.A.N/Examiner, Art Unit 3628      
                                                                                                                                                                                                                                                                                                                                                                                                          /SHANNON S CAMPBELL/Supervisory Patent Examiner, Art Unit 3628