DETAILED ACTION
The following NON-FINAL Office Action is in response to application 16/639889 filed on 02/18/2020.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

Status of Claims
Claims 1-20 are currently pending and Claims 1-3, 9-11, and 17-18 have been rejected as follows.

Priority
Examiner has noted that the Applicant has claimed priority from the provisional application 62/555879 filed on 09/08/2017 and the foreign application PCT/US2018/049493 filed on 09/05/2018.

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 02/26/2020 and 12/28/2020 comply with the provisions of 37 CFR 1.97, 1.98, and MPEP 609 and were considered by the Examiner. 

	Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:


This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claims 1,3, 9, 11, and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Haparnas et al. (US2017/0132540) in view of Petroff (US2009/0327011) in view of non-patent literature Finnman et al. (published May 13, 2016 “Deep reinforcement learning compared with Q-table learning applied to backgammon” https://www.kth.se/social/files/58865ec8f27654607fb6e9a4/PFinnman_MWinberg_dkand16.pdf ). 

As per independent Claim 1, Claim 9, and Claim 17,
Haparnas teaches a ride order dispatching system, comprising: a processor; (para. 26-31) and a non-transitory computer-readable storage medium (para. 122) storing instructions that, when executed by the processor, cause the processor to perform a method, the method processing:/ A computer-implemented method for ride order dispatching, comprising:/ A non-transitory computer-readable storage medium storing instructions that, when executed by a processor, cause the processor to perform a ride order dispatching method, the method comprising:
obtaining, from a computing device, a current location of a vehicle (see para. 21 where the current location of the mobile device of a driver is transmitted to the system; para. 61 where driver availability data identifies the current location of the driver, whether the driver is available to transport, a destination location of a current trip, etc.)
using the current location of the vehicle and a time to obtain action information for the vehicle, the action information comprising: staying at the current location of the vehicle, re-positioning the vehicle, or accepting a ride order; and transmitting the action information to the computing device to cause the vehicle to stay at the current location, re-position to another location, or accept the ride order by proceeding to a pick-up location of the ride order (see para. 84-88 where based on the current availability of drivers and the event time, the system may navigate the drivers to location of the event prior to the completion of the event as in para. 86, if a driver is 15 minutes away and event ends at 9 PM then the system navigation starts at 8:45 PM; para. 87 where the system tracks the number of drivers en route to the event location and may request additional drivers to meet demand; para. 88 where the system navigates the drivers to the event location in anticipation of receiving passenger requests and may distribute the drivers among locations around the event location; see figure 4 and para. 101-104 where drivers may wait at the event locations and may be selected based on their proximity to the event location; figure 5 and para. 105-107 where the event end time changes and the actions provided to drivers changes) 

Haparnas does not teach inputting the vehicle schedule to a trained model to obtain action information for the vehicle.

Petroff teaches:
inputting the vehicle schedule to a trained model to obtain action information for the vehicle (see Petroff para. 18 where the vehicle dispatch system uses linear programming and reinforcement learning to para. 25-32 reinforcement learning is used to create a schedule for a vehicle such as “number of loads that should be picked up at each source location and dropped off at each destination location” and the schedule is input into the reinforcement learning algorithm as state S to obtain the appropriate action; see also para. 35-37 for the Q-table where Q(s,a) is the policy value function for state s given action a and pi(s) is the action that should be taken for state s)
transmitting the action information to the computing device (see Petroff para. 32 where the vehicle is dispatched to take the appropriate action towards meeting the schedule and achieving the goal) 

Examiner clarifying that Haparnas teaches using the current location of a vehicle and a time of an event as a schedule for the dispatch/actions of the vehicle. Petroff uses a vehicle schedule as an input to a trained model to obtain action information. 
It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to modify the Haparnas invention with the Petroff inputting the vehicle schedule to a trained model to obtain action information for the vehicle with the motivation of increasing the efficiency of the system as Petroff para. 4 “Vehicles and events in a work area are monitored so that vehicles can be dispatches when an event occurs that affects efficiency. For example, the object may be to maximize the amount of material hauled while minimizing operational costs. In another example, the object may be to maximize the number of deliveries over a period of time” and para. 36 “the goal of reinforcement learning is to maximize the reward R to both identify the appropriate (e.g., best) action for each state and designate that action as the policy for that state”. 

Haparnas/Petroff does not teach a trained neural network model.

Finnman teaches:
a trained neural network model (see Finnman page 11 deep reinforcement learning where the Q-table is replaced with a deep learning network )

It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to modify the Haparnas/Petroff invention with the Finnman trained neural network model with the motivation of increasing the efficiency of the model as in page 11 by replacing the Q-table with the deep learning network the learning capacity of the network increases -  “increasing the number of hidden layers and nodes therein increases the learning capacity of the network”. 

As per dependent Claim 3, Claim 11, and Claim 18,
Haparnas/Petroff/Finnman teaches the system of claim 1, the method of claim 9, and the storage medium of claim 17. 
Haparnas and Petroff does not teach wherein the neural network model comprises: an input layer comprising one or more action inputs and one or more state inputs; one or more hidden layers; and an output layer comprising a state-action value output. However, Petroff does teach one or more action inputs and one or more state inputs and a state-action value output (para. 35-37). 
Finnman teaches:
wherein the neural network model comprises: an input layer comprising one or more action inputs and one or more state inputs; one or more hidden layers; and an output layer comprising a state-action value output (see page 15-18 where the neural network has two hidden layers, 

It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to modify the Haparnas/Petroff invention with the Finnman wherein the neural network model comprises: an input layer comprising one or more action inputs and one or more state inputs; one or more hidden layers; and an output layer comprising a state-action value output with the motivation of increasing the efficiency of the model as in page 11 by replacing the Q-table with the deep learning network the learning capacity of the network increases -  “increasing the number of hidden layers and nodes therein increases the learning capacity of the network”. 

Claims 2 and 10 are rejected under 35 U.S.C. 103 as being unpatentable over Haparnas et al. (US2017/0132540) in view of Petroff (US2009/0327011) in view of non-patent literature Finnman et al. (published May 13, 2016 “Deep reinforcement learning compared with Q-table learning applied to backgammon”) as applied to claim 1 above further in view of non-patent literature Hasselt et al. (published 2016 “Deep reinforcement learning with Double Q-learning” https://ojs.aaai.org/index.php/AAAI/article/view/10295 ).

As per dependent Claim 2 and Claim 10,
Haparnas/Petroff/Finnman teaches the system of claim 1 and the method of claim 9.
Haparnas does not teach, but Petroff teaches:
the model comprises a reinforcement learning algorithm (see Petroff para. 18 where the vehicle dispatch system uses linear programming and reinforcement learning to para. 25-32 

It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to modify the Haparnas invention with the Petroff the model comprises a reinforcement learning algorithm with the motivation of increasing the efficiency of the system as Petroff para. 4 “Vehicles and events in a work area are monitored so that vehicles can be dispatches when an event occurs that affects efficiency. For example, the object may be to maximize the amount of material hauled while minimizing operational costs. In another example, the object may be to maximize the number of deliveries over a period of time” and para. 36 “the goal of reinforcement learning is to maximize the reward R to both identify the appropriate (e.g., best) action for each state and designate that action as the policy for that state”. 

Haparnas/Petroff do not teach but Finnman teaches:
the neural network model comprises a deep neural network and a reinforcement learning algorithm (see page 17-18)

It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to modify the Haparnas/Petroff invention with the Finnman the neural network model comprises a deep neural network and a reinforcement learning algorithm; and the deep neural network comprises two deep-Q networks with the motivation of increasing the efficiency of the model 

Haparnas/Petroff/Finnman does not teach the deep neural network comprises two deep-Q networks.

Hasselt teaches:
the deep neural network comprises two deep-Q networks (see page 2094-2095 double Q-learning algorithm where DQN combines Q-learning with a deep neural network and the authors constructed a new algorithm called double DQN)

It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to modify the Haparnas/Petroff/Finnman invention with the Hasselt deep neural network comprises two deep-Q networks with the motivation of reducing unrealistically high action values which may negatively affect the quality of the resulting policy (page 2094) and “this algorithm not only yields more accurate value estimates…overestimations of DQN indeed lead to poorer policies and that it is beneficial to reduce them” (page 2094).

Closest Prior Art
Currently available prior art alone or in combination fail to disclose every element of claims 4-8, 12-16, and 19-20. Examiner noting the limitations Claims 4, 12, and 19 of “wherein: the action comprises a destination and a drop-off time associated with performing a vehicle trip; the state comprises geo-coordinates of the vehicle and a pick-up time associated with the vehicle trip; the state-action value comprises a cumulative reward; and the input layers, the hidden layers, and the output layer are in a 
The following are the closest prior art:
Kivlovskiy et al. (US2018/0341881) teaches determining the most optimal post-trip action for an autonomous vehicle such as parking or re-positioning to a higher demand area to service rides.
Barahona et al. (US2013/0159206) teaches dispatching instructions provided to vehicles to satisfy demand at multiple locations.
Gao et al. (US2017/0193360) teaches a deep Q neural network.
Palanisamy et al (US10732639) and Yao (US2019/0220737) are not available prior art; however, they teach using neural networks and reinforcement learning to refine the action policies of autonomous vehicles.
 
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Lisa Ma whose telephone number is (571)272-2495. The examiner can normally be reached Monday to Thursday 7 AM - 5 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Shannon Campbell can be reached on (571)272-5587. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To 



/L.M./Examiner, Art Unit 3628                                                                                                                                                                                                        
/GEORGE CHEN/Primary Examiner, Art Unit 3628